25 Not-To-Dos of Data
- Parviz Foroughian
- Oct 13, 2024
- 7 min read
Starting any journey is tough, and more so when it involves data! Since I decided to join the Blog community I have been thinking actively of what to write about and it suddenly dawned on me that maybe highlighting a few Not-To-Dos is am easy way to start this Blog experiment. After all, it is always easier to tell people what not to do even if what not to do is what you actually do yourself (and know you should not do!). Hope you are feeling me. Anyway, here comes the first Blog post on this site so, hope you agree with the contents. Happy to hear from you to improve this and future articles. Oh, and remember, 25 is an arbitrary number. Somewhat round, not too small and not too big but we could go to 50 or more, as what is definitely not in short supply is bad data habits:-).
Finally, this is meant to be a way for me to get back to heavy writing, find like-minded colleagues and get positive feedback to learn. Each item really scratches the surface. I am hoping to pick a few of these short articles in the next few weeks and months to explore further. Too early to say how but now you know what the plan is!
For now, here is a list of 25 "Not-To-Dos" of data that each of which, one way or another, affects effective Data Management:
Neglecting Data Governance
What is it: Failing to implement a framework that outlines how data is managed, who owns it, and how it’s used.
Potential Impact: Poor data quality, lack of accountability, and compliance issues.
Best Practice Fix: Establish a formal data governance framework with clear roles, policies, and ownership for all data assets.
Ignoring Data Quality Management
What is it: Not monitoring or enforcing the accuracy, completeness, and reliability of data.
Potential Impact: Decisions based on inaccurate data lead to poor outcomes and financial losses.
Example: JP Morgan’s London Whale incident where poor data quality in risk models led to $6 billion in losses (Google it!).
Best Practice Fix: Implement regular data quality checks and establish clear KPIs around data integrity.
Allowing/Encouraging/Tolerating Siloed Data
What is it: Storing data in isolated systems where it cannot be easily accessed or shared.
Potential Impact: Difficulty in gaining cross-departmental insights, leading to fragmented decision-making.
Best Practice Fix: Break down data silos by using centralized platforms or data lakes that integrate all data sources. Remember a centralized solution is not necessarily physical (hope to touch on this one in more details sooner, rather than later).
Over-Complicating Data Architecture
What is it: Building overly complex data pipelines and systems that slow down the processing and decision-making.
Potential Impact: Increased operational overhead and slower time to insights.
Best Practice Fix: Design data architecture to be scalable and modular, focusing on simplicity and agility.
Relying on Manual Data Processes
What is it: Using manual data handling methods such as data entry and validation.
Potential Impact: Human errors, inefficiencies, and high labor costs.
Example: Just Google for examples of financial services firms who lost millions due to manual data entry errors in transactions.
Best Practice Fix: Automate data workflows and validations to reduce human intervention and error.
Lack of Standardization Across Data Sources
What is it: Failing to standardize data formats, structures, and naming conventions across systems.
Potential Impact: Inconsistent reporting, confusion, and difficulty in combining datasets. And maybe I should really say: too many reports for the same topics! I feel many know exactly what I mean here.
Best Practice Fix: Establish data standards and enforce them across all data entry points and sources.
Underinvesting in Data Security
What is it: Neglecting to implement proper data security measures such as encryption, access control, and monitoring.
Potential Impact: Data breaches, compliance violations, and loss of customer trust.
Example: Equifax’s 2017 breach exposed 147 million records due to poor security practices.
Best Practice Fix: Implement a multi-layered security approach with encryption, access controls, and real-time monitoring.
Not Defining Clear Data Ownership
What is it: Failing to assign responsibility for specific data assets to individuals or departments.
Potential Impact: Lack of accountability, leading to mismanagement or neglect of critical data.
Best Practice Fix: Assign data owners who are responsible for the quality, usage, and lifecycle of their data.
Skipping Proper Data Documentation
What is it: Failing to maintain documentation of data sources, transformations, and governance. This is one place that I go on about the dangers of "Agile Methodology". Not that it is bad on its own, but documentation is typically the first thing that suffers when you seek agility in an organization!
Potential Impact: Difficulties in understanding and trusting the data, leading to errors in decision-making.
Best Practice Fix: Maintain comprehensive metadata and data lineage documentation to track the flow of data.
Overlooking Data Privacy Regulations
What is it: Ignoring or inadequately adhering to data privacy laws such as GDPR or CCPA.
Potential Impact: Heavy fines, reputational damage, and loss of customer trust.
Example: Google was fined $57 million for GDPR violations due to improper consent practices. An intersting one to Google about, of course:-).
Best Practice Fix: Regularly audit data processes for compliance and update practices according to evolving regulations.
Using Outdated or Unsupported Data Tools
What is it: Relying on legacy systems or tools that are no longer maintained or secure.
Potential Impact: System failures, security vulnerabilities, and reduced performance.
Best Practice Fix: Regularly evaluate and update data infrastructure to ensure tools are modern, secure, and supported.
Failing to Prioritize Data Integration
What is it: Not connecting data from different systems, leading to fragmentation and incomplete insights.
Potential Impact: Inaccurate reporting, poor customer service, and disjointed operations.
Best Practice Fix: Use ETL (Extract, Transform, Load) processes, middleware, or other relevant methodologies/technologies to integrate data from disparate sources into a single repository.
Assuming All Data is Valuable
What is it: Collecting and storing every piece of data without assessing its relevance or usefulness.
Potential Impact: Increased storage costs, complexity in data management, and lower performance.
Best Practice Fix: Focus on collecting high-quality, relevant data and regularly purge unnecessary or outdated data.
Not Providing Data Literacy Training
What is it: Failing to train employees on how to interpret and use data effectively.
Potential Impact: Misinterpretation of data, poor decision-making, and underutilization of data tools.
Best Practice Fix: Implement regular data literacy training programs tailored to different levels of expertise within the organization.
Lack of Scalability in Data Infrastructure
What is it: Building data systems that cannot grow with increasing data volumes or business needs.
Potential Impact: Performance bottlenecks, downtime, and costly system overhauls.
Best Practice Fix: Design data systems to scale dynamically by using cloud-native architectures and horizontal scaling techniques.
Not Establishing Clear Data Use Policies
What is it: Lacking formal policies on how data should be accessed, used, and shared within the organization.
Potential Impact: Data misuse, security breaches, or legal violations.
Best Practice Fix: Create comprehensive data use policies that define access levels, usage rules, and protocols for handling sensitive data.
Rushing to Implement AI Without Clean Data
What is it: Deploying AI models without ensuring the underlying data is accurate, consistent, and complete.
Potential Impact: Poor AI model performance and inaccurate predictions.
Example: IBM Watson for Oncology made incorrect treatment recommendations due to poor training data.
Best Practice Fix: Focus on data cleansing and quality assurance before feeding data into AI models.
Failing to Align Data Strategy with Business Objectives
What is it: Implementing data initiatives without considering their alignment with the organization’s overall goals.
Potential Impact: Underutilized systems, wasted resources, and missed business opportunities.
Best Practice Fix: Develop a data strategy that directly supports key business objectives and regularly review its effectiveness.
Not Archiving Old or Unused Data
What is it: Retaining data indefinitely without determining its relevance or necessity.
Potential Impact: Increased storage costs and legal risks from holding onto unnecessary or sensitive data.
Best Practice Fix: Implement data lifecycle management policies that archive or delete old data based on usage patterns and regulatory requirements.
Assuming Cloud Migration Solves All Data Problems
What is it: Believing that moving data to the cloud will automatically resolve governance, quality, or integration issues.
Potential Impact: Continued data problems, only now in a cloud environment, along with unexpected costs.
Best Practice Fix: Plan cloud migrations carefully, addressing data governance and quality beforehand, and ensure cloud solutions are cost-effective and scalable.
Lack of Real-Time Data Access
What is it: Failing to provide access to real-time data, limiting the ability to respond quickly to changes.
Potential Impact: Decision-making based on outdated information, leading to missed opportunities or poor outcomes.
Best Practice Fix: Implement real-time data streaming or event-driven architectures to provide up-to-date insights.
Relying Too Much on One Vendor
What is it: Becoming overly dependent on a single vendor for critical data infrastructure or tools.
Potential Impact: Vendor lock-in, pricing power shifts, and increased risk during outages or failures.
Best Practice Fix: Diversify vendors and maintain vendor-neutral solutions where possible, with contingency plans in place.
Over-Reliance on Shadow IT for Data Management
What is it: Allowing departments to create and manage their own data solutions outside the oversight of the central IT team.
Potential Impact: Uncontrolled data sprawl, security risks, and lack of governance.
Best Practice Fix: Integrate shadow IT initiatives into the broader data strategy, providing IT governance while allowing flexibility.
Not Having a Disaster Recovery Plan for Data
What is it: Failing to create a backup or disaster recovery strategy for critical data systems.
Potential Impact: Data loss, operational disruptions, and severe financial consequences.
Example: In 2011, a Japanese automotive company lost critical data due to a natural disaster, as they had no offsite backup.
Best Practice Fix: Establish a disaster recovery plan with regular backups (offsite or cloud-based) and conduct recovery drills to ensure business continuity.
Failing to Automate Data Management
What is it: Relying on manual processes for managing and processing data when automation solutions are available.
Potential Impact: Higher operational costs, increased human error, and inefficiency in data handling.
Best Practice Fix: Invest in automation tools such as ETL processes, data cataloging, and AI-based monitoring to streamline data workflows and reduce the risk of human error.
Like I said already, we can discuss more than the above 25 items and for sure, we can dissect each one in more details. But these are another activities for another days. See you soon in a different post...
コメント