For this assignment, you will research a topic relating to Data Warehousing and Business Intelligence. An example of acceptable topics includes: – Please pick any 2-3 topics not more. Along with the paper, please create a PowerPoint presentation which should be at least 10 slides in length not including title or reference slides.
Introduction to Data Warehousing:
Data Warehouse Architecture:
ETL (Extract, Transform, Load) Processes:
Data Modeling for Data Warehousing:
Data Quality and Data Governance:
Data Warehouse Tools and Technologies:
Data Warehouse Security:
DAMA-DMBOK
DATA MANAGEMENT BODY OF KNOWLEDGE
SECOND EDITION
DAMA International
Technics Publications
BASKING RIDGE, NEW JERSEY
Dedicated to the memory of
Patricia Cupoli, MLS, MBA, CCP, CDMP
(May 25, 1948 – July 28, 2015)
for her lifelong commitment to the Data Management profession
and her contributions to this publication.
Published by:
2 Lindsley Road
Basking Ridge, NJ 07920 USA
https://www.TechnicsPub.com
Senior Editor:
Editor:
Production Editor:
Bibliography Researcher:
Collaboration Tool Manager:
Deborah Henderson, CDMP
Susan Earley, CDMP
Laura Sebastian-Coleman, CDMP, IQCP
Elena Sykora, DGSP
Eva Smith, CDMP
Cover design by Lorena Molinari
All rights reserved. No part of this book may be reproduced or transmitted in any form or by any means, electronic
or mechanical, including photocopying, recording or by any information storage and retrieval system, without
written permission from the publisher, except for the inclusion of brief quotations in a review.
The author and publisher have taken care in the preparation of this book, but make no expressed or implied
warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or
consequential damages in connection with or arising out of the use of the information or programs contained herein.
All trade and product names are trademarks, registered trademarks or service marks of their respective companies
and are the property of their respective holders and should be treated as such.
Second Edition
First Printing 2017
Copyright © 2017 DAMA International
ISBN, Print ed.
ISBN, PDF ed.
ISBN, Server ed.
ISBN, Enterprise ed.
9781634622349
9781634622363
9781634622486
9781634622479
Library of Congress Control Number:
2017941854
Contents
Preface _________________________________________________________ 15
Chapter 1: Data Management _______________________________________ 17
1. Introduction ____________________________________________________________ 17
2. Essential Concepts _______________________________________________________ 18
2.1 Data ______________________________________________________________________ 18
2.2 Data and Information ________________________________________________________ 20
2.3 Data as an Organizational Asset _______________________________________________ 20
2.4 Data Management Principles __________________________________________________ 21
2.5 Data Management Challenges _________________________________________________ 23
2.6 Data Management Strategy ___________________________________________________ 31
3. Data Management Frameworks ____________________________________________ 33
3.1 Strategic Alignment Model____________________________________________________ 33
3.2 The Amsterdam Information Model ____________________________________________ 34
3.3 The DAMA-DMBOK Framework _______________________________________________ 35
3.4 DMBOK Pyramid (Aiken) _____________________________________________________ 39
3.5 DAMA Data Management Framework Evolved ___________________________________ 40
4. DAMA and the DMBOK ___________________________________________________ 43
5. Works Cited / Recommended ______________________________________________ 46
Chapter 2: Data Handling Ethics ____________________________________ 49
1. Introduction ____________________________________________________________ 49
2. Business Drivers ________________________________________________________ 51
3. Essential Concepts _______________________________________________________ 52
3.1 Ethical Principles for Data ____________________________________________________ 52
3.2 Principles Behind Data Privacy Law ____________________________________________ 53
3.3 Online Data in an Ethical Context ______________________________________________ 56
3.4 Risks of Unethical Data Handling Practices ______________________________________ 56
3.5 Establishing an Ethical Data Culture ____________________________________________ 60
3.6 Data Ethics and Governance __________________________________________________ 64
4. Works Cited / Recommended ______________________________________________ 65
Chapter 3: Data Governance ________________________________________ 67
1. Introduction ____________________________________________________________ 67
1.1 Business Drivers ____________________________________________________________ 70
1.2 Goals and Principles _________________________________________________________ 71
1.3 Essential Concepts __________________________________________________________ 72
2. Activities _______________________________________________________________ 79
2.1 Define Data Governance for the Organization ____________________________________ 79
2.2 Perform Readiness Assessment _______________________________________________ 79
2.3 Perform Discovery and Business Alignment _____________________________________ 80
2.4 Develop Organizational Touch Points___________________________________________ 81
2.5 Develop Data Governance Strategy _____________________________________________ 82
2.6 Define the DG Operating Framework ___________________________________________ 82
2.7 Develop Goals, Principles, and Policies __________________________________________ 83
2.8 Underwrite Data Management Projects _________________________________________ 84
2.9 Engage Change Management __________________________________________________ 85
1
2 • DMBOK2
2.10 Engage in Issue Management ________________________________________________ 86
2.11 Assess Regulatory Compliance Requirements___________________________________ 87
2.12 Implement Data Governance _________________________________________________ 88
2.13 Sponsor Data Standards and Procedures _______________________________________ 88
2.14 Develop a Business Glossary _________________________________________________ 90
2.15 Coordinate with Architecture Groups _________________________________________ 90
2.16 Sponsor Data Asset Valuation ________________________________________________ 91
2.17 Embed Data Governance ____________________________________________________ 91
3. Tools and Techniques____________________________________________________ 92
3.1 Online Presence / Websites___________________________________________________ 92
3.2 Business Glossary ___________________________________________________________ 92
3.3 Workflow Tools ____________________________________________________________ 93
3.4 Document Management Tools_________________________________________________ 93
3.5 Data Governance Scorecards __________________________________________________ 93
4. Implementation Guidelines _______________________________________________ 93
4.1 Organization and Culture_____________________________________________________ 93
4.2 Adjustment and Communication ______________________________________________ 94
5. Metrics ________________________________________________________________ 94
6. Works Cited / Recommended _____________________________________________ 95
Chapter 4: Data Architecture _______________________________________ 97
1. Introduction ___________________________________________________________ 97
1.1 Business Drivers ____________________________________________________________ 99
1.2 Data Architecture Outcomes and Practices _____________________________________ 100
1.3 Essential Concepts _________________________________________________________ 101
2. Activities _____________________________________________________________ 109
2.1 Establish Data Architecture Practice __________________________________________ 110
2.2 Integrate with Enterprise Architecture ________________________________________ 115
3. Tools ________________________________________________________________ 115
3.1 Data Modeling Tools________________________________________________________ 115
3.2 Asset Management Software _________________________________________________ 115
3.3 Graphical Design Applications _______________________________________________ 115
4. Techniques ___________________________________________________________ 116
4.1 Lifecycle Projections _______________________________________________________ 116
4.2 Diagramming Clarity _______________________________________________________ 116
5. Implementation Guidelines ______________________________________________ 117
5.1 Readiness Assessment / Risk Assessment ______________________________________ 118
5.2 Organization and Cultural Change ____________________________________________ 119
6. Data Architecture Governance ___________________________________________ 119
6.1 Metrics ___________________________________________________________________ 120
7. Works Cited / Recommended ____________________________________________ 120
Chapter 5: Data Modeling and Design _______________________________ 123
1. Introduction __________________________________________________________ 123
1.1 Business Drivers ___________________________________________________________ 125
1.2 Goals and Principles ________________________________________________________ 125
1.3 Essential Concepts _________________________________________________________ 126
2. Activities _____________________________________________________________ 152
2.1 Plan for Data Modeling______________________________________________________ 152
CONTENTS • 3
2.2 Build the Data Model _______________________________________________________ 153
2.3 Review the Data Models _____________________________________________________ 158
2.4 Maintain the Data Models ___________________________________________________ 159
3. Tools _________________________________________________________________ 159
3.1 Data Modeling Tools ________________________________________________________ 159
3.2 Lineage Tools _____________________________________________________________ 159
3.3 Data Profiling Tools ________________________________________________________ 160
3.4 Metadata Repositories ______________________________________________________ 160
3.5 Data Model Patterns ________________________________________________________ 160
3.6 Industry Data Models _______________________________________________________ 160
4. Best Practices __________________________________________________________ 161
4.1 Best Practices in Naming Conventions _________________________________________ 161
4.2 Best Practices in Database Design _____________________________________________ 161
5. Data Model Governance _________________________________________________ 162
5.1 Data Model and Design Quality Management ___________________________________ 162
5.2 Data Modeling Metrics ______________________________________________________ 164
6. Works Cited / Recommended _____________________________________________ 166
Chapter 6: Data Storage and Operations _____________________________ 169
1. Introduction ___________________________________________________________ 169
1.1 Business Drivers ___________________________________________________________ 171
1.2 Goals and Principles ________________________________________________________ 171
1.3 Essential Concepts _________________________________________________________ 172
2. Activities ______________________________________________________________ 193
2.1 Manage Database Technology ________________________________________________ 194
2.2 Manage Databases _________________________________________________________ 196
3. Tools _________________________________________________________________ 209
3.1 Data Modeling Tools ________________________________________________________ 209
3.2 Database Monitoring Tools __________________________________________________ 209
3.3 Database Management Tools _________________________________________________ 209
3.4 Developer Support Tools ____________________________________________________ 209
4. Techniques ____________________________________________________________ 210
4.1 Test in Lower Environments _________________________________________________ 210
4.2 Physical Naming Standards __________________________________________________ 210
4.3 Script Usage for All Changes _________________________________________________ 210
5. Implementation Guidelines_______________________________________________ 210
5.1 Readiness Assessment / Risk Assessment ______________________________________ 210
5.2 Organization and Cultural Change ____________________________________________ 211
6. Data Storage and Operations Governance ___________________________________ 212
6.1 Metrics ___________________________________________________________________ 212
6.2 Information Asset Tracking __________________________________________________ 213
6.3 Data Audits and Data Validation ______________________________________________ 213
7. Works Cited / Recommended _____________________________________________ 214
Chapter 7: Data Security __________________________________________ 217
1. Introduction ___________________________________________________________ 217
1.1 Business Drivers ___________________________________________________________ 220
1.2 Goals and Principles ________________________________________________________ 222
1.3 Essential Concepts _________________________________________________________ 223
4 • DMBOK2
2. Activities _____________________________________________________________ 245
2.1 Identify Data Security Requirements __________________________________________ 245
2.2 Define Data Security Policy __________________________________________________ 247
2.3 Define Data Security Standards_______________________________________________ 248
3. Tools ________________________________________________________________ 256
3.1 Anti-Virus Software / Security Software _______________________________________ 256
3.2 HTTPS ___________________________________________________________________ 256
3.3 Identity Management Technology ____________________________________________ 257
3.4 Intrusion Detection and Prevention Software ___________________________________ 257
3.5 Firewalls (Prevention) ______________________________________________________ 257
3.6 Metadata Tracking _________________________________________________________ 257
3.7 Data Masking/Encryption ___________________________________________________ 258
4. Techniques ___________________________________________________________ 258
4.1 CRUD Matrix Usage ________________________________________________________ 258
4.2 Immediate Security Patch Deployment ________________________________________ 258
4.3 Data Security Attributes in Metadata __________________________________________ 258
4.4 Metrics ___________________________________________________________________ 259
4.5 Security Needs in Project Requirements _______________________________________ 261
4.6 Efficient Search of Encrypted Data ____________________________________________ 262
4.7 Document Sanitization ______________________________________________________ 262
5. Implementation Guidelines ______________________________________________ 262
5.1 Readiness Assessment / Risk Assessment ______________________________________ 262
5.2 Organization and Cultural Change ____________________________________________ 263
5.3 Visibility into User Data Entitlement __________________________________________ 263
5.4 Data Security in an Outsourced World _________________________________________ 264
5.5 Data Security in Cloud Environments __________________________________________ 265
6. Data Security Governance _______________________________________________ 265
6.1 Data Security and Enterprise Architecture _____________________________________ 265
7. Works Cited / Recommended ____________________________________________ 266
Chapter 8: Data Integration and Interoperability______________________ 269
1. Introduction __________________________________________________________ 269
1.1 Business Drivers ___________________________________________________________ 270
1.2 Goals and Principles ________________________________________________________ 272
1.3 Essential Concepts _________________________________________________________ 273
2. Data Integration Activities _______________________________________________ 286
2.1 Plan and Analyze __________________________________________________________ 286
2.2 Design Data Integration Solutions ____________________________________________ 289
2.3 Develop Data Integration Solutions ___________________________________________ 291
2.4 Implement and Monitor _____________________________________________________ 293
3. Tools ________________________________________________________________ 294
3.1 Data Transformation Engine/ETL Tool ________________________________________ 294
3.2 Data Virtualization Server ___________________________________________________ 294
3.3 Enterprise Service Bus ______________________________________________________ 294
3.4 Business Rules Engine ______________________________________________________ 295
3.5 Data and Process Modeling Tools _____________________________________________ 295
3.6 Data Profiling Tool _________________________________________________________ 295
3.7 Metadata Repository _______________________________________________________ 296
4. Techniques ___________________________________________________________ 296
CONTENTS • 5
5. Implementation Guidelines_______________________________________________ 296
5.1 Readiness Assessment / Risk Assessment ______________________________________ 296
5.2 Organization and Cultural Change ____________________________________________ 297
6. DII Governance_________________________________________________________ 297
6.1 Data Sharing Agreements ___________________________________________________ 298
6.2 DII and Data Lineage _______________________________________________________ 298
6.3 Data Integration Metrics ____________________________________________________ 299
7. Works Cited / Recommended _____________________________________________ 299
Chapter 9: Document and Content Management_______________________ 303
1. Introduction ___________________________________________________________ 303
1.1 Business Drivers ___________________________________________________________ 305
1.2 Goals and Principles ________________________________________________________ 305
1.3 Essential Concepts _________________________________________________________ 307
2. Activities ______________________________________________________________ 323
2.1 Plan for Lifecycle Management _______________________________________________ 323
2.2 Manage the Lifecycle _______________________________________________________ 326
2.3 Publish and Deliver Content _________________________________________________ 329
3. Tools _________________________________________________________________ 330
3.1 Enterprise Content Management Systems ______________________________________ 330
3.2 Collaboration Tools ________________________________________________________ 333
3.3 Controlled Vocabulary and Metadata Tools _____________________________________ 333
3.4 Standard Markup and Exchange Formats ______________________________________ 333
3.5 E-discovery Technology _____________________________________________________ 336
4. Techniques ____________________________________________________________ 336
4.1 Litigation Response Playbook ________________________________________________ 336
4.2 Litigation Response Data Map ________________________________________________ 337
5. Implementation Guidelines_______________________________________________ 337
5.1 Readiness Assessment / Risk Assessment ______________________________________ 338
5.2 Organization and Cultural Change ____________________________________________ 339
6. Documents and Content Governance _______________________________________ 340
6.1 Information Governance Frameworks _________________________________________ 340
6.2 Proliferation of Information _________________________________________________ 342
6.3 Govern for Quality Content __________________________________________________ 342
6.4 Metrics ___________________________________________________________________ 343
7. Works Cited / Recommended _____________________________________________ 344
Chapter 10: Reference and Master Data _____________________________ 347
1. Introduction ___________________________________________________________ 347
1.1 Business Drivers ___________________________________________________________ 349
1.2 Goals and Principles ________________________________________________________ 349
1.3 Essential Concepts _________________________________________________________ 350
2. Activities ______________________________________________________________ 370
2.1 MDM Activities ____________________________________________________________ 371
2.2 Reference Data Activities ____________________________________________________ 373
3. Tools and Techniques ___________________________________________________ 375
4. Implementation Guidelines_______________________________________________ 375
4.1 Adhere to Master Data Architecture ___________________________________________ 376
4.2 Monitor Data Movement ____________________________________________________ 376
6 • DMBOK2
4.3 Manage Reference Data Change ______________________________________________ 376
4.4 Data Sharing Agreements ___________________________________________________ 377
5. Organization and Cultural Change ________________________________________ 378
6. Reference and Master Data Governance____________________________________ 378
6.1 Metrics ___________________________________________________________________ 379
7. Works Cited / Recommended ____________________________________________ 379
Chapter 11: Data Warehousing and Business Intelligence_______________ 381
1. Introduction __________________________________________________________ 381
1.1 Business Drivers ___________________________________________________________ 383
1.2 Goals and Principles ________________________________________________________ 383
1.3 Essential Concepts _________________________________________________________ 384
2. Activities _____________________________________________________________ 394
2.1 Understand Requirements __________________________________________________ 394
2.2 Define and Maintain the DW/BI Architecture ___________________________________ 395
2.3 Develop the Data Warehouse and Data Marts ___________________________________ 396
2.4 Populate the Data Warehouse ________________________________________________ 397
2.5 Implement the Business Intelligence Portfolio __________________________________ 398
2.6 Maintain Data Products _____________________________________________________ 399
3. Tools ________________________________________________________________ 402
3.1 Metadata Repository _______________________________________________________ 402
3.2 Data Integration Tools ______________________________________________________ 403
3.3 Business Intelligence Tools Types ____________________________________________ 403
4. Techniques ___________________________________________________________ 407
4.1 Prototypes to Drive Requirements ____________________________________________ 407
4.2 Self-Service BI _____________________________________________________________ 408
4.3 Audit Data that can be Queried _______________________________________________ 408
5. Implementation Guidelines ______________________________________________ 408
5.1 Readiness Assessment / Risk Assessment ______________________________________ 408
5.2 Release Roadmap __________________________________________________________ 409
5.3 Configuration Management __________________________________________________ 409
5.4 Organization and Cultural Change ____________________________________________ 410
6. DW/BI Governance_____________________________________________________ 411
6.1 Enabling Business Acceptance _______________________________________________ 411
6.2 Customer / User Satisfaction _________________________________________________ 412
6.3 Service Level Agreements ___________________________________________________ 412
6.4 Reporting Strategy _________________________________________________________ 412
6.5 Metrics ___________________________________________________________________ 413
7. Works Cited / Recommended ____________________________________________ 414
Chapter 12: Metadata Management ________________________________ 417
1. Introduction __________________________________________________________ 417
1.1 Business Drivers ___________________________________________________________ 420
1.2 Goals and Principles ________________________________________________________ 420
1.3 Essential Concepts _________________________________________________________ 421
2. Activities _____________________________________________________________ 434
2.1 Define Metadata Strategy____________________________________________________ 434
2.2 Understand Metadata Requirements __________________________________________ 435
2.3 Define Metadata Architecture ________________________________________________ 436
CONTENTS • 7
2.4 Create and Maintain Metadata________________________________________________ 438
2.5 Query, Report, and Analyze Metadata__________________________________________ 440
3. Tools _________________________________________________________________ 440
3.1 Metadata Repository Management Tools _______________________________________ 440
4. Techniques ____________________________________________________________ 441
4.1 Data Lineage and Impact Analysis_____________________________________________ 441
4.2 Metadata for Big Data Ingest _________________________________________________ 443
5. Implementation Guidelines_______________________________________________ 444
5.1 Readiness Assessment / Risk Assessment ______________________________________ 444
5.2 Organizational and Cultural Change ___________________________________________ 445
6. Metadata Governance ___________________________________________________ 445
6.1 Process Controls ___________________________________________________________ 445
6.2 Documentation of Metadata Solutions _________________________________________ 446
6.3 Metadata Standards and Guidelines ___________________________________________ 446
6.4 Metrics ___________________________________________________________________ 447
7. Works Cited / Recommended _____________________________________________ 448
Chapter 13: Data Quality _________________________________________ 449
1. Introduction ___________________________________________________________ 449
1.1 Business Drivers ___________________________________________________________ 452
1.2 Goals and Principles ________________________________________________________ 452
1.3 Essential Concepts _________________________________________________________ 453
2. Activities ______________________________________________________________ 473
2.1 Define High Quality Data ____________________________________________________ 473
2.2 Define a Data Quality Strategy ________________________________________________ 474
2.3 Identify Critical Data and Business Rules _______________________________________ 474
2.4 Perform an Initial Data Quality Assessment_____________________________________ 475
2.5 Identify and Prioritize Potential Improvements _________________________________ 476
2.6 Define Goals for Data Quality Improvement ____________________________________ 477
2.7 Develop and Deploy Data Quality Operations ___________________________________ 477
3. Tools _________________________________________________________________ 484
3.1 Data Profiling Tools ________________________________________________________ 485
3.2 Data Querying Tools ________________________________________________________ 485
3.3 Modeling and ETL Tools_____________________________________________________ 485
3.4 Data Quality Rule Templates _________________________________________________ 485
3.5 Metadata Repositories ______________________________________________________ 485
4. Techniques ____________________________________________________________ 486
4.1 Preventive Actions _________________________________________________________ 486
4.2 Corrective Actions _________________________________________________________ 486
4.3 Quality Check and Audit Code Modules ________________________________________ 487
4.4 Effective Data Quality Metrics ________________________________________________ 487
4.5 Statistical Process Control ___________________________________________________ 488
4.6 Root Cause Analysis ________________________________________________________ 490
5. Implementation Guidelines_______________________________________________ 490
5.1 Readiness Assessment / Risk Assessment ______________________________________ 491
5.2 Organization and Cultural Change ____________________________________________ 492
6. Data Quality and Data Governance_________________________________________ 493
6.1 Data Quality Policy _________________________________________________________ 493
6.2 Metrics ___________________________________________________________________ 494
8 • DMBOK2
7. Works Cited / Recommended ____________________________________________ 494
Chapter 14: Big Data and Data Science ______________________________ 497
1. Introduction __________________________________________________________ 497
1.1 Business Drivers ___________________________________________________________ 498
1.2 Principles ________________________________________________________________ 500
1.3 Essential Concepts _________________________________________________________ 500
2. Activities _____________________________________________________________ 511
2.1 Define Big Data Strategy and Business Needs ___________________________________ 511
2.2 Choose Data Sources _______________________________________________________ 512
2.3 Acquire and Ingest Data Sources______________________________________________ 513
2.4 Develop Data Hypotheses and Methods ________________________________________ 514
2.5 Integrate / Align Data for Analysis ____________________________________________ 514
2.6 Explore Data Using Models __________________________________________________ 514
2.7 Deploy and Monitor ________________________________________________________ 516
3. Tools ________________________________________________________________ 517
3.1 MPP Shared-nothing Technologies and Architecture _____________________________ 518
3.2 Distributed File-based Databases _____________________________________________ 519
3.3 In-database Algorithms _____________________________________________________ 520
3.4 Big Data Cloud Solutions ____________________________________________________ 520
3.5 Statistical Computing and Graphical Languages _________________________________ 520
3.6 Data Visualization Tools ____________________________________________________ 520
4. Techniques ___________________________________________________________ 521
4.1 Analytic Modeling __________________________________________________________ 521
4.2 Big Data Modeling _________________________________________________________ 522
5. Implementation Guidelines ______________________________________________ 523
5.1 Strategy Alignment _________________________________________________________ 523
5.2 Readiness Assessment / Risk Assessment ______________________________________ 523
5.3 Organization and Cultural Change ____________________________________________ 524
6. Big Data and Data Science Governance_____________________________________ 525
6.1 Visualization Channels Management __________________________________________ 525
6.2 Data Science and Visualization Standards ______________________________________ 525
6.3 Data Security______________________________________________________________ 526
6.4 Metadata _________________________________________________________________ 526
6.5 Data Quality ______________________________________________________________ 527
6.6 Metrics ___________________________________________________________________ 527
7. Works Cited / Recommended ____________________________________________ 528
Chapter 15: Data Management Maturity Assessment __________________ 531
1. Introduction __________________________________________________________ 531
1.1 Business Drivers ___________________________________________________________ 532
1.2 Goals and Principles ________________________________________________________ 534
1.3 Essential Concepts _________________________________________________________ 534
2. Activities _____________________________________________________________ 539
2.1 Plan Assessment Activities __________________________________________________ 540
2.2 Perform Maturity Assessment________________________________________________ 542
2.3 Interpret Results __________________________________________________________ 543
2.4 Create a Targeted Program for Improvements __________________________________ 544
2.5 Re-assess Maturity _________________________________________________________ 545
CONTENTS • 9
3. Tools _________________________________________________________________ 545
4. Techniques ____________________________________________________________ 546
4.1 Selecting a DMM Framework _________________________________________________ 546
4.2 DAMA-DMBOK Framework Use ______________________________________________ 546
5. Guidelines for a DMMA __________________________________________________ 547
5.1 Readiness Assessment / Risk Assessment ______________________________________ 547
5.2 Organizational and Cultural Change ___________________________________________ 548
6. Maturity Management Governance ________________________________________ 548
6.1 DMMA Process Oversight ____________________________________________________ 548
6.2 Metrics ___________________________________________________________________ 548
7. Works Cited / Recommended _____________________________________________ 549
Chapter 16: Data Management Organization and Role Expectations _______ 551
1. Introduction ___________________________________________________________ 551
2. Understand Existing Organization and Cultural Norms ________________________ 551
3. Data Management Organizational Constructs ________________________________ 553
3.1 Decentralized Operating Model _______________________________________________ 553
3.2 Network Operating Model ___________________________________________________ 554
3.3 Centralized Operating Model _________________________________________________ 555
3.4 Hybrid Operating Model ____________________________________________________ 556
3.5 Federated Operating Model __________________________________________________ 557
3.6 Identifying the Best Model for an Organization __________________________________ 557
3.7 DMO Alternatives and Design Considerations ___________________________________ 558
4. Critical Success Factors __________________________________________________ 559
4.1 Executive Sponsorship ______________________________________________________ 559
4.2 Clear Vision _______________________________________________________________ 559
4.3 Proactive Change Management _______________________________________________ 559
4.4 Leadership Alignment ______________________________________________________ 560
4.5 Communication____________________________________________________________ 560
4.6 Stakeholder Engagement ____________________________________________________ 560
4.7 Orientation and Training ____________________________________________________ 560
4.8 Adoption Measurement _____________________________________________________ 561
4.9 Adherence to Guiding Principles ______________________________________________ 561
4.10 Evolution Not Revolution __________________________________________________ 561
5. Build the Data Management Organization ___________________________________ 562
5.1 Identify Current Data Management Participants _________________________________ 562
5.2 Identify Committee Participants ______________________________________________ 562
5.3 Identify and Analyze Stakeholders ____________________________________________ 563
5.4 Involve the Stakeholders ____________________________________________________ 563
6. Interactions Between the DMO and Other Data-oriented Bodies ________________ 564
6.1 The Chief Data Officer_______________________________________________________ 564
6.2 Data Governance ___________________________________________________________ 565
6.3 Data Quality_______________________________________________________________ 566
6.4 Enterprise Architecture _____________________________________________________ 566
6.5 Managing a Global Organization ______________________________________________ 567
7. Data Management Roles _________________________________________________ 568
7.1 Organizational Roles _______________________________________________________ 568
7.2 Individual Roles ___________________________________________________________ 568
8. Works Cited / Recommended _____________________________________________ 571
10 • DMBOK2
Chapter 17: Data Management and Organizational Change Management __ 573
1. Introduction __________________________________________________________ 573
2. Laws of Change ________________________________________________________ 574
3. Not Managing a Change: Managing a Transition _____________________________ 575
4. Kotter’s Eight Errors of Change Management _______________________________ 577
4.1 Error #1: Allowing Too Much Complacency ____________________________________ 577
4.2 Error #2: Failing to Create a Sufficiently Powerful Guiding Coalition ________________ 578
4.3 Error #3: Underestimating the Power of Vision _________________________________ 578
4.4 Error #4: Under Communicating the Vision by a Factor of 10, 100, or 1000 __________ 579
4.5 Error #5: Permitting Obstacles to Block the Vision_______________________________ 580
4.6 Error #6: Failing to Create Short-Term Wins ___________________________________ 580
4.7 Error #7: Declaring Victory Too Soon _________________________________________ 581
4.8 Error # 8: Neglecting to Anchor Changes Firmly in the Corporate Culture____________ 581
5. Kotter’s Eight Stage Process for Major Change ______________________________ 582
5.1 Establishing a Sense of Urgency ______________________________________________ 583
5.2 The Guiding Coalition_______________________________________________________ 586
5.3 Developing a Vision and Strategy _____________________________________________ 590
5.4 Communicating the Change Vision ____________________________________________ 594
6. The Formula for Change_________________________________________________ 598
7. Diffusion of Innovations and Sustaining Change _____________________________ 599
7.1 The Challenges to be Overcome as Innovations Spread ___________________________ 601
7.2 Key Elements in the Diffusion of Innovation ____________________________________ 601
7.3 The Five Stages of Adoption _________________________________________________ 601
7.4 Factors Affecting Acceptance or Rejection of an Innovation or Change ______________ 602
8. Sustaining Change _____________________________________________________ 603
8.1 Sense of Urgency / Dissatisfaction ____________________________________________ 604
8.2 Framing the Vision _________________________________________________________ 604
8.3 The Guiding Coalition_______________________________________________________ 605
8.4 Relative Advantage and Observability _________________________________________ 605
9. Communicating Data Management Value __________________________________ 605
9.1 Communications Principles __________________________________________________ 605
9.2 Audience Evaluation and Preparation _________________________________________ 606
9.3 The Human Element________________________________________________________ 607
9.4 Communication Plan _______________________________________________________ 608
9.5 Keep Communicating _______________________________________________________ 609
10. Works Cited / Recommended ___________________________________________ 609
Acknowledgements _____________________________________________ 611
Index _________________________________________________________ 615
Figures
Figure 1 Data Management Principles ____________________________________________________________ 22
Figure 2 Data Lifecycle Key Activities_____________________________________________________________ 29
Figure 3 Strategic Alignment Model (Henderson and Venkatraman) _____________________________________ 34
Figure 4 Amsterdam Information Model (adapted) __________________________________________________ 35
Figure 5 The DAMA-DMBOK2 Data Management Framework (The DAMA Wheel) ___________________________ 36
Figure 6 DAMA Environmental Factors Hexagon ____________________________________________________ 36
Figure 7 Knowledge Area Context Diagram ________________________________________________________ 37
Figure 8 Purchased or Built Database Capability ____________________________________________________ 40
Figure 9 DAMA Functional Area Dependencies _____________________________________________________ 41
Figure 10 DAMA Data Management Function Framework _____________________________________________ 42
Figure 11 DAMA Wheel Evolved ________________________________________________________________ 44
Figure 12 Context Diagram: Data Handling Ethics ___________________________________________________ 50
Figure 13 Ethical Risk Model for Sampling Projects __________________________________________________ 64
Figure 14 Context Diagram: Data Governance and Stewardship _________________________________________ 69
Figure 15 Data Governance and Data Management __________________________________________________ 72
Figure 16 Data Governance Organization Parts _____________________________________________________ 74
Figure 17 Enterprise DG Operating Framework Examples _____________________________________________ 75
Figure 18 CDO Organizational Touch Points ________________________________________________________ 81
Figure 19 An Example of an Operating Framework __________________________________________________ 83
Figure 20 Data Issue Escalation Path _____________________________________________________________ 86
Figure 21 Context Diagram: Data Architecture _____________________________________________________ 100
Figure 22 Simplified Zachman Framework________________________________________________________ 103
Figure 23 Enterprise Data Model _______________________________________________________________ 106
Figure 24 Subject Area Models Diagram Example___________________________________________________ 107
Figure 25 Data Flow Depicted in a Matrix ________________________________________________________ 108
Figure 26 Data Flow Diagram Example __________________________________________________________ 109
Figure 27 The Data Dependencies of Business Capabilities ____________________________________________ 112
Figure 28 Context Diagram: Data Modeling and Design ______________________________________________ 124
Figure 29 Entities __________________________________________________________________________ 129
Figure 30 Relationships______________________________________________________________________ 130
Figure 31 Cardinality Symbols _________________________________________________________________ 131
Figure 32 Unary Relationship – Hierarchy ________________________________________________________ 131
Figure 33 Unary Relationship – Network _________________________________________________________ 131
Figure 34 Binary Relationship _________________________________________________________________ 132
Figure 35 Ternary Relationship ________________________________________________________________ 132
Figure 36 Foreign Keys ______________________________________________________________________ 133
Figure 37 Attributes ________________________________________________________________________ 133
Figure 38 Dependent and Independent Entity _____________________________________________________ 134
Figure 39 IE Notation _______________________________________________________________________ 137
Figure 40 Axis Notation for Dimensional Models ___________________________________________________ 138
Figure 41 UML Class Model ___________________________________________________________________ 140
Figure 42 ORM Model _______________________________________________________________________ 141
Figure 43 FCO-IM Model _____________________________________________________________________ 142
Figure 44 Data Vault Model ___________________________________________________________________ 143
Figure 45 Anchor Model _____________________________________________________________________ 143
Figure 46 Relational Conceptual Model __________________________________________________________ 145
Figure 47 Dimensional Conceptual Model ________________________________________________________ 146
Figure 48 Relational Logical Data Model _________________________________________________________ 146
Figure 49 Dimensional Logical Data Model _______________________________________________________ 147
Figure 50 Relational Physical Data Model ________________________________________________________ 148
Figure 51 Dimensional Physical Data Model _______________________________________________________ 149
Figure 52 Supertype and Subtype Relationships ___________________________________________________ 152
Figure 53 Modeling is Iterative ________________________________________________________________ 153
11
12 • DMBOK2
Figure 54 Context Diagram: Data Storage and Operations _____________________________________________ 170
Figure 55 Centralized vs. Distributed ____________________________________________________________ 175
Figure 56 Federated Databases ________________________________________________________________ 176
Figure 57 Coupling __________________________________________________________________________ 177
Figure 58 CAP Theorem ______________________________________________________________________ 180
Figure 59 Database Organization Spectrum _______________________________________________________ 184
Figure 60 Log Shipping vs. Mirroring ____________________________________________________________ 192
Figure 61 SLAs for System and Database Performance _______________________________________________ 203
Figure 62 Sources of Data Security Requirements ___________________________________________________ 218
Figure 63 Context Diagram: Data Security_________________________________________________________ 219
Figure 64 DMZ Example ______________________________________________________________________ 231
Figure 65 Security Role Hierarchy Example Diagram ________________________________________________ 251
Figure 66 Context Diagram: Data Integration and Interoperability ______________________________________ 271
Figure 67 ETL Process Flow ___________________________________________________________________ 274
Figure 68 ELT Process Flow ___________________________________________________________________ 275
Figure 69 Application Coupling ________________________________________________________________ 282
Figure 70 Enterprise Service Bus _______________________________________________________________ 283
Figure 71 Context Diagram: Documents and Content ________________________________________________ 304
Figure 72 Document Hierarchy based on ISO 9001-4.2 _______________________________________________ 317
Figure 73 Electronic Discovery Reference Model ___________________________________________________ 319
Figure 74 Information Governance Reference Model_________________________________________________ 341
Figure 75 Context Diagram: Reference and Master Data ______________________________________________ 348
Figure 76 Key Processing Steps for MDM _________________________________________________________ 361
Figure 77 Master Data Sharing Architecture Example ________________________________________________ 370
Figure 78 Reference Data Change Request Process __________________________________________________ 377
Figure 79 Context Diagram: DW/BI _____________________________________________________________ 382
Figure 80 The Corporate Information Factory______________________________________________________ 388
Figure 81 Kimball’s Data Warehouse Chess Pieces __________________________________________________ 390
Figure 82 Conceptual DW/BI and Big Data Architecture ______________________________________________ 391
Figure 83 Release Process Example _____________________________________________________________ 400
Figure 84 Context Diagram: Metadata____________________________________________________________ 419
Figure 85 Centralized Metadata Architecture ______________________________________________________ 432
Figure 86 Distributed Metadata Architecture ______________________________________________________ 433
Figure 87 Hybrid Metadata Architecture __________________________________________________________ 434
Figure 88 Example Metadata Repository Metamodel ________________________________________________ 437
Figure 89 Sample Data Element Lineage Flow Diagram _______________________________________________ 442
Figure 90 Sample System Lineage Flow Diagram ___________________________________________________ 442
Figure 91 Context Diagram: Data Quality _________________________________________________________ 451
Figure 92 Relationship Between Data Quality Dimensions ____________________________________________ 460
Figure 93 A Data Quality Management Cycle Based on the Shewhart Chart ________________________________ 463
Figure 94 Barriers to Managing Information as a Business Asset ________________________________________ 467
Figure 95 Control Chart of a Process in Statistical Control _____________________________________________ 489
Figure 96 Abate Information Triangle____________________________________________________________ 498
Figure 97 Context Diagram: Big Data and Data Science _______________________________________________ 499
Figure 98 Data Science Process ________________________________________________________________ 501
Figure 99 Data Storage Challenges ______________________________________________________________ 503
Figure 100 Conceptual DW/BI and Big Data Architecture _____________________________________________ 504
Figure 101 Services-based Architecture __________________________________________________________ 506
Figure 102 Columnar Appliance Architecture ______________________________________________________ 519
Figure 103 Context Diagram: Data Management Maturity Assessment ___________________________________ 533
Figure 104 Data Management Maturity Model Example ______________________________________________ 535
Figure 105 Example of a Data Management Maturity Assessment Visualization _____________________________ 537
Figure 106 Assess Current State to Create an Operating Model _________________________________________ 552
Figure 107 Decentralized Operating Model ________________________________________________________ 554
FIGURES AND TABLES • 13
Figure 108 Network Operating Model ___________________________________________________________ 554
Figure 109 Centralized Operating Model _________________________________________________________ 555
Figure 110 Hybrid Operating Model ____________________________________________________________ 556
Figure 111 Federated Operating Model __________________________________________________________ 557
Figure 112 Stakeholder Interest Map ____________________________________________________________ 564
Figure 113 Bridges’s Transition Phases __________________________________________________________ 576
Figure 114 Kotter’s Eight Stage Process for Major Change ____________________________________________ 583
Figure 115 Sources of Complacency_____________________________________________________________ 585
Figure 116 Vision Breaks Through Status Quo _____________________________________________________ 591
Figure 117 Management/Leadership Contrast _____________________________________________________ 593
Figure 118 Everett Rogers Diffusion of Innovations _________________________________________________ 600
Figure 119 The Stages of Adoption _____________________________________________________________ 602
Tables
Table 1 GDPR Principles ______________________________________________________________________ 54
Table 2 Canadian Privacy Statutory Obligations _____________________________________________________ 55
Table 3 United States Privacy Program Criteria _____________________________________________________ 55
Table 4 Typical Data Governance Committees / Bodies _______________________________________________ 74
Table 5 Principles for Data Asset Accounting _______________________________________________________ 78
Table 6 Architecture Domains _________________________________________________________________ 101
Table 7 Commonly Used Entity Categories ________________________________________________________ 127
Table 8 Entity, Entity Type, and Entity Instance ____________________________________________________ 128
Table 9 Modeling Schemes and Notations ________________________________________________________ 136
Table 10 Scheme to Database Cross Reference _____________________________________________________ 137
Table 11 Data Model Scorecard® Template _______________________________________________________ 164
Table 12 ACID vs BASE ______________________________________________________________________ 180
Table 13 Sample Regulation Inventory Table ______________________________________________________ 246
Table 14 Role Assignment Grid Example _________________________________________________________ 250
Table 15 Levels of Control for Documents per ANSI-859 _____________________________________________ 327
Table 16 Sample Audit Measures _______________________________________________________________ 329
Table 17 Simple Reference List ________________________________________________________________ 353
Table 18 Simple Reference List Expanded ________________________________________________________ 354
Table 19 Cross-Reference List _________________________________________________________________ 354
Table 20 Multi-Language Reference List _________________________________________________________ 354
Table 21 UNSPSC (Universal Standard Products and Services Classification) ______________________________ 355
Table 22 NAICS (North America Industry Classification System) _______________________________________ 355
Table 23 Critical Reference Data Metadata Attributes _______________________________________________ 357
Table 24 Source Data as Received by the MDM System _______________________________________________ 361
Table 25 Standardized and Enriched Input Data ___________________________________________________ 362
Table 26 Candidate Identification and Identity Resolution ____________________________________________ 364
Table 27 DW-Bus Matrix Example ______________________________________________________________ 389
Table 28 CDC Technique Comparison ___________________________________________________________ 393
Table 29 Common Dimensions of Data Quality _____________________________________________________ 458
Table 30 DQ Metric Examples _________________________________________________________________ 480
Table 31 Data Quality Monitoring Techniques _____________________________________________________ 481
Table 32 Analytics Progression ________________________________________________________________ 501
Table 33 Typical Risks and Mitigations for a DMMA _________________________________________________ 547
Table 34 Bridges’s Transition Phases ____________________________________________________________ 575
Table 35 Complacency Scenarios _______________________________________________________________ 578
Table 36 Declaring Victory Too Soon Scenarios ____________________________________________________ 581
Table 37 Diffusion of Innovations Categories Adapted to Information Management _________________________ 600
Table 38 The Stages of Adoption (Adapted from Rogers, 1964) ________________________________________ 602
Table 39 Communication Plan Elements _________________________________________________________ 608
Preface
D
AMA International is pleased to release the second edition of the DAMA Guide to the Data Management
Body of Knowledge (DAMA-DMBOK2). Since the publication of the first edition in 2009, significant
developments have taken place in the field of data management. Data Governance has become a standard
structure in many organizations, new technologies have enabled the collection and use of ‘Big Data’ (semistructured and unstructured data in a wide range of formats), and the importance of data ethics has grown along
with our ability to explore and exploit the vast amount of data and information produced as part of our daily lives.
These changes are exciting. They also place new and increasing demands on our profession. DAMA has responded
to these changes by reformulating the DAMA Data Management Framework (the DAMA Wheel), adding detail
and clarification, and expanding the scope of the DMBOK:
•
Context diagrams for all Knowledge Areas have been improved and updated.
•
Data Integration and Interoperability has been added as a new Knowledge Area to highlight its importance
(Chapter 8).
•
Data Ethics has been called out as a separate chapter due to the increasing necessity of an ethical approach
to all aspects of data management (Chapter 2).
•
The role of governance has been described both as a function (Chapter 3) and in relation to each
Knowledge Area.
•
A similar approach has been taken with organizational change management, which is described in Chapter
17 and incorporated into the Knowledge Area chapters.
•
New chapters on Big Data and Data Science (Chapter 14) and Data Management Maturity Assessment
(Chapter 15) help organizations understand where they want to go and give them the tools to get there.
•
The second edition also includes a newly formulated set of data management principles to support the
ability of organizations to manage their data effectively and get value from their data assets (Chapter 1).
We hope the DMBOK2 will serve data management professionals across the globe as a valuable resource and
guide. Nevertheless, we also recognize it is only a starting point. Real advancement will come as we apply and
learn from these ideas. DAMA exists to enable members to learn continuously, by sharing ideas, trends, problems,
and solutions.
Sue Geuens
Laura Sebastian-Coleman
President
Publications Officer
DAMA International
DAMA International
15
CHAPTER 1
Data Management
M
1. Introduction
any organizations recognize that their data is a vital enterprise asset. Data and information can give
them insight about their customers, products, and services. It can help them innovate and reach
strategic goals. Despite that recognition, few organizations actively manage data as an asset from
which they can derive ongoing value (Evans and Price, 2012). Deriving value from data does not happen in a
vacuum or by accident. It requires intention, planning, coordination, and commitment. It requires management and
leadership.
Data Management is the development, execution, and supervision of plans, policies, programs, and practices that
deliver, control, protect, and enhance the value of data and information assets throughout their lifecycles.
A Data Management Professional is any person who works in any facet of data management (from technical
management of data throughout its lifecycle to ensuring that data is properly utilized and leveraged) to meet
strategic organizational goals. Data management professionals fill numerous roles, from the highly technical (e.g.,
database administrators, network administrators, programmers) to strategic business (e.g., Data Stewards, Data
Strategists, Chief Data Officers).
Data management activities are wide-ranging. They include everything from the ability to make consistent
decisions about how to get strategic value from data to the technical deployment and performance of databases.
Thus data management requires both technical and non-technical (i.e., ‘business’) skills. Responsibility for
managing data must be shared between business and information technology roles, and people in both areas must
be able to collaborate to ensure an organization has high quality data that meets its strategic needs.
Data and information are not just assets in the sense that organizations invest in them in order to derive future
value. Data and information are also vital to the day-to-day operations of most organizations. They have been
called the ‘currency’, the ‘life blood’, and even the ‘new oil’ of the information economy. 1 Whether or not an
organization gets value from its analytics, it cannot even transact business without data.
To support the data management professionals who carry out the work, DAMA International (The Data
Management Association) has produced this book, the second edition of The DAMA Guide to the Data
1 Google ‘data as currency’, ‘data as life blood’, and ‘the new oil’, for numerous references.
17
18 • DMBOK2
Management Body of Knowledge (DMBOK2). This edition builds on the first one, published in 2009, which
provided foundational knowledge on which to build as the profession advanced and matured.
This chapter outlines a set of principles for data management. It discusses challenges related to following those
principles and suggests approaches for meeting these challenges. The chapter also describes the DAMA Data
Management Framework, which provides the context for the work carried out by data management professionals
within various Data Management Knowledge Areas.
1.1
Business Drivers
Information and knowledge hold the key to competitive advantage. Organizations that have reliable, high quality
data about their customers, products, services, and operations can make better decisions than those without data or
with unreliable data. Failure to manage data is similar to failure to manage capital. It results in waste and lost
opportunity. The primary driver for data management is to enable organizations to get value from their data assets,
just as effective management of financial and physical assets enables organizations to get value from those assets.
1.2
Goals
Within an organization, data management goals include:
•
Understanding and supporting the information needs of the enterprise and its stakeholders, including
customers, employees, and business partners
•
Capturing, storing, protecting, and ensuring the integrity of data assets
•
Ensuring the quality of data and information
•
Ensuring the privacy and confidentiality of stakeholder data
•
Preventing unauthorized or inappropriate access, manipulation, or use of data and information
•
Ensuring data can be used effectively to add value to the enterprise
2. Essential Concepts
2.1 Data
Long-standing definitions of data emphasize its role in representing facts about the world. 2 In relation to
information technology, data is also understood as information that has been stored in digital form (though data is
2 The New Oxford American Dictionary defines data as “facts and statistics collected together for analysis.” The American
Society for Quality (ASQ) defines data as “A set of collected facts” and describes two kinds of numerical data: measured or
variable and counted or attributed. The International Standards Organization (ISO) defines data as “re-interpretable
DATA MANAGEMENT • 19
not limited to information that has been digitized and data management principles apply to data captured on paper
as well as in databases). Still, because today we can capture so much information electronically, we call many
things ‘data’ that would not have been called ‘data’ in earlier times – things like names, addresses, birthdates, what
one ate for dinner on Saturday, the most recent book one purchased.
Such facts about individual people can be aggregated, analyzed, and used to make a profit, improve health, or
influence public policy. Moreover our technological capacity to measure a wide range of events and activities (from
the repercussions of the Big Bang to our own heartbeats) and to collect, store, and analyze electronic versions of
things that were not previously thought of as data (videos, pictures, sound recordings, documents) is close to
surpassing our ability to synthesize these data into usable information. 3 To take advantage of the variety of data
without being overwhelmed by its volume and velocity requires reliable, extensible data management practices.
Most people assume that, because data represents facts, it is a form of truth about the world and that the facts will
fit together. But ‘facts’ are not always simple or straightforward. Data is a means of representation. It stands for
things other than itself (Chisholm, 2010). Data is both an interpretation of the objects it represents and an object
that must be interpreted (Sebastian-Coleman, 2013). This is another way of saying that we need context for data to
be meaningful. Context can be thought of as data’s representational system; such a system includes a common
vocabulary and a set of relationships between components. If we know the conventions of such a system, then we
can interpret the data within it. 4 These conventions are often documented in a specific kind of data referred to as
Metadata.
However, because people often make different choices about how to represent concepts, they create different ways
of representing the same concepts. From these choices, data takes on different shapes. Think of the range of ways
we have to represent calendar dates, a concept about which there is an agreed-to definition. Now consider more
complex concepts (such as customer or product), where the granularity and level of detail of what needs to be
represented is not always self-evident, and the process of representation grows more complex, as does the process
of managing that information over time. (See Chapter 10).
Even within a single organization, there are often multiple ways of representing the same idea. Hence the need for
Data Architecture, modeling, governance, and stewardship, and Metadata and Data Quality management, all of
which help people understand and use data. Across organizations, the problem of multiplicity multiplies. Hence the
need for industry-level data standards that can bring more consistency to data.
Organizations have always needed to manage their data, but changes in technology have expanded the scope of this
management need as they have changed people’s understanding of what data is. These changes have enabled
organizations to use data in new ways to create products, share information, create knowledge, and improve
representation of information in a formalized manner suitable for communication, interpretation, or processing” (ISO 11179).
This definition emphasizes the electronic nature of data and assumes, correctly, that data requires standards because it is
managed through information technology systems. That said, it does not speak to the challenges of formalizing data in a
consistent way, across disparate systems. Nor does it account well for the concept of unstructured data.
3 http://ubm.io/2c4yPOJ (Accessed 20016-12-04). http://bit.ly/1rOQkt1 (Accessed 20016-12-04).
For additional information on the constructed-ness of data see: Kent, Data and Reality (2012) and Devlin, Business
Unintelligence (2013).
4
20 • DMBOK2
organizational success. But the rapid growth of technology and with it human capacity to produce, capture, and
mine data for meaning has intensified the need to manage data effectively.
2.2 Data and Information
Much ink has been spilled over the relationship between data and information. Data has been called the “raw
material of information” and information has been called “data in context”. 5 Often a layered pyramid is used to
describe the relationship between data (at the base), information, knowledge, and wisdom (at the very top). While
the pyramid can be helpful in describing why data needs to be well-managed, this representation presents several
challenges for data management.
•
It is based on the assumption that data simply exists. But data does not simply exist. Data has to be
created.
•
By describing a linear sequence from data through wisdom, it fails to recognize that it takes knowledge to
create data in the first place.
•
It implies that data and information are separate things, when in reality, the two concepts are intertwined
with and dependent on each other. Data is a form of information and information is a form of data.
Within an organization, it may be helpful to draw a line between information and data for purposes of clear
communication about the requirements and expectations of different uses by different stakeholders. (“Here is a
sales report for the last quarter [information]. It is based on data from our data warehouse [data]. Next quarter these
results [data] will be used to generate our quarter-over-quarter performance measures [information]”). Recognizing
data and information need to be prepared for different purposes drives home a central tenet of data management:
Both data and information need to be managed. Both will be of higher quality if they are managed together with
uses and customer requirements in mind. Throughout the DMBOK, the terms will be used interchangeably.
2.3 Data as an Organizational Asset
An asset is an economic resource, that can be owned or controlled, and that holds or produces value. Assets can be
converted to money. Data is widely recognized as an enterprise asset, though understanding of what it means to
manage data as an asset is still evolving. In the early 1990s, some organizations found it questionable whether the
value of goodwill should be given a monetary value. Now, the ‘value of goodwill’ commonly shows up as an item
on the Profit and Loss Statement (P&L). Similarly, while not universally adopted, monetization of data is becoming
increasingly common. It will not be too long before we see this as a feature of P&Ls. (See Chapter 3.)
Today’s organizations rely on their data assets to make more effective decisions and to operate more efficiently.
Businesses use data to understand their customers, create new products and services, and improve operational
efficiency by cutting costs and controlling risks. Government agencies, educational institutions, and not-for-profit
5 See English, 1999 and DAMA, 2009.
DATA MANAGEMENT • 21
organizations also need high quality data to guide their operational, tactical, and strategic activities. As
organizations increasingly depend on data, the value of data assets can be more clearly established.
Many organizations identify themselves as ‘data-driven’. Businesses aiming to stay competitive must stop making
decisions based on gut feelings or instincts, and instead use event triggers and apply analytics to gain actionable
insight. Being data-driven includes the recognition that data must be managed efficiently and with professional
discipline, through a partnership of business leadership and technical expertise.
Furthermore, the pace of business today means that change is no longer optional; digital disruption is the norm. To
react to this, business must co-create information solutions with technical data professionals working alongside
line-of-business counterparts. They must plan for how to obtain and manage data that they know they need to
support business strategy. They must also position themselves to take advantage of opportunities to leverage data in
new ways.
2.4 Data Management Principles
Data management shares characteristics with other forms of asset management, as seen in Figure 1. It involves
knowing what data an organization has and what might be accomplished with it, then determining how best to use
data assets to reach organizational goals.
Like other management processes, it must balance strategic and operational needs. This balance can best be struck
by following a set of principles that recognize salient features of data management and guide data management
practice.
•
Data is an asset with unique properties: Data is an asset, but it differs from other assets in important
ways that influence how it is managed. The most obvious of these properties is that data is not consumed
when it is used, as are financial and physical assets.
•
The value of data can and should be expressed in economic terms: Calling data an asset implies that it
has value. While there are techniques for measuring data’s qualitative and quantitative value, there are not
yet standards for doing so. Organizations that want to make better decisions about their data should
develop consistent ways to quantify that value. They should also measure both the costs of low quality
data and the benefits of high quality data.
•
Managing data means managing the quality of data: Ensuring that data is fit for purpose is a primary
goal of data management. To manage quality, organizations must ensure they understand stakeholders’
requirements for quality and measure data against these requirements.
•
It takes Metadata to manage data: Managing any asset requires having data about that asset (number of
employees, accounting codes, etc.). The data used to manage and use data is called Metadata. Because
data cannot be held or touched, to understand what it is and how to use it requires definition and
knowledge in the form of Metadata. Metadata originates from a range of processes related to data creation,
processing, and use, including architecture, modeling, stewardship, governance, Data Quality
management, systems development, IT and business operations, and analytics.
22 • DMBOK2
DATA
MANAGEMENT
PRINCIPLES
Effective data
management requires
leadership
commitment
Data is valuable
• D
ata is an asset with
unique properties
• The value of data can and
should be expressed in
economic terms
Data Management Requirements are Business Requirements
•
•
•
•
Managing data means managing the quality of data
It takes Metadata to manage data
It takes planning to manage data
Data management requirements must
drive Information Technology decisions
Data Management depends on diverse skills
• D
ata management is cross-functional
• Data management requires an enterprise
perspective
• Data management must account for a range of
perspectives
Data Management is lifecycle management
• Different types of data have different lifecycle
characteristics
• Managing data includes managing the risks
associated with data
Figure 1 Data Management Principles
•
It takes planning to manage data: Even small organizations can have complex technical and business
process landscapes. Data is created in many places and is moved between places for use. To coordinate
work and keep the end results aligned requires planning from an architectural and process perspective.
•
Data management is cross-functional; it requires a range of skills and expertise: A single team cannot
manage all of an organization’s data. Data management requires both technical and non-technical skills
and the ability to collaborate.
•
Data management requires an enterprise perspective: Data management has local applications, but it
must be applied across the enterprise to be as effective as possible. This is one reason why data
management and data governance are intertwined.
•
Data management must account for a range of perspectives: Data is fluid. Data management must
constantly evolve to keep up with the ways data is created and used and the data consumers who use it.
DATA MANAGEMENT • 23
•
Data management is lifecycle management: Data has a lifecycle and managing data requires managing
its lifecycle. Because data begets more data, the data lifecycle itself can be very complex. Data
management practices need to account for the data lifecycle.
•
Different types of data have different lifecycle characteristics: And for this reason, they have different
management requirements. Data management practices have to recognize these differences and be flexible
enough to meet different kinds of data lifecycle requirements.
•
Managing data includes managing the risks associated with data: In addition to being an asset, data
also represents risk to an organization. Data can be lost, stolen, or misused. Organizations must consider
the ethical implications of their uses of data. Data-related risks must be managed as part of the data
lifecycle.
•
Data management requirements must drive Information Technology decisions: Data and data
management are deeply intertwined with information technology and information technology
management. Managing data requires an approach that ensures technology serves, rather than drives, an
organization’s strategic data needs.
•
Effective data management requires leadership commitment: Data management involves a complex
set of processes that, to be effective, require coordination, collaboration, and commitment. Getting there
requires not only management skills, but also the vision and purpose that come from committed
leadership.
2.5 Data Management Challenges
Because data management has distinct characteristics derived from the properties of data itself, it also presents
challenges in following these principles. Details of these challenges are discussed in Sections 2.5.1 through 2.5.13.
Many of these challenges refer to more than one principle.
2.5.1 Data Differs from Other Assets 6
Physical assets can be pointed to, touched, and moved around. They can be in only one place at a time. Financial
assets must be accounted for on a balance sheet. However, data is different. Data is not tangible. Yet it is durable; it
does not wear out, though the value of data often changes as it ages. Data is easy to copy and transport. But it is not
easy to reproduce if it is lost or destroyed. Because it is not consumed when used, it can even be stolen without
being gone. Data is dynamic and can be used for multiple purposes. The same data can even be used by multiple
people at the same time – something that is impossible with physical or financial assets. Many uses of data beget
more data. Most organizations must manage increasing volumes of data and the relation between data sets.
6 This section derives from Redman, Thomas. Data Quality for the Information Age (1996) pp. 41-42, 232-36; and Data Driven
(2008), Chapter One, “The Wondrous and Perilous Properties of Data and Information.”
24 • DMBOK2
These differences make it challenging to put a monetary value on data. Without this monetary value, it is difficult
to measure how data contributes to organizational success. These differences also raise other issues that affect data
management, such as defining data ownership, inventorying how much data an organization has, protecting against
the misuse of data, managing risk associated with data redundancy, and defining and enforcing standards for Data
Quality.
Despite the challenges with measuring the value of data, most people recognize that data, indeed, has value. An
organization’s data is unique to itself. Were organizationally unique data (such as customer lists, product
inventories, or claim history) to be lost or destroyed, replacing it would be impossible or extremely costly. Data is
also the means by which an organization knows itself – it is a meta-asset that describes other assets. As such, it
provides the foundation for organizational insight.
Within and between organizations, data and information are essential to conducting business. Most operational
business transactions involve the exchange of information. Most information is exchanged electronically, creating a
data trail. This data trail can serve purposes in addition to marking the exchanges that have taken place. It can
provide information about how an organization functions.
Because of the important role that data plays in any organization, it needs to be managed with care.
2.5.2 Data Valuation
Value is the difference between the cost of a thing and the benefit derived from that thing. For some assets, like
stock, calculating value is easy. It is the difference between what the stock cost when it was purchased and what it
was sold for. But for data, these calculations are more complicated, because neither the costs nor the benefits of
data are standardized.
Since each organization’s data is unique to itself, an approach to data valuation needs to begin by articulating
general cost and benefit categories that can be applied consistently within an organization. Sample categories
include 7:
•
Cost of obtaining and storing data
•
Cost of replacing data if it were lost
•
Impact to the organization if data were missing
•
Cost of risk mitigation and potential cost of risks associated with data
•
Cost of improving data
•
Benefits of higher quality data
•
What competitors would pay for data
•
What the data could be sold for
•
Expected revenue from innovative uses of data
7 While the DMBOK2 was preparing to go to press, another means of valuing data was in the news: Wannacry ransomware
attack (17 May 2017) impacted more than 100K organizations in 150 countries. The culprits used the software to hold data
hostage until victims paid ransom to get their data released. http://bit.ly/2tNoyQ7.
DATA MANAGEMENT • 25
A primary challenge to data asset valuation is that the value of data is contextual (what is of value to one
organization may not be of value to another) and often temporal (what was valuable yesterday may not be valuable
today). That said, within an organization, certain types of data are likely to be consistently valuable over time. Take
reliable customer information, for example. Customer information may even grow more valuable over time, as
more data accumulates related to customer activity.
In relation to data management, establishing ways to associate financial value with data is critical, since
organizations need to understand assets in financial terms in order to make consistent decisions. Putting value on
data becomes the basis of putting value on data management activities. 8 The process of data valuation can also be
used a means of change management. Asking data management professionals and the stakeholders they support to
understand the financial meaning of their work can help an organization transform its understanding of its own data
and, through that, its approach to data management.
2.5.3 Data Quality
Ensuring that data is of high quality is central to data management. Organizations manage their data because they
want to use it. If they cannot rely on it to meet business needs, then the effort to collect, store, secure, and enable
access to it is wasted. To ensure data meets business needs, they must work with data consumers to define these
needs, including characteristics that make data of high quality.
Largely because data has been associated so closely with information technology, managing Data Quality has
historically been treated as an afterthought. IT teams are often dismissive of the data that the systems they create
are supposed to store. It was probably a programmer who first observed ‘garbage in, garbage out’ – and who no
doubt wanted to let it go at that. But the people who want to use the data cannot afford to be dismissive of quality.
They generally assume data is reliable and trustworthy, until they have a reason to doubt these things. Once they
lose trust, it is difficult to regain it.
Most uses of data involve learning from it in order to apply that learning and create value. Examples include
understanding customer habits in order to improve a product or service and assessing organizational performance or
market trends in order to develop a better business strategy, etc. Poor quality data will have a negative impact on
these decisions.
As importantly, poor quality data is simply costly to any organization. Estimates differ, but experts think
organizations spend between 10-30% of revenue handling data quality issues. IBM estimated the cost of poor
quality data in the US in 2016 was $3.1 Trillion. 9 Many of the costs of poor quality data are hidden, indirect, and
therefore hard to measure. Others, like fines, are direct and easy to calculate. Costs come from:
•
Scrap and rework
•
Work-arounds and hidden correction processes
8 For case studies and examples, see Aiken and Billings, Monetizing Data Management (2014).
9 Reported in Redman, Thomas. “Bad Data Costs U.S. $3 Trillion per Year.” Harvard Business Review. 22 September 2016.
https://hbr.org/2016/09/bad-data-costs-the-u-s-3-trillion-per-year.
26 • DMBOK2
•
Organizational inefficiencies or low productivity
•
Organizational conflict
•
Low job satisfaction
•
Customer dissatisfaction
•
Opportunity costs, including inability to innovate
•
Compliance costs or fines
•
Reputational costs
The corresponding benefits of high quality data include:
•
Improved customer experience
•
Higher productivity
•
Reduced risk
•
Ability to act on opportunities
•
Increased revenue
•
Competitive advantage gained from insights on customers, products, processes, and opportunities
As these costs and benefits imply, managing Data Quality is not a one-time job. Producing high quality data
requires planning, commitment, and a mindset that builds quality into processes and systems. All data management
functions can influence Data Quality, for good or bad, so all of them must account for it as they execute their work.
(See Chapter 13).
2.5.4 Planning for Better Data
As stated in the chapter introduction, deriving value from data does not happen by accident. It requires planning in
many forms. It starts with the recognition that organizations can control how they obtain and create data. If they
view data as a product that they create, they will make better decisions about it throughout its lifecycle. These
decisions require systems thinking because they involve:
•
The ways data connects business processes that might otherwise be seen as separate
•
The relationship between business processes and the technology that supports them
•
The design and architecture of systems and the data they produce and store
•
The ways data might be used to advance organizational strategy
Planning for better data requires a strategic approach to architecture, modeling, and other design functions. It also
depends on strategic collaboration between business and IT leadership. And, of course, it depends on the ability to
execute effectively on individual projects.
The challenge is that there are usually organizational pressures, as well as the perennial pressures of time and
money, that get in the way of better planning. Organizations must balance long- and short-term goals as they
execute their strategies. Having clarity about the trade-offs leads to better decisions.
DATA MANAGEMENT • 27
2.5.5 Metadata and Data Management
Organizations require reliable Metadata to manage data as an asset. Metadata in this sense should be understood
comprehensively. It includes not only the business, technical, and operational Metadata described in Chapter 12,
but also the Metadata embedded in Data Architecture, data models, data security requirements, data integration
standards, and data operational processes. (See Chapters 4 – 11.)
Metadata describes what data an organization has, what it represents, how it is classified, where it came from, how
it moves within the organization, how it evolves through use, who can and cannot use it, and whether it is of high
quality. Data is abstract. Definitions and other descriptions of context enable it to be understood. They make data,
the data lifecycle, and the complex systems that contain data comprehensible.
The challenge is that Metadata is a form of data and needs to be managed as such. Organizations that do not
manage their data well generally do not manage their Metadata at all. Metadata management often provides a
starting point for improvements in data management overall.
2.5.6 Data Management is Cross-functional
Data management is a complex process. Data is managed in different places within an organization by teams that
have responsibility for different phases of the data lifecycle. Data management requires design skills to plan for
systems, highly technical skills to administer hardware and build software, data analysis skills to understand issues
and problems, analytic skills to interpret data, language skills to bring consensus to definitions and models, as well
as strategic thinking to see opportunities to serve customers and meet goals.
The challenge is getting people with this range of skills and perspectives to recognize how the pieces fit together so
that they collaborate well as they work toward common goals.
2.5.7 Establishing an Enterprise Perspective
Managing data requires understanding the scope and range of data within an organization. Data is one of the
‘horizontals’ of an organization. It moves across verticals, such as sales, marketing, and operations… Or at least it
should. Data is not only unique to an organization; sometimes it is unique to a department or other sub-part of an
organization. Because data is often viewed simply as a by-product of operational processes (for example, sales
transaction records are the by-product of the selling process), it is not always planned for beyond the immediate
need.
Even within an organization, data can be disparate. Data originates in multiple places within an organization.
Different departments may have different ways of representing the same concept (e.g., customer, product, vendor).
As anyone involved in a data integration or Master Data Management project can testify, subtle (or blatant)
differences in representational choices present challenges in managing data across an organization. At the same
time, stakeholders assume that an organization’s data should be coherent, and a goal of managing data is to make it
fit together in common sense ways so that it is usable by a wide range of data consumers.
28 • DMBOK2
One reason data governance has become increasingly important is to help organizations make decisions about data
across verticals. (See Chapter 3.)
2.5.8 Accounting for Other Perspectives
Today’s organizations use data that they create internally, as well as data that they acquire from external sources.
They have to account for different legal and compliance requirements across national and industry lines. People
who create data often forget that someone else will use that data later. Knowledge of the potential uses of data
enables better planning for the data lifecycle and, with that, for better quality data. Data can also be misused.
Accounting for this risk reduces the likelihood of misuse.
2.5.9 The Data Lifecycle
Like other assets, data has a lifecycle. To effectively manage data assets, organizations need to understand and plan
for the data lifecycle. Well-managed data is managed strategically, with a vision of how the organization will use
its data. A strategic organization will define not only its data content requirements, but also its data management
requirements. These include policies and expectations for use, quality, controls, and security; an enterprise
approach to architecture and design; and a sustainable approach to both infrastructure and software development.
The data lifecycle is based on the product lifecycle. It should not be confused with the systems development
lifecycle. Conceptually, the data lifecycle is easy to describe (see Figure 2). It includes processes that create or
obtain data, those that move, transform, and store it and enable it to be maintained and shared, and those that use or
apply it, as well as those that dispose of it. 10 Throughout its lifecycle, data may be cleansed, transformed, merged,
enhanced, or aggregated. As data is used or enhanced, new data is often created, so the lifecycle has internal
iterations that are not shown on the diagram. Data is rarely static. Managing data involves a set of interconnected
processes aligned with the data lifecycle.
The specifics of the data lifecycle within a given organization can be quite complicated, because data not only has a
lifecycle, it also has lineage (i.e., a pathway along which it moves from its point of origin to its point of usage,
sometimes called the data chain). Understanding the data lineage requires documenting the origin of data sets, as
well as their movement and transformation through systems where they are accessed and used. Lifecycle and
lineage intersect and can be understood in relation to each other. The better an organization understands the
lifecycle and lineage of its data, the better able it will be to manage its data.
The focus of data management on the data lifecycle has several important implications:
•
Creation and usage are the most critical points in the data lifecycle: Data management must be
executed with an understanding of how data is produced, or obtained, as well as how data is used. It costs
money to produce data. Data is valuable only when it is consumed or applied. (See Chapters 5, 6, 8, 11,
and 14.)
10 See McGilvray (2008) and English (1999) for information on the product lifecycle and data.
DATA MANAGEMENT • 29
Plan
Design &
Enable
Create /
Obtain
Enhance
Use
Store /
Maintain
Dispose of
Figure 2 Data Lifecycle Key Activities
•
Data Quality must be managed throughout the data lifecycle: Data Quality Management is central to
data management. Low quality data represents cost and risk, rather than value. Organizations often find it
challenging to manage the quality of data because, as described previously, data is often created as a byproduct or operational processes and organizations often do not set explicit standards for quality. Because
the quality of quality can be impacted by a range of lifecycle events, quality must be planned for as part of
the data lifecycle (see Chapter 13).
•
Metadata Quality must be managed through the data lifecycle: Because Metadata is a form of data,
and because organizations rely on it to manage other data, Metadata quality must be managed in the same
way as the quality of other data (see Chapter 12).
•
Data Security must be managed throughout the data lifecycle: Data management also includes
ensuring that data is secure and that risks associated with data are mitigated. Data that requires protection
must be protected throughout its lifecycle, from creation to disposal (see Chapter 7 Data Security).
•
Data Management efforts should focus on the most critical data: Organizations produce a lot of data, a
large portion of which is never actually used. Trying to manage every piece of data is not possible.
Lifecycle management requires focusing on an organization’s most critical data and minimizing data ROT
(Data that is Redundant, Obsolete, Trivial) (Aiken, 2014).
2.5.10 Different Types of Data
Managing data is made more complicated by the fact that there are different types of data that have different
lifecycle management requirements. Any management system needs to classify the objects that are managed. Data
30 • DMBOK2
can be classified by type of data (e.g., transactional data, Reference Data, Master Data, Metadata; alternatively
category data, resource data, event data, detailed transaction data) or by content (e.g., data domains, subject areas)
or by format or by the level of protection the data requires. Data can also be classified by how and where it is stored
or accessed. (See Chapters 5 and 10.)
Because different types of data have different requirements, are associated with different risks, and play different
roles within an organization, many of the tools of data management are focused on aspects of classification and
control (Bryce, 2005). For example, Master Data has different uses and consequently different management
requirements than does transactional data. (See Chapters 9, 10, 12, and 14.)
2.5.11 Data and Risk
Data not only represents value, it also represents risk. Low quality data (inaccurate, incomplete, or out-of-date)
obviously represents risk because its information is not right. But data is also risky because it can be misunderstood
and misused.
Organizations get the most value from the highest quality data – available, relevant, complete, accurate, consistent,
timely, usable, meaningful, and understood. Yet, for many important decisions, we have information gaps – the
difference between what we know and what we need to know to make an effective decision. Information gaps
represent enterprise liabilities with potentially profound impacts on operational effectiveness and profitability.
Organizations that recognize the value of high quality data can take concrete, proactive steps to improve the quality
and usability of data and information within regulatory and ethical cultural frameworks.
The increased role of information as an organizational asset across all sectors has led to an increased focus by
regulators and legislators on the potential uses and abuses of information. From Sarbanes-Oxley (focusing on
controls over accuracy and validity of financial transaction data from transaction to balance sheet) to Solvency II
(focusing on data lineage and quality of data underpinning risk models and capital adequacy in the insurance
sector), to the rapid growth in the last decade of data privacy regulations (covering the processing of data about
people across a wide range of industries and jurisdictions), it is clear that, while we are still waiting for Accounting
to put Information on the balance sheet as an asset, the regulatory environment increasingly expects to see it on the
risk register, with appropriate mitigations and controls being applied.
Likewise, as consumers become more aware of how their data is used, they expect not only smoother and more
efficient operation of processes, but also protection of their information and respect for their privacy. This means
the scope of who our strategic stakeholders are as data management professionals can often be broader than might
have traditionally been the case. (See Chapters 2 Data Handling Ethics and 7 Data Security.)
Increasingly, the balance sheet impact of information management, unfortunately, all too often arises when these
risks are not managed and shareholders vote with their share portfolios, regulators impose fines or restrictions on
operations, and customers vote with their wallets.
DATA MANAGEMENT • 31
2.5.12 Data Management and Technology
As noted in the chapter introduction and elsewhere, data management activities are wide-ranging and require both
technical and business skills. Because almost all of today’s data is stored electronically, data management tactics
are strongly influenced by technology. From its inception, the concept of data management has been deeply
intertwined with management of technology. That legacy continues. In many organizations, there is ongoing
tension between the drive to build new technology and the desire to have more reliable data – as if the two…