Please enjoy reading this archived article; it may not include all images.

Data Integrity—Information Security’s Poor Relation

Date Published: 1 November 2011

Information security has become a visible issue in business, on the move and at home. Its practice places emphasis on preventing attacks that target availability (e.g., denial of service) and those that result in infections by malicious software (malware) that allow a third party to do unauthorized things with data and information (e.g., theft, disclosure, modification, destruction of data).

The Stuxnet worm reported in 2010 altered the operation of an industrial process and was designed to damage physical equipment and modify the operator’s monitoring indications to show that the equipment was working normally.1 This was an attack on data integrity (also referred to as a “semantic attack”) that, if and when replicated on other targets, could cause major problems in critical information infrastructures such as utilities, emergency services, air traffic control and others with a large IT component on which society relies. Data governance is an essential component for strengthening data integrity.

A recent article in the ISACA Journal presents a data governance framework developed by Microsoft for privacy, confidentiality and compliance. It discusses the roles of people, process and technology; the data life cycle; and the principles of data privacy and confidentiality. It also provides links to more detailed papers on the subject of trustworthy computing.2

Here, these topics will be expanded upon, focusing on data integrity, the standards and best practices that support it, and the role of data governance. This article also introduces a nonproprietary data governance framework.

Of the three main domains of information security, availability is closely associated with technology and lends itself to being measured. Downtime is visible and can be expressed as an absolute value (e.g., in minutes per incident) or as a percentage, and it is simple enough to understand that “five nines” (99.999 percent) availability means a total cumulative downtime of around five minutes in a year. Data center operators know what it takes to achieve this.

Confidentiality is easy enough to explain, but makes sense only if data and documents have been classified into categories that reflect the business need to protect them, such as “public,” “restricted to,” “embargoed until” and “secret.”

The technical people who provide IT infrastructure and services should not be expected to perform this classification, as they may not have enough business knowledge to do so and, through outsourcing and/or cloud computing, they may even be external to the business. Therefore, business functions must take ownership of the data and their classification process, while IT service and technology providers support this with tools and processes such as identity access management (IAM) controls and encryption.

The simplest metric for confidentiality is binary: An item that should not be disclosed either has not been (confidentiality is preserved) or has been (confidentiality is lost). Unfortunately, this is not a very useful metric, as it does not reveal the impact of such a disclosure, which can range from mild embarrassment to a breach of national security.

When it comes to integrity, the situation is more complex because the word means different things to different people. This creates fertile ground for miscommunication and misunderstandings, with the risk that the activity will not be done well enough because of unclear accountabilities.

What is Meant by “Integrity”?

The importance of data integrity can be illustrated simply: A person needs hospital treatment that includes taking a daily medication dosage of 10 milligrams (mg). By accidental or deliberate intervention, the electronic record of the treatment is changed to a dosage of 100 mg—with fatal consequences. In another example, what if, as in a work of fiction that predates the Stuxnet attack of 2010, the control systems of a nuclear power station are interfered with to show normal conditions while, in fact, a chain reaction has been triggered?3 Are professionals aware of the many definitions of “data integrity”? According to:

  • A security officer—“Data integrity” may mean that data cannot be modified undetectably. From the perspective of data and network security, data integrity is the assurance that information can be accessed or modified only by those authorized to do so. An examination of this concept could show that “integrity” also includes personal integrity (i.e., trust, trustworthiness, reliability) in addition to systems integrity (i.e., antivirus protection, structured system development life cycles [SDLCs], peer review of source code, extensive testing).
  • A database administrator—“Data integrity” may mean that data entered into a database are accurate, valid and consistent. Database administrators would more than likely also discuss entity integrity, domain integrity and referential integrity—concepts that may be unfamiliar to an infrastructure expert well versed in ISO 27000 or the US National Institute of Standards and Technology (NIST) Special Publication (SP) 800 series.
  • A data architect or modeler—“Data integrity” may mean that primary entities should be unique and not null. Uniqueness of the entities within a data set means that there are no duplicates within the data set and that there is a key that can be used to uniquely access each entity within the data set.
  • A data owner (i.e., the subject matter expert)—“Data integrity” may be a measure of quality, as it ensures that there are appropriate business rules that define the relationships between entities and that these provide validation mechanisms such as testing for orphaned records.
  • A vendor—“Data integrity” is:
    Accuracy and consistency of stored data, indicated by an absence of any alteration in data between two updates of a data record. Data integrity is imposed within a database at its design stage through the use of standard rules and procedures and is maintained through the use of error checking and validation routines.4
  • An online dictionary—“Data integrity” is the:
    Quality of correctness, completeness, wholeness, soundness and compliance with the intention of the creators of the data. It is achieved by preventing accidental or deliberate, but unauthorized, insertion, modification or destruction of data in a database. Data integrity is one of the six fundamental components of information security.5

There is no doubt that there are more definitions to be found. But they have overlaps, address different issues and create semantic confusion, which is a likely reason for databases to be the least protected objects in the IT infrastructure.

This is not the end of the problem statement. The decentralization of information systems and the availability of powerful programming environments for end users, particularly spreadsheets, have created potentially uncontrolled integrity vulnerabilities because such spreadsheets are used to support executive decisions, possibly without due consideration of data quality and data integrity. How should this be counted? It could be considered as:

  • An information security issue, given that data integrity cannot be guaranteed
  • A software quality issue, given that most spreadsheets are not subject to life-cycle management
  • A business intelligence issue, given that it leads to garbage in, garbage out (GIGO)

Perhaps it could be considered as all three, in which case it must be determined who (i.e., the data owner, the end user who designed the spreadsheet, the IT department or service provider, or all of them working together) should address these.

Triggers of Data Integrity Loss

The previous section used, as an example, untested and undocumented user-designed spreadsheets (aggravated by manual input, particularly when not assisted by validation of the values entered), but there are other, potentially more serious, triggers such as:

  • Changes to access permissions and privileges
  • Inability to track the use of privileged passwords, particularly when passwords are shared
  • End-user error that impacts production data
  • Vulnerable code-in applications (e.g., backdoors)
  • Weak or immature change control and accreditation processes
  • Misconfiguration of security devices and software
  • Incorrectly or incompletely applied patches
  • Unauthorized devices connected to the corporate network
  • Unauthorized applications on devices connected to the corporate network
  • Inadequate or not applied segregation of duties (SoD)

To complicate matters, the IT audit function may not have the critical mass to undertake audits covering all of these areas.

Attacks on Data Integrity

Attacks on data integrity involve intentional, unauthorized modifications of data at some point in their life cycle. For the purpose of this article, the data life cycle consists of:

  • Entering, creating and/or acquiring data
  • Processing and/or deriving data
  • Storing, replicating and distributing data
  • Archiving and recalling data
  • Backing up and restoring data
  • Deleting, removing and destroying data

Fraud is the oldest form of attack on data integrity, and it exists in many variants. The variants will not be discussed in this article, other than to mention an example that, in 2008, made page one in the world news: The “abuse of trust, forgery and unauthorized use of the bank’s computer systems” by a trader at Societe Generale (France) resulted in losses estimated at €4.9 billion.6 Judging from the number of publications and international conferences that deal with fraud, this issue is likely to remain high on the agenda for some time.

Web site defacements have affected many organizations in the private and public sectors for many years, but apart from some reputational damage, none could be considered as having been “catastrophic.”

Logic bombs, unauthorized software introduced into a system by one or more of its programmers/maintainers, or Trojan horses or other means can also impact data integrity through modifying data (as when a formula in a spreadsheet is incorrect) or encrypting data and then demanding a ransom to provide the decryption key. There have been several such attacks in recent years, mainly affecting hard drives in personal computers. It should be expected that attacks of this type will be launched against servers sooner or later.

Unauthorized modifications of operating systems (OSs) (server and network) and/or applications software (such as undocumented backdoors), database tables, production data and infrastructure configuration are also considered to be attacks on data integrity. It can be assumed that the findings of IT audits regularly include weaknesses in key processes, particularly the management of privileged access, change management, SoD and the monitoring of logs. These weaknesses make such modifications possible and hard to detect (until an incident occurs).

Another form of attack on data integrity is interference with Systems Control and Data Acquisition (SCADA) systems, such as those used by critical infrastructures (e.g., electricity, water supply) and in industrial processes. Frequently, these are not installed, operated or managed by the IT function. The attack on the Iranian uranium enrichment facilities in 2010 was designed to modify the behavior of the centrifuges while displaying normal conditions in the control panels.7

It should be noted that many of these control systems are not connected to the Internet and, in the case of the injection of Stuxnet software, required a manual intervention,8 which confirms that “people” remain the weakest link in information security/assurance.

Aligning with Standards and Best Practices for Risk Management and Compliance

For enterprises that have not already done so, a good place to start planning defenses is the adoption of best practices such as COBIT Deliver and Support (DS) 11.6 Security requirements for data management, used in conjunction with its related section in the IT Assurance Guide: Using COBIT.9

These publications summarize the control objective and its value and risk drivers and offer a list of recommended tests of the control design.

ISACA has also published a series of documents mapping standards related to information security with COBIT 4.1, and these are extremely valuable documents for practitioners and auditors. Additionally, an excellent article that maps the Payment Card Industry Data Security Standard (PCI DSS) v2.0 with COBIT 4.1 was recently published in COBIT Focus.10

An additional resource is available from Data Management Association International (DAMA): The DAMA Guide to the Data Management Body of Knowledge (DMBOK), specifically chapters three (Data Governance), seven (Data Security Management) and 12 (Data Quality Management).11

From a compliance perspective, there is a growing body of legislation that places accountability for data integrity and information assurance (IA) on organizations. In the US, this includes the Data Quality Act, Sarbanes-Oxley Act, Gramm-Leach-Bliley Act, Health Insurance Portability and Accountability Act, and Fair Credit Reporting Act—all of which impose severe penalties for noncompliance. There is also the Federal Information Security Management Act, which can impose budgetary penalties for noncompliance. (A discussion of legislation outside of the US is beyond the scope of this article; however, two major pieces of comparable legislation are the European Union [EU] “Directive on Data Protection” and the EU “8th Company Law Directive” on statutory audit12, 13).

Improving Data Integrity

The adoption of best practices needs to be complemented by formalizing accountabilities for the business and IT processes that support and enhance data security.

Business Responsibilities
A program of data integrity assurance needs to address Detect, Deter (2D); Prevent, Prepare (2P); and Respond, Recover (2R).14 As data owners, the initiative must come from the business, and the role of the IT service provider—in-house or outsourced—should be one of implementation.

Good practices to adopt include:

  • Taking ownership of data and accountability for data integrity—There is no one else in the organization who can do this other than people in the appropriate business unit. This should be obvious when IT services and operations are outsourced, but when these are provided in-house, it is tempting and easy to believe that the data are owned by IT and that IT is responsible for maintaining confidentiality and integrity.

    Ownership requires a value assessment in the form of an estimation of the potential cost of lost data integrity, including direct financial losses (as is the case in fraud or major operational disruption), legal costs and reputational damage.
  • Access rights and privileges—The principles of need to know (NtK) and least privilege (LP) are good practice and, in theory, are not difficult to apply. Social networks and the concept that everyone is an information producer push for greater openness and sharing, and social networks are becoming a force that resists and challenges the implementation of NtK and LP.

    The processes for requesting, changing and removing access rights should be formalized, documented, and regularly reviewed and audited. Privilege creep—when individuals change responsibilities and carry forward historical privileges—constitutes a serious business risk that can undermine proper SoD.

    It is common for organizations not to have a complete and updated inventory of who has access to what and a complete list of user privileges. Several vendors offer products that support the collection of such privileges in an automated manner.

    When NtK and LP have been implemented and supported by a strong IAM process, privileged access remains a sensitive area to be addressed and controlled. Privileged access gives unrestricted access to production data and source code. When users can bypass change control procedures, there is a potential for serious damage.

    Business units that have database administrators and/or programmers responsible for applications should, at least, maintain an inventory of who has access to what and ensure that change logs are kept and reviewed. When privileged passwords are shared due to the nature of the technologies used, consideration should be given to applying tools that clearly identify the individual who accessed the facilities and that log the date, time and changes made.
  • SoD—This is a well-proven concept and is something that internal audit is more than likely to insist upon for all sensitive systems and transactions. The relentless pressure to reduce costs and run lean organizations is a powerful opposing force and, therefore, a domain of business risk.

IT and End-user Support Responsibilities
Whoever provides information systems and technology operational services (i.e., business unit, in-house IT department, outsourcing service provider) has to demonstrate that appropriate measures—such as those defined in COBIT DS11 Manage data—are carried out to an appropriate level of maturity, and that appropriate performance and risk metrics are collected, monitored and reported.

End-user support teams (either part of IT or independent) are usually responsible for creating accounts and credentials for access to systems and data. These accounts and credentials must be documented fully and implemented only if the relevant authorizations have been formally issued.

Internal Audit Responsibilities
The role of auditors is to provide independent and objective assessments of the extent to which business and IT responsibilities for data integrity have been addressed and applied.

Imbalances in Accountabilities for Information Security and Assurance

IA is the practice of managing risks related to the use, processing, storage and transmission of information or data and the systems and processes used for those purposes. IA has grown from the practice of information security, which, in turn, grew out of practices and procedures of computer security.

Service providers (e.g., IT organizations, outsourcers) are clearly responsible for technologies and their operation and put measures in place to provide confidentiality, integrity and availability (CIA) in the operational environment. With regards to protecting data, they provide services such as backups and disaster recovery arrangements with clearly defined time and point recovery objectives (i.e., the amount of data lost in an incident) documented in service level agreements (SLAs). However, service providers do not have responsibility for data governance and its many related activities.

SLAs place clearly defined responsibilities on IT service providers, but not on data and system owners. This results in a lack of clarity related to accountabilities and, therefore, an inability to ensure that data have been properly classified and that the roles and responsibilities of data users and, in particular, privileged users are managed in a way that reflects their critical roles. As a result, data integrity remains the poor relation of information security and IA.

Data Integrity Metrics

There is little material published on key metrics, performance and key risk indicators for data integrity in an information security context. The following may be helpful starting points:

  • An inventory of privileged access rights, indicating who has access to what, who has permission to do what, and the date when a document was last reviewed and updated
  • An inventory of data that are subject to extraction, transformation and loading to another system
  • The number of users who have carried forward historical access rights and privileges
  • The number of orphaned or dormant accounts
  • The number of application systems that contain hard-encoded access rights or backdoors
  • The number of instances that required access to production data to modify or correct them
  • The number or percentage of identified unauthorized accesses and/or changes to production data
  • The number of security issues that are associated with data (in one year/one month)
  • The number of systems that are not covered by the main corporate IAM solution
  • An index of incorrect or inconsistent data
  • The percentage of the enterprise (or critical application) data model that is covered by measures to protect integrity
  • The number of measures that are included in databases and applications to detect data inconsistencies
  • The number of measures that are implemented to detect unauthorized accesses to production data
  • The number of measures that are implemented to detect unauthorized accesses to OSs
  • The number of measures that are implemented to detect changes not subject to change-control procedures
  • Annual financial losses due to fraud perpetrated through computer systems
  • The number of data integrity attacks on SCADA systems
  • The number of press reports that arose from data integrity problems

The Need for Data Governance

Data governance addresses specifically the information resources that are processed and disseminated. The key elements of data governance can be categorized into six major areas: data accessibility, data availability, data quality, data consistency, data security and data auditability. DAMA produced DMBOK,15 which presents a comprehensive framework for data management and governance, including tasks to be performed and inputs, outputs, processes and controls.

Conclusion

GIGO is as valid today as it was when it was first formulated some 60 years ago. The difference between then and now is that the volume of data in digital form has grown exponentially, and this growth has not been accompanied by the development and strengthening of data governance disciplines. The fact regarding CIA (the three pillars of information security) remains—that availability is the only component for which metrics are well defined and generally accepted.

Not applying data integrity metrics should be seen as an obstacle, because it implies that an enterprise cannot demonstrate that confidentiality or integrity are “better” or “worse” than before procedures and processes were introduced to manage them.

As long as data governance does not receive the same degree of attention as IT governance (and the latter often remains the weak link in corporate governance), organizations will be exposed to significant operational, financial, noncompliance and reputational risk.

Endnotes

1 Farwell, James P.; Rafal Rohozinski; “Stuxnet and the Future of Cyber War,” Survival, vol. 53, issue 1, 2011
2 Salido, Javier; “Data Governance for Privacy, Confidentiality and Compliance: A Holistic Approach,” ISACA Journal, vol. 6, 2010
3 Dobbs, Michael; The Edge of Madness, Simon & Shuster UK Ltd., UK, 2008
4 IBM, Top 3 Keys to Higher ROI From Data Mining, IBM SPSS white paper
5 YourDictionary.com, http://computer.yourdictionary.com/data-integrity
6 See Kerviel, Jerome; L’engranage, Memoires d’un Trader, Flammarion, France, 2010, and Societe Generale, www.societegenerale.com/en/search/node/kerviel.
7 Op cit, Farwell
8 Broad, William J.; John Markoff; David E. Sanger; “Israeli Test on Worm Called Crucial in Iran Nuclear Delay,” The New York Times, 15 January 2011, www.nytimes.com/2011/01/16/world/middleeast/16stuxnet.html?pagewanted=all
9 IT Governance Institute, IT Assurance Guide: Using COBIT, USA, 2007, p. 212
10 Bankar, Pritam; Sharad Verma; “Mapping PCI DSS v2.0 With COBIT 4.1,” COBIT Focus, vol. 2, 2011
11 Data Management Association International (DAMA), The DAMA Guide to the Data Management Body of Knowledge, Technics Publications LLC, USA, 2009
12 European Union (EU), Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995 on the Protection of Individuals With Regard to the Processing of Personal Data and on the Free Movement of Such Data
13 EU, Directive 2006/43/EC of the European Parliament and of the Council of 17 May 2006 on Statutory Audits of Annual Accounts and Consolidated Accounts, Amending Council Directives 78/660/EEC and 83/349/EEC and Repealing Council Directive 84/253/EEC
14 Adapted from the US Chiefs of Staff Joint Publication 3-28, “Civil Support,” USA, 14 September 2007
15 Op cit, DAMA

Ed Gelbstein, Ph.D.,
has worked in IT for more than 40 years and is the former director of the United Nations (UN) International Computing Centre, a service organization providing IT services around the globe to most of the organizations in the UN system. Since leaving the UN, Gelbstein has been an advisor on IT matters to the UN Board of Auditors and the French National Audit Office (Cour des Comptes), and is also a faculty member of Webster University, Geneva, Switzerland. He is a regular speaker at international conferences covering audit, risk, governance and information security, and is the author of several publications. Gelbstein lives in France and may be contacted at ed.gelbstein@gmail.com.