Auditing Big Data in Enterprises

Author: Abdullah Al-Mansour, Security+
Date Published: 1 November 2017
español | 中文

There is no stagnation in information security. One major national incident often leads to more robust reporting requirements, paperwork and additional duties. In 2013, Edward Snowden’s actions became a catalyst of change for accountability, insider threat programs and the auditing of privileged users. Though the resulting good practices highlight what needs to be done to adapt to new threats posed by those with privileged access, the strategy to accomplish this mission can be outdated.

How can an information systems security officer (ISSO) or information systems security manager (ISSM) find suspicious behavior among the breadth and depth of information that comes pouring out of information systems? A system might have 10 users or 100 users, each putting in eight hours of activity per day, in addition to continuous background chatter mixed with various service groups and working group accounts. Most auditors are responsible for multiple systems and are likely updating plans and baselines, performing compliance checks, giving security education classes and briefings, attending mandatory meetings, approving or denying requests for accounts, and addressing myriad other activities. Depending on the size of a system and the auditor’s review logs, the system may produce a week’s worth of data in one day. The auditor is expected not only to perform the due care of ensuring the log exists and is uncorrupted, but also to review the logs for abnormalities and malicious behavior.

The amount of data reviewed has changed the scope of an information security professional from an auditor to a data mining and analytics expert. That change demands a new set of skills.

System Audit

A system audit is a countermeasure used to review and analyze the actions of users on a system. In an age where separation of duties is best practice, the system audit is typically performed by a designated security professional as opposed to a system administrator. An audit is a shallow review of system events that ends rather quickly, as opposed to a deep discourse that continues for weeks and months. Data mining may seem counterintuitive to an auditor. After all, one of the primary roles of an ISSO or ISSM is to ensure information integrity. Defending against the manipulation of data by authorized or unauthorized persons is a founding principle of information security. However, data mining and analytics are rooted in manipulating data.

To audit big data, the word “audit” must be left behind. It is an insufficient term to describe the security review of a system. Even as objective, goal and mission should not be used interchangeably but should instead point to a duration of time (short term, middle term and long term, respectively), so, too, is auditing a technique used for smaller amounts of data. A standalone system can be audited. A peer-to-peer system can be audited. A networked system that exists over a wide area network (WAN) or a local area network (LAN) that produces encyclopedic volumes of data weekly cannot be audited. These more complex networks must be data mined.

Preexisting Resources

Thankfully, popular software, such as MATLAB, has inadvertently addressed the needs of the security professional by being highly reliable tools in fields that utilize mathematics and scientific modeling to analyze data. These same tools that assist in other professional communities’ methodologies must be embraced and adopted for a post-Edward Snowden security environment.

At the base level, ISSOs must speak the language of computing, not just compliance. Whether C++, Visual Basics or Python, to effectively data mine event-related logs, the ISSO will need to become familiar with a programming language. The ISSO must understand, on a conceptual basis, dictionaries, lists, arrays (e.g., two-dimensional arrays, three-dimensional arrays), Boolean, defining functions, conditional statements and loops, to name a few. The ISSO is often the first line of defense and must become more engineering-minded as the burden of catching and detecting malicious activity increases. Security departments will need to invest in engineering education as it pertains to the science of data manipulation and analytics.

Patterns

How does one find the proverbial needle in the haystack? The answer is through patterns.

No one has time to sift through mountains of data. The use of patterns makes data mining scalable. A system has patterns, and the patterns form baselines.

An information system is not limited to a primary baseline. Service accounts, privilege accounts, general user accounts, first shift, second shift and testing times can all be grouped individually and cohesively for the purpose of finding commonalities or discrepancies. A typical security event log can be divided into successes and failures and then compared for recurring failures that later lead to a success. Patterns can be found through coding, conditional statements, loops and the like.

Analytics

Once patterns are gathered and centralized, analytics can be employed to measure the frequency of occurrence, the bit sizes, the quantity of files executed and average time of use. The math involved allows a data miner to grasp the big picture. Individuals are normally overwhelmed by the sheer volume of information, but automation of pattern-recognizing techniques makes big data welcome.

The larger the sample size, the easier it is to determine patterns of normal and abnormal behavior. Network haystacks are bombarded by algorithms that notify the information archeologist about the probes of an insider threat.

Education

As with all new developments, education is a founding necessity of data mining. The benefits of coding to gather information and analytics to dissect it are lost if a data miner does not know how to interpret the information. The ones and zeros must have substance. The averages that make up the bell curves of statistics determine the likelihood that an event has occurred, is occurring or will occur.

Such statistics are useless to the untrained reviewer. There are several reputable organizations that offer free classes for those who want to pursue careers as data analysts. Udacity1 is an online learning platform that offers several classes—beginner, intermediate and expert—and teaches data analysis using Python software with coding libraries Numpy and Pandas. EdX2 is another free website that has formed educational partnerships with Harvard University (Cambridge, Massachusetts, USA), Microsoft, Massachusetts Institute of Technology (MIT) (Cambridge, USA) and others. EdX offers an introduction to data analysis using Microsoft Excel.

Lessons Learned

There is no one magic formula to audit big data. Experiences have to be translated into code and built upon. For example, a script was deployed to check for the daily audit logs. Should the audit log not exist, the ISSO would be notified through automation. Each day the script would check for the expected date of the audit and, as expected, the audit and the date would be in the appropriate write-protected folder. However, the script was not checking to ensure that the previous days’ audit logs were in the same folder. Each day the original file was overwritten by the new file. The error in audit scripting may have been an isolated event, or the overwrite could have been systemic. Small lessons learned, such as this, help to develop a more refined automation process that measures information assurance, and that system of measurement can be spread across the enterprise to find similar outliers on other networks.

Data Structure

Another aid in the war on outliers is data representation. There is nothing worse than being an ISSO for a system that has only raw data. From the old Windows event viewer to a Solaris audit log, raw files are heinous to survey. The least a system administrator could do is delimit the lines and include some column headers. A descending numerical index could also help.

The flow of the data should be organized and, given today’s functionality, employ the option of graphical representation: Linear representation, bar graphs, pie charts, analytics and color can help make the data much easier to interpret. Thousands of lines of data and countless hours of scrolling can, in fact, become a five-minute study of a line graph accurately displaying patterns of activity for the day, week or even month.

Communication

A key ingredient in any process, including data mining, is communication. If an ISSM is unaware of the anomalies or trends occurring across the enterprise, the definitions and pattern identification that can mitigate and prevent those trends may not manifest. The user computing habits of Company A, which is located on the west coast of a country, need to be juxtaposed with Company A’s satellite factory residing on the east coast of that country. An obstruction to good communication is self-preservation. There exists a natural reluctance to share information because it could paint a negative portrayal of a person or work location, and this reluctance hinders the overreaching data mining process. If one site participates in honest data collection and another site does not, eventually both sites will not. Without communication, data mining on an enterprise level will always be hindered.

Communication provides a centralized location where analytics can be gathered and assessed to find trends and patterns. If the data are padded, the ability to develop countermeasures is slowed and their effectiveness is reduced. As with most successful enterprise-level endeavors, effective communication starts at the top levels. If there is a policy in place to foster cohesion, functional managers can execute that policy and craft processes that support and sustain it. Weak communication is indicative of a poorly constructed policy, which translates into a misunderstood vision and inevitably leads to restrictive communication.

Networks are producing more information than ever before. Auditors must be equipped with the tools needed to meet the challenges of ensuring confidentiality, integrity, access control and availability. To achieve this mission, an auditor’s mind-set must evolve from a smaller data management skill set. Without the tools that come from data mining and analytics, the auditor will be overwhelmed on a daily and weekly basis. As a result, the quality of review will degrade from assured due diligence to due care or perhaps due diligence for only the first couple hundred lines of captured data.

Learning how to write scripts that loop through audit logs in search of specific patterns is crucial. Graphical representation of the data opens the door to analytics and allows the auditor to see the big picture and identify trends. Communication will facilitate the distribution of data on user behavior and increase the pool of information for better statistical analysis. These are keys for effective enterprise auditing.

Quantitative

Success at an enterprise level requires an ISSO to write scripts that provide a greater analysis of events. This data might be the number of people who log in every week or the daily size of an audit log. The aggregation of this data can be used to determine the average rate of occurrence, which, in turn, establishes a baseline of normality for a system. Recording the frequency of occurrence can also be used to anticipate events such as malfunctioning scripts or influxes of user activity. Quantitative analysis adds depth to an audit and introduces models by which events can be predicted based upon numerical trends.

Conclusion

Without analytics, enterprise-level auditing is a diminished discipline, limited in scope and effectiveness. Without an educated auditing workforce, armed with a programing language for automation and a data-mining philosophy and skill set, the needs of leaders at the enterprise level will go unmet. Leaders will not have the data needed to analyze on a large scale nor a workforce that is capable of getting them the data on a weekly or daily basis.

The beauty of analytics, from a security perspective, is that it allows the security department to align with the critical functions of corporate business. It can be used to discover recurring incidents and common trends that might otherwise have been missed. Establishing numerical baselines or quantified data can supplement a normal auditor’s tasks and enhance the auditor’s ability to see beneath the surface of what is presented in an audit. Good communication of analyzed data gives decision makers a better view of their systems through a holistic approach, which can aid in the creation of enterprise-level goals. Data mining adds dimension and depth to the auditing process at the enterprise level.

Endnotes

1 Udacity, https://www.udacity.com/
2 edX, https://www.edx.org/

Abdullah Al-Mansour, Security+
Is an information systems security professional. His interests include analytics, data mining and technology.