Detecting Insider Threat Behaviors Using Social Media Platforms

Author: Michael Williams, DSc, and Babur Kohy, DSc
Date Published: 12 October 2022

Insider threats have long been a problem in organizations, and they are constantly finding their way into what should be the most trusted organizations. An insider threat is the potential for an individual who has or had authorized access to an organization’s assets to use their access, either maliciously or unintentionally, to act in a way that could negatively affect the organization.1 These threats originate from malicious users, careless users and compromised users,2 which is concerning to public industries, governments and private citizens. Historically, 47% of breaches are due to insider threats.3

There is no set way for detecting and mitigating insider threats, and organizations need more effective measures. One approach to detecting potential insider threats is using machine learning (ML) to retrieve data from social media. However, gathering and analyzing such data can be considered an invasion of privacy.

The protection of privacy information is essential, and it is important to understand the ethical concerns and cost ramifications caused by leaks outside of the people who need to know the information analyzed. Incidents that threaten security or privacy cost consumers and taxpayers millions of US dollars each year in training and staffing. There is also the cost of information being leaked or hardening infrastructure after the harm from a breach has occurred.

It is important for IT professionals to understand the availability of social media data and methods that can be used to identify possible insider threats using ML, and the privacy and policy implications pertaining to retrieving information from a multitude of social media platforms. The possibilities available based on the use of ML have drawn the attention of policy makers and government leaders.

Using ML to Identify Threats

Research was conducted to investigate possible insider threats using ML techniques, investigate polices that could be put in place to protect employee’s data and broaden the understanding of this evolving capability in terms of privacy. The researchers analyzed 500,000 social media posts to determine if they could be associated with users who could pose insider threats using Facepager and ML. FacePager is a free, open-source software tool that can be used to extract data from various social media platforms including YouTube, Facebook, Amazon and Twitter (figure 1).4 It can retrieve millions of posts from a social media page in minutes, including information such as the creation date, the last updated date, emoticons, identification numbers, replies and messages.

Figure 1—FacePager Application Tool

Source: Facepager, https://facepager.software.informer.com/3.6/

A trust score algorithm from R programming was created to analyze the data. Figure 2 shows the posts that were exported to a structure comma-separated value file (CSV). Once in this format, the data can be analyzed by using R or Python Programming. In R Programming, beginners can quickly solve complex problems with preset software packages that were built by developers using R coding.5 Although Python has hundreds of libraries that perform complex operations that can be useful in data analytics, web development and other operations, it does not have thousands of preset packages as does R programming.6

Figure 2—Retrieving Data From Social Media

Source: Facepager, https://facepager.software.informer.com/3.6/

ML can be used to query a data set of thousands or even millions of posts and give each a score based on a baseline social media data set. There are many other social media data sets, but this analysis focuses on a data set retrieved from Fox News posts. In this example, although many posts were considered normal posts and posed no risk for an organization, a few hundred were concerning and 1 was categorized as extremely concerning. The larger the word impact score for the post, which is illustrated by an abundance of words about government sectors and officials or words that may be concerning included in posts, the higher the chance of an insider threat.

The security practitioner must find a balance between advancing technology to help civilization protect against insider threats and still enforce the individual’s privacy protections.

Security and Privacy Considerations

Privacy is a human need. Although there are benefits of using scraping tools such as FacePager, the use of these types of tools can also breach the privacy of social media users. The security practitioner must find a balance between advancing technology to help protect against insider threats and enforcing the individual’s privacy protections. Although some social media platforms such as Facebook have prevented the scraping of personally identifiable information (PII), pieces of information can be interconnected, filtered and searched using ML to reveal this information.

If data from the posts retrieved from social media and the associated trust scores are compromised, either by vulnerabilities through cloud storage, processing or upgrades, leaks of this information could damage reputations for years and potentially affect careers. Therefore, lawsuits are possible if organizations are found negligent when handling sensitive data—regardless of whether social media pages are open to the public. The mishandling and unauthorized public release of data could be unethical at best and warrant a lawsuit at worse. Many malicious actors’ sole purpose is to obtain reputation-damaging information to hold organizations and individuals hostage until ransoms are paid. For this reason, data should be protected with the highest regard, data auditing should be implemented to monitor the downloading of information, encryption should be used to abide by the need-to-know policy when sending data in the cloud and authentication should be enforced to ensure that only the intended personnel are accessing the data. Physical security of the systems information should be stored behind a secure door that is protected with either a badge reader or a camera directed at the entry point.

Although data collection using Facepager can benefit organizations, the analysis of the results can be a major challenge. Validation and attribution online are difficult; therefore, a dedicated team of investigators are needed to process the related data and safeguard the data collection process from start to end.

Investigating other social media platforms and comparing the results to this industry perspective could be a useful next step in the process.

The concern around insider threats goes further than current employees. Protections may also need to be enacted for future employment opportunities as well. Although it is within the confines of the law for an employer to inquire about an employee’s social media posts, there should be laws that prevent a past employer from sharing retrieved data with other organizations where the employee seeks employment. A policy at the federal level could prevent organizations from preserving social media data or the results of this data for longer than 3 years. This policy could prevent organizations from perpetually barring an individual from employment.

Conclusion

Despite multiple detection methods used to protect the most sensitive information, insiders who are assumed to be trusted employees and have unimpeded access to sensitive information continue to cost organizations. In 2021, the cost of insider threats cost US organizations US$15.4 million on average annually, up 34% from 2020.7, 8 Roughly two-thirds of US citizens have social media accounts.9 People use social media accounts to share ideas, explore new ideas and engage with those who have similar and dissimilar ideas. Therefore, there is an abundance of information on social media pages that can be retrieved and linked by available PII to obtain work locations, schools or organizations with which users are associated. These associations could possibly be stored in a cloud environment for the use of multiple organizations.

This loss of privacy negatively affects the consumer, but the loss of intellectual property could also negatively affect national security and put lives at jeopardy, such as with stolen military weapon technology, methods and sources. The security of systems holding sensitive data needs to include safeguards for protecting privacy. Organizations and governments must find a balance between protecting privacy while also being proactive in detecting insider threats.

Endnotes

1 Park, W.; Y. You; K. Lee; “Detecting Potential Insider Threat: Analyzing Insiders’ Sentiment Exposed in Social Media,” Security and Communication Networks, vol. 3, 2018
2 Santos, E. E.; et al.; "Modeling Insider Threat Types in Cyber Organizations," 2017 IEEE International Symposium on Technologies for Homeland Security, USA, 2017
3 Ibid.
4 Saeed, A. M; S. R. Hussein; C. M. Ali; T. A. Rashid; “Medical Dataset Classification for Kurdish Short Text Over Social Media,” Data in Brief, March 2022
5 Tippmann, S.; “Programming Tools: Adventures With R,” Nature, vol. 517, 29 December 2014
6 White, M.; “The Need for Speed: Julia vs. Python,” January 2022
7 Moses, L.; “How Fox News Became the Most-Engaged News Site on Facebook,” Digiday, 27 June 2016
8 Bloomberg, “Global Cybersecurity Study: Insider Threats Cost Organizations $15.4 Million Annually, Up 34 Percent From 2020,” 25 January 2022
9 Op cit Santos, et al.

Michael Williams, DSc

Is a technology leader with more than 20 years of experience. He has developed innovative technologies for public and private organizations to achieve actionable outcomes. Williams’s academic research intends to pioneer insider threat detection using publicly available information to predict an individual’s trustworthiness using trust scores. Prior to joining the private industry, Williams served within the United States military for 20 years. He volunteers with local academic and nonprofit organizations regularly.

Babur Kohy, DSc

Is a results-oriented cybersecurity leader with hands on experience within multiple cybersecurity domains. Kohy lectures extensively on deep and dark web techniques for the identification and exploitation of dark net gateways to enhance personal security and anonymity. His current research focus is on cyberresiliency, nonattributable communication networks, the metaverse and actionable defense.