TROOPERS26: Integrating Incident Analysis and Digital Forensics Tooling for Automated Compromise Detection

Last week I gave a talk at #TROOPERS26: Integrating Incident Analysis and Digital Forensics Tooling for Automated Compromise Detection. I discussed the challenges of incident analysis, such as increasing storage capacities and the lack of integration between tools. I presented a modular framework that integrates established forensic and analysis tools using a decision-tree-based control mechanism. A workflow was designed to control the execution of 14 integrated analysis tools in order to reproduce the manual analysis process usually performed by analysts. Moreover, the framework is capable of identifying whether a system has been compromised and compiles a analyst-oriented report. Together with the audience we took a look at the report in a live demonstration. The evaluation results of the framework were promising as it was able to identify all compromised systems. However, a significant number of false positive classifications were also observed. To improve the framework possible future extensions include functionality such as recovering already deleted files to detect missed Indicators of Compromise. Additionally, our team want to integrate artificial intelligence in the workflow to help in data processing and make more decisions automatically. The slides will be published next week on the conference website. I will add the link in this blog post when they become available. A more detailed description of the content of the talk can be found in the following sections. Looking forward to #TROOPERS27!

Introduction

As the frequency of cybersecurity incidents continues to increase, the need for effective Incident Analysis (IA) has never been greater. Although incidents themselves may be unavoidable, organizations can certainly influence how they manage and respond to these events. The IA process seeks to understand how and why an incident occurred, often revisiting definitions of unlawful, unauthorized, or unacceptable actions to identify their origin. IA involves collecting evidence, reconstructing events, and assessing the impact on the organization’s systems and data.

Garfinkel predicted in 2010 several challenges for the next 10 years for Digital Forensics (DF) that also apply to IA, including the growing size of storage devices, the insufficient time available to analyze all data, and the increasing complexity of analyzes due to a wider variety of operating systems and file formats, which necessitates the use of multiple analysis tools. As forensic analyses become more time-consuming and costly, the need for efficient tools becomes critical. For instance, Garfinkel noted that imaging a 2TB hard drive already took around seven hours, and storage capacities have since increased significantly. Furthermore, forensic tools must be constantly updated or integrated with others to remain relevant. Large vendors struggle to cover all use cases, making it necessary to rely on smaller, often open-source tools. However, these tools often suffer from issues such as being outdated, poorly documented, or abandoned due to lack of funding. A final issue is the lack of triage functionality in forensic tools. Given the length and complexity of forensic analyses, a more efficient approach would be to prioritize critical findings and immediately return them to the analyst for further investigation. However, few existing tools implement such a system, highlighting a gap in current forensic methodologies.

These predictions have largely materialized, leading to the widespread need for multiple analysis tools, which is a challenge that motivated us.

Contemporary works increasingly focus on implementing new tools and technologies to automate IA and generalizing workflows for Incident Response (IR). For example Akatosh. The framework was developed within the scope of a project by the U.S. Department of Homeland Security’s Transition to Practice Program (TTP). Akatosh automates IA by capturing time snapshots of memory images. When an Intrusion Detection System (IDS) triggers an alert, corresponding snapshots are automatically analyzed, and a report is generated. The analysis is performed using Volatility and Rekall. However, Akatosh is limited to memory analysis and does not extend to disk image analysis.

The GRR Rapid Response Framework (GRR) is an open-source, multi-platform tool designed for continuous remote live forensic investigations within enterprise environments. It allows analysts to collaboratively manage investigations through a centralized web interface. GRR is built to support automated analysis and integrates additional forensic tools, such as The Sleuth Kit (TSK). The framework operates on a client-server model, where flows are initiated from the server to client machines. For example, an analyst can configure a flow to have client machines search for specific malware hashes, with any findings returned to the server for further analysis. However, a key limitation of the GRR framework is its real-time investigative focus, which may not align with workflows requiring non-live analysis.

The AUDIT toolkit integrates open-source tools within a Java Expert System Shell (Jess). It allows users with limited technical expertise to create rules for automated task execution during forensic investigations of disk images. It consists of three main components: a MySQL database, a knowledge base, and a core component. The database stores information about forensic tools, including their input requirements and expected output. Examples of integrated tools include bulk_extractor, which extracts artifacts such as emails, credit card numbers, and internet history. The knowledge base contains facts and rules that enable forward chaining, allowing the system to infer new information about the target disk based on previously collected data. The core component is responsible for executing forensic tools according to the rules defined in the knowledge base. While the system generates reports, these do not contain specific investigative findings but rather document which tools were executed, when, and why. The AUDIT toolkit primarily focuses on extracting personal data from suspect computers, such as images, emails, and documents, rather than detecting malware infections or other system compromises.

The SCAlable Realtime Forensics (SCARF) framework is a container-based software tool designed to support automated scalable forensic analysis. SCARF accepts disk images as input and generates a searchable database of forensic artifacts, which is maintained as a cluster of Elasticsearch nodes to enable efficient querying. The actual analysis is performed by workers, each operating as an independent Docker container, with every worker handling one specific task. Tasks can include operations such as SHA-1 hashing, running bulk_extractor, or using ExifTool to extract metadata from files. SCARF offers fast processing and scalability, as its containerized architecture allows for dynamic resource allocation. Additional containers can be deployed as needed, depending on the available hardware resources, making the system adaptable to different workloads. However, its primary focus is on evidence retrieval for DF rather than on understanding or reporting the root causes of a compromise, limiting its use in IA scenarios.

Farrell presents the development of an automated reporting system built on top of the PyFlag forensic tool, an open-source solution implemented by the Australian government for forensic media and network analysis. PyFlag, short for Python Forensic and Log Analysis GUI, utilizes TSK for file system analysis and includes its own scripting language, PyFlash, which allows users to develop extensions. In this work, PyFlag was extended with two new plugins focusing on improved reporting and the integration of an AOL Instant Messenger file identifier and extractor. However, the study also highlights several limitations. The system suffers from usability issues, making installation and setup challenging. Additionally, PyFlag faces significant integration challenges, as every new tool or component must be specifically adapted to fit within its architecture, limiting its flexibility and extensibility.

Methodology

The foundation of the incident analysis framework is a decision tree that models the investigative steps a human analyst would typically follow based on the results of previous tools. This structure was developed based on insights gained from interviews with ten experienced security analysts of ERNW Research. The following figure provides a summarized overview of the workflow. The input is a disk image, followed by multiple analysis steps. Initially, fundamental data about the image is retrieved. The process then diverges into two main paths: event analysis and artifact analysis. From the artifact analysis, specific files are extracted and subjected to a detailed malware analysis in a subprocess. All results of the analysis are stored in a central TinyDB database, ensuring that subsequent analysis steps can access and persistently store the findings. This structure enables a comprehensive evaluation of the results and facilitates the final report generation with Jinja2 templates.

Workflow — Simplified workflow of the framework

The following list provides a comprehensive overview of the tools currently orchestrated by the framework. Some internal tools are omitted for privacy and confidentiality reasons.

Evaluation & Conclusion

The framework was capable of correctly identifying the genuinely compromised disk images, assigning them an appropriate compromise rating. In total, approximately 69 % of IoCs were successfully detected. Missed indicators were primarily due to deleted files. However, the integration with event log analysis proved effective, as associated event traces could still point to malicious activity even in the absence of the original files. Nonetheless, the overall classification accuracy of the framework was relatively low, at 0.33, primarily due to a tendency to overestimate compromise on clean images. This was largely attributed to high false positive rates in third-party tools used in the analysis, particularly VMRay (23 %) and Sigma (96 %).

Despite this limitation, the framework demonstrated strong potential in data reduction, managing to exclude 99.86 % of all files from further analysis and focusing only on relevant artifacts. This represents a significant advantage both in terms of execution time and in reducing the workload for analysts reviewing the results. In conclusion, while the compromise rating mechanism requires further refinement to improve accuracy, the analysis results highlight the framework’s value as a preliminary triage tool. It supports analysts by filtering out large volumes of irrelevant data, identifying potential IoCs, and offering an initial assessment of disk images. However, the final determination of whether a system has been compromised must ultimately be made by experienced security analysts.

The execution times of the framework’s analyses were evaluated. While execution time naturally increased with the size of the disk image, even the largest image in the evaluation was processed in under three hours, excluding the malware analysis conducted with VMRay. The malware analysis component, although highly valuable, presented a performance bottleneck. This challenge was mitigated through a triage mechanism and the generation of preliminary reports, ensuring that the most relevant findings were prioritized for analysts.

The implementation and evaluation of the framework offer several opportunities for future extension and improvement. Additional modules with extended functionality could be developed to enhance the analytical capabilities of the framework. For example, the integration of file carving techniques could allow for the recovery of deleted files, thereby reducing the number of artifacts that are currently missed due to their absence from the file system. In particular, the integration of artificial intelligence techniques offers substantial potential to enhance detection accuracy and further narrow the gap between automated and human-driven analysis in future work.

Cheers!
Anne