Accelerating Cognitive Cybersecurity
Executive Summary
Human analysts bring to bear background knowledge, rules of thumb, contextualized reasoning, flexible assessments of similarity, and dynamic information gathering to identify and counter cyber-attacks. But we are easily overwhelmed by data. Machines can quickly look for known attack signatures in large volumes of data, but are myopic and easily fooled by adversaries that introduce syntactic variations of old attacks or invent new types of attacks. We will design and prototype a more semantic approach to cognitive cyber security that integrates diverse information sources and reasons about attacks like humans, and to leverage hardware acceleration at key points to scale up information integration to the enterprise and beyond.
Technical Challenge/Activitites
At the core of our approach is an ontology, a knowledge-base (KB), and a reasoner. A variety of data sources (sensors) assert facts into the KB. This includes traditional facts about the state of host systems (e.g., from top or monit), network activity (e.g., from Wireshark), and the output of existing intrusion detection systems. But it also includes facts extracted from unstructured data sources, like vulnerability description feeds (e.g., CVE and CCE), hacker forums and chat rooms on the dark web, and blog posts. The facts will be encoded as assertions in RDF supported by an extensive ontology in OWL that, at the highest level, has classes for means, consequences, and targets of attacks. Rules supplied by domain experts are encoded using the ontology, SWRL and Jena, and the SPARQL query language. This allows the system to combine background knowledge and current sensor data to reason about and detect attacks, yielding the means, consequences, and targets of the attack. The figure to the right depicts the system’s structure diagrammatically.
For example, a domain expert could assert the fact that Microsoft Internet Explorer versions 6 through 8 have a “use after free” vulnerability. In the ontology, use-after-free is an instance of backdoor which is a subclass of malicious-code-execution which is a subclass of means. A rule that integrates a variety of sensor data might say that if an affected version of IE is running (as detected by a host sensor), and the user visited a previously unvisited site (as detected by an application level gateway) that has a negative reputation (as reported by a commercial provider), and that a connection was subsequently been opened to a machine in a known range of zombie address (as detected by Wireshark for SORBS), then an attack is likely occurring. Given this line of reasoning, it is trivial to generate a concise alert that allows a human analyst to drill down to the underlying raw data that produced the facts at the head of the chain of inference.
Potential Impact
A theme that will pervade our work is the adversarial nature of the task. The use of abduction/induction is one approach to being robust with respect to new attack vectors and types introduced to circumvent existing facts in the KB. A more pernicious type of attack is one on the KB itself by manipulating sensor data to, for example, observe responses to gather information about its contents, overwhelm the reasoner by pushing toward complex chains of inference, or increasing the number of false positives and degrading trust in the system. We will do an initial exploration in year one, but hope to dive deeper into this issue in out years.
The expected outputs of the first year of this effort are research papers and functional demos, with the latter produced quarterly and the former expected near the end of the first year, which is scheduled roughly as follows:
- Q1: Stand up simulation environment end-to-end demo using existing ontology and KB pre-populated with facts from prior extraction efforts. Identify and obtain historical data from “real” network, e.g., from UMBC’s OIT. Construct typology of adversarial attacks on KB. Measure precision, recall, and run times.
- Q2: Develop and prototype acceleration of deductive inference. Use core graph analytics to add rules to the reasoner based on analysis of known attacks. Develop methods for countering adversary for selected elements of typology. Measure run times.
- Q3: Develop and evaluate hardware acceleration for graph analytics. Develop and demonstrate abductive reasoning locally to previously unknown attack means. Measure the impact of methods for countering adversary in terms of precision and recall, and impact on run time.
- Q4: Measure fully accelerated run times. Demonstrate abductive reasoning coupled with global inductive inference to previously unknown attack means and resistance to adversaries.
Resources
Member |
Affiliation |
Area |
Tim Oates (lead) | Computer Science and Electrical Engineering | Artificial Intelligence, Machine Learning |
Anupam Joshi | Computer Science and Electrical Engineering | Semantic Web, Security |
Tim Finin | Computer Science and Electrical Engineering | Knowledge Representation and Reasoning, Semantic Web |
Milt Halem | CSEE and CHMPR | High performance computing |
Karuna Joshi | Computer Science and Electrical Engineering | Cloud Computing, Data Science |
Zhiyuan Chen | Information Systems | Semantic Search and Data Integration, Privacy Preservation |
Aryya Gangopadhyay | Information Systems | Graph analytics, data visualization |
Yelena Yesha | CSEE and CHMPR | Analytics |
Postdoc (30%) | Computer Science and Electrical Engineering | Acceleration |
2.5 GRAs | CSEE/IS | Various |