Threat Intelligence (TI) is one of these new trendy words in the cybersecurity world. Many vendors offer their own solution of threat intelligence. In the present era of information, the challenge is finding the right solution on time. Sometimes it is like finding a needle in a haystack, but, luckily not always. And this is what TI is about, about going through huge amount of data to find relevant information and use it.
This newsletter will dive deep into the underlying issues of TI, and describes typical pitfalls usually encountered when learning to use it.
Overview
What is it?
In the course of a forensic investigation, it is a common practice to list down the various artefacts encountered, such as impacted or conspicuous hosts or networks, URLs, or even registry keys or specific paths on the targeted machine. In a nutshell, every piece of information related to the breach that could help identify the intruder or recognize the behavior of the malware. These artefacts are called Indicator Of Compromise (IOC).
To understand what an IOC is, let’s use an example which is not related to the IT security field: The Adventures of Tintin, the famous European comics.
So, what is the first things that comes to your mind when you think of Captain Haddock? Most probably, a few colorful insults. Given the list below, try to guess which artefacts can identify the Captain in a clear unambiguous way:
“Bashi-bazouk”, “Coconuts”, “Freshwater Spacemen”, “MRKRPXZKRMTFRZ”,
“Heretic” and “Ectoplasms”.
Taken separately, a search engine would only clearly identify our dear Archibald with “Freshwater Spacemen” and “MRKRPXZKRMTFRZ”, even though “Bashi-bazouk” might also be accepted if you put aside history. “Coconuts”, “Heretic” and “Ectoplams” without any context, in the other hand, are too common words for being good indicators.
To go back to our subject, if Captain Haddock was a malware, good IOC to quickly recognize it will be “Freshwater Spacemen”, “MRKRPXZKRMTFRZ” and “Bashi-bazouk”.
That is all well and good, but it implies the creation of your own database of IOC, preferably aggregated from other trusted sources to have the maximum of readily pertinent information. Threat intelligence feeds are one of these sources, a stream of data that can include IOCs. You can generate your own feeds, use external ones, or mix them together. Nowadays, plenty of feeds are freely available, but you can also decide to turn to specialized vendors, at a cost.
Feeds are generated using multiple sources and for multiple purposes. Researchers, vendors, CSIRT teams, abuse websites (anti-phishing for example) and many more. In these feeds, you will usually find network footprints (IP addresses, domain names or urls) that are ready to use, perhaps by injecting them into your SIEM for example. But first, gathering more information about them is crucial to avoid false positive overload. We will see why later.
Time is a key factor
Actors in the organized crime are now real professional. Trying to defeat them on your own is near impossible. You can use simple block lists from a particular bot network, but it might be not enough. Creating IOC takes time, and even when vendors release IOC, they are few days old. Why is that?
Usually, when a new malware is installed on a computer (j-5), it takes a few days to detect it due to his behavior (j-3). Then it takes a few more days to analyze it and extract relevant IOC from it (j). One more day to share the information (j+1) and probably another one to integrate this new information into your TI process (j+2). By the time the IOCs reach your organization, they can be one week old. If you do not search in the backlogs, a threat can stay hidden for a while, especially if the malware copies itself into a new file, using a new signature.
This example is also applicable to other type of IOC. If a non-technical person clicks on a link in an email, reaches a phishing websites and gives away his credentials, you will not detect the leak unless there is a complaint from this person.
So as you can see, past and future are important to check when you decide to dive into TI.
Context
The most important information in TI is not the list of IOC, but their context of origin:
- Where this IOC comes from?
- What it is? A panel for a Command and Control (C2)? A phishing website?
Having a list containing only IPs is meaningless. Of course, you could give this list to your security team and ask them to tell you if these IPs are “real” threats or not. Again, without context, this mind-numbing process would probably only be a waste of time without any gained knowledge at the end.
There is no magic box for finding information, only digging and correlation. An IOC given without any additional information is like finding the right location of a screw when building something, without the manual. For example, public IPs belong to providers and are usually shared. A single IP, without context, will not be a good IOC. Lacking context such as related URLs, a time period, or a protocol, the digging ends here and you can just as well discard it.
As previously said, context is paramount in threat intelligence. Security teams response will change drastically whether an IOC corresponds to a C2 server or to a single bot. Another information often missing or disregarded when considering IPs IOC is their origin and direction of communication. An alert raised by your SIEM regarding external IPs linked to brute force attacks should definitely be treated with a lesser priority than another alert indicating a connection from your internal network towards an IP linked to a C2 server.
A good threat intelligence program needs as much information as possible for classifying IOCs. Information that comes from the feed itself, correlated with data you already possess. In our company, we use these enriched IOCs for creating dedicated detection rules, which limit false positive rating.
Dark Side
Now you know what TI is about, and you are ready to collect your first feeds. How will you use them to create detection rules, and when? Why not just adding a blacklist and define a few detections rules in your security assets and call it a day?
Ideally, you would be right. Unfortunately, if feeds are great resources, most of the time they cannot be used as is. You need at least to validate each entry, especially network ones, to avoid problems. If you are lucky, your devices will just ignore any wrong entry, but in the worst case scenario, you might block your own assets, or for more complex IOCs, your devices might even crash.
Feeds are most of the time information coming from different sources: sandbox, abuse websites, automated processes or manual information. TI programs usually mean the creation of a dedicated database for gathering IOC. You should also enrich it using various information coming from public resources or restricted resources related to your business (example: SWIFT for banking). You can do it using scripts or dedicated tools like IntelMQ. For those who are familiar with the Business Intelligence field, Threat Intelligence is a sibling; you can use BI tools like ETL indifferently.
Anyway, feeds are like the input from any other “application”, and should be considered as such: Verification and cleaning is a must do!
Let’s consider another example: phishing feeds. Frequently, what you get is what users reported on the website. You can find various encoding like Cyrillic, Punycode, in all flavors of UTF or other code pages, e.g.
“hxxp://inter.xn--bttrx-y3a5604c.com”.
Are you sure all your security assets can deal with such an URL?
Another issue related to phishing is links. Most of the time they contain a parameter with the email of the impacted users:
“hxxp://hellios.ml/pic/conta/records/control/?email=info@another_company.com“
How many chances for this exact IoC to match in your environment? Yes, as you probably guess, absolutely none.
Finally, there is the human factor: mistake or intentional modification. For example, it is not uncommon to find this kind of URLs:
“hxxp://p”
“hxxps://miro(.)meyxier(.)fr”
In the first case, the false positive rate will simply be astronomical if used in a detection rule using the “contains” operator. In the second case, you have almost no chance to detect this if you search the exact content. If you use regex match, this example works. Remember to use an input validation on each field you plan to use, and adapt the content depending of detection match (exact content or regex).
Final Word
TI is still young and like any other fields, there is no one way to build your own program. Gather as much information as you can, do not forget limitation due to regulation if you deal with private information. Keep in mind how you plan to use data, it will help to determine what you keep and how.
When your TI program start growing, share your indicators, so that other security actors can benefit it and enrich it in turns. For this purpose, you can use one of the dedicated format (STIX, OpenIOC, CybOX) or even a dedicated platform like the famous M.I.S.P. (Malware Information Sharing Platform).
TI is a time-consuming field, but fully qualified IOC are efficient, accurate and important to the community. This is a virtuous circle, which is necessary to fight the threat actors.