The world of intelligence assessment is divided into multiple parts. For example, the SIGINT, or signal intelligence is the fact of collecting information or data via signals (Mobile network, Wi-Fi, radar, radio…). Another part is the HUMINT, for human intelligence. This part is related to information that can be extracted from human, with discussion for example.
This newsletter will deep dive into another part: the open source intelligence.
The word “open” means public data. All data collected during the OSINT process need to be collected via a passive way, without asking people or server directly. A third party can collect these data actively and dispose it on internet for example. When these third parties publish this data, they will be rendered available for anyone.
On one hand, this data can be indexed by search engine like Google, Bing or Yahoo. When it is indexed, information is reachable easily just using a search engine. On the other hand, the data can be published on the internet without indexation, which means that collecting this data is more complicated but still possible and open for anyone possessing the link or an access (DeepWeb).
Information will be mainly collected on internet. People unaware with security and data privacy will publish a lot of data about their life on internet. Like explained previously, some data can be queried on search engines but other information can be found on social networks like Facebook, Twitter, Instagram or LinkedIn. Before the massive use of social network, people had been using blogs and forum for sharing data or asking questions about specific topic. Whether in social networks or blogs, people like publishing information about themselves like holiday location, jobs or post question about personal issues. Depending on your smartphone, some metadata can be saved on the picture when you take it. This data can be the time on which the photo was taken, a thumbnail or event GPS coordinates. These coordinates can allow an attacker to know the location of your home, your job and even your holiday location.
Another part of data can be associated to companies. Companies expose services on internet, buy domain names, publish documents, etc. For this part, it will be possible to collect information without sending any request to the company server, always by using a third client. The first example is by using Google for searching PDF hosted by Excellium Services website. This can be done with a technic called Google Dorking by searching site:excellium-services.com filetype:pdf which says to Google: search on the site excellium-services.com all PDF file type documents. From these documents, information can also be collected, like internal username who generated the PDF or email addresses. This kind of data can also be present on Office documents and images. Another example is the Shodan website, it performs worldwide network IP scan and for each one shows open ports. Shodan go deeper with a banner grabbing, which allows it to know the type of service exposed behind the port. Domain names and hosts can be collected with TLS certificates or via website like dnsdumpster.com, then other collected domains are reused for the whole process (find data, users, emails, ports, services, etc).
When the information is published on internet, a lot of bot and web crawler keep a copy and propose a search engine. This is the job of the WayBackMachine (archive.org) for example. It keeps a copy of website and allows user to browse the history of a specific domain or webpage. You need to consider that all public information will stay public forever. The last link in references can show you the Luxembourgish government website in 2000, kept by the WayBackMachine service for example. You can also try with your company website or personal one.
Other information can be retrieved from the deep web. For example, in case of database leak, information can be shared on the deep web without the knowledge of the company. Web sites like Pastebin.com allows visitors to past data in text format and then share a link. Without this link, it is hard or impossible to see the pasted data.
The “intelligence” is the fact of using all this data in order to reach a goal. This can be used by the government in order to find terrorists or also by a company in order to find interesting people or information about a concurrent. On the next part, the focus will be on a pentest point of view.
OSINT in Pentest process
When the attacker possesses enough information about the target and employees, he can pass to the next step and use social engineering skills. This part is not a part of the OSINT, which is only a passive process.
Social engineering needs to have contact with the person and try to get more information about the company or a service for example.
For the human part, before contacting the person, the attacker can create some fake profile on social networks and fake emails. Then, these profiles can be customized with same hobbies as the victim. The attacker can also follow some groups like a personality fan club, classified ad Facebook group or by posting on forum followed by the target. This approach will allow the attacker to contact the target and speaks with it about a common topic.
The next step is to contact the target or interact with systems and leave the OSINT point of view. While speaking or writing with the target, the attacker can ask the victim about his job or colleagues in order to collect more information. He can also convince the target to open document or click on link.
For the technical part, the attacker need to know IP, open ports, username, CMS used, server version and so on. This knowledge will be used for the intrusion part by testing exploit or brute force on the target infrastructure.
These steps are not part of OSINT, but the OSINT process is a prerequisite for a good weaponization.
The human part is more complex. It can be possible to raise employees’ awareness about the impact when they publish on internet, but it is not possible to prevent them. One of the solution is to use a pseudonym and not posting personal information about themselves on internet.
Companies or people can do the test on themselves by trying to get information using Google or social networks in a first time and see which type of information could be found on themselves…