Open Source Intelligence (OSINT) is a large and extremely dangerous attack vector. It can often be difficult to monitor, as the product of this vector is data most generally gleaned from third party public services from Google to Git – making it impossible to know if it is even happening! This article looks at some of the tools used to scour through Source Code Repositories.


OSINT itself contains many different attack vectors; there are so many different data ports that a diligent APT can use to research their target. Sensitive Code Exposure is a problem caused by adversaries perusing through public source code repositories to find secrets long forgotten within a targeted code base which may have been removed in the main branch of the code and may still exist in a historic commit. 

Git remembers everything… so does the internet!

One of the most amazing features of Git is that it will remember all the states your code base has been in. This is great for giving your team the ability to rapidly roll back code changes and inject features into the main branch to release them. However, this is also one of the worst parts of it, as simple mistakes accidentally committed can last forever!

Even more worrying, even if the secret is purged from the repository, do we know who saw it or took a copy? Did the Wayback Machine scoop a copy?

wayback machine

Anything that does get compromised should be updated, even if it was a mistake from the past.

Tools to read Git for secret strings

There have been cases where people feel confident that, with the amount of source code changes that have occured in different branches over a project, there is a good chance an attacker will never find these secrets. However, numerous projects now exist that allow APTs to greatly reduce the volume of data they would need to go through. A few examples of these tools are:


Trufflehog is one of the easiest tools to use. It is highly customisable and comes with a range of default flags to search for that would provide the user with a filtered list of Keys, Certificates and Passwords across all commits. 

It can be installed from pip:

`pip install truffleHog`

And executed with:

`truffleHog –regex –entropy=False </path/to/directory/of/repo>

# Note: You may wish to switch entropy to True to tune your results. Users will generally scan with both.


You can even use the `–rules` flag to give the program a JSON file with signatures you may know exist within your code base. You can add it to your personal copy of truffleHog and have it scan those elements to make sure they are clean.


To go a stage further, if we notice from the above some particularly naughty developers, we might want to detail them and see if their own code profiles carry any data about the target organisation. This is where tools like Sherlock can be used to hunt down individuals across social media networks (including coding ones).

Another simple command set:

`sherlock $user1 $user2 $user3`


Sherlock is capable of searching more than one user at a time and generating a report for the assailant to peruse.

Attacking the CI/CD Bots

In more recent times, we are seeing more successful attacks against the CI/CD build chain. If your code base is hosted on a third party service, there is a strong chance that there is a robot user associated to facilitate the build. There may even be a chance that, using the tools above, an adversary may find credentials to be able to access this user. 

Tools like Gitoops are available to help us see the lateral movements possible from the compromise of a given account using Graph DB. With it, we are able to map out all the relations various elements have with each other – including secrets.

For testing, we can use Docker Compose to fire up what we need from a cloned Gitoops repo run:

`docker-compose -f docker-compose.yml up -d`

Once set up, the first step will be to pull in the data from the CI/CD system under inspection. For GitHub, your command might look something like:

`gitoops github –debug –organization $ORG –neo4j-password $NEO4J_PASSWORD –neo4j-uri=”neo4j://localhost:7687″ –token $GITHUB_TOKEN –ingestor default –ingestor secrets –session helloworld`

Where the ORG is the name of your own GitHub organisation or user and the GITHUB_TOKEN is the API token of a user.

The $NEO4J_PASSWORD may not be required, as by default the Dockerfile will not provide authentication on the database running locally, but can be set and provided if required.

Once the data has been harvested, we will need to enrich it with:

`gitoops enrich –debug –organization $ORG  –session helloworld –neo4j-password $NEO4J_PASSWORD –neo4j-uri=”neo4j://localhost:7687″`

This will start sifting through the data to find connections, storing all our data in a Graph DB to make querying more simplified for the relationships we may want to look for. As an example:

`MATCH p=(:Repository)-->(:EnvironmentVariable)

The above statement will be great for hunting through repositories to check which environmental variables have been stored here. For example, an AWS Access and Secret Key combo might be stored here with untold power in the Cloud environment to deploy the code. 

Cloud environment

Round up

Overall protections against the majority of the above are easily created. Adhering to “Principles of Least Privilege” when designing our RBAC and ABAC will go a long way to prevent exposure in the event of a breach.

On top of this, we can use tools such as Git Secrets to add a secrets scanning capability to our git environment or an additional stage within the CI/CD commit chain to ensure nothing gets written to the repository.

Thankfully, the tools listed above can be used by defensive teams as well to give insight into these data sources so that we can remediate the problems ahead of time.

Give the Git Robbing security tools a try

There are many tools mentioned within this article and each one deserves a detailed play! Not every Cloud Environment will be the same, but with the aid of these tools we can be scanning a lot deeper to better understand where our weaknesses lie.


Paul Hardy is a Principal Systems Developer at Cloudreach with a passion for Offensive Security. Having worked with Cloudreach for almost a decade, he has built up expertise over a wide range of technologies, often finding new and creative ways to improve the security posture of Cloudreach’s customers. Contact us to learn more.