Blog · 2026-06-24

EmeraldWhale's .git/.env credential scraping is still running, ~2 years on

Nobody's exploiting you. They're reading your .git/config.

For the last 30 days we sorted every request that reached the HoneyLabs sensor network by what the sender was after. Once you set aside the benign crawlers, the most common thing sent at our sensors was a grab for secrets. More source IPs probed for .env and .git/config files, the kind a careless deploy leaves exposed, than for any single vulnerability we tracked in the window.

3,050 distinct source IPs sent 246,941 of these requests over the month.

What they ask for

The targets are predictable, and they sit wherever a sloppy deploy leaves them:

.env* files: 1,698 source IPs. Laravel, Symfony, Node and Django apps drop database passwords, API tokens and mail credentials into these.
.git/config: 1,293 IPs. An exposed .git directory leaks the repository's remote URL, and from there an attacker can often clone the whole repo and read every secret committed into its history.
config.json, credentials* and .aws: 701 to 1,020 IPs each, reaching for application config and AWS credential files by name.

What the scanners ask for

A single hit on /.env is background noise; every public server sees it. The structure behind the .git/config requests is what stood out.

One host, one wordlist

896 of the 1,293 IPs that requested .git/config did not ask for it once. They walked a wordlist of deploy-path permutations, requesting the file under dozens of common subdirectories in one pass: /admin/.git/config, /api/.git/config, /backend/.git/config, /wordpress/.git/config, /v1/.git/config, and on down the list. The deepest single source asked for 131 variants.

A single source walking the .git/config wordlist

The user-agent changes on almost every line. Across the secret-scanning traffic we counted more than 2,000 distinct user-agent strings, from current Chrome and Firefox to ancient Symbian and OS/2 builds. More than two thousand of those strings share one and the same HTTP client fingerprint, so the variety is cosmetic. The rotation exists to slip past rules that block on a fixed agent.

It is easy to see through. The requests send accept-charset: utf-8 and omit the Accept header that every real browser includes. These are scripts in browser costumes, not the browsers they name.

Where it runs

The scanning is cloud-hosted and heavily concentrated. Of the 3,050 source IPs, 2,280 (three quarters) sat on Google Cloud. A Hong Kong provider, UCloud, was a distant second at 298, with Microsoft, DigitalOcean and Amazon in the low hundreds and a long tail of smaller networks behind them.

Hosting network of the scanning IPs

That matches the economics. Scanning the whole IPv4 space for one filename is cheap and finishes in hours, and it runs fine on trial or low-cost cloud instances that get discarded once they are blocked.

A steady hum, then a flood

Most days the secret-scanning held a low, constant rate of a few dozen sources and a few thousand requests. On June 14 it jumped to 1,370 source IPs and 104,312 requests in a single day. June 15 was still elevated at 742 sources before the volume fell back to baseline.

Daily unique source IPs scanning .env and .git/config

A burst shaped like that usually means a fresh fleet of scanning nodes was spun up, pointed at a target list, and torn down when the run finished.

The EmeraldWhale playbook, still running

None of this is new in kind. In October 2024, Sysdig documented an operation it named EmeraldWhale: a credential-harvesting campaign that scanned IP ranges for exposed .git/config and Laravel .env files, cloned the repositories those configs pointed to, and pulled cloud keys out of the source. By the time researchers found the operator's own storage bucket, it held more than 15,000 stolen cloud credentials and the configs for over 10,000 private repositories (BleepingComputer coverage).

The traffic in our sensors follows the same playbook: the same two file families, the same deploy-path wordlists, the same automated walk across address ranges. What we can add is that this style of operation is still running at scale close to two years later, that it now leans overwhelmingly on Google Cloud, and that it arrives in fleet-sized bursts like June 14.

The stolen material has a ready market. Sysdig found EmeraldWhale validating the tokens it captured and reselling the working cloud, email and messaging credentials into phishing and spam operations. A live cloud key is worth real money, and the gap between exposure and abuse is short, so any key that sat in a reachable .env should be treated as already burned.

What to do about it

The fix is unglamorous and it holds:

Stop serving dotfiles. Block any request for a path containing /.git/, /.env, /.aws or /.ssh at the reverse proxy or CDN and return a 404. One rule stops the whole class.
Keep .git and .env out of the web root. A deploy should never copy the repository's .git directory or a real .env onto a public host. Ship build artifacts, not the working tree.
Treat any exposed secret as already stolen. If a .env or a committed key was ever reachable, rotate it now and check the provider's logs for use. Assume the scan that found it also read it.
Alert on the walk. A single client requesting .git/config under many subdirectories within seconds is a high-confidence signal, and one detection rule on that pattern catches the tool described here.

You can check whether a given address shows up in this activity at honeylabs.net/lookup, and the indicators we extract from honeypot payloads are free to use at honeylabs.net/feeds.

Method: figures cover a rolling 30-day window of the HoneyLabs sensor network. Counts are distinct source IPs and request totals for paths matching the file families named above. Sensor addresses are redacted from all public data.