Right before leaving my previous job, we released a lengthy post that I had written about a survey of around 20 million public keys, which I had gathered from all different corners of the Internet during summer 2024.
This includes TLS certificates, SSH server keys, Github and Gitlab user keys, and DNSKEYs.
A friend recently pointed me to some similar things out there, which I hadn’t been aware of when I originally worked on this.
Thomas Stromberg on Linkedin,
mentioned having gathered ~350k SSH user keys from Github users, and this
led me to the work of Filippo Valsorda, who
built an SSH server,
whoami.filippo.io
, which performs a clever little trick based on having
gathered a similar quantity of GitHub SSH user keys:
$ ssh whoami.filippo.io
+---------------------------------------------------------------------+
| |
| _o/ Hello Dan Lenski!
| |
| |
| Did you know that ssh sends all your public keys to any server |
| it tries to authenticate to? |
| |
| We matched them to the keys of your GitHub account, |
| @dlenski, which are available via the GraphQL API
| and at https://github.com/dlenski.keys
| |
| -- Filippo (https://filippo.io) |
| |
| |
| P.S. The source of this server is at |
| https://github.com/FiloSottile/whoami.filippo.io |
| |
+---------------------------------------------------------------------+
My post about the survey of 20 million public keys included a section on Leakage of Identifying Information.
Here’s a key point: if you reuse the same SSH public key as a convenient method to login to multiple SSH servers, or push code to multiple Git forges, the reuse of that key can be used to correlate your identity across those servers.
among these tens of thousands of cases, we also find a few where the associated usernames are completely different (e.g.
clark.kent
on GitHub andsup3rman
on GitLab) and where the affected users might be surprised or alarmed to learn that it is possible to link these real-world identities.
Additionally, if you use an SSH key or a TLS key that is in any way unusual in its cryptographic parameters, I can use that to determine the key’s provenance, and possible infer something about your identity:
- Using an RSA key with an exponent of 37? You generated it with an old version of OpenSSH
- Using an RSA key with an exponent of 35? You generated it with an old version of PuTTY%2C)
- Running a TLS server with an EC key based on a Brainpool curve? You’re in Germany and probably in one of a few specific industries
- Running an SSH server that offers an Ed448 key? You’re running a specific brand of router