python-pgpdump, a PGP packet parser library
PGP file formats and data are not the simplest thing to wrap your head around, so pgpdump is a very handy tool to have available. Although similar to gpg --list-packets
, pgpdump
output is a bit more verbose and descriptive.
pgpdump is a PGP packet visualizer which displays the packet format of OpenPGP (RFC 4880) and PGP version 2 (RFC 1991).
There is an online version available as well if you have never used it before and want to see what it is. Here is the output when parsing a detached signature file.
$ pgpdump testfile.sig
Old: Signature Packet(tag 2)(70 bytes)
Ver 4 - new
Sig type - Signature of a binary document(0x00).
Pub alg - DSA Digital Signature Algorithm(pub 17)
Hash alg - SHA1(hash 2)
Hashed Sub: signature creation time(sub 2)(4 bytes)
Time - Wed Mar 7 15:42:52 CST 2012
Sub: issuer key ID(sub 16)(8 bytes)
Key ID - 0x5C2E46A0F53A76ED
Hash left 2 bytes - d4 1a
DSA r(159 bits) - ...
DSA s(160 bits) - ...
-> hash(DSA q bits)
With package signing now a reality in Arch Linux, I wanted a way of displaying the signature data stored in the package databases on the website. I was a bit disappointed after searching for tools that could be used to help me do this, and shelling out to gpg
or another tool seemed like a bad way of going about it.
It turns out the PGP packet format is completely insane, which is probably why there isn’t much in the way of library support nor many binary implementations outside of GnuPG. Luckily the pgpdump code is a lot more terse than diving into GnuPG, so I was able to use it as a good reference point for writing a PGP packet-parsing library in Python.
The two existing projects I found related to PGP in Python appear to be abandoned: OpenPGP, last updated in July 2005, and pgpmsg, which has no listed date, and no source files.
Discovering there was nothing great out there, I present my first package on PyPi, python-pgpdump. The home for the code is on github. Designed so it can hopefully parse every type of PGP packet down the road, the first release did only what I needed it to do- parse detached signature packet data. I’ve already had one contributor add public key packet parsing support, so that is a good sign that I didn’t code things too stupidly. Since then, I have also added some parsers for other packets and public key algorithms.
The end result of this is a usable library to extract the details from the PGP signature data with no dependencies except the Python standard library. This allows us to show the PGP key ID of the person who created the signature on the Arch Linux website, and from there, match the key ID to a developer’s known key in the system.
In [1]: import base64, pgpdump
In [2]: data = base64.b64decode("iEYEABECAAYFAk824S0ACgkQXC5GoPU6du3rsgCeOXqjR0K
NIIfdZNhLZtzvU5d7oc0AoJHaJRAgGv4r6kAKgsNjfMBttHwM")
In [3]: pgp_data = pgpdump.BinaryData(data)
In [4]: pgp_data.data
Out[4]: bytearray(b'\x88F\x04\x00\x11\x02\x00\x06\x05\x02O6\xe1-\x00\n\t\x10\\.F
\xa0\xf5:v\xed\xeb\xb2\x00\x9e9z\xa3GB\x8d \x87\xddd\xd8Kf\xdc\xefS\x97{\xa1\xcd
\x00\xa0\x91\xda%\x10 \x1a\xfe+\xea@\n\x82\xc3c|\xc0m\xb4|\x0c')
In [5]: packets = list(pgp_data.packets())
In [6]: packets[0]
Out[6]: <SignaturePacket: DSA Digital Signature Algorithm, SHA1, length 70>
In [7]: packets[0].key_id
Out[7]: '5C2E46A0F53A76ED'
In [8]: packets[0].datetime
Out[8]: datetime.datetime(2012, 2, 11, 21, 44, 13)
I definitely welcome pull requests for any improvements. I would highly recommend looking at the pgpdump C source if you need to figure out what is going on in addition to reading RFC 4880.
See Also
- Arch Package Visualization - June 23, 2011
- Unstated coding style - December 21, 2011
- The real story behind Arch Linux package signing - March 24, 2011
- Someone else did the pacman 3.5.0 blog post - March 17, 2011
- Fast unicode decoding in Python 2.7 - February 27, 2011