This was basically a showerthought. How could I store files online, in plain sight, for free. Because who doesn’t like a good ‘ol game of hide and seek. But with files. On the internet.

07/09 Update: someone pointed out that I made a mistake with the meaning of the 4th byte of the chunk type. I’ve updated the table to reflect the proper meaning.

The Challenge

  • Hide files in plain sight
  • Allow them to be distributed via free public channels. E.g Twitter, Reddit, imgur.

Finding a format

I spend an evening reading up different file formats. I considered all sorts of file formats, but none of them really tickled my fancy. Until I ran across PNGs. PNG files are very well structured. And soon you’ll realise why they’re perfect to store a payload.

PNG files start with an 8 byte signature, 89 50 4E 47 0D 0A 1A 0A. The first byte is a non-ASCII character, byte 2 through 4 spell out PNG in ASCII. The remaining bytes are line ends, the DOS EOF character, and another line break.

What follows next are what is known as chunks. The PNG i’ll use in this example comes from the Wikipedia page on the PNG format and can be found here

For the image on the Wiki page, the chunks follow this format:

4 byte chunk size 4 byte chunk type N byte chunk content 4 byte CRC
13 IHDR [] 9a768270
218087 IDAT [] e11d26bc
0 IEND [] ae426082

IHDR contains metadata related to the image such as width and heigh. IDAT contains the actual image data and IEND marks the end of the file. The chunk type naming follows a very clear convention:

  first letter second letter third letter fourth letter
uppercase Critical Chunk Standard Chunk Reserved No Safe to copy
lowercase Non-critical Non-standard Chunk n/a Safe to copy

For example, IHDR means that:

  • I: it’s a critical chunk, e.g the file can’t be rendered without it.
  • H: It’s an offial chunk type that’s been standardized in the spec.
  • D: Reserved chunk that always needs to be uppercase.
  • R: Unsafe to copy if other chunks have been edited.

Getting down and dirty

First we have to come up with a chunk name. One of my coworkers calls everyone a little punk, and with chunk types needing to be 4 ASCII characters, punk is perfect. Following the table above on chunk type naming, I settled for puNk.

To make my life easier, I’m working with some helper functions called read_bytes, read_bytes_as_hex, read_bytes_as_ascii, and read_bytes_as_int. You can find a link to the complete source at the bottom of this post.

Let’s open up our file:

self._file = open(input_file, 'rb+')

We have to open it in binary mode to make sure we won’t have any reading issues later on.

self._read_bytes(8)

This reads the first 8 bytes of the file. This is the byte signature that we’re not really interested in. What should come up next are chunks.

chunk_size = self._read_bytes_as_int(4)
print 'Chunk size:', chunk_size

chunk_type = self._read_bytes_as_ascii(4)
print 'Chunk type:', chunk_type

content = self._read_bytes(chunk_size)

crc = self._read_bytes_as_hex(4)
print 'CRC:', crc

Outputs

Chunk size: 13
Chunk type: IHDR
CRC: 9a768270

Perfect! Let’s loop through the entire file until we reach the EOF

Chunk size: 13
Chunk type: IHDR
CRC: 9a768270
Chunk size: 218087
Chunk type: IDAT
CRC: e11d26bc
Chunk size: 0
Chunk type: IEND
CRC: ae426082

Injecting the payload

I’m a lazy man, so let’s inject our puNK payload at the end.

if chunk_type == self._END_CHUNK_TYPE:  # IEND
  self._inject_punk_chunk()
  self._file.close()

Diving inside of inject_punk_chunk: First we need to move back the cursor in the file by 8 bytes. It’s 8 bytes because we have 4 byte chunk type, and a 4 byte chunk size that we need to overwrite.

self._rewind_bytes(8)

The CRC bytes is a cyclic redundacy check over the chunk type and the content. Not the length. So let’s create a new byte array so we can easily create this CRC.

tmp_bytes = bytearray()
tmp_bytes.extend(bytearray(self._PUNK_CHUNK_TYPE))
tmp_bytes.extend(self._bytes_to_hide)

Now with this ready, we can start writing to the file:

self._file.write(bytearray(struct.pack('!i', chunk_size)))
self._file.write(bytearray(self._PUNK_CHUNK_TYPE))
self._file.write(self._bytes_to_hide)

Notice I’m using pack here because we need to write a 4 byte integer to the file. Not just the chunk size. The ! specifies big-endian encoding.

Now we have to write the CRC bytes. The CRC returns an integer, which needs to be 4 bytes, so again we use pack to write this to the file.

crc = binascii.crc32(tmp_bytes)
self._file.write(bytearray(struct.pack('!i', crc)))

And last but not least, we write the EOF chunk

self._file.write(bytearray(struct.pack('!i', 0)))
self._file.write(bytearray(self._END_CHUNK_TYPE))

Okay, that should be it! Let’s try to inject an image as payload. Because I like dead memes, we’ll use

And inject it in to

Run the script that loops through the chunks, and injects the payload at the end:

Chunk size: 13
Chunk type: IHDR
CRC: 9a768270
Chunk size: 218087
Chunk type: IDAT
CRC: e11d26bc
Chunk size: 0
Chunk type: IEND
CRC: ae426082
Hiding 27 kB ( 28208 bytes)
Injecting punk chunk
Punk chunk injected
Reached EOF

Looping through the chunks to see if the chunk got injected properly:

Chunk size: 13
Chunk type: IHDR
CRC: 9a768270
Chunk size: 218087
Chunk type: IDAT
CRC: e11d26bc
Chunk size: 28208
Chunk type: puNk
CRC: 8cccb594
Chunk size: 0
Chunk type: IEND
Reached EOF

Excellent! I opened the file, see the dice. And no doge. Exactly what is expected.

Getting our file back

Now that we have a file with a payload, we need to get it back. Inside of our chunk parser, we get the content. That’s great because now all we need to do is check if whether we encountered a puNK chunk, and if we did write it to a file. We create the file like this: self._output = open(output_file, 'wb+'), and write to it like this:

if chunk_type == self._PUNK_CHUNK_TYPE:

    print "Found a punk chunk", len(content), "bytes. Writing to file"
    self._output.write(bytearray(content))
    self._output.close()
    self._file.close()

Outputs:

Chunk type: puNk
CRC: 8cccb594
Found a punk chunk 28208 bytes. Writing to file

Quick MD5 check to see if the files are equal:

md5 doge.jpg doge_from_punk.jpg
MD5 (doge.jpg) = 9023d02eefc75f4c6ce177795e620b29
MD5 (doge_from_punk.jpg) = 9023d02eefc75f4c6ce177795e620b29

Sweet! We’ve just hidden an ancient meme inside of a picture of 3 dice.

Distributing it to Imgur

The goal of the project was to store these files in broad daylight without anyone suspecting a thing. Time to upload the file to IMGUR. Here she is in all her glory:

Hidden underneath is a Doge meme… or is it?

Let’s find out:

> wget http://i.imgur.com/Qk5BP19.png

> md5 Qk5BP19.png png_out.png
MD5 (Qk5BP19.png) = ba56411b9753a9ff2dc4aa74d079e4c8
MD5 (png_out.png) = ba56411b9753a9ff2dc4aa74d079e4c8

For good measure, let’s extract the payload. I’ve written a Punk class by now,

punk = Punk()
punk.decode('Qk5BP19.png', 'doge_from_imgur.jpg')

And an MD5 hash check

md5 doge.jpg doge_from_imgur.jpg
MD5 (doge.jpg) = 9023d02eefc75f4c6ce177795e620b29
MD5 (doge_from_imgur.jpg) = 9023d02eefc75f4c6ce177795e620b29

Taadaa!

We can now store any type of arbitrary data on other people their servers, without them ever knowing about it.

All thise code works, but is a quickly written POC. You can optimize it no doubt, and make it deal with larger file sizes. PNG chunks can only store up to 2 gigabyte, and most image hosts only allow you to store a few megabytes.

For the future:

  • Come up with a format to distribute a file over multiple PNGs
  • Make it redundant, allow for uploading to multiple sources
  • Add GPG encryption options for an added layer of security

And last but not least, you can find the gist with all code here.

Example:

from punk import Punk

# First param is file name, 2nd param is bytes you want to inject.
punk.encode('png_out.png', file('doge.jpg').read())

# First param is the file name, 2nd param is output file name.
punk.decode('png_out.png', 'doge.jpg')

No external libraries needed. Because I’m awesome like that.

All code was written while listening to this album: