Compression & Decompress Of A Stream

So far in Python I had not found a good method / module for performing compression and decompression of data as streams;  most tools required files to be compressed which has some obvious limitations.  But then I saw a mention of pyLZMA roll by. It supports compression and decompression of streams using the Lempel–Ziv–Markov chain algorithm. The license of the module is LGPL-2.1; not MIT, but at least it is "Lesser" GPL'd.  I've taken it for a spin and it seems to successfully compress and decompress all the data I've thrown at it (remember to always checksum your data).

import pylzma, hashlib

# Calculate the SHA checksum for our input file
i = open('Brighton.jpg', 'rb')
h1 = hashlib.sha1()
while True:
    tmp = i.read(1024)
    if not tmp: break
h1 = h1.hexdigest()
print 'Input SHA Checksum: {0}'.format(h1)
# Compress the input file (as a stream) to a file (as a stream)
o = open('compressed.lzma', 'wb')
s = pylzma.compressfile(i)
while True:
    tmp = s.read(1)
    if not tmp: break

# Decomrpess the file (as a stream) to a file (as a stream)
i = open('compressed.lzma', 'rb')
o = open('decompressed.raw', 'wb')
s = pylzma.decompressobj()
while True:
    tmp = i.read(1)
    if not tmp: break

# Check the decompressed file
i = open('decompressed.raw', 'rb')
h2 = hashlib.sha1()
while True:
    tmp = i.read(1024)
    if not tmp: break
h2 = h2.hexdigest()
print 'Result SHA Checksum: {0}'.format(h2)
if (h1 == h2): print 'OK!'

Of course a JPEG file doesn't compress much, but that makes it an even better test case.

1 comment:

  1. On the decompress I think you need to add an "o.write(s.flush())" before the "s.close()" call based on the usage information is these notes (see line 125):