Thursday, November 12, 2015

Visualizing the Calgary compression corpus

The Calgary corpus is a collection of text and binary files commonly used to benchmark and test lossless compression programs. It's now quite dated, but it still has some value because the corpus is so well known.

Anyhow, here are the files visualized using the same approached described in my previous blog post. The block size (each pixel) = 512 bytes.

paper1 and paper6 have some interesting shifts at the ends of each file, which corresponds to the bottom right section of the images. Turns out these are the appendixes, which have very different content vs. the rest of each paper's content.

bib:


book1:

book2:

 fourpics:

geo:

news:

obj1:

obj2:

 paper1:

paper2:

paper3:

 paper4:

paper5:

paper6:

pic:

progc:

progl:

progp:

trans:

twobooks:

twopics:

No comments:

Post a Comment