Sunday, November 29, 2015

The Key Missing API in Lossless Data Compressors

There's a key streaming API missing from every lossless codec I've seen. This is the next API going into lzham_codec_devel (what will be LZHAM v1.1). This API bridges the gap between the lossless and lossy worlds, enables some other interesting use cases, and it should be easy to add to most designs.

For some background, the (previously) complete set of lossless compression library API's are:

Blocked:
CompressMemoryToMemory() - comp buffer in memory to another buffer
DecompressMemoryToMemory() decomp buffer in memory to another buffer
GetCompressBound()- returns max possible comp size given size of data to compress

Streaming:
CreateCompressContext() - create new comp context
DestroyCompressContext() - destroy comp context
ResetCompressContext() - reset comp context, reusing allocated memory
CompressContinue() - compress some bytes from input to output buf
CompressFlush(bool end) - forcibly flush comp, generating output

CreateDecompressContext() - create new decomp context
DestroyDecompressContext() - destroy decomp context
ResetDecompressContext() - reset decomp context, reusing allocated memory
DecompressContinue() - decompress some bytes from input to output buf

The missing streaming API is:

double CompressQuery(comp_ctx *pCtx, const void *pBuf, size_t size)

This function efficiently computes the compressed size, in fractional bits (and/or integer bytes) of the specified buffer using the current compression context. Importantly, the current compression context (entropy coding state, sliding dictionary, statistical models, etc.) is not modified. 

This API basically gives you an upper bound on how many compressed bits would be added to the output given a particular input. (It's an upper bound, not exact, because the flush imposes a hard artificial LZ phrase boundary on the output.)

This API can be inefficiently emulated to some degree on streaming compressors that support flushing, except you'll have to settle for only integer byte results, and put up with a full recompress before each query. CompressQuery() is superior because it can give you fractional bit results, it doesn't need to fully update its statistical models, or even fully entropy code the output (it just has to compute how many bits it would output, which codecs like LZMA/LZHAM can do today because they must compute accurate "bit prices" during near-optimal parsing).

CompressQuery() should be implementable in any lossless compressor, not just LZ based ones. Typically, lossless compression is viewed as some black box that occurs after you've generated some data. With this API you can now intimately interact with the compression engine in order to choose the set of data that leads to higher compression.

Example ideas of what you can now do with this API:

1. Rate distortion optimized (RDO) DXTc/PVRTC/etc. compression (i.e. like crunch)

A typical DXTc block compressor evaluates hundreds to thousands of possible packed block candidates, many of them with very similar or virtually the same PSNR's.  A simple RDO DXTc compressor would compute a list of candidate DXTc blocks for each input block, query the backend lossless compressor on each candidate block to determine how many bits would be added to the compressed output, then choose the encoding that strikes the best balance between coded bits and quality. The block compressor then codes (or "commits") this specific block to the compressed output stream by calling CompressContinue(), then continues to the next block and starts the process over again.

This is just local optimization. A more advanced version would use a dynamic programming approach to look ahead multiple blocks (like LZMA or LZHAM's parsers do) to build a graph so the best combination of blocks can be chosen that best balances compressed bits vs. PSNR.

I have already done several promising experiments in this area on DXTc textures while writing crunch. Interestingly, this approach is compatible with virtually any block based format.

2. Universal prediction engine
Honestly this usage is pretty far out and speculative. Here's one possibility, in the context of a real-time or turn based game:

Each frame, encode the position of a player character into a C-style POD fixed size data structure (let's call them records). Compress the raw record bytes by calling CompressContinue(). Simple enough.

Now here's where things get interesting. Let's say we want to try and determine the probability that the character will be at position X on the next frame. Evaluate the next X possible legal gamespace positions the character can be in, and encode these positions into records like usual. Now iterate through each possible legal position's record and call CompressQuery() on each record's serialized struct. 

The return value will be how many fractional bits are needed to encode each structure given the compressor's current context. The more bits needed to encode a record, the higher the record's entropy, and the less likely (or more "surprising") the position is to the compressor. More probable (less surprising) records will require fewer bits. 

Once the codec forms a decent model of the input records it should be able to predict the next position with (hopefully) reasonable certainty. This approach could be quite interesting given a sophisticated enough statistical modeling system and entropy coding backend. (Or not, I haven't tried it yet.)

3. bsdiff-style preprocessing for delta compression
bsdiff is a LZ-like approach for creating patch files. It consists of a command stream, a delta byte stream, and a literal byte stream, which can all be separately compressed (as bsdiff.exe does using bzip2). Importantly, there is no single "right" way of encoding a patch stream, i.e. there are many possible ways of generating fully valid patch command streams.

CompressQuery() can be called while composing the various streams in order to determine the most optimal set of commands/delta bytes/literal bytes to generate, in order to minimize the resulting compressed patch file size.

4. Optimized PNG-like lossless image compression
The typical PNG compressor adaptively chooses the filter to use on each scanline which minimizes the sum of absolute errors. A better metric would be, for each filter, to call CompressQuery() to determine how many compressed bits would be output if that filter was selected, and choose the filter that results in the fewest bits.

5. Basically any data that will be losslessly compressed that has multiple valid or usable encodings (mesh vertex data, curve fitted animation data, VQ data, etc.) could benefit from tightly coupling the data generation process with the backend lossless compressor.

No comments:

Post a Comment