Code Comments
Programming Forum and web based access to our favorite programming groups.> ..3 bytes .. The resulting file was 115 bytes. File header information of 112 bytes. > Next I took an uncompressible file (a zip file) of 6550 bytes and zipped i t. > The result was 6662. File header information of 112 bytes. > text file of 69 bytes. I zipped it. The result was 176 bytes. 176 - 112 -> 64. The 69 bytes were compressed to 64, probably using RLE. With typical Cobol record data there are large runs of spaces and zero characters because the fields must allow for the largest expected data item. This type of data can be compressed much smaller than 'text' that has little white space. Zip chooses the 'best' compression to use for a particular type of data. For small text files it may choose RLE, for large english text it may choose to use a dictionary based mechism that has a large table but very small resulting data.
Post Follow-up to this messageIn article <1137783996.596533.80170@f14g2000cwb.googlegroups.com>, "Richard" <riplin@Azonic .co.nz> writes: > > > 176 - 112 -> 64. The 69 bytes were compressed to 64, probably using > RLE. With a 69-byte "text" file, I doubt you'd get much compression, if any, from RLE; plain ASCII text (which I assume is what we're dealing with here) rarely has much in the way of runs (of whatever symbol length). That looks like the result of Deflate (probably straight LZ77) to me. I just did a bit of testing of my own, and Info-Zip and WinZip both compressed a 69-byte text file to 64 bytes using Deflate (defF, defN, or defX, depending on the "fast", "normal", or "max" options). Not that it matters... > With typical Cobol record data there are large runs of spaces and zero > characters because the fields must allow for the largest expected data > item. This type of data can be compressed much smaller than 'text' > that has little white space. Right, and here RLE often works very well, which is why we include a handful of RLE compressors (for 8-bit and 16-bit symbols, and for smaller or larger inputs) with MF COBOL. > Zip chooses the 'best' compression to use for a particular type of > data. For small text files it may choose RLE, for large english text it > may choose to use a dictionary based mechism that has a large table but > very small resulting data. Actually, LZ77 doesn't require much table space. The table (or tables, since LZ77 will generate a new one when the compression ratio drops as it's processing the data stream) is a Huffman table that maps output codes to offsets in previous input data. That is, the compressed symbols in LZ77 output refer back to earlier data: "this next piece is the same as the 16-bit sequence we saw 124 bits ago". So most of the information in the table is actually stored in the user data itself. And the table itself is compressed using a fixed Huffman encoding. -- Michael Wojcik michael.wojcik@microfocus.com Sure we're tossing out fluff, but tell me, where does anyone deal in words with substance? -- Haruki Murakami (trans Alfred Birnbaum)
Post Follow-up to this messagePowered by vBulletin
Copyright 2000-2006 Jelsoft Enterprises Limited.