The Computer Oracle

Why is a 7zipped file larger than the raw file?

--------------------------------------------------
Rise to the top 3% as a developer or hire one of them at Toptal: https://topt.al/25cXVn
--------------------------------------------------

Music by Eric Matyas
https://www.soundimage.org
Track title: Industries in Orbit Looping

--

Chapters
00:00 Why Is A 7zipped File Larger Than The Raw File?
00:19 Answer 1 Score 6
00:33 Answer 2 Score 0
00:50 Answer 3 Score 2
01:24 Accepted Answer Score 82
03:34 Thank you

--

Full question
https://superuser.com/questions/464315/w...

--

Content licensed under CC BY-SA
https://meta.stackexchange.com/help/lice...

--

Tags
#compression #zip #7zip

#avk47



ACCEPTED ANSWER

Score 82


It comes down to a concept called entropy. See Wikipedia.

The basic idea is that, if there existed a compression operation that could always make a file smaller, then logic dictates that said compression operation would be able to reduce any file to 0 bytes and still retain all the data. But this is absurd, because we know that 0 bytes can not convey any information at all. So we have just proven that there can not exist a compression algorithm that always makes its input smaller, because if that were the case, any information could be stored in 0 bytes -- but 0 bytes implies the absence of information, so you can't simultaneously have no information and all information. Hence, it's absurd.

Due to this theoretical concept, every compression program you ever use is going to increase the size of (or at best, maintain the same size of) some input. That is, for any compression algorithm you design or use, there will be certain inputs that will come out smaller, and some that will not.

Already-compressed data is generally a terrible candidate for further compression, because most lossless compression algorithms are based on the same theoretical principles. It is possible to compress poorly-compressed data even further; but this is less efficient than simply compressing it with the best-available algorithm from the original data to begin with.

For example, if you had a 100 MB text file and compress it using the regular Zip algorithm, it might get compressed down to 50 MB. If you then compress the Zip file with LZMA2, you might get it down to 40 or 45 MB, because LZMA has a higher compression ratio for most compressible data than Zip does. So it stands to reason that it can also compress Zip data, because Zip doesn't completely suck all the entropy out of it. But if you eliminate the Zip container entirely, you may be able to get it even smaller by compressing the raw text with LZMA2, potentially yielding something on the order of 30 - 35 MB (these are just "air numbers" to illustrate the concept).

In the case of that binary you're trying to compress, it's larger because the 7-Zip file format has to create its own internal structure and pack the already-compressed executable's data into the 7-Zip format. This contains things like a dictionary, a file header, and so on. These extra data are usually more than offset by the savings of compressing the data itself, but it appears that the executable you're trying to compress is already compressed with some form of LZMA; otherwise, it would likely shrink the size of the executable or very slightly increase it, rather than increasing it by 2 MB (which is a lot).




ANSWER 2

Score 6


If the original executable was already compressed (or contained heavily compressed data or noncompressible data) then compressing it will increase the size.




ANSWER 3

Score 2


Most compression algorithms use whats called a symbol table, basicly just peices of the file it uses as elements it CAN compress. This, of course, creates some overhead in the file but usually results a much smaller file.

In already compressed files, it still creates a set of symbols, but there's very little that can be reduce the size on. In your case, the the symbol table of the already compressed file is probably in the neighborhood of 2 MB or probably more if it did manage to do some compressing.




ANSWER 4

Score 0


the compressing ideea:

the compression software creates a list of files and eliminates the duplicate content.

when compressing already compressed files, you may get your compressed files bigger than the original.