| This is a patched version of zlib modified to use |
| Pentium-optimized assembly code in the deflation algorithm. The files |
| changed/added by this patch are: |
| |
| README.586 |
| match.S |
| |
| The effectiveness of these modifications is a bit marginal, as the the |
| program's bottleneck seems to be mostly L1-cache contention, for which |
| there is no real way to work around without rewriting the basic |
| algorithm. The speedup on average is around 5-10% (which is generally |
| less than the amount of variance between subsequent executions). |
| However, when used at level 9 compression, the cache contention can |
| drop enough for the assembly version to achieve 10-20% speedup (and |
| sometimes more, depending on the amount of overall redundancy in the |
| files). Even here, though, cache contention can still be the limiting |
| factor, depending on the nature of the program using the zlib library. |
| This may also mean that better improvements will be seen on a Pentium |
| with MMX, which suffers much less from L1-cache contention, but I have |
| not yet verified this. |
| |
| Note that this code has been tailored for the Pentium in particular, |
| and will not perform well on the Pentium Pro (due to the use of a |
| partial register in the inner loop). |
| |
| If you are using an assembler other than GNU as, you will have to |
| translate match.S to use your assembler's syntax. (Have fun.) |
| |
| Brian Raiter |
| breadbox@muppetlabs.com |
| April, 1998 |
| |
| |
| Added for zlib 1.1.3: |
| |
| The patches come from |
| http://www.muppetlabs.com/~breadbox/software/assembly.html |
| |
| To compile zlib with this asm file, copy match.S to the zlib directory |
| then do: |
| |
| CFLAGS="-O3 -DASMV" ./configure |
| make OBJA=match.o |