Welcome to my zlib hacks page. Send comments, code patches or anything else related to these files to christop@charm.net. These files are unsupported by the zlib team, so please don't email them directly. If you are a developer, the Zlib-devel mailing list is a good resource.
11-09-2004
Cool! Gilles Vollant did a windows x86_64 port of the inflate asm code in inffas86.c here. I don't have a win64 platform yet.
Oct-21-2004
The ifdef NO_GUNZIP was not being defined when NO_GZIP is used to build zlib, this version removes that ifdef inffast.S, but needs zlib version > 1.2.2 to work when NO_GZIP is defined.
The AMD64 comment was added to inffas86.c.
Dec-29-2003
I added AMD64 inflate asm support to the 1.2.1 release in inffas86.c. This version is also slightly quicker on x86 systems because instead of using rep movsb to copy data, it uses rep movsw, which moves data in 2 byte chunks instead of single bytes. I've tested the AMD64 code on a Fedora Core 1 + the x86_64 updates from http://fedora.linux.duke.edu/fc1_x86_64, which is running on a Athlon 64 3000+ / Gigabyte GA-K8VT800M system with 1GB ram. The 64 bit version is about 4% faster than the 32 bit version when decompressing mozilla-source-1.3.tar.gz.
$ uname -a Linux sic 2.4.22-1.2129.nptl #1 Tue Dec 2 00:09:06 CST 2003 x86_64 x86_64 x86_64 GNU/Linux # build and time the 64 bit version $ cp contrib/inflate86/inffas86.c . $ make CFLAGS="-O3 -DUSE_MMAP -DASMINF -fomit-frame-pointer" OBJA=inffas86.o $ time minigzip -d < mozilla-source-1.3.tar.gz > /dev/null real 0m1.186s user 0m1.140s sys 0m0.040s # build and time the 32 bit version $ rm *.o $ make CFLAGS="-O3 -m32 -DUSE_MMAP -DASMINF -fomit-frame-pointer" OBJA=inffas86.o $ time minigzip -d < mozilla-source-1.3.tar.gz > /dev/null real 0m1.247s user 0m1.220s sys 0m0.030s
Nov-23-2003
Zlib 1.2.1 is released here.
Oct-10-2003
I added zlib dll files based on Gilles Vollant's vc7 1.2.0.7 builds found at www.winimage.com here.
The icc7 zlib dll build is here.
The icc7 and vc7 zlib dll compiles are here.
The test data I used to compare the builds is here.
The output of 'runtest.sh' for each of the builds: icc7 w/ASM icc7 wo/ASM vc7 w/ASM vc7 wo/ASM.
I ran these tests on a 1.4ghz Pentium M with 512MB ram.
Sep-14-2003
Updated version for release candidate 1.2.0.5.
The x86 at&t asm version: inffast.S and the inline C x86 asm version: inffas86.c.$ sha1sum * > /tmp/sumout ; cd /tmp $ cat sumout c3154daab8d0f12ead06dffc4d0153655d1b305f inffas86.c abd5b7173b763fcb665644955f6eabcc2b056f64 inffast.S $ gpg -a --detach-sign sumout -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.0.7 (GNU/Linux) iD8DBQA/ZBCD3iAhXCETSNYRAphiAKC73MW3aYbmncSwUKZ93/rCIn7wMwCfUCKE D72kU/aGharW93u8paao/3o= =LN31 -----END PGP SIGNATURE----- $ gpg --verify sumout.asc gpg: Signature made Sun 14 Sep 2003 02:53:55 AM EDT using DSA key ID 211348D6 gpg: Good signature from "Chris Anderson"
Public key here.
Mar-14-2003
Here are the inffast.c assembly versions for zlib-1.2.0 which is not widely available yet. The x86 at&t asm version: inffast.S and the inline C x86 asm version: inffas86.c.
The inline C asm version contains code for both Microsoft C and GNU C compilers. I've successfully compiled and tested it under gcc3.2, gcc2.96, icc6.0, and msvc6.0. The inffast.S version is still slightly faster on PIII platforms because it contains MMX code which I haven't put into the inline C version yet.
These files only effect performance of decompression. Before using this code, I recommend evaluating the performance of the standard zlib. A good optimizing compiler can boost performance of zlib decompression significantly. Since decompression on my P4/2ghz can reach 100 Mbytes/sec or more with the C version of inflate, there are other bandwidth bottlenecks in a system which will limit how fast zlib can decompress (e.g. disk, network i/o).
To create a zlib with inffast.S or inffas86.c under Linux:
$ gunzip < zlib-1.2.0.tar.gz | tar xf -
$ cd zlib-1.2.0
$ CFLAGS="-O3 --omit-frame-pointer" configure
Checking for gcc...
Building static library libz.a version 1.2.0 with gcc.
Checking for unistd.h... Yes.
Checking whether to use vsnprintf() or snprintf()... using vsnprintf()
Checking for vsnprintf() in stdio.h... Yes.
Checking for return value of vsnprintf()... Yes.
Checking for errno.h... Yes.
Checking for mmap support... Yes
$ make
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o example.o example.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o adler32.o adler32.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o compress.o compress.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o crc32.o crc32.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o gzio.o gzio.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o uncompr.o uncompr.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o deflate.o deflate.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o trees.o trees.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o zutil.o zutil.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o inflate.o inflate.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o infback.o infback.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o inftrees.o inftrees.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o inffast.o inffast.c
ar rc libz.a adler32.o compress.o crc32.o gzio.o uncompr.o deflate.o trees.o zutil.o inflate.o infback.o inftrees.o inffast.o
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o example example.o libz.a
gcc -O3 --omit-frame-pointer -DUSE_MMAP -c -o minigzip.o minigzip.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o minigzip minigzip.o libz.a
$ make test
hello world
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek: hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
*** zlib test OK ***
Copy inffast.S or inffas86.c to contrib/inflate86/ and
add these lines to the Makefile before the DO NOT DELETE line:
# x86 asm inflate_fast() version inffast.o: contrib/inflate86/inffast.S gcc -c contrib/inflate86/inffast.S -o inffast.o # DO NOT DELETE THIS LINE -- make depend depends on it.Or add these lines for inffas86.c:
# x86 C inline asm inflate_fast() version inffast.o: contrib/inflate86/inffas86.c $(CC) $(CFLAGS) -I. -c contrib/inflate86/inffas86.c -o inffast.o # DO NOT DELETE THIS LINE -- make depend depends on it.Note: the space preceding $(CC) and gcc on the third line is really a tab, make is picky about this.
$ cp minigzip minigzip_std
$ (edit Makefile)
$ cp inffas86.c contrib/inflate86/
$ rm inffast.o
$ make
gcc -O3 --omit-frame-pointer -DUSE_MMAP -I. -c contrib/inflate86/inffas86.c -o inffast.o
ar rc libz.a adler32.o compress.o crc32.o gzio.o uncompr.o deflate.o trees.o zutil.o inflate.o infback.o inftrees.o inffast.o
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o example example.o libz.a
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o minigzip minigzip.o libz.a
$ make test
hello world
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek: hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
*** zlib test OK ***
That's it! Try minigzip -d and minigzip_std -d on some files to see if all of
this was worth it:
$ ls -l ~/zips/mozilla-source-1.3.tar.gz -rw------- 1 chris chris 41527472 Mar 14 02:38 /home/chris/zips/mozilla-source-1.3.tar.gz $ ./minigzip -d < ~/zips/mozilla-source-1.3.tar.gz | wc -c 266434560 $ time cat < ~/zips/mozilla-source-1.3.tar.gz > /dev/null real 0m0.044s user 0m0.010s sys 0m0.040s $ time ./minigzip -d < ~/zips/mozilla-source-1.3.tar.gz > /dev/null real 0m2.146s user 0m2.090s sys 0m0.030s $ time ./minigzip_std -d < ~/zips/mozilla-source-1.3.tar.gz > /dev/null real 0m2.795s user 0m2.750s sys 0m0.040sWhich ain't too shabby... 266434560 / 2.146 / 1024 = 121244 Kbytes/sec
$ gcc --version 2.96 $ cat /proc/cpuinfo processor : 0 vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 2.00GHz stepping : 4 cpu MHz : 2017.985 cache size : 512 KB fdiv_bug : no hlt_bug : no f00f_bug : no coma_bug : no fpu : yes fpu_exception : yes cpuid level : 2 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm bogomips : 4023.91