Welcome to my zlib hacks page. Send comments, code patches or anything else related to these files to christop@charm.net. These files are unsupported by the zlib team, so please don't email them directly. If you are a developer, the Zlib-devel mailing list is a good resource.

11-09-2004

Cool! Gilles Vollant did a windows x86_64 port of the inflate asm code in inffas86.c here. I don't have a win64 platform yet.

Oct-21-2004

The ifdef NO_GUNZIP was not being defined when NO_GZIP is used to build zlib, this version removes that ifdef inffast.S, but needs zlib version > 1.2.2 to work when NO_GZIP is defined.

The AMD64 comment was added to inffas86.c.

Dec-29-2003

I added AMD64 inflate asm support to the 1.2.1 release in inffas86.c. This version is also slightly quicker on x86 systems because instead of using rep movsb to copy data, it uses rep movsw, which moves data in 2 byte chunks instead of single bytes. I've tested the AMD64 code on a Fedora Core 1 + the x86_64 updates from http://fedora.linux.duke.edu/fc1_x86_64, which is running on a Athlon 64 3000+ / Gigabyte GA-K8VT800M system with 1GB ram. The 64 bit version is about 4% faster than the 32 bit version when decompressing mozilla-source-1.3.tar.gz.

$ uname -a
Linux sic 2.4.22-1.2129.nptl #1 Tue Dec 2 00:09:06 CST 2003 x86_64 x86_64 x86_64 GNU/Linux

# build and time the 64 bit version

$ cp contrib/inflate86/inffas86.c .
$ make CFLAGS="-O3 -DUSE_MMAP -DASMINF -fomit-frame-pointer" OBJA=inffas86.o
$ time minigzip -d < mozilla-source-1.3.tar.gz > /dev/null

real    0m1.186s
user    0m1.140s
sys     0m0.040s

# build and time the 32 bit version

$ rm *.o
$ make CFLAGS="-O3 -m32 -DUSE_MMAP -DASMINF -fomit-frame-pointer" OBJA=inffas86.o
$ time minigzip -d < mozilla-source-1.3.tar.gz > /dev/null

real    0m1.247s
user    0m1.220s
sys     0m0.030s

Nov-23-2003

Zlib 1.2.1 is released here.

Oct-10-2003

I added zlib dll files based on Gilles Vollant's vc7 1.2.0.7 builds found at www.winimage.com here.

The icc7 zlib dll build is here.

The icc7 and vc7 zlib dll compiles are here.

The test data I used to compare the builds is here.

The output of 'runtest.sh' for each of the builds: icc7 w/ASM icc7 wo/ASM vc7 w/ASM vc7 wo/ASM.

I ran these tests on a 1.4ghz Pentium M with 512MB ram.

Sep-14-2003

Updated version for release candidate 1.2.0.5.

The x86 at&t asm version: inffast.S and the inline C x86 asm version: inffas86.c.

$ sha1sum * > /tmp/sumout ; cd /tmp

$ cat sumout
c3154daab8d0f12ead06dffc4d0153655d1b305f  inffas86.c
abd5b7173b763fcb665644955f6eabcc2b056f64  inffast.S

$ gpg -a --detach-sign sumout 
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.7 (GNU/Linux)

iD8DBQA/ZBCD3iAhXCETSNYRAphiAKC73MW3aYbmncSwUKZ93/rCIn7wMwCfUCKE
D72kU/aGharW93u8paao/3o=
=LN31
-----END PGP SIGNATURE-----

$ gpg --verify sumout.asc
gpg: Signature made Sun 14 Sep 2003 02:53:55 AM EDT using DSA key ID 211348D6
gpg: Good signature from "Chris Anderson "

Public key here.

Mar-14-2003

Here are the inffast.c assembly versions for zlib-1.2.0 which is not widely available yet. The x86 at&t asm version: inffast.S and the inline C x86 asm version: inffas86.c.

The inline C asm version contains code for both Microsoft C and GNU C compilers. I've successfully compiled and tested it under gcc3.2, gcc2.96, icc6.0, and msvc6.0. The inffast.S version is still slightly faster on PIII platforms because it contains MMX code which I haven't put into the inline C version yet.

These files only effect performance of decompression. Before using this code, I recommend evaluating the performance of the standard zlib. A good optimizing compiler can boost performance of zlib decompression significantly. Since decompression on my P4/2ghz can reach 100 Mbytes/sec or more with the C version of inflate, there are other bandwidth bottlenecks in a system which will limit how fast zlib can decompress (e.g. disk, network i/o).

To create a zlib with inffast.S or inffas86.c under Linux:

$ gunzip < zlib-1.2.0.tar.gz | tar xf -
$ cd zlib-1.2.0
$ CFLAGS="-O3 --omit-frame-pointer" configure
Checking for gcc...
Building static library libz.a version 1.2.0 with gcc.
Checking for unistd.h... Yes.
Checking whether to use vsnprintf() or snprintf()... using vsnprintf()
Checking for vsnprintf() in stdio.h... Yes.
Checking for return value of vsnprintf()... Yes.
Checking for errno.h...  Yes.
Checking for mmap support... Yes
$ make
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o example.o example.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o adler32.o adler32.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o compress.o compress.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o crc32.o crc32.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o gzio.o gzio.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o uncompr.o uncompr.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o deflate.o deflate.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o trees.o trees.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o zutil.o zutil.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o inflate.o inflate.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o infback.o infback.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o inftrees.o inftrees.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o inffast.o inffast.c
ar rc libz.a adler32.o compress.o crc32.o gzio.o uncompr.o deflate.o trees.o zutil.o inflate.o infback.o inftrees.o inffast.o 
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o example example.o libz.a
gcc -O3 --omit-frame-pointer -DUSE_MMAP   -c -o minigzip.o minigzip.c
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o minigzip minigzip.o libz.a
$ make test
hello world
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek: hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
                *** zlib test OK ***
Copy inffast.S or inffas86.c to contrib/inflate86/ and add these lines to the Makefile before the DO NOT DELETE line:
# x86 asm inflate_fast() version
inffast.o: contrib/inflate86/inffast.S
	gcc -c contrib/inflate86/inffast.S -o inffast.o

# DO NOT DELETE THIS LINE -- make depend depends on it.
Or add these lines for inffas86.c:
# x86 C inline asm inflate_fast() version
inffast.o: contrib/inflate86/inffas86.c
	$(CC) $(CFLAGS) -I. -c contrib/inflate86/inffas86.c -o inffast.o

# DO NOT DELETE THIS LINE -- make depend depends on it.
Note: the space preceding $(CC) and gcc on the third line is really a tab, make is picky about this.
$ cp minigzip minigzip_std
$ (edit Makefile)
$ cp inffas86.c contrib/inflate86/
$ rm inffast.o
$ make
gcc -O3 --omit-frame-pointer -DUSE_MMAP -I. -c contrib/inflate86/inffas86.c -o inffast.o
ar rc libz.a adler32.o compress.o crc32.o gzio.o uncompr.o deflate.o trees.o zutil.o inflate.o infback.o inftrees.o inffast.o 
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o example example.o libz.a
gcc -O3 --omit-frame-pointer -DUSE_MMAP -o minigzip minigzip.o libz.a
$ make test
hello world
uncompress(): hello, hello!
gzread(): hello, hello!
gzgets() after gzseek: hello!
inflate(): hello, hello!
large_inflate(): OK
after inflateSync(): hello, hello!
inflate with dictionary: hello, hello!
                *** zlib test OK ***
That's it! Try minigzip -d and minigzip_std -d on some files to see if all of this was worth it:
$ ls -l ~/zips/mozilla-source-1.3.tar.gz 
-rw-------    1 chris    chris    41527472 Mar 14 02:38 /home/chris/zips/mozilla-source-1.3.tar.gz
$ ./minigzip -d < ~/zips/mozilla-source-1.3.tar.gz | wc -c
266434560
$ time cat < ~/zips/mozilla-source-1.3.tar.gz > /dev/null

real    0m0.044s
user    0m0.010s
sys     0m0.040s
$ time ./minigzip -d < ~/zips/mozilla-source-1.3.tar.gz > /dev/null

real    0m2.146s
user    0m2.090s
sys     0m0.030s
$ time ./minigzip_std -d < ~/zips/mozilla-source-1.3.tar.gz > /dev/null

real    0m2.795s
user    0m2.750s
sys     0m0.040s
Which ain't too shabby... 266434560 / 2.146 / 1024 = 121244 Kbytes/sec
$ gcc --version
2.96
$ cat /proc/cpuinfo
processor       : 0
vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Pentium(R) 4 CPU 2.00GHz
stepping        : 4
cpu MHz         : 2017.985
cache size      : 512 KB
fdiv_bug        : no
hlt_bug         : no
f00f_bug        : no
coma_bug        : no
fpu             : yes
fpu_exception   : yes
cpuid level     : 2
wp              : yes
flags           : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm
bogomips        : 4023.91