-
Notifications
You must be signed in to change notification settings - Fork 36
Comparing changes
Open a pull request
base repository: intel/zlib
base: master
head repository: 12sidedtech/zlib
compare: master
- 18 commits
- 16 files changed
- 4 contributors
Commits on Dec 13, 2013
-
Add architecture detection in configure script.
This allows for per-architecture build tuning.
Configuration menu - View commit details
-
Copy full SHA for 1af4192 - Browse repository at this point
Copy the full SHA 1af4192View commit details
Commits on Jan 17, 2014
-
Adds check for SSE2, SSE4.2, and the PCLMULQDQ instructions.
Configuration menu - View commit details
-
Copy full SHA for d24da7c - Browse repository at this point
Copy the full SHA d24da7cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 99999a8 - Browse repository at this point
Copy the full SHA 99999a8View commit details -
Add preprocessor define to tune Adler32 loop unrolling.
Excessive loop unrolling is detrimental to performance. This patch adds a preprocessor define, ADLER32_UNROLL_LESS, to reduce unrolling factor from 16 to 8. Updates configure script to set as default on x86
Configuration menu - View commit details
-
Copy full SHA for fad00ea - Browse repository at this point
Copy the full SHA fad00eaView commit details
Commits on Jun 3, 2014
-
Tune longest_match implementation
Separates the byte-by-byte and short-by-short longest_match implementations into two separately tweakable versions and splits all of the longest match functions into a separate file. Split the end-chain and early-chain scans and provide likely/unlikely hints to improve branh prediction. Add an early termination condition for levels 5 and under to stop iterating the hash chain when the match length for the current entry is less than the current best match. Also adjust variable types and scopes to provide better optimization hints to the compiler.
Configuration menu - View commit details
-
Copy full SHA for 2c27091 - Browse repository at this point
Copy the full SHA 2c27091View commit details -
Add preprocessor define to tune crc32 unrolling.
Adds a preprocessor define, CRC32_UNROLL_LESS, to reduce unrolling factor from 8 to 4 for the crc32 calculation. Updates configure script to set as default on x86
Configuration menu - View commit details
-
Copy full SHA for fd80ca4 - Browse repository at this point
Copy the full SHA fd80ca4View commit details -
Adds SSE2 optimized hash shifting to fill_window.
Uses SSE2 subtraction with saturation to shift the hash in 16B chunks. Renames the old fill_window implementation to fill_window_c(), and adds a new fill_window_sse() implementation in fill_window_sse.c. Moves UPDATE_HASH into deflate.h and changes the scope of read_buf from local to ZLIB_INTERNAL for sharing between the two implementations. Updates the configure script to check for SSE2 intrinsics and enables this optimization by default on x86. The runtime check for SSE2 support only occurs on 32-bit, as x86_64 requires SSE2. Adds an explicit rule in Makefile.in to build fill_window_sse.c with the -msse2 compiler flag, which is required for SSE2 intrinsics.
Configuration menu - View commit details
-
Copy full SHA for 5640481 - Browse repository at this point
Copy the full SHA 5640481View commit details -
add SSE4.2 optimized hash function
For systems supporting SSE4.2, use the crc32 instruction as a fast hash function. Also, provide a better fallback hash. For both new hash functions, we hash 4 bytes, instead of 3, for certain levels. This shortens the hash chains, and also improves the quality of each hash entry.
Configuration menu - View commit details
-
Copy full SHA for d306c75 - Browse repository at this point
Copy the full SHA d306c75View commit details -
add PCLMULQDQ optimized CRC folding
Rather than copy the input data from strm->next_in into the window and then compute the CRC, this patch combines these two steps into one. It performs a SSE memory copy, while folding the data down in the SSE registers. A final step is added, when we write the gzip trailer, to reduce the 4 SSE registers to 32b. Adds some extra padding bytes to the window to allow for SSE partial writes.
Configuration menu - View commit details
-
Copy full SHA for 3684659 - Browse repository at this point
Copy the full SHA 3684659View commit details
Commits on Jul 26, 2014
-
deflate: add new deflate_quick strategy for level 1
The deflate_quick strategy is designed to provide maximum deflate performance. deflate_quick achieves this through: - only checking the first hash match - using a small inline SSE4.2-optimized longest_match - forcing a window size of 8K, and using a precomputed dist/len table - forcing the static Huffman tree and emitting codes immediately instead of tallying This patch changes the scope of flush_pending, bi_windup, and static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code, put_short, and send_bits to deflate.h. Updates the configure script to enable by default for x86. On systems without SSE4.2, fallback is to deflate_fast strategy. Fixes #6 Fixes #8Configuration menu - View commit details
-
Copy full SHA for d948170 - Browse repository at this point
Copy the full SHA d948170View commit details -
deflate: add new deflate_medium strategy
From: Arjan van de Ven <arjan@linux.intel.com> As the name suggests, the deflate_medium deflate strategy is designed to provide an intermediate strategy between deflate_fast and deflate_slow. After finding two adjacent matches, deflate_medium scans left from the second match in order to determine whether a better match can be formed. Fixes #2
Configuration menu - View commit details
-
Copy full SHA for 0a225b1 - Browse repository at this point
Copy the full SHA 0a225b1View commit details -
deflate: avoid use of uninitialized variable
(Note emit_match() doesn't currently use the value at all.) Fixes #4
Configuration menu - View commit details
-
Copy full SHA for 86694e8 - Browse repository at this point
Copy the full SHA 86694e8View commit details -
Configuration menu - View commit details
-
Copy full SHA for 308be56 - Browse repository at this point
Copy the full SHA 308be56View commit details -
Configuration menu - View commit details
-
Copy full SHA for ed145f4 - Browse repository at this point
Copy the full SHA ed145f4View commit details -
Add crc_ forward declarations to deflate and add read_buf fwd dcl to …
…fill_window_sse.
Configuration menu - View commit details
-
Copy full SHA for e176b3c - Browse repository at this point
Copy the full SHA e176b3cView commit details
Commits on May 16, 2016
-
Fix Partial Symbol Generation for QUICK deflate
When using deflate_quick() in a streaming fashion and the output buffer runs out of space while the input buffer still has data, deflate_quick() would emit partial symbols. Force the deflate_quick() loop to terminate for a flush before any further processing is done, returning to the main deflate() routine to do its thing.
Configuration menu - View commit details
-
Copy full SHA for 06961a6 - Browse repository at this point
Copy the full SHA 06961a6View commit details
Commits on May 17, 2016
-
Add block_open state for deflate_quick
By storing whether or not a block has been opened (or terminated), the static trees used for the block and the end block markers can be emitted appropriately.
Configuration menu - View commit details
-
Copy full SHA for 4316869 - Browse repository at this point
Copy the full SHA 4316869View commit details -
On deflation context creation, initialize the block_open state to 0 to ensure that no uninitialized values are used.
Configuration menu - View commit details
-
Copy full SHA for d4cd963 - Browse repository at this point
Copy the full SHA d4cd963View commit details
This comparison is taking too long to generate.
Unfortunately it looks like we can’t render this comparison for you right now. It might be too big, or there might be something weird with your repository.
You can try running this command locally to see the comparison on your machine:
git diff master...master