Skip to content
This repository was archived by the owner on Mar 1, 2024. It is now read-only.
Permalink

Comparing changes

Choose two branches to see what’s changed or to start a new pull request. If you need to, you can also or learn more about diff comparisons.

Open a pull request

Create a new pull request by comparing changes across two branches. If you need to, you can also . Learn more about diff comparisons here.
base repository: intel/zlib
Failed to load repositories. Confirm that selected base ref is valid, then try again.
Loading
base: master
Choose a base ref
...
head repository: 12sidedtech/zlib
Failed to load repositories. Confirm that selected head ref is valid, then try again.
Loading
compare: master
Choose a head ref
Checking mergeability… Don’t worry, you can still create the pull request.
  • 18 commits
  • 16 files changed
  • 4 contributors

Commits on Dec 13, 2013

  1. Add architecture detection in configure script.

    This allows for per-architecture build tuning.
    jtkukunas committed Dec 13, 2013
    Configuration menu
    Copy the full SHA
    1af4192 View commit details
    Browse the repository at this point in the history

Commits on Jan 17, 2014

  1. For x86, add CPUID check.

    Adds check for SSE2, SSE4.2, and the PCLMULQDQ instructions.
    jtkukunas committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    d24da7c View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    99999a8 View commit details
    Browse the repository at this point in the history
  3. Add preprocessor define to tune Adler32 loop unrolling.

    Excessive loop unrolling is detrimental to performance. This patch
    adds a preprocessor define, ADLER32_UNROLL_LESS, to reduce unrolling
    factor from 16 to 8.
    
    Updates configure script to set as default on x86
    jtkukunas committed Jan 17, 2014
    Configuration menu
    Copy the full SHA
    fad00ea View commit details
    Browse the repository at this point in the history

Commits on Jun 3, 2014

  1. Tune longest_match implementation

    Separates the byte-by-byte and short-by-short longest_match
    implementations into two separately tweakable versions and
    splits all of the longest match functions into a separate file.
    
    Split the end-chain and early-chain scans and provide likely/unlikely
    hints to improve branh prediction.
    
    Add an early termination condition for levels 5 and under to stop
    iterating the hash chain when the match length for the current
    entry is less than the current best match.
    
    Also adjust variable types and scopes to provide better optimization
    hints to the compiler.
    jtkukunas committed Jun 3, 2014
    Configuration menu
    Copy the full SHA
    2c27091 View commit details
    Browse the repository at this point in the history
  2. Add preprocessor define to tune crc32 unrolling.

    Adds a preprocessor define, CRC32_UNROLL_LESS, to reduce unrolling
    factor from 8 to 4 for the crc32 calculation.
    
    Updates configure script to set as default on x86
    jtkukunas committed Jun 3, 2014
    Configuration menu
    Copy the full SHA
    fd80ca4 View commit details
    Browse the repository at this point in the history
  3. Adds SSE2 optimized hash shifting to fill_window.

    Uses SSE2 subtraction with saturation to shift the hash in
    16B chunks. Renames the old fill_window implementation to
    fill_window_c(), and adds a new fill_window_sse() implementation
    in fill_window_sse.c.
    
    Moves UPDATE_HASH into deflate.h and changes the scope of
    read_buf from local to ZLIB_INTERNAL for sharing between
    the two implementations.
    
    Updates the configure script to check for SSE2 intrinsics and enables
    this optimization by default on x86. The runtime check for SSE2 support
    only occurs on 32-bit, as x86_64 requires SSE2. Adds an explicit
    rule in Makefile.in to build fill_window_sse.c with the -msse2 compiler
    flag, which is required for SSE2 intrinsics.
    jtkukunas committed Jun 3, 2014
    Configuration menu
    Copy the full SHA
    5640481 View commit details
    Browse the repository at this point in the history
  4. add SSE4.2 optimized hash function

    For systems supporting SSE4.2, use the crc32 instruction as a fast
    hash function. Also, provide a better fallback hash.
    
    For both new hash functions, we hash 4 bytes, instead of 3, for certain
    levels. This shortens the hash chains, and also improves the quality
    of each hash entry.
    jtkukunas committed Jun 3, 2014
    Configuration menu
    Copy the full SHA
    d306c75 View commit details
    Browse the repository at this point in the history
  5. add PCLMULQDQ optimized CRC folding

    Rather than copy the input data from strm->next_in into the window and
    then compute the CRC, this patch combines these two steps into one. It
    performs a SSE memory copy, while folding the data down in the SSE
    registers. A final step is added, when we write the gzip trailer,
    to reduce the 4 SSE registers to 32b.
    
    Adds some extra padding bytes to the window to allow for SSE partial
    writes.
    jtkukunas committed Jun 3, 2014
    Configuration menu
    Copy the full SHA
    3684659 View commit details
    Browse the repository at this point in the history

Commits on Jul 26, 2014

  1. deflate: add new deflate_quick strategy for level 1

    The deflate_quick strategy is designed to provide maximum
    deflate performance.
    
    deflate_quick achieves this through:
        - only checking the first hash match
        - using a small inline SSE4.2-optimized longest_match
        - forcing a window size of 8K, and using a precomputed dist/len
          table
        - forcing the static Huffman tree and emitting codes immediately
          instead of tallying
    
    This patch changes the scope of flush_pending, bi_windup, and
    static_ltree to ZLIB_INTERNAL and moves END_BLOCK, send_code,
    put_short, and send_bits to deflate.h.
    
    Updates the configure script to enable by default for x86. On systems
    without SSE4.2, fallback is to deflate_fast strategy.
    
    Fixes #6
    Fixes #8
    jtkukunas committed Jul 26, 2014
    Configuration menu
    Copy the full SHA
    d948170 View commit details
    Browse the repository at this point in the history
  2. deflate: add new deflate_medium strategy

    From: Arjan van de Ven <arjan@linux.intel.com>
    
    As the name suggests, the deflate_medium deflate strategy is designed
    to provide an intermediate strategy between deflate_fast and deflate_slow.
    After finding two adjacent matches, deflate_medium scans left from
    the second match in order to determine whether a better match can be
    formed.
    
    Fixes #2
    jtkukunas committed Jul 26, 2014
    Configuration menu
    Copy the full SHA
    0a225b1 View commit details
    Browse the repository at this point in the history
  3. deflate: avoid use of uninitialized variable

    (Note emit_match() doesn't currently use the value at all.)
    
    Fixes #4
    Nathan Kidd authored and jtkukunas committed Jul 26, 2014
    Configuration menu
    Copy the full SHA
    86694e8 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    308be56 View commit details
    Browse the repository at this point in the history
  5. Add forward declarations for fill_window_sse and flush_pending to def…

    …late_quick.c.
    mp15 authored and jtkukunas committed Jul 26, 2014
    Configuration menu
    Copy the full SHA
    ed145f4 View commit details
    Browse the repository at this point in the history
  6. Add crc_ forward declarations to deflate and add read_buf fwd dcl to …

    …fill_window_sse.
    mp15 authored and jtkukunas committed Jul 26, 2014
    Configuration menu
    Copy the full SHA
    e176b3c View commit details
    Browse the repository at this point in the history

Commits on May 16, 2016

  1. Fix Partial Symbol Generation for QUICK deflate

    When using deflate_quick() in a streaming fashion and the output buffer
    runs out of space while the input buffer still has data, deflate_quick()
    would emit partial symbols. Force the deflate_quick() loop to terminate
    for a flush before any further processing is done, returning to the main
    deflate() routine to do its thing.
    pvachon committed May 16, 2016
    Configuration menu
    Copy the full SHA
    06961a6 View commit details
    Browse the repository at this point in the history

Commits on May 17, 2016

  1. Add block_open state for deflate_quick

    By storing whether or not a block has been opened (or terminated), the
    static trees used for the block and the end block markers can be emitted
    appropriately.
    pvachon committed May 17, 2016
    Configuration menu
    Copy the full SHA
    4316869 View commit details
    Browse the repository at this point in the history
  2. Initialize block_open state

    On deflation context creation, initialize the block_open state to 0 to
    ensure that no uninitialized values are used.
    pvachon committed May 17, 2016
    Configuration menu
    Copy the full SHA
    d4cd963 View commit details
    Browse the repository at this point in the history
Loading