You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues-all@impala.apache.org by "Michael Smith (Jira)" <ji...@apache.org> on 2022/07/28 17:08:00 UTC

[jira] [Created] (IMPALA-11458) Update to newer zlib/snappy/zstd

Michael Smith created IMPALA-11458:
--------------------------------------

             Summary: Update to newer zlib/snappy/zstd
                 Key: IMPALA-11458
                 URL: https://issues.apache.org/jira/browse/IMPALA-11458
             Project: IMPALA
          Issue Type: Improvement
          Components: Backend
    Affects Versions: Impala 4.1.0
            Reporter: Michael Smith


Update:
 * zlib 1.2.11 to 1.2.12
 * snappy 1.1.8 to 1.1.9
 * zstd 1.4.9 to 1.5.3

h3. zlib

Changes in 1.2.12 (27 Mar 2022)
- Cygwin does not have _wopen(), so do not create gzopen_w() there
- Permit a deflateParams() parameter change as soon as possible
- Limit hash table inserts after switch from stored deflate
- Fix bug when window full in deflate_stored()
- Fix CLEAR_HASH macro to be usable as a single statement
- Avoid a conversion error in gzseek when off_t type too small
- Have Makefile return non-zero error code on test failure
- Avoid some conversion warnings in gzread.c and gzwrite.c
- Update use of errno for newer Windows CE versions
- Small speedup to inflate [psumbera]
- Return an error if the gzputs string length can't fit in an int
- Add address checking in clang to -w option of configure
- Don't compute check value for raw inflate if asked to validate
- Handle case where inflateSync used when header never processed
- Avoid the use of ptrdiff_t
- Avoid an undefined behavior of memcpy() in gzappend()
- Avoid undefined behaviors of memcpy() in gz*printf()
- Avoid an undefined behavior of memcpy() in _tr_stored_block()
- Make the names in functions declarations identical to definitions
- Remove old assembler code in which bugs have manifested
- Fix deflateEnd() to not report an error at start of raw deflate
- Add legal disclaimer to README
- Emphasize the need to continue decompressing gzip members
- Correct the initialization requirements for deflateInit2()
- Fix a bug that can crash deflate on some input when using Z_FIXED
- Assure that the number of bits for deflatePrime() is valid
- Use a structure to make globals in enough.c evident
- Use a macro for the printf format of big_t in enough.c
- Clean up code style in enough.c, update version
- Use inline function instead of macro for index in enough.c
- Clarify that prefix codes are counted in enough.c
- Show all the codes for the maximum tables size in enough.c
- Add gznorm.c example, which normalizes gzip files
- Fix the zran.c example to work on a multiple-member gzip file
- Add tables for crc32_combine(), to speed it up by a factor of 200
- Add crc32_combine_gen() and crc32_combine_op() for fast combines
- Speed up software CRC-32 computation by a factor of 1.5 to 3
- Use atomic test and set, if available, for dynamic CRC tables
- Don't bother computing check value after successful inflateSync()
- Correct comment in crc32.c
- Add use of the ARMv8 crc32 instructions when requested
- Use ARM crc32 instructions if the ARM architecture has them
- Explicitly note that the 32-bit check values are 32 bits
- Avoid adding empty gzip member after gzflush with Z_FINISH
- Fix memory leak on error in gzlog.c
- Fix error in comment on the polynomial representation of a byte
- Clarify gz* function interfaces, referring to parameter names
- Change macro name in inflate.c to avoid collision in VxWorks
- Correct typo in blast.c
- Improve portability of contrib/minizip
- Fix indentation in minizip's zip.c
- Replace black/white with allow/block. (theresa-m)
- minizip warning fix if MAXU32 already defined. (gvollant)
- Fix unztell64() in minizip to work past 4GB. (Daniël Hörchner)
- Clean up minizip to reduce warnings for testing
- Add fallthrough comments for gcc
- Eliminate use of ULL constants
- Separate out address sanitizing from warnings in configure
- Remove destructive aspects of make distclean
- Check for cc masquerading as gcc or clang in configure
- Fix crc32.c to compile local functions only if used

h3. zstd

Need https://github.com/facebook/zstd/pull/3217 to update zlib.

v1.5.3 (June, 2022)
perf: 5-30% faster dictionary compression at levels 1-4 (#3086, #3114, #3152, @embg)
perf: 5-10% faster streaming compression at levels 1-2 (#3114, @embg)
perf: Remove branch in ZSTD_fast_noDict (#3129, @felixhandte)
perf: Add option to prefetch CDict tables (#3177, @embg)
perf: Minor compression ratio improvement (#2983, @Cyan4973)
perf: Minor speed improvement for Huffman decoding (#3013, @WojciechMula)
perf: Enable STATIC_BMI2 for gcc/clang (#3080, @TocarIP)
perf: Optimize ZSTD_row_getMatchMask for ARM levels 8-10 (#3139, #3160, @danlark1)
perf: aarch64 performance improvements (#3145, #3141, @JunHe77)
perf: Lazy parameters adaptation (#2974, @Cyan4973)
cli: Async write for decompression (#2975, @yoniko)
cli: Use buffered output (#2985, @yoniko)
cli: Change zstdless behavior to align with zless (#2909, @binhdvo)
cli: AsyncIO compression (#3021, #3022, @yoniko)
cli: Print zlib/lz4/lzma library versions in verbose version output (#3030, @terrelln)
cli: Fix for -r on empty directory (#3027, @brailovich)
cli: Fix required decompression memory usage reported by -vv + --long (#3042, @u1f35c)
cli: Fix infinite loop when empty input is passed to trainer (#3081, @terrelln)
cli: Implement more gzip compatibility (#3059, @dirkmueller)
cli: Keep original file if -c or --stdout is given (#3052, @dirkmueller)
bug: Fix for block-splitter (#3033, @Cyan4973)
bug: Fixes for Sequence Compression API (#3023, #3040, @Cyan4973)
bug: Fix leaking thread handles on Windows (#3147, @animalize)
bug: Fix timing issues with cmake/meson builds (#3166, #3167, #3170, @Cyan4973)
build: Allow user to select legacy level for cmake (#3050, @shadchin)
build: Enable legacy support by default in cmake (#3079, @niamster)
build: Meson improvements (#3039, #3122, @eli-schwartz)
build: Add aarch64 to supported architectures for zstd_trace (#3054, @ooosssososos)
doc: Split help in long and short version, cleanup formatting (#3094, @dirkmueller)
doc: Updated man page, providing more details for --train mode (#3112, @Cyan4973)
doc: Add decompressor errata document (#3092, @terrelln)
misc: Enable Intel CET (#2992, #2994, @hjl-tools)
misc: Streaming decompression can detect incorrect header ID sooner (#3175, @Cyan4973)

v1.5.2 (Jan, 2022)
perf: Regain Minimal memset()-ing During Reuse of Compression Contexts (@Cyan4973, #2969)
build: Build Zstd with `noexecstack` on All Architectures (@felixhandte, #2964)
doc: Clarify Licensing (@terrelln, #2981)

v1.5.1 (Dec, 2021)
perf: rebalanced compression levels, to better match the intended speed/level curve, by @senhuang42
perf: faster huffman decoder, using x64 assembly, by @terrelln
perf: slightly faster high speed modes (strategies fast & dfast), by @felixhandte
perf: improved binary size and faster compilation times, by @terrelln
perf: new row64 mode, used notably in level 12, by @senhuang42
perf: faster mid-level compression speed in presence of highly repetitive patterns, by @senhuang42
perf: minor compression ratio improvements for small data at high levels, by @cyan4973
perf: reduced stack usage (mostly useful for Linux Kernel), by @terrelln
perf: faster compression speed on incompressible data, by @bindhvo
perf: on-demand reduced ZSTD_DCtx state size, using build macro ZSTD_DECODER_INTERNAL_BUFFER, at a small cost of performance, by @bindhvo
build: allows hiding static symbols in the dynamic library, using build macro, by @skitt
build: support for m68k (Motorola 68000's), by @cyan4973
build: improved AIX support, by @Helflym
build: improved meson unofficial build, by @eli-schwartz
cli : custom memory limit when training dictionary (#2925), by @embg
cli : report advanced parameters information when compressing in very verbose mode (``-vv`), by @Svetlitski-FB

v1.5.0  (May 11, 2021)
api: Various functions promoted from experimental to stable API: (#2579-2581, @senhuang42)
  `ZSTD_defaultCLevel()`
  `ZSTD_getDictID_fromCDict()`
api: Several experimental functions have been deprecated and will emit a compiler warning (#2582, @senhuang42)
  `ZSTD_compress_advanced()`
  `ZSTD_compress_usingCDict_advanced()`
  `ZSTD_compressBegin_advanced()`
  `ZSTD_compressBegin_usingCDict_advanced()`
  `ZSTD_initCStream_srcSize()`
  `ZSTD_initCStream_usingDict()`
  `ZSTD_initCStream_usingCDict()`
  `ZSTD_initCStream_advanced()`
  `ZSTD_initCStream_usingCDict_advanced()`
  `ZSTD_resetCStream()`
api: ZSTDMT_NBWORKERS_MAX reduced to 64 for 32-bit environments (@Cyan4973)
perf: Significant speed improvements for middle compression levels (#2494, @senhuang42 @terrelln)
perf: Block splitter to improve compression ratio, enabled by default for high compression levels (#2447, @senhuang42)
perf: Decompression loop refactor, speed improvements on `clang` and for `--long` modes (#2614 #2630, @Cyan4973)
perf: Reduced stack usage during compression and decompression entropy stage (#2522 #2524, @terrelln)
bug: Improve setting permissions of created files (#2525, @felixhandte)
bug: Fix large dictionary non-determinism (#2607, @terrelln)
bug: Fix non-determinism test failures on Linux i686 (#2606, @terrelln)
bug: Fix various dedicated dictionary search bugs (#2540 #2586, @senhuang42 @felixhandte)
bug: Ensure `ZSTD_estimateCCtxSize*() `monotonically increases with compression level (#2538, @senhuang42)
bug: Fix --patch-from mode parameter bound bug with small files (#2637, @occivink)
bug: Fix UBSAN error in decompression (#2625, @terrelln)
bug: Fix superblock compression divide by zero bug (#2592, @senhuang42)
bug: Make the number of physical CPU cores detection more robust (#2517, @PaulBone)
doc: Improve `zdict.h` dictionary training API documentation (#2622, @terrelln)
doc: Note that public `ZSTD_free*()` functions accept NULL pointers (#2521, @animalize)
doc: Add style guide docs for open source contributors (#2626, @Cyan4973)
tests: Better regression test coverage for different dictionary modes (#2559, @senhuang42)
tests: Better test coverage of index reduction (#2603, @terrelln)
tests: OSS-Fuzz coverage for seekable format (#2617, @senhuang42)
tests: Test coverage for ZSTD threadpool API (#2604, @senhuang42)
build: Dynamic library built multithreaded by default (#2584, @senhuang42)
build: Move  `zstd_errors.h`  and  `zdict.h`  to  `lib/`  root (#2597, @terrelln)
build: Allow `ZSTDMT_JOBSIZE_MIN` to be configured at compile-time, reduce default to 512KB (#2611, @Cyan4973)
build: Single file library build script moved to `build/` directory (#2618, @felixhandte)
build: `ZBUFF_*()` is no longer built by default (#2583, @senhuang42)
build: Fixed Meson build (#2548, @SupervisedThinking @kloczek)
build: Fix excessive compiler warnings with clang-cl and CMake (#2600, @nickhutchinson)
build: Detect presence of `md5` on Darwin (#2609, @felixhandte)
build: Avoid SIGBUS on armv6 (#2633, @bmwiedmann)
cli: `--progress` flag added to always display progress bar (#2595, @senhuang42)
cli: Allow reading from block devices with `--force` (#2613, @felixhandte)
cli: Fix CLI filesize display bug (#2550, @Cyan4973)
cli: Fix windows CLI `--filelist` end-of-line bug (#2620, @Cyan4973)
contrib: Various fixes for linux kernel patch (#2539, @terrelln)
contrib: Seekable format - Decompression hanging edge case fix (#2516, @senhuang42)
contrib: Seekable format - New seek table-only API  (#2113 #2518, @mdittmer @Cyan4973)
contrib: Seekable format - Fix seek table descriptor check when loading (#2534, @foxeng)
contrib: Seekable format - Decompression fix for large offsets, (#2594, @azat)
misc: Automatically published release tarballs available on Github (#2535, @felixhandte)

h3. snappy

* Performance improvements.
* Google Test and Google Benchmark are now bundled in third_party/.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-all-unsubscribe@impala.apache.org
For additional commands, e-mail: issues-all-help@impala.apache.org