You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Gopal Vijayaraghavan <go...@apache.org> on 2019/02/07 17:23:04 UTC
Re: How to make ORC use libz.so instead of libzip.so
> We are conducting a project involving replacing (Linux) system's
> libz.so with our own hardware based implementation, but this requires us to
> replace libzip.so with our own so that small zip processing doesn't go through
> hardware, as hardware actually cannot process these requests correctly due to
> structural differences between hardware and software implementations of the
> deflate algorithm.
You're hitting a JDK8 & below limitation.
https://bugs.openjdk.java.net/browse/JDK-8079759 -> https://bugs.openjdk.java.net/browse/JDK-8031767 -> https://bugs.openjdk.java.net/browse/JDK-8176343
I've got a similar TODO sitting on my backburner, waiting for hardware access to test.
POWER9 NX 842 is my target for optimizing this & all the kernel bits for this are already shipped in Linux.
ORC is actually a bit hard-tuned for x86_64 Zlib performance - the different columns use different levels & strategies, which worked well on libzip.
hive.exec.orc.encoding.strategy & hive.exec.orc.compression.strategy are set to SPEED to allow for standard Zlib to be good enough.
HWAccel might mean that the COMPRESSION mode for both might not produce a performance hit (& in fact, might be faster due to lower bandwidth for blocks both ways over the bus).
Cheers,
Gopal