You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Gopal Vijayaraghavan <go...@apache.org> on 2019/02/07 17:23:04 UTC

Re: How to make ORC use libz.so instead of libzip.so

    
>    We are conducting a project involving replacing (Linux) system's
>    libz.so with our own hardware based implementation, but this requires us to 
>    replace libzip.so with our own so that small zip processing doesn't go through 
>    hardware, as hardware actually cannot process these requests correctly due to 
>    structural differences between hardware and software implementations of the 
>    deflate algorithm. 

You're hitting a JDK8 & below limitation.

https://bugs.openjdk.java.net/browse/JDK-8079759 -> https://bugs.openjdk.java.net/browse/JDK-8031767 -> https://bugs.openjdk.java.net/browse/JDK-8176343

I've got a similar TODO sitting on my backburner, waiting for hardware access to test.

POWER9 NX 842 is my target for optimizing this & all the kernel bits for this are already shipped in Linux.

ORC is actually a bit hard-tuned for x86_64 Zlib performance - the different columns use different levels & strategies, which worked well on libzip.

hive.exec.orc.encoding.strategy & hive.exec.orc.compression.strategy are set to SPEED to allow for standard Zlib to be good enough.

HWAccel might mean that the COMPRESSION mode for both might not produce a performance hit (& in fact, might be faster due to lower bandwidth for blocks both ways over the bus).

Cheers,
Gopal