You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@orc.apache.org by Gopal Vijayaraghavan <go...@apache.org> on 2016/11/04 05:07:55 UTC
[discuss] Fixing DecimalColumnVector cache misses
Hi,
(x-posted for discussion)
Hive's storage-api + ORC vector readers have a cache miss built-into it for the case of Decimal readers.
With LLAP, two distinct cache misses are basically dragging Decimal performance down.
DecimalColumnVector -> HiveDecimalWritable -> HiveDecimal(BigInteger) -> new BigDecimal()
The writable is entirely overhead and so is the BigInteger -> BigDecimal conversions, particularly since the HiveDecimal type is not boxed unlike a "long".
Modifying the writable involves a fresh allocation of a HiveDecimal, which makes the object reference a rather unsightly cache miss (this is TPC-H Q1).
Changing this in hive/storage-api will produce a chicken-egg scenario between hive/storage-api -> orc -> hive/ql/exec/vectorization, across projects.
I'm conflicted on how to change DecimalColumnVector one-shot without breaking things (if possible, remove BigInteger allocations in the read-path as a possible optimization).
Suggestions/discuss?
Cheers,
Gopal