You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@hive.apache.org by "Slim Bouguerra (Jira)" <ji...@apache.org> on 2020/01/08 22:49:00 UTC

[jira] [Commented] (HIVE-22583) LLAP cache always misses with non-vectorized serde readers such as OpenCSV

    [ https://issues.apache.org/jira/browse/HIVE-22583?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17011048#comment-17011048 ] 

Slim Bouguerra commented on HIVE-22583:
---------------------------------------

[~szita] I think that might be the same thing, in fact the tez counters depends on HDFS counters and that is related to the file format that can change and thus the bytes count can change.
Think of it that the byte read or miss by the cache are relative the ORC file formats.
As i said i think for now we can avoid this test case that can be flaky and work on a query that can run against the cache only, that's more robust IMO. 

> LLAP cache always misses with non-vectorized serde readers such as OpenCSV
> --------------------------------------------------------------------------
>
>                 Key: HIVE-22583
>                 URL: https://issues.apache.org/jira/browse/HIVE-22583
>             Project: Hive
>          Issue Type: Bug
>          Components: llap
>            Reporter: Ádám Szita
>            Assignee: Ádám Szita
>            Priority: Major
>         Attachments: HIVE-22583.0.patch, HIVE-22583.1.patch, HIVE-22583.2.patch
>
>
> Although after the first read LLAP cache stores data of tables that are not using the LazySimple serde, the stored data is then never used in the future subsequent queries, causing a full cache miss and re-read each time.
> Problem is rooted in SerdeEncodedDataReader#cacheFileData is not taking care of creating an entry for the root/struct column of the table. The only cases this is taken care of are when a vectorized reader is used _(e.g. LazySimpleSerde's LazySimpleDeserializeRead)_, where SerdeEncodedDataReader#processAsyncCacheData takes care of this.
> This can be reproduced by either using a custom serde, like OpenCSV or using LazySimpleSerde, but turning off _hive.llap.io.encode.vector.serde.enabled_.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)