You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Attila Magyar <am...@hortonworks.com> on 2019/09/09 15:27:50 UTC
Review Request 71456: select count gives incorrect result after
loading data from text file
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/71456/
-----------------------------------------------------------
Review request for hive, Ashutosh Chauhan, Jesús Camacho Rodríguez, and Slim Bouguerra.
Bugs: HIVE-22055
https://issues.apache.org/jira/browse/HIVE-22055
Repository: hive-git
Description
-------
This happens when tez.grouping.min-size is set to a small value (for example 1) so that the split size that is calculated from the file size is going to be used. This changes as the table grows and different split sizes will be used while doing each selects.
load 90 records from f1
select count(1) gives back 90
load 90 records from f2
select count(1) gives back 172 // 8 records missing
When running the second select the split size is larger, and SerDeLowLevelCacheImpl is already populated with stripes from the first select (and by that tiem split size was smaller).
There is problem with how LineRecordReader works togeather with the cache. So if a larger split is requested and an overlapping smaller one is already in the cache, then SerDeEncodedDataReader'll try to extend the existing split by reading the
difference between the large and the small split. But it'll start reading right after the last stripe pyhsically ends,
and LineRecordReader always skips the first row, unless we are at the beginning of the file. So this line skipping behaviour is not considered at one point and that's why some rows are missing.
Diffs
-----
itests/src/test/resources/testconfiguration.properties 98280c52fe9
llap-server/src/java/org/apache/hadoop/hive/llap/io/encoded/SerDeEncodedDataReader.java 462b25fa234
ql/src/test/queries/clientpositive/mm_loaddata_split_change.q PRE-CREATION
ql/src/test/results/clientpositive/llap/mm_loaddata_split_change.q.out PRE-CREATION
Diff: https://reviews.apache.org/r/71456/diff/1/
Testing
-------
with q test
Thanks,
Attila Magyar