You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@drill.apache.org by Jason Altekruse <al...@gmail.com> on 2014/05/23 18:22:11 UTC

Review Request 21868: Drill 827

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21868/
-----------------------------------------------------------

Review request for drill.


Repository: drill-git


Description
-------

Parquet reader was previously reading too far into an RLE stream. Now saving each value the same way I am saving each definition level, so if the read loop is exited when we realize all of the var length values in a record will not fit in the current batch, the last value we read will still be available for insertion into the next batch. Previously it was losing the value and always reading another at the start of the next loop, causing it to try to read too many values out of the stream.


Diffs
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/PageReadStatus.java e4081d9 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordReader.java 0996620 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/VarLenBinaryReader.java 4efcdaf 
  exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java dec4b15 
  exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetResultListener.java a533117 

Diff: https://reviews.apache.org/r/21868/diff/


Testing
-------

Added a test for the file generated by Steven.


Thanks,

Jason Altekruse


Re: Review Request 21868: Drill 827

Posted by Jason Altekruse <al...@gmail.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/21868/
-----------------------------------------------------------

(Updated May 23, 2014, 4:27 p.m.)


Review request for drill.


Changes
-------

Marked the new test ignore as it relies on a binary file outside of git.


Repository: drill-git


Description
-------

Parquet reader was previously reading too far into an RLE stream. Now saving each value the same way I am saving each definition level, so if the read loop is exited when we realize all of the var length values in a record will not fit in the current batch, the last value we read will still be available for insertion into the next batch. Previously it was losing the value and always reading another at the start of the next loop, causing it to try to read too many values out of the stream.


Diffs (updated)
-----

  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/PageReadStatus.java e4081d9 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/ParquetRecordReader.java 0996620 
  exec/java-exec/src/main/java/org/apache/drill/exec/store/parquet/VarLenBinaryReader.java 4efcdaf 
  exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetRecordReaderTest.java dec4b15 
  exec/java-exec/src/test/java/org/apache/drill/exec/store/parquet/ParquetResultListener.java a533117 

Diff: https://reviews.apache.org/r/21868/diff/


Testing
-------

Added a test for the file generated by Steven.


Thanks,

Jason Altekruse