You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by Sergey Shelukhin <se...@hortonworks.com> on 2018/05/09 19:12:25 UTC

Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/
-----------------------------------------------------------

Review request for hive and Prasanth_J.


Repository: hive-git


Description
-------

see jira


Diffs
-----

  llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 


Diff: https://reviews.apache.org/r/67039/diff/1/


Testing
-------


Thanks,

Sergey Shelukhin


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202800
-----------------------------------------------------------


Ship it!




Ship It!

- Prasanth_J


On May 9, 2018, 8:55 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> -----------------------------------------------------------
> 
> (Updated May 9, 2018, 8:55 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

Posted by Sergey Shelukhin <se...@hortonworks.com>.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/
-----------------------------------------------------------

(Updated May 9, 2018, 8:55 p.m.)


Review request for hive and Prasanth_J.


Repository: hive-git


Description
-------

see jira


Diffs (updated)
-----

  llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
  ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 


Diff: https://reviews.apache.org/r/67039/diff/2/

Changes: https://reviews.apache.org/r/67039/diff/1-2/


Testing
-------


Thanks,

Sergey Shelukhin


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

Posted by j....@gmail.com.

> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1076 (patched)
> > <https://reviews.apache.org/r/67039/diff/1/?file=2018739#file2018739line1089>
> >
> >     Positions are packed variable in protobuf. 
> >     Also there is stream suppression. 
> >     Same column in a stripe may have a stream suppressed (say all non-nulls values, isPresent gets suppressed) then its positions will not be recorded. Same column in another stripe might have all streams. So all this does is if the stream does not exist don't move on to next position.
> >     
> >     [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
> >     [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)
> 
> Sergey Shelukhin wrote:
>     I'm not sure what you mean... I'm moving this code outside of the 0-length check, and moving into utility method.
>     All calls of the same method did the exact same thing before.
> 
> Prasanth_J wrote:
>     https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java#L248-L256 
>     If a stream is suppressed, its positions are removed.
>     
>     So I am not sure if it is safe to advance the positions without checking if the stream exists/available or not.

Actually we remove the positions only for isPresent streams. I think it should be ok if we advance positions first before seek for other streams.


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
-----------------------------------------------------------


On May 9, 2018, 8:55 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> -----------------------------------------------------------
> 
> (Updated May 9, 2018, 8:55 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

Posted by Sergey Shelukhin <se...@hortonworks.com>.

> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 280 (patched)
> > <https://reviews.apache.org/r/67039/diff/1/?file=2018739#file2018739line283>
> >
> >     the reader seeks mostly delegates to data stream seek and length stream seek (RLE seeks does more than just position seek - it has to seek within bytes or runs)

This patch only adjusts for InStream seeks


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1076 (patched)
> > <https://reviews.apache.org/r/67039/diff/1/?file=2018739#file2018739line1089>
> >
> >     Positions are packed variable in protobuf. 
> >     Also there is stream suppression. 
> >     Same column in a stripe may have a stream suppressed (say all non-nulls values, isPresent gets suppressed) then its positions will not be recorded. Same column in another stripe might have all streams. So all this does is if the stream does not exist don't move on to next position.
> >     
> >     [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
> >     [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)

I'm not sure what you mean... I'm moving this code outside of the 0-length check, and moving into utility method.
All calls of the same method did the exact same thing before.


> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1894 (patched)
> > <https://reviews.apache.org/r/67039/diff/1/?file=2018739#file2018739line1931>
> >
> >     where is this skipSeek() implementation? I don't see it in the patch or master.

Looks like it is in ORC on master. Changed to a static method. I edited pre-ORC-split impl when originally fixing the issue.


- Sergey


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
-----------------------------------------------------------


On May 9, 2018, 7:12 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> -----------------------------------------------------------
> 
> (Updated May 9, 2018, 7:12 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

Posted by j....@gmail.com.

> On May 9, 2018, 8:43 p.m., Prasanth_J wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
> > Lines 1076 (patched)
> > <https://reviews.apache.org/r/67039/diff/1/?file=2018739#file2018739line1089>
> >
> >     Positions are packed variable in protobuf. 
> >     Also there is stream suppression. 
> >     Same column in a stripe may have a stream suppressed (say all non-nulls values, isPresent gets suppressed) then its positions will not be recorded. Same column in another stripe might have all streams. So all this does is if the stream does not exist don't move on to next position.
> >     
> >     [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
> >     [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)
> 
> Sergey Shelukhin wrote:
>     I'm not sure what you mean... I'm moving this code outside of the 0-length check, and moving into utility method.
>     All calls of the same method did the exact same thing before.

https://github.com/apache/orc/blob/master/java/core/src/java/org/apache/orc/impl/writer/TreeWriterBase.java#L248-L256 
If a stream is suppressed, its positions are removed.

So I am not sure if it is safe to advance the positions without checking if the stream exists/available or not.


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
-----------------------------------------------------------


On May 9, 2018, 8:55 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> -----------------------------------------------------------
> 
> (Updated May 9, 2018, 8:55 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/2/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>


Re: Review Request 67039: HIVE-19479 encoded stream seek is incorrect for 0-length RGs in LLAP IO

Posted by j....@gmail.com.
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/67039/#review202793
-----------------------------------------------------------




ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
Lines 280 (patched)
<https://reviews.apache.org/r/67039/#comment284787>

    the reader seeks mostly delegates to data stream seek and length stream seek (RLE seeks does more than just position seek - it has to seek within bytes or runs)



ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
Lines 1076 (patched)
<https://reviews.apache.org/r/67039/#comment284788>

    Positions are packed variable in protobuf. 
    Also there is stream suppression. 
    Same column in a stripe may have a stream suppressed (say all non-nulls values, isPresent gets suppressed) then its positions will not be recorded. Same column in another stripe might have all streams. So all this does is if the stream does not exist don't move on to next position.
    
    [[0,0,0],[0,0],[0,0]] -> positions for isPresent, Data, Length..
    [[0,0],[0,0]] -> positions for Data, Length.. (isPresent suppressed)



ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java
Lines 1894 (patched)
<https://reviews.apache.org/r/67039/#comment284789>

    where is this skipSeek() implementation? I don't see it in the patch or master.


- Prasanth_J


On May 9, 2018, 7:12 p.m., Sergey Shelukhin wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/67039/
> -----------------------------------------------------------
> 
> (Updated May 9, 2018, 7:12 p.m.)
> 
> 
> Review request for hive and Prasanth_J.
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> see jira
> 
> 
> Diffs
> -----
> 
>   llap-server/src/java/org/apache/hadoop/hive/llap/io/decode/OrcEncodedDataConsumer.java fc0c66a888 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedReaderImpl.java 1d7eceb1ef 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/encoded/EncodedTreeReaderFactory.java 42532f9a0e 
> 
> 
> Diff: https://reviews.apache.org/r/67039/diff/1/
> 
> 
> Testing
> -------
> 
> 
> Thanks,
> 
> Sergey Shelukhin
> 
>