You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by j....@gmail.com on 2013/10/01 03:12:19 UTC

Re: Review Request 14243: HIVE-5325: Implement statistics providing ORC writer and reader interfaces


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java, line 360
> > <https://reviews.apache.org/r/14243/diff/2/?file=356668#file356668line360>
> >
> >     Can you add a comment when this if condition will be true and when it will be false. It isn't obvious.

Done.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java, line 1794
> > <https://reviews.apache.org/r/14243/diff/2/?file=356670#file356670line1794>
> >
> >     Log a message here, saying unknown category.

Done.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java, line 1806
> > <https://reviews.apache.org/r/14243/diff/2/?file=356670#file356670line1806>
> >
> >     What is more useful is how much size these objects will take when loaded back in memory (e.g while doing map-joins). What you have is how much memory they take up while ORC writer has buffered them in memory. So, what we want is numVals * sizeof(boolean)

Makes sense. Updated to use JavaDataModel. 


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java, line 1868
> > <https://reviews.apache.org/r/14243/diff/2/?file=356670#file356670line1868>
> >
> >     Log a msg here too.

Done.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java, line 23
> > <https://reviews.apache.org/r/14243/diff/2/?file=356671#file356671line23>
> >
> >     Class JavaDataModel has almost identical info. Consider using that.

Removed StatsUtils as it is no longer required.


> On Sept. 30, 2013, 4:16 p.m., Ashutosh Chauhan wrote:
> > ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java, line 264
> > <https://reviews.apache.org/r/14243/diff/2/?file=356674#file356674line264>
> >
> >     You have removed this test. Is this getting covered via new one?

I wrote that test case earlier to make sure old ORC format (v0.11 format) can be read without issues. For that new orc file written in old format was added to test resources  which will be read by a unit test. The write part of the unit test case is no longer required since we directly have old format orc file in resources. Hence I removed it. 


- Prasanth_J


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/14243/#review26490
-----------------------------------------------------------


On Sept. 24, 2013, 10:18 p.m., Prasanth_J wrote:
> 
> -----------------------------------------------------------
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/14243/
> -----------------------------------------------------------
> 
> (Updated Sept. 24, 2013, 10:18 p.m.)
> 
> 
> Review request for hive, Ashutosh Chauhan and Owen O'Malley.
> 
> 
> Bugs: HIVE-5325
>     https://issues.apache.org/jira/browse/HIVE-5325
> 
> 
> Repository: hive-git
> 
> 
> Description
> -------
> 
> HIVE-5324 adds new interfaces that can be implemented by ORC reader/writer to provide statistics. Writer provided statistics is used to update table/partition level statistics in metastore. Reader provided statistics can be used for reducer estimation, CBO etc. in the absence of metastore statistics.
> 
> 
> Diffs
> -----
> 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/BinaryColumnStatistics.java PRE-CREATION 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/ColumnStatisticsImpl.java 6268617 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/OrcOutputFormat.java c80fb02 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/ReaderImpl.java c454f32 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/StringColumnStatistics.java 72e779a 
>   ql/src/java/org/apache/hadoop/hive/ql/io/orc/WriterImpl.java 44961ce 
>   ql/src/java/org/apache/hadoop/hive/ql/stats/StatsUtils.java PRE-CREATION 
>   ql/src/protobuf/org/apache/hadoop/hive/ql/io/orc/orc_proto.proto edbf822 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestInputOutputFormat.java 34b2305 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcFile.java e6569f4 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcNullOptimization.java b93db84 
>   ql/src/test/org/apache/hadoop/hive/ql/io/orc/TestOrcSerDeStats.java PRE-CREATION 
>   ql/src/test/resources/orc-file-dump-dictionary-threshold.out 003c132 
>   ql/src/test/resources/orc-file-dump.out fac5326 
> 
> Diff: https://reviews.apache.org/r/14243/diff/
> 
> 
> Testing
> -------
> 
> ORC related unit and qfile tests are passing.
> 
> 
> Thanks,
> 
> Prasanth_J
> 
>