You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/09/05 02:21:24 UTC

[jira] [Updated] (HIVE-7990) With fetch column stats disabled number of elements in grouping set is not taken into account

     [ https://issues.apache.org/jira/browse/HIVE-7990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Mostafa Mokhtar updated HIVE-7990:
----------------------------------
    Component/s:     (was: File Formats)
                 Statistics

> With fetch column stats disabled number of elements in grouping set is not taken into account
> ---------------------------------------------------------------------------------------------
>
>                 Key: HIVE-7990
>                 URL: https://issues.apache.org/jira/browse/HIVE-7990
>             Project: Hive
>          Issue Type: Bug
>          Components: Statistics
>    Affects Versions: 0.14.0
>         Environment: Loading into orc
>            Reporter: Mostafa Mokhtar
>            Assignee: Prasanth J
>              Labels: performance
>             Fix For: 0.14.0
>
>
> When loading into an un-paritioned ORC table WriterImpl$StructTreeWriter.write method is synchronized.
> When hive.optimize.sort.dynamic.partition is enabled the current thread will be the only writer and the synchronization is not needed.
> Also  checking for memory per row is an over kill , this can be done per 1K rows or such
> {code}
>   public void addRow(Object row) throws IOException {
>     synchronized (this) {
>       treeWriter.write(row);
>       rowsInStripe += 1;
>       if (buildIndex) {
>         rowsInIndex += 1;
>         if (rowsInIndex >= rowIndexStride) {
>           createRowIndexEntry();
>         }
>       }
>     }
>     memoryManager.addedRow();
>   }
> {code}
> This can improve ORC load performance by 7% 
> {code}
> Stack Trace	Sample Count	Percentage(%)
> WriterImpl.addRow(Object)	5,852	65.782
>    WriterImpl$StructTreeWriter.write(Object)	5,163	58.037
>    MemoryManager.addedRow()	666	7.487
>       MemoryManager.notifyWriters()	648	7.284
>          WriterImpl.checkMemory(double)	645	7.25
>             WriterImpl.flushStripe()	643	7.228
>                WriterImpl$StructTreeWriter.writeStripe(OrcProto$StripeFooter$Builder, int)	584	6.565
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)