You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/08/15 21:01:19 UTC

[jira] [Created] (HIVE-7741) Don't synchronize WriterImpl.addRow() when dynamic.partition is enabled

Mostafa Mokhtar created HIVE-7741:
-------------------------------------

             Summary: Don't synchronize WriterImpl.addRow() when dynamic.partition is enabled
                 Key: HIVE-7741
                 URL: https://issues.apache.org/jira/browse/HIVE-7741
             Project: Hive
          Issue Type: Bug
          Components: File Formats
    Affects Versions: 0.13.1
         Environment: Loading into orc
            Reporter: Mostafa Mokhtar
            Assignee: Prasanth J
             Fix For: 0.14.0


When loading into an un-paritioned ORC table WriterImpl$StructTreeWriter.write method is synchronized.

When hive.optimize.sort.dynamic.partition is enabled the current thread will be the only writer and the synchronization is not needed.

Also  checking for memory per row is an over kill , this can be done per 1K rows or such

{code}
  public void addRow(Object row) throws IOException {
    synchronized (this) {
      treeWriter.write(row);
      rowsInStripe += 1;
      if (buildIndex) {
        rowsInIndex += 1;

        if (rowsInIndex >= rowIndexStride) {
          createRowIndexEntry();
        }
      }
    }
    memoryManager.addedRow();
  }
{code}

This can improve ORC load performance by 7% 

{code}
Stack Trace	Sample Count	Percentage(%)
WriterImpl.addRow(Object)	5,852	65.782
   WriterImpl$StructTreeWriter.write(Object)	5,163	58.037
   MemoryManager.addedRow()	666	7.487
      MemoryManager.notifyWriters()	648	7.284
         WriterImpl.checkMemory(double)	645	7.25
            WriterImpl.flushStripe()	643	7.228
               WriterImpl$StructTreeWriter.writeStripe(OrcProto$StripeFooter$Builder, int)	584	6.565
{code}







--
This message was sent by Atlassian JIRA
(v6.2#6252)