You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Mostafa Mokhtar (JIRA)" <ji...@apache.org> on 2014/09/29 20:52:34 UTC
[jira] [Updated] (HIVE-8291) ACID : Reading from partitioned
bucketed tables has high overhead, 50% of time is spent in
OrcInputFormat.getReader
[ https://issues.apache.org/jira/browse/HIVE-8291?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Mostafa Mokhtar updated HIVE-8291:
----------------------------------
Summary: ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader (was: Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader)
> ACID : Reading from partitioned bucketed tables has high overhead, 50% of time is spent in OrcInputFormat.getReader
> -------------------------------------------------------------------------------------------------------------------
>
> Key: HIVE-8291
> URL: https://issues.apache.org/jira/browse/HIVE-8291
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.14.0
> Environment: cn105
> Reporter: Mostafa Mokhtar
> Assignee: Owen O'Malley
> Fix For: 0.14.0
>
>
> Reading from bucketed partitioned tables has significantly higher overhead compared to non-bucketed non-partitioned files.
> 50% of the time is spent in these two lines of code in OrcInputFormate.getReader()
> {code}
> String txnString = conf.get(ValidTxnList.VALID_TXNS_KEY,
> Long.MAX_VALUE + ":");
> ValidTxnList validTxnList = new ValidTxnListImpl(txnString);
> {code}
> {code}
> Stack Trace Sample Count Percentage(%)
> hive.ql.exec.tez.MapRecordSource.pushRecord() 2,981 87.215
> org.apache.tez.mapreduce.lib.MRReaderMapred.next() 2,002 58.572
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.next(Object, Object) 2,002 58.572
> mapred.split.TezGroupedSplitsInputFormat$TezGroupedSplitsRecordReader.initNextRecordReader() 1,984 58.046
> hive.ql.io.HiveInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,983 58.016
> hive.ql.io.orc.OrcInputFormat.getRecordReader(InputSplit, JobConf, Reporter) 1,891 55.325
> hive.ql.io.orc.OrcInputFormat.getReader(InputSplit, AcidInputFormat$Options) 1,723 50.41
> hive.common.ValidTxnListImpl.<init>(String) 934 27.326
> conf.Configuration.get(String, String) 621 18.169
> {code}
> Another 20% of the profile is spent in MapOperator.cleanUpInputFileChangedOp
> 5% the CPU in
> {code}
> Path onepath = normalizePath(onefile);
> {code}
> And
> 15% the CPU in
> {code}
> onepath.toUri().relativize(fpath.toUri()).equals(fpath.toUri());
> {code}
> From the profiler
> {code}
> Stack Trace Sample Count Percentage(%)
> org.apache.hadoop.hive.ql.exec.tez.MapRecordSource.processRow(Object) 978 28.613
> org.apache.hadoop.hive.ql.exec.vector.VectorMapOperator.process(Writable) 978 28.613
> org.apache.hadoop.hive.ql.exec.Operator.cleanUpInputFileChanged() 866 25.336
> org.apache.hadoop.hive.ql.exec.MapOperator.cleanUpInputFileChangedOp() 866 25.336
> java.net.URI.relativize(URI) 655 19.163
> java.net.URI.relativize(URI, URI) 655 19.163
> java.net.URI.normalize(String) 517 15.126
> java.net.URI.needsNormalization(String) 372 10.884
> java.lang.String.charAt(int) 235 6.875
> java.net.URI.equal(String, String) 27 0.79
> java.lang.StringBuilder.toString() 1 0.029
> java.lang.StringBuilder.<init>() 1 0.029
> java.lang.StringBuilder.append(String) 1 0.029
> org.apache.hadoop.hive.ql.exec.MapOperator.normalizePath(String) 167 4.886
> org.apache.hadoop.fs.Path.<init>(String) 162 4.74
> org.apache.hadoop.fs.Path.initialize(String, String, String, String) 162 4.74
> org.apache.hadoop.fs.Path.normalizePath(String, String) 97 2.838
> org.apache.commons.lang.StringUtils.replace(String, String, String) 97 2.838
> org.apache.commons.lang.StringUtils.replace(String, String, String, int) 97 2.838
> java.lang.String.indexOf(String, int) 97 2.838
> java.net.URI.<init>(String, String, String, String, String) 65 1.902
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)