You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Grisha Trubetskoy (JIRA)" <ji...@apache.org> on 2011/08/25 16:06:28 UTC
[jira] [Created] (HIVE-2408) Perpetually degrading performance in
checkPaths
Perpetually degrading performance in checkPaths
-----------------------------------------------
Key: HIVE-2408
URL: https://issues.apache.org/jira/browse/HIVE-2408
Project: Hive
Issue Type: Bug
Components: HBase Handler
Affects Versions: 0.7.1, 0.8.0
Reporter: Grisha Trubetskoy
In ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, checkPaths() tacks on a copy_N if a file exists, working its way up until an available file name is found. The problem is that the exists() check is quite expensive in HDFS, and if you have hundreds of files to go through this becomes a serious bottleneck.
A better solution would be to use a timestamp in the file name, then followed by the "copy_N scheme".
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2408) Perpetually degrading performance in
checkPaths
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-2408:
-----------------------------
Component/s: (was: HBase Handler)
Query Processor
> Perpetually degrading performance in checkPaths
> -----------------------------------------------
>
> Key: HIVE-2408
> URL: https://issues.apache.org/jira/browse/HIVE-2408
> Project: Hive
> Issue Type: Bug
> Components: Query Processor
> Affects Versions: 0.7.1, 0.8.0
> Reporter: Grisha Trubetskoy
>
> In ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java, checkPaths() tacks on a copy_N if a file exists, working its way up until an available file name is found. The problem is that the exists() check is quite expensive in HDFS, and if you have hundreds of files to go through this becomes a serious bottleneck.
> A better solution would be to use a timestamp in the file name, then followed by the "copy_N scheme".
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira