You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2010/02/10 20:04:32 UTC
[jira] Created: (HIVE-1149) Optimize CombineHiveFileInputFormat
execution speed
Optimize CombineHiveFileInputFormat execution speed
---------------------------------------------------
Key: HIVE-1149
URL: https://issues.apache.org/jira/browse/HIVE-1149
Project: Hadoop Hive
Issue Type: Bug
Reporter: Zheng Shao
When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
One of the culprit is the "new URI" call in the following function. We should try to get rid of it.
{code}
protected static PartitionDesc getPartitionDescFromPath(
Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
// The format of the keys in pathToPartitionInfo sometimes contains a port
// and sometimes doesn't, so we just compare paths.
for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
.entrySet()) {
try {
if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
return entry.getValue();
}
} catch (URISyntaxException e2) {
}
}
throw new IOException("cannot find dir = " + dir.toString()
+ " in partToPartitionInfo!");
}
{code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1149) Optimize CombineHiveFileInputFormat
execution speed
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Zheng Shao updated HIVE-1149:
-----------------------------
Priority: Minor (was: Major)
Issue Type: Improvement (was: Bug)
> Optimize CombineHiveFileInputFormat execution speed
> ---------------------------------------------------
>
> Key: HIVE-1149
> URL: https://issues.apache.org/jira/browse/HIVE-1149
> Project: Hadoop Hive
> Issue Type: Improvement
> Reporter: Zheng Shao
> Priority: Minor
>
> When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
> One of the culprit is the "new URI" call in the following function. We should try to get rid of it.
> {code}
> protected static PartitionDesc getPartitionDescFromPath(
> Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
> // The format of the keys in pathToPartitionInfo sometimes contains a port
> // and sometimes doesn't, so we just compare paths.
> for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
> .entrySet()) {
> try {
> if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
> return entry.getValue();
> }
> } catch (URISyntaxException e2) {
> }
> }
> throw new IOException("cannot find dir = " + dir.toString()
> + " in partToPartitionInfo!");
> }
> {code}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.