You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2010/02/10 20:04:32 UTC

[jira] Created: (HIVE-1149) Optimize CombineHiveFileInputFormat execution speed

Optimize CombineHiveFileInputFormat execution speed
---------------------------------------------------

                 Key: HIVE-1149
                 URL: https://issues.apache.org/jira/browse/HIVE-1149
             Project: Hadoop Hive
          Issue Type: Bug
            Reporter: Zheng Shao


When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
One of the culprit is the "new URI" call in the following function. We should try to get rid of it.

{code}
  protected static PartitionDesc getPartitionDescFromPath(
      Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
    // The format of the keys in pathToPartitionInfo sometimes contains a port
    // and sometimes doesn't, so we just compare paths.
    for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
        .entrySet()) {
      try {
        if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
          return entry.getValue();
        }
      } catch (URISyntaxException e2) {
      }
    }
    throw new IOException("cannot find dir = " + dir.toString()
        + " in partToPartitionInfo!");
  }
{code}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1149) Optimize CombineHiveFileInputFormat execution speed

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Zheng Shao updated HIVE-1149:
-----------------------------

      Priority: Minor  (was: Major)
    Issue Type: Improvement  (was: Bug)

> Optimize CombineHiveFileInputFormat execution speed
> ---------------------------------------------------
>
>                 Key: HIVE-1149
>                 URL: https://issues.apache.org/jira/browse/HIVE-1149
>             Project: Hadoop Hive
>          Issue Type: Improvement
>            Reporter: Zheng Shao
>            Priority: Minor
>
> When there are a lot of files and a lot of pools, CombineHiveFileInputFormat is pretty slow.
> One of the culprit is the "new URI" call in the following function. We should try to get rid of it.
> {code}
>   protected static PartitionDesc getPartitionDescFromPath(
>       Map<String, PartitionDesc> pathToPartitionInfo, Path dir) throws IOException {
>     // The format of the keys in pathToPartitionInfo sometimes contains a port
>     // and sometimes doesn't, so we just compare paths.
>     for (Map.Entry<String, PartitionDesc> entry : pathToPartitionInfo
>         .entrySet()) {
>       try {
>         if (new URI(entry.getKey()).getPath().equals(dir.toUri().getPath())) {
>           return entry.getValue();
>         }
>       } catch (URISyntaxException e2) {
>       }
>     }
>     throw new IOException("cannot find dir = " + dir.toString()
>         + " in partToPartitionInfo!");
>   }
> {code}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.