You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2011/03/08 02:17:59 UTC

[jira] Created: (HIVE-2030) isEmptyPath() to use ContentSummary cache

isEmptyPath() to use ContentSummary cache
-----------------------------------------

                 Key: HIVE-2030
                 URL: https://issues.apache.org/jira/browse/HIVE-2030
             Project: Hive
          Issue Type: Improvement
            Reporter: Siying Dong
            Assignee: Siying Dong
            Priority: Minor


addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005234#comment-13005234 ] 

He Yongqiang commented on HIVE-2030:
------------------------------------

siying, can you update the patch?

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2030:
------------------------------

    Status: Patch Available  (was: Open)

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2030:
---------------------------------

      Component/s: Query Processor
    Fix Version/s: 0.8.0

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005358#comment-13005358 ] 

He Yongqiang commented on HIVE-2030:
------------------------------------

running tests with the new patch

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004317#comment-13004317 ] 

He Yongqiang commented on HIVE-2030:
------------------------------------

okay, will test and commit.

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2030:
------------------------------

    Attachment: HIVE-2030.2.patch

In the case of Exception, we don't populate cache. It's to make sure cache never gets wrong value.

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004260#comment-13004260 ] 

Siying Dong commented on HIVE-2030:
-----------------------------------

Yongqiang, I don't quite understand your comment. If there is a cache miss, we call the original method. We never make things worse.

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang updated HIVE-2030:
-------------------------------

    Status: Open  (was: Patch Available)

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004253#comment-13004253 ] 

He Yongqiang commented on HIVE-2030:
------------------------------------

The ContentSummary is not guaranteed to be populated. Even it is, it seems this information is not passed to the child process. (So this is not empty only when executing with local mode)

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2030:
------------------------------

    Attachment: HIVE-2030.3.patch

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Resolved: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

He Yongqiang resolved HIVE-2030.
--------------------------------

    Resolution: Fixed

committed. thanks siying!

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary cache

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2030:
------------------------------

    Attachment: HIVE-2030.1.patch

> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
>                 Key: HIVE-2030
>                 URL: https://issues.apache.org/jira/browse/HIVE-2030
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>            Priority: Minor
>         Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira