You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2011/03/08 02:17:59 UTC
[jira] Created: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
isEmptyPath() to use ContentSummary cache
-----------------------------------------
Key: HIVE-2030
URL: https://issues.apache.org/jira/browse/HIVE-2030
Project: Hive
Issue Type: Improvement
Reporter: Siying Dong
Assignee: Siying Dong
Priority: Minor
addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005234#comment-13005234 ]
He Yongqiang commented on HIVE-2030:
------------------------------------
siying, can you update the patch?
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-2030:
------------------------------
Status: Patch Available (was: Open)
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-2030:
---------------------------------
Component/s: Query Processor
Fix Version/s: 0.8.0
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Components: Query Processor
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Fix For: 0.8.0
>
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13005358#comment-13005358 ]
He Yongqiang commented on HIVE-2030:
------------------------------------
running tests with the new patch
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004317#comment-13004317 ]
He Yongqiang commented on HIVE-2030:
------------------------------------
okay, will test and commit.
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-2030:
------------------------------
Attachment: HIVE-2030.2.patch
In the case of Exception, we don't populate cache. It's to make sure cache never gets wrong value.
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004260#comment-13004260 ]
Siying Dong commented on HIVE-2030:
-----------------------------------
Yongqiang, I don't quite understand your comment. If there is a cache miss, we call the original method. We never make things worse.
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang updated HIVE-2030:
-------------------------------
Status: Open (was: Patch Available)
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13004253#comment-13004253 ]
He Yongqiang commented on HIVE-2030:
------------------------------------
The ContentSummary is not guaranteed to be populated. Even it is, it seems this information is not passed to the child process. (So this is not empty only when executing with local mode)
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-2030:
------------------------------
Attachment: HIVE-2030.3.patch
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Resolved: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "He Yongqiang (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
He Yongqiang resolved HIVE-2030.
--------------------------------
Resolution: Fixed
committed. thanks siying!
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch, HIVE-2030.2.patch, HIVE-2030.3.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-2030) isEmptyPath() to use ContentSummary
cache
Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-2030?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Siying Dong updated HIVE-2030:
------------------------------
Attachment: HIVE-2030.1.patch
> isEmptyPath() to use ContentSummary cache
> -----------------------------------------
>
> Key: HIVE-2030
> URL: https://issues.apache.org/jira/browse/HIVE-2030
> Project: Hive
> Issue Type: Improvement
> Reporter: Siying Dong
> Assignee: Siying Dong
> Priority: Minor
> Attachments: HIVE-2030.1.patch
>
>
> addInputPaths() calls isEmptyPath() for every input path. Now every call is a DFS namenode call. Making isEmptyPath() to use cached ContentSummary, we should be able to avoid some namenode calls and reduce latency in the case of multiple partitions.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira