You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@hive.apache.org by "Gopal V (JIRA)" <ji...@apache.org> on 2015/09/19 04:00:08 UTC

[jira] [Commented] (HIVE-11882) Fetch optimizer should stop source files traversal once it exceeds the hive.fetch.task.conversion.threshold

    [ https://issues.apache.org/jira/browse/HIVE-11882?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14876828#comment-14876828 ] 

Gopal V commented on HIVE-11882:
--------------------------------

[~yalovyyi]: Yes, this would totally suck for S3, because the list operations are expensive. 

But I thought this was already implemented in InputEstimator, maybe we're not hitting the impl for the remaining check.

{code}
   * @param remaining Early exit condition. If it has positive value, further estimation
   *                  can be canceled on the point of exceeding it. In this case,
   *                  return any bigger length value then this (Long.MAX_VALUE, for eaxmple).
   */
{code}

> Fetch optimizer should stop source files traversal once it exceeds the hive.fetch.task.conversion.threshold
> -----------------------------------------------------------------------------------------------------------
>
>                 Key: HIVE-11882
>                 URL: https://issues.apache.org/jira/browse/HIVE-11882
>             Project: Hive
>          Issue Type: Improvement
>          Components: Physical Optimizer
>    Affects Versions: 1.0.0
>            Reporter: Illya Yalovyy
>
> Hive 1.0's fetch optimizer tries to optimize queries of the form "select <C> from <T> where <F> limit <L>" to a fetch task (see the hive.fetch.task.conversion property). This optimization gets the lengths of all the files in the specified partition and does some comparison against a threshold value to determine whether it should use a fetch task or not (see the hive.fetch.task.conversion.threshold property). This process of getting the length of all files. One of the main problems in this optimization is the fetch optimizer doesn't seem to stop once it exceeds the hive.fetch.task.conversion.threshold. It works fine on HDFS, but could cause a significant performance degradation on other supported file systems. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)