You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Siying Dong (JIRA)" <ji...@apache.org> on 2011/03/22 02:06:05 UTC

[jira] [Created] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
-------------------------------------------------------------------------------

                 Key: HIVE-2068
                 URL: https://issues.apache.org/jira/browse/HIVE-2068
             Project: Hive
          Issue Type: Improvement
            Reporter: Siying Dong
            Assignee: Siying Dong


Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016518#comment-13016518 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

can you update the review-board entry ?

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020386#comment-13020386 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

FetchTask: return false if number of rows found.
Else, it looks good

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.3.patch

previous patch missed a file.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.4.patch

addressing Namit's comments.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016597#comment-13016597 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

Siying, I dont see the new changes

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017092#comment-13017092 ] 

jiraposter@reviews.apache.org commented on HIVE-2068:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/540/
-----------------------------------------------------------

Review request for hive and namit jain.


Summary
-------

For HIVE-2068


This addresses bug HIVE-2068.
    https://issues.apache.org/jira/browse/HIVE-2068


Diffs
-----

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1086466 
  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1086466 
  trunk/conf/hive-default.xml 1086466 
  trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1086466 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1086466 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1086466 
  trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1086466 

Diff: https://reviews.apache.org/r/540/diff


Testing
-------

added a test to test suite.


Thanks,

Siying



> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13017897#comment-13017897 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

Can you regenerate the patch ?
I am getting a lot of conflicts

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]
          Status: Resolved  (was: Patch Available)

Committed. Thanks Siying

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.6.patch

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-2068:
---------------------------------

      Component/s: Query Processor
    Fix Version/s: 0.8.0

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>          Components: Query Processor
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>             Fix For: 0.8.0
>
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Open  (was: Patch Available)

found some problem with last modified piece of codes.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013774#comment-13013774 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

Can you add a review board entry ?

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13013874#comment-13013874 ] 

Siying Dong commented on HIVE-2068:
-----------------------------------

https://reviews.apache.org/r/540/diff/

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Patch Available  (was: Open)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016535#comment-13016535 ] 

Siying Dong commented on HIVE-2068:
-----------------------------------

review-board updated.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.2.patch

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Patch Available  (was: Open)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020762#comment-13020762 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

Can you rerun the tests ?
I am getting some failures - in global_limit.q


> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.5.patch

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Patch Available  (was: Open)

deleted the latest patch. The fetchTask return part is actually OK.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "jiraposter@reviews.apache.org (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020390#comment-13020390 ] 

jiraposter@reviews.apache.org commented on HIVE-2068:
-----------------------------------------------------


-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/540/
-----------------------------------------------------------

(Updated 2011-04-15 18:37:21.441402)


Review request for hive and namit jain.


Changes
-------

fix a small logic bug.


Summary
-------

For HIVE-2068


This addresses bug HIVE-2068.
    https://issues.apache.org/jira/browse/HIVE-2068


Diffs (updated)
-----

  trunk/cli/src/java/org/apache/hadoop/hive/cli/CliDriver.java 1091258 
  trunk/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java 1091258 
  trunk/conf/hive-default.xml 1091258 
  trunk/hwi/src/java/org/apache/hadoop/hive/hwi/HWISessionItem.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/CommandNeedRetryException.java PRE-CREATION 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Context.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/Driver.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/FetchTask.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/LimitOperator.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/exec/Task.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/GenMapRedUtils.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/optimizer/SamplePruner.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/DDLSemanticAnalyzer.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/ParseContext.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/QBParseInfo.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/parse/SemanticAnalyzer.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/FetchWork.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/plan/LimitDesc.java 1091258 
  trunk/ql/src/java/org/apache/hadoop/hive/ql/processors/CommandProcessor.java 1091258 
  trunk/ql/src/test/org/apache/hadoop/hive/ql/QTestUtil.java 1091258 
  trunk/ql/src/test/queries/clientpositive/global_limit.q PRE-CREATION 
  trunk/ql/src/test/results/clientpositive/global_limit.q.out PRE-CREATION 
  trunk/service/src/java/org/apache/hadoop/hive/service/HiveServer.java 1091258 

Diff: https://reviews.apache.org/r/540/diff


Testing
-------

added a test to test suite.


Thanks,

Siying



> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment:     (was: HIVE-2068.6.patch)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Patch Available  (was: Open)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Patch Available  (was: Open)

fix the issue. I think what Namit means is that the function should always return true(no more rows).

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Patch Available  (was: Open)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Status: Patch Available  (was: Open)

looks like simple "... limit ..." depends on the sequence of list files, which is not deterministic. I modify the test case to always put the 3 same files so that the results will be deterministic.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.1.patch

Features are mostly finished and I did some manual tests.
I'm still running all the tests. I'm also thinking of how to add tests to cover the Driver changes with retry.

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016030#comment-13016030 ] 

Namit Jain commented on HIVE-2068:
----------------------------------

comments in review-board

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Siying Dong updated HIVE-2068:
------------------------------

    Attachment: HIVE-2068.6.patch

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch, HIVE-2068.6.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Siying Dong (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13016604#comment-13016604 ] 

Siying Dong commented on HIVE-2068:
-----------------------------------

Namit, you can't see trunk/conf/hive-default.xml is already included in the diff of the review board?

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Patch Available  (was: Open)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HIVE-2068) Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-2068:
-----------------------------

    Status: Open  (was: Patch Available)

> Speed up query "select xx,xx from xxx LIMIT xxx" if no filtering or aggregation
> -------------------------------------------------------------------------------
>
>                 Key: HIVE-2068
>                 URL: https://issues.apache.org/jira/browse/HIVE-2068
>             Project: Hive
>          Issue Type: Improvement
>            Reporter: Siying Dong
>            Assignee: Siying Dong
>         Attachments: HIVE-2068.1.patch, HIVE-2068.2.patch, HIVE-2068.3.patch, HIVE-2068.4.patch, HIVE-2068.5.patch
>
>
> Currently, "select xx,xx from xxx where ...(only partition conditions) LIMIT xxx" will start a MapReduce job with input to be the whole table or partition. The latency can be huge if the table or partition is big. We could reduce number of input files to speed up the queries.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira