You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-dev@hadoop.apache.org by "Ashish Thusoo (JIRA)" <ji...@apache.org> on 2008/09/13 00:23:44 UTC

[jira] Commented: (HADOOP-4086) Add limit to Hive QL

    [ https://issues.apache.org/jira/browse/HADOOP-4086?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12630696#action_12630696 ] 

Ashish Thusoo commented on HADOOP-4086:
---------------------------------------

A few thoughts on this... 

The idea here is to implement a limited version of the limit clause as it appears in mysql. I am not planning to implement the offset part of it. Basically I want to support

SELECT .... LIMIT N 

where N is the number of rows to be returned from the query block (note this is only for the selects in the query block).

While generating the plan for the query, once the plan for the query block has been generated I can add the plan fragment

LimitMap -> ReduceSink -> LimitReduce

to it.

So for example if the plan of the query block is something like...

opX1 -> opX2 .... -> ReduceSink -> reduce op -> opY1 -> opY2 ...

This would look like

opX1 -> opX2 ... -> ReduceSink -> reduce op -> opY1 -> opY2 ... -> LimitMapOp -> ReduceSink -> LimitReduceOp

This should also work seemlessly with plans that do not have a ReduceSink ie. plans that look like

opX1 -> opX2 ... -> opXn

will look like

opX1 -> opX2 ... -> opXn -> LimitMap -> ReduceSink -> LinkReduce

Suppose we are calculating limit N the LimitMap will pass through N rows from each mapper and the LinkReduce will return N rows out of the ones it receives from the mappers. We have to run this map/reduce job with 1 reducer.

Thoughts?



> Add limit to Hive QL
> --------------------
>
>                 Key: HADOOP-4086
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4086
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: contrib/hive
>            Reporter: Ashish Thusoo
>            Assignee: Ashish Thusoo
>
> Add a limit feature to the Hive Query language.
> so you can do the following things:
> SELECT * FROM T LIMIT 10;
> and this would just return the 10 rows.
> No gaurantees are made on which 10 rows are returned by the query.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.