You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "John Sichi (JIRA)" <ji...@apache.org> on 2010/04/13 23:31:49 UTC

[jira] Created: (HIVE-1304) add row_sequence UDF

add row_sequence UDF
--------------------

                 Key: HIVE-1304
                 URL: https://issues.apache.org/jira/browse/HIVE-1304
             Project: Hadoop Hive
          Issue Type: New Feature
          Components: Metastore
    Affects Versions: 0.6.0
            Reporter: John Sichi
            Assignee: John Sichi
             Fix For: 0.6.0


This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.

I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.

The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.


-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Commented: (HIVE-1304) add row_sequence UDF

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881501#action_12881501 ] 

Namit Jain commented on HIVE-1304:
----------------------------------

+1

will commit if the tests pass

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Status: Patch Available  (was: Open)

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Status: Open  (was: Patch Available)

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Attachment: HIVE-1304.2.patch

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

        Fix Version/s: 0.7.0
    Affects Version/s: 0.6.0
                           (was: 0.7.0)

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Status: Patch Available  (was: Open)

New patch addresses Namit's comments.


> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1304) add row_sequence UDF

Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856638#action_12856638 ] 

Edward Capriolo commented on HIVE-1304:
---------------------------------------

Can we work on... https://issues.apache.org/jira/browse/HIVE-1265..first.

As of now commits on UDF's is a major pain. Two or three times already I had had to regenerate UDFs because someone else touched the FunctionRegistry. I have had two or three waves of UDF's I want to commit ReflectionUDF, MathUDF, EncryptionUDF but something else sneaks in and I have to regenerate. It is a major pain. 



> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain updated HIVE-1304:
-----------------------------

          Status: Resolved  (was: Patch Available)
    Hadoop Flags: [Reviewed]
      Resolution: Fixed

Committed. Thanks John

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Status: Open  (was: Patch Available)

Oops, need to move test to contrib too.


> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Attachment: HIVE-1304.3.patch

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-1304) add row_sequence UDF

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881299#action_12881299 ] 

Namit Jain commented on HIVE-1304:
----------------------------------

Can you explicitly set the number of reducers to 1, so ensure same results ? It doesn't matter today, but should be a issue with miniMr etc.
Also, do you want to keep it in contrib  - since this is not guaranteed - multiple mappers may give same results etc.

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Component/s: Query Processor
                     (was: Metastore)

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Status: Patch Available  (was: Open)

New patch with test moved to contrib, and DESCRIBE and EXPLAIN thrown in for good measure.


> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Affects Version/s: 0.7.0
                           (was: 0.6.0)

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Query Processor
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.7.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-1304) add row_sequence UDF

Posted by "John Sichi (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

John Sichi updated HIVE-1304:
-----------------------------

    Attachment: HIVE-1304.1.patch

> add row_sequence UDF
> --------------------
>
>                 Key: HIVE-1304
>                 URL: https://issues.apache.org/jira/browse/HIVE-1304
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: Metastore
>    Affects Versions: 0.6.0
>            Reporter: John Sichi
>            Assignee: John Sichi
>             Fix For: 0.6.0
>
>         Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira