You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "John Sichi (JIRA)" <ji...@apache.org> on 2010/04/13 23:31:49 UTC
[jira] Created: (HIVE-1304) add row_sequence UDF
add row_sequence UDF
--------------------
Key: HIVE-1304
URL: https://issues.apache.org/jira/browse/HIVE-1304
Project: Hadoop Hive
Issue Type: New Feature
Components: Metastore
Affects Versions: 0.6.0
Reporter: John Sichi
Assignee: John Sichi
Fix For: 0.6.0
This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Commented: (HIVE-1304) add row_sequence UDF
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881501#action_12881501 ]
Namit Jain commented on HIVE-1304:
----------------------------------
+1
will commit if the tests pass
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Status: Patch Available (was: Open)
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Status: Open (was: Patch Available)
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Attachment: HIVE-1304.2.patch
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Fix Version/s: 0.7.0
Affects Version/s: 0.6.0
(was: 0.7.0)
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Status: Patch Available (was: Open)
New patch addresses Namit's comments.
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1304) add row_sequence UDF
Posted by "Edward Capriolo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12856638#action_12856638 ]
Edward Capriolo commented on HIVE-1304:
---------------------------------------
Can we work on... https://issues.apache.org/jira/browse/HIVE-1265..first.
As of now commits on UDF's is a major pain. Two or three times already I had had to regenerate UDFs because someone else touched the FunctionRegistry. I have had two or three waves of UDF's I want to commit ReflectionUDF, MathUDF, EncryptionUDF but something else sneaks in and I have to regenerate. It is a major pain.
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Namit Jain updated HIVE-1304:
-----------------------------
Status: Resolved (was: Patch Available)
Hadoop Flags: [Reviewed]
Resolution: Fixed
Committed. Thanks John
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Status: Open (was: Patch Available)
Oops, need to move test to contrib too.
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Attachment: HIVE-1304.3.patch
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-1304) add row_sequence UDF
Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12881299#action_12881299 ]
Namit Jain commented on HIVE-1304:
----------------------------------
Can you explicitly set the number of reducers to 1, so ensure same results ? It doesn't matter today, but should be a issue with miniMr etc.
Also, do you want to keep it in contrib - since this is not guaranteed - multiple mappers may give same results etc.
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Component/s: Query Processor
(was: Metastore)
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Status: Patch Available (was: Open)
New patch with test moved to contrib, and DESCRIBE and EXPLAIN thrown in for good measure.
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch, HIVE-1304.2.patch, HIVE-1304.3.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Affects Version/s: 0.7.0
(was: 0.6.0)
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Query Processor
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.7.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-1304) add row_sequence UDF
Posted by "John Sichi (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
John Sichi updated HIVE-1304:
-----------------------------
Attachment: HIVE-1304.1.patch
> add row_sequence UDF
> --------------------
>
> Key: HIVE-1304
> URL: https://issues.apache.org/jira/browse/HIVE-1304
> Project: Hadoop Hive
> Issue Type: New Feature
> Components: Metastore
> Affects Versions: 0.6.0
> Reporter: John Sichi
> Assignee: John Sichi
> Fix For: 0.6.0
>
> Attachments: HIVE-1304.1.patch
>
>
> This is a poor man's answer to the standard analytic function row_number(); it assigns a sequence of numbers to rows, starting from 1.
> I'm calling it row_sequence() to distinguish it from the real analytic function, so that once we add support for those, there won't be any conflict with the existing UDF.
> The problem with this UDF approach is that there are no guarantees about ordering in SQL processing internals, so use with caution.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: https://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira