You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "David Phillips (JIRA)" <ji...@apache.org> on 2008/12/16 01:09:44 UTC
[jira] Created: (HIVE-179) SUBSTR function should work like other
databases
SUBSTR function should work like other databases
------------------------------------------------
Key: HIVE-179
URL: https://issues.apache.org/jira/browse/HIVE-179
Project: Hadoop Hive
Issue Type: Bug
Components: Query Processor
Reporter: David Phillips
Assignee: David Phillips
Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
Oracle treats position 0 the same as 1. Perhaps we should too?
{noformat}
SUBSTR('ABCDEFG',3,4): CDEF
SUBSTR('ABCDEFG',-5,4): CDEF
SUBSTR('ABCDEFG',3): CDEFG
SUBSTR('ABCDEFG',-5): CDEFG
SUBSTR('ABC',1,1): A
MySQL:
SUBSTR('ABC',0,1): <empty>
SUBSTR('ABC',0,2): <empty>
SUBSTR('ABC',1,0): <empty>
SUBSTR('ABC',1,-1): <empty>
Oracle:
SUBSTR('ABC',0,1): A
SUBSTR('ABC',0,2): AB
SUBSTR('ABC',1,0): <null>
SUBSTR('ABC',1,-1): <null>
{noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Carl Steinbach updated HIVE-179:
--------------------------------
Fix Version/s: 0.3.0
(was: 0.6.0)
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Fix For: 0.3.0
>
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-179) SUBSTR function should work like other
databases
Posted by "David Phillips (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Phillips updated HIVE-179:
--------------------------------
Attachment: hive-substr.patch
This patch is tentative. All tests pass except these two due to HIVE-215:
ant -Dtestcase=TestCliDriver -Dqfile=groupby2_map.q test
ant -Dtestcase=TestCliDriver -Dqfile=groupby3_map.q test
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662602#action_12662602 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
The first map/reduce job produces longs where as the second map/reduce job expects a double.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Thusoo updated HIVE-179:
-------------------------------
Priority: Critical (was: Major)
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657633#action_12657633 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
I think we should change it and make it ANSI compliant. It is still early days and I think we can ask the users to change their scripts. Not being SQL compliant will present more problems in the future.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662606#action_12662606 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
The actual error is happening between the map and reduce boundary of the second map reduce job. The ddl for the dynamic serde that is used to serialize data at this boundary is double, long where as the output generated is long, double as seen in the plan.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "David Phillips (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658611#action_12658611 ]
David Phillips commented on HIVE-179:
-------------------------------------
MySQL and Oracle both return NULL if any of the parameters are NULL.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667547#action_12667547 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
Running tests on this. Will commit once it is done. Sorry, this one fell off my radar...
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662395#action_12662395 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
The patch looks correct and rather harmless. I tried the change that you suggested to groupby2_map.q on a clean branch and it worked find. If you are getting a cast exception it is likely related to the change of signatures but even that seems harmless. Trying out this patch in my environment now.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662598#action_12662598 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
This seems to be exposing a latent bug in plan generation. Still investigating this, but here is an explanation. If you do
diff -y ./ql/src/test/results/clientpositive/groupby2_map.q.out ./ql/src/test/results/clientpositive/.svn/text-base/groupby2_map.q.out.svn-base | less
you will see the plan changes that this patch makes. An excerpt is as follows:
--------------------------------
STAGE PLANS: STAGE PLANS:
Stage: Stage-1 Stage: Stage-1
Map Reduce Map Reduce
Alias -> Map Operator Tree: Alias -> Map Operator Tree:
src src
Group By Operator Group By Operator
aggregations: aggregations:
expr: sum(UDFToDouble(substr(value, 5))) | expr: count(DISTINCT substr(value, 4))
expr: count(DISTINCT substr(value, 5)) | expr: sum(UDFToDouble(substr(value, 4)))
keys: keys:
expr: substr(key, 1, 1) | expr: substr(key, 0, 1)
type: string type: string
expr: substr(value, 5) | expr: substr(value, 4)
type: string type: string
mode: hash mode: hash
Reduce Output Operator Reduce Output Operator
key expressions: key expressions:
expr: 0 expr: 0
type: string type: string
expr: 1 expr: 1
type: string type: string
sort order: ++ sort order: ++
Map-reduce partition columns: Map-reduce partition columns:
expr: 0 expr: 0
type: string type: string
expr: 1 expr: 1
type: string type: string
tag: -1 tag: -1
value expressions: value expressions:
expr: 2 expr: 2
type: double <
expr: 3 <
type: bigint type: bigint
> expr: 3
> type: double
Reduce Operator Tree: Reduce Operator Tree:
Group By Operator Group By Operator
aggregations: aggregations:
expr: count(DISTINCT KEY.1) expr: count(DISTINCT KEY.1)
expr: sum(VALUE.0) | expr: sum(VALUE.1)
keys: keys:
expr: KEY.0 expr: KEY.0
----------------------------
So the point to note here is that the count and the sum expressions are the top most group by operator so
in the last group by operator in the excerpt I think expr: count(DISTINCT KEY.1) should have been expr: count(DISTINCT KEY0)
Still debugging this one..
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Ashish Thusoo updated HIVE-179:
-------------------------------
Resolution: Fixed
Fix Version/s: 0.2.0
Hadoop Flags: [Incompatible change, Reviewed] (was: [Incompatible change])
Status: Resolved (was: Patch Available)
committed. Thanks David!! Sorry for not doing this before...
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Fix For: 0.2.0
>
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657639#action_12657639 ]
Zheng Shao commented on HIVE-179:
---------------------------------
Another thing to test is what does MYSQL returns when the string, or index, or len is NULL.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "David Phillips (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657619#action_12657619 ]
David Phillips commented on HIVE-179:
-------------------------------------
I note this will break ~27 tests that use {{substr(src.value,4)}} or similar. Is this going to be a problem for existing user code?
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657638#action_12657638 ]
Zheng Shao commented on HIVE-179:
---------------------------------
Totally agree. Let's make it ANSI compliant!
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (HIVE-179) SUBSTR function should work like other
databases
Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662600#action_12662600 ]
Ashish Thusoo commented on HIVE-179:
------------------------------------
On second thoughts, the plan seems to be ok. Since the keys have not swapped (only the values have swapped), count(DISTINCT KEY.1) seems to be correct.
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (HIVE-179) SUBSTR function should work like other
databases
Posted by "David Phillips (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
David Phillips updated HIVE-179:
--------------------------------
Release Note: SUBSTR position now starts at 1, not 0. A negative position counts backwards from the end of the string, starting at -1. Position 0 is treated as position 1.
Status: Patch Available (was: Open)
> SUBSTR function should work like other databases
> ------------------------------------------------
>
> Key: HIVE-179
> URL: https://issues.apache.org/jira/browse/HIVE-179
> Project: Hadoop Hive
> Issue Type: Bug
> Components: Query Processor
> Reporter: David Phillips
> Assignee: David Phillips
> Priority: Critical
> Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0. Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null). MySQL and PostgreSQL return empty strings, never null. PostgreSQL errors for negative lengths. I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1. Perhaps we should too?
> {noformat}
> SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
> SUBSTR('ABCDEFG',3): CDEFG
> SUBSTR('ABCDEFG',-5): CDEFG
> SUBSTR('ABC',1,1): A
> MySQL:
> SUBSTR('ABC',0,1): <empty>
> SUBSTR('ABC',0,2): <empty>
> SUBSTR('ABC',1,0): <empty>
> SUBSTR('ABC',1,-1): <empty>
> Oracle:
> SUBSTR('ABC',0,1): A
> SUBSTR('ABC',0,2): AB
> SUBSTR('ABC',1,0): <null>
> SUBSTR('ABC',1,-1): <null>
> {noformat}
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.