You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@hive.apache.org by "David Phillips (JIRA)" <ji...@apache.org> on 2008/12/16 01:09:44 UTC

[jira] Created: (HIVE-179) SUBSTR function should work like other databases

SUBSTR function should work like other databases
------------------------------------------------

                 Key: HIVE-179
                 URL: https://issues.apache.org/jira/browse/HIVE-179
             Project: Hadoop Hive
          Issue Type: Bug
          Components: Query Processor
            Reporter: David Phillips
            Assignee: David Phillips


Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.

Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.

Oracle treats position 0 the same as 1.  Perhaps we should too?

{noformat}
 SUBSTR('ABCDEFG',3,4): CDEF
SUBSTR('ABCDEFG',-5,4): CDEF
   SUBSTR('ABCDEFG',3): CDEFG
  SUBSTR('ABCDEFG',-5): CDEFG
     SUBSTR('ABC',1,1): A

MySQL:
     SUBSTR('ABC',0,1): <empty>
     SUBSTR('ABC',0,2): <empty>
     SUBSTR('ABC',1,0): <empty>
    SUBSTR('ABC',1,-1): <empty>

Oracle:
     SUBSTR('ABC',0,1): A
     SUBSTR('ABC',0,2): AB
     SUBSTR('ABC',1,0): <null>
    SUBSTR('ABC',1,-1): <null>
{noformat}


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-179) SUBSTR function should work like other databases

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-179:
--------------------------------

    Fix Version/s: 0.3.0
                       (was: 0.6.0)

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>             Fix For: 0.3.0
>
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-179) SUBSTR function should work like other databases

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Phillips updated HIVE-179:
--------------------------------

    Attachment: hive-substr.patch

This patch is tentative.  All tests pass except these two due to HIVE-215:

ant -Dtestcase=TestCliDriver -Dqfile=groupby2_map.q test
ant -Dtestcase=TestCliDriver -Dqfile=groupby3_map.q test



> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662602#action_12662602 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

The first map/reduce job produces longs where as the second map/reduce job expects a double.


> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-179:
-------------------------------

    Priority: Critical  (was: Major)

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657633#action_12657633 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

I think we should change it and make it ANSI compliant. It is still early days and I think we can ask the users to change their scripts. Not being SQL compliant will present more problems in the future. 

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662606#action_12662606 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

The actual error is happening between the map and reduce boundary of the second map reduce job. The ddl for the dynamic serde that is used to serialize data at this boundary is double, long where as the output generated is long, double as seen in the plan.


> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12658611#action_12658611 ] 

David Phillips commented on HIVE-179:
-------------------------------------

MySQL and Oracle both return NULL if any of the parameters are NULL.

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667547#action_12667547 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

Running tests on this. Will commit once it is done. Sorry, this one fell off my radar...

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662395#action_12662395 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

The patch looks correct and rather harmless. I tried the change that you suggested to groupby2_map.q on a clean branch and it worked find. If you are getting a cast exception it is likely related to the change of signatures but even that seems harmless. Trying out this patch in my environment now.

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662598#action_12662598 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

This seems to be exposing a latent bug in plan generation. Still investigating this, but here is an explanation. If you do 

diff -y ./ql/src/test/results/clientpositive/groupby2_map.q.out ./ql/src/test/results/clientpositive/.svn/text-base/groupby2_map.q.out.svn-base | less 

you will see the plan changes that this patch makes. An excerpt is as follows: 
-------------------------------- 
STAGE PLANS: STAGE PLANS: 
  Stage: Stage-1 Stage: Stage-1 
    Map Reduce Map Reduce 
      Alias -> Map Operator Tree: Alias -> Map Operator Tree: 
        src src 
            Group By Operator Group By Operator 
              aggregations: aggregations: 
                    expr: sum(UDFToDouble(substr(value, 5))) | expr: count(DISTINCT substr(value, 4)) 
                    expr: count(DISTINCT substr(value, 5)) | expr: sum(UDFToDouble(substr(value, 4))) 
              keys: keys: 
                    expr: substr(key, 1, 1) | expr: substr(key, 0, 1) 
                    type: string type: string 
                    expr: substr(value, 5) | expr: substr(value, 4) 
                    type: string type: string 
              mode: hash mode: hash 
              Reduce Output Operator Reduce Output Operator 
                key expressions: key expressions: 
                      expr: 0 expr: 0 
                      type: string type: string 
                      expr: 1 expr: 1 
                      type: string type: string 
                sort order: ++ sort order: ++ 
                Map-reduce partition columns: Map-reduce partition columns: 
                      expr: 0 expr: 0 
                      type: string type: string 
                      expr: 1 expr: 1 
                      type: string type: string 
                tag: -1 tag: -1 
                value expressions: value expressions: 
                      expr: 2 expr: 2 
                      type: double < 
                      expr: 3 < 
                      type: bigint type: bigint 
                                                              > expr: 3 
                                                              > type: double 
      Reduce Operator Tree: Reduce Operator Tree: 
        Group By Operator Group By Operator 
          aggregations: aggregations: 
                expr: count(DISTINCT KEY.1) expr: count(DISTINCT KEY.1) 
                expr: sum(VALUE.0) | expr: sum(VALUE.1) 
          keys: keys: 
                expr: KEY.0 expr: KEY.0 
---------------------------- 
So the point to note here is that the count and the sum expressions are the top most group by operator so 
in the last group by operator in the excerpt I think expr: count(DISTINCT KEY.1) should have been expr: count(DISTINCT KEY0) 

Still debugging this one..

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-179:
-------------------------------

       Resolution: Fixed
    Fix Version/s: 0.2.0
     Hadoop Flags: [Incompatible change, Reviewed]  (was: [Incompatible change])
           Status: Resolved  (was: Patch Available)

committed. Thanks David!! Sorry for not doing this before...

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>             Fix For: 0.2.0
>
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657639#action_12657639 ] 

Zheng Shao commented on HIVE-179:
---------------------------------

Another thing to test is what does MYSQL returns when the string, or index, or len is NULL.

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657619#action_12657619 ] 

David Phillips commented on HIVE-179:
-------------------------------------

I note this will break ~27 tests that use {{substr(src.value,4)}} or similar.  Is this going to be a problem for existing user code?

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Zheng Shao (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12657638#action_12657638 ] 

Zheng Shao commented on HIVE-179:
---------------------------------

Totally agree. Let's make it ANSI compliant!


> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HIVE-179) SUBSTR function should work like other databases

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.

    [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12662600#action_12662600 ] 

Ashish Thusoo commented on HIVE-179:
------------------------------------

On second thoughts, the plan seems to be ok. Since the keys have not swapped (only the values have swapped), count(DISTINCT KEY.1) seems to be correct.


> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HIVE-179) SUBSTR function should work like other databases

Posted by "David Phillips (JIRA)" <ji...@apache.org>.

     [ https://issues.apache.org/jira/browse/HIVE-179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Phillips updated HIVE-179:
--------------------------------

    Release Note: SUBSTR position now starts at 1, not 0.  A negative position counts backwards from the end of the string, starting at -1.  Position 0 is treated as position 1.
          Status: Patch Available  (was: Open)

> SUBSTR function should work like other databases
> ------------------------------------------------
>
>                 Key: HIVE-179
>                 URL: https://issues.apache.org/jira/browse/HIVE-179
>             Project: Hadoop Hive
>          Issue Type: Bug
>          Components: Query Processor
>            Reporter: David Phillips
>            Assignee: David Phillips
>            Priority: Critical
>         Attachments: hive-substr.patch
>
>
> Positions start at 1, not 0.  Negative positions start at the end of the string and count backwards.
> Oracle returns null for lengths less than 1 or non-existent substrings (any empty strings are null).  MySQL and PostgreSQL return empty strings, never null.  PostgreSQL errors for negative lengths.  I suggest we follow the MySQL behavior.
> Oracle treats position 0 the same as 1.  Perhaps we should too?
> {noformat}
>  SUBSTR('ABCDEFG',3,4): CDEF
> SUBSTR('ABCDEFG',-5,4): CDEF
>    SUBSTR('ABCDEFG',3): CDEFG
>   SUBSTR('ABCDEFG',-5): CDEFG
>      SUBSTR('ABC',1,1): A
> MySQL:
>      SUBSTR('ABC',0,1): <empty>
>      SUBSTR('ABC',0,2): <empty>
>      SUBSTR('ABC',1,0): <empty>
>     SUBSTR('ABC',1,-1): <empty>
> Oracle:
>      SUBSTR('ABC',0,1): A
>      SUBSTR('ABC',0,2): AB
>      SUBSTR('ABC',1,0): <null>
>     SUBSTR('ABC',1,-1): <null>
> {noformat}

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.