You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hive.apache.org by "Zheng Shao (JIRA)" <ji...@apache.org> on 2009/06/04 10:27:08 UTC

[jira] Created: (HIVE-541) Implement UDFs: INSTR and LOCATE

Implement UDFs: INSTR and LOCATE
--------------------------------

                 Key: HIVE-541
                 URL: https://issues.apache.org/jira/browse/HIVE-541
             Project: Hadoop Hive
          Issue Type: New Feature
    Affects Versions: 0.4.0
            Reporter: Zheng Shao
            Assignee: Zheng Shao


http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate

These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731858#action_12731858 ] 

Min Zhou commented on HIVE-541:
-------------------------------

all test cases passed on my side,  how's  yours?

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731764#action_12731764 ] 

Min Zhou commented on HIVE-541:
-------------------------------

hmm, It's may be a good way. I will try it soon. 

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731363#action_12731363 ] 

Min Zhou commented on HIVE-541:
-------------------------------

Text.find(String)  would not be faster ,  string argument will be encoded internally in Text, equivalent cost of Text.toString() which will decode a text. 

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-541.1.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou updated HIVE-541:
--------------------------

    Attachment: HIVE-541.2.patch

Added a GenericUDFUtils.findText() where string encoding and decoding is avoided, faster execution will be gained.  

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Yuntao Jia (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731582#action_12731582 ] 

Yuntao Jia commented on HIVE-541:
---------------------------------

The patch now uses "String.indexOf(String)" to find the position of a text inside of another text. What about writing our own function like 

int find(Text text, Text subtext);

It does not requires converting Texts to Strings any more. Would it be even faster?



> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Ashish Thusoo (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ashish Thusoo updated HIVE-541:
-------------------------------

    Assignee: Min Zhou  (was: Zheng Shao)

Assigning to Min as he has submitted the patch.

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Min Zhou (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Min Zhou updated HIVE-541:
--------------------------

    Attachment: HIVE-541.1.patch

patch



> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Zheng Shao
>         Attachments: HIVE-541.1.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Namit Jain resolved HIVE-541.
-----------------------------

      Resolution: Fixed
    Hadoop Flags: [Reviewed]

Committed. Thanks Min

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Namit Jain (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12731812#action_12731812 ] 

Namit Jain commented on HIVE-541:
---------------------------------

+1

The changes looks good - will merge if the tests pass

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>    Affects Versions: 0.4.0
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>         Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (HIVE-541) Implement UDFs: INSTR and LOCATE

Posted by "Carl Steinbach (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/HIVE-541?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Carl Steinbach updated HIVE-541:
--------------------------------

        Fix Version/s: 0.4.0
    Affects Version/s:     (was: 0.4.0)
          Component/s: UDF

> Implement UDFs: INSTR and LOCATE
> --------------------------------
>
>                 Key: HIVE-541
>                 URL: https://issues.apache.org/jira/browse/HIVE-541
>             Project: Hadoop Hive
>          Issue Type: New Feature
>          Components: UDF
>            Reporter: Zheng Shao
>            Assignee: Min Zhou
>             Fix For: 0.4.0
>
>         Attachments: HIVE-541.1.patch, HIVE-541.2.patch
>
>
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_instr
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_locate
> These functions can be directly implemented with Text (instead of String). This will make the test of whether one string contains another string much faster.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.