You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Alexey Zotov <az...@griddynamics.com> on 2013/12/26 12:36:28 UTC

concat_ws() UDF

Hello Guys,

historically concat_ws() UDF was added in the scope of
https://issues.apache.org/jira/browse/HIVE-682 ticket. I have a simple
question about its implementation. According to above ticket's description
it should have behavior like MySQL implementation (
http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws).
MySQL documentation says:
"CONCAT_WS()<http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws>
does
not skip empty strings. However, it does skip any NULL values after the
separator argument.". I have performed a simple test:

> create table test (col1 string);
> select concat_ws(',', '10', NULL, '4', '', 'a') from test limit 1;


as a result I have got:

> 10,4,,a

which looks good for me.

But in tests (
https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out
:
78 line) I see the following lines:

POSTHOOK: query: SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>        concat_ws('**', dest1.c1, *NULL*, dest1.c3) FROM dest1
> POSTHOOK: type: QUERY
> POSTHOOK: Input: default@dest1
> #### A masked pattern was here ####
> POSTHOOK: Lineage: dest1.c1 SIMPLE []
> POSTHOOK: Lineage: dest1.c2 SIMPLE []
> POSTHOOK: Lineage: dest1.c3 SIMPLE []
> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309

which looks like NULLs are not skipped.

What have I missed?

Thanks, Alexey.

Re: concat_ws() UDF

Posted by Alexey Zotov <az...@griddynamics.com>.
Ouhcc.. It was a very stupid question from my side :( Thanks a lot Ritesh!


On Sat, Dec 28, 2013 at 12:32 PM, Ritesh Agrawal <ra...@netflix.com>wrote:

> The first string in concat_ws is used as a separator. Since the third
> value in the last concat_ws is null, it skipped that and simply
> concactenated second and fourth value by **. Hence you got  abc****8675309
>
> Ritesh
>
>
> On Fri, Dec 27, 2013 at 12:13 AM, Alexey Zotov <az...@griddynamics.com>wrote:
>
>> Sorry, but in my previous email there was some misleading due to text
>> formatting.
>>
>> So, there are tests for concat_ws() UDF:
>> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out.
>> One of tests (78 line) contains the following query:
>>
>>> SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>>>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>>>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>>>        concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1
>>
>> and expects the following results:
>>
>>> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309
>>
>>
>> I'm confused by the result of the last concat_ws. Why it returns abc****
>> 8675309 instead of abc8675309? It looks like NULL VALUE has not been
>> skipped.
>>
>> Thanks.
>>
>>
>> On Thu, Dec 26, 2013 at 3:36 PM, Alexey Zotov <az...@griddynamics.com>wrote:
>>
>>> Hello Guys,
>>>
>>> historically concat_ws() UDF was added in the scope of
>>> https://issues.apache.org/jira/browse/HIVE-682 ticket. I have a simple
>>> question about its implementation. According to above ticket's description
>>> it should have behavior like MySQL implementation (
>>> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws).
>>> MySQL documentation says: "CONCAT_WS()<http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws> does
>>> not skip empty strings. However, it does skip any NULL values after the
>>> separator argument.". I have performed a simple test:
>>>
>>>> create table test (col1 string);
>>>> select concat_ws(',', '10', NULL, '4', '', 'a') from test limit 1;
>>>
>>>
>>> as a result I have got:
>>>
>>>> 10,4,,a
>>>
>>> which looks good for me.
>>>
>>> But in tests (
>>> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out :
>>> 78 line) I see the following lines:
>>>
>>>
>>>>
>>>>
>>>> POSTHOOK: query: SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>>>>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>>>>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>>>>        concat_ws('**', dest1.c1, *NULL*, dest1.c3) FROM dest1
>>>>
>>>>
>>>>
>>>> POSTHOOK: type: QUERY
>>>> POSTHOOK: Input: default@dest1
>>>> #### A masked pattern was here ####
>>>> POSTHOOK: Lineage: dest1.c1 SIMPLE []
>>>> POSTHOOK: Lineage: dest1.c2 SIMPLE []
>>>> POSTHOOK: Lineage: dest1.c3 SIMPLE []
>>>>
>>>>
>>>>
>>>> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309
>>>
>>> which looks like NULLs are not skipped.
>>>
>>> What have I missed?
>>>
>>> Thanks, Alexey.
>>>
>>
>>
>>
>> --
>>
>> Best regards
>>
>> Zotov Alexey
>> Grid Dynamics
>> Skype: azotcsit
>>
>
>


-- 

Best regards

Zotov Alexey
Grid Dynamics
Skype: azotcsit

Re: concat_ws() UDF

Posted by Ritesh Agrawal <ra...@netflix.com>.
The first string in concat_ws is used as a separator. Since the third value
in the last concat_ws is null, it skipped that and simply concactenated
second and fourth value by **. Hence you got  abc****8675309

Ritesh


On Fri, Dec 27, 2013 at 12:13 AM, Alexey Zotov <az...@griddynamics.com>wrote:

> Sorry, but in my previous email there was some misleading due to text
> formatting.
>
> So, there are tests for concat_ws() UDF:
> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out.
> One of tests (78 line) contains the following query:
>
>> SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>>        concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1
>
> and expects the following results:
>
>> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309
>
>
> I'm confused by the result of the last concat_ws. Why it returns abc****
> 8675309 instead of abc8675309? It looks like NULL VALUE has not been
> skipped.
>
> Thanks.
>
>
> On Thu, Dec 26, 2013 at 3:36 PM, Alexey Zotov <az...@griddynamics.com>wrote:
>
>> Hello Guys,
>>
>> historically concat_ws() UDF was added in the scope of
>> https://issues.apache.org/jira/browse/HIVE-682 ticket. I have a simple
>> question about its implementation. According to above ticket's description
>> it should have behavior like MySQL implementation (
>> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws).
>> MySQL documentation says: "CONCAT_WS()<http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws> does
>> not skip empty strings. However, it does skip any NULL values after the
>> separator argument.". I have performed a simple test:
>>
>>> create table test (col1 string);
>>> select concat_ws(',', '10', NULL, '4', '', 'a') from test limit 1;
>>
>>
>> as a result I have got:
>>
>>> 10,4,,a
>>
>> which looks good for me.
>>
>> But in tests (
>> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out :
>> 78 line) I see the following lines:
>>
>>
>>>
>>> POSTHOOK: query: SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>>>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>>>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>>>        concat_ws('**', dest1.c1, *NULL*, dest1.c3) FROM dest1
>>>
>>>
>>> POSTHOOK: type: QUERY
>>> POSTHOOK: Input: default@dest1
>>> #### A masked pattern was here ####
>>> POSTHOOK: Lineage: dest1.c1 SIMPLE []
>>> POSTHOOK: Lineage: dest1.c2 SIMPLE []
>>> POSTHOOK: Lineage: dest1.c3 SIMPLE []
>>>
>>>
>>> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309
>>
>> which looks like NULLs are not skipped.
>>
>> What have I missed?
>>
>> Thanks, Alexey.
>>
>
>
>
> --
>
> Best regards
>
> Zotov Alexey
> Grid Dynamics
> Skype: azotcsit
>

Re: concat_ws() UDF

Posted by Alexey Zotov <az...@griddynamics.com>.
Sorry, but in my previous email there was some misleading due to text
formatting.

So, there are tests for concat_ws() UDF:
https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out.
One of tests (78 line) contains the following query:

> SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>        concat_ws('**', dest1.c1, NULL, dest1.c3) FROM dest1

and expects the following results:

> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309


I'm confused by the result of the last concat_ws. Why it returns abc****
8675309 instead of abc8675309? It looks like NULL VALUE has not been
skipped.

Thanks.


On Thu, Dec 26, 2013 at 3:36 PM, Alexey Zotov <az...@griddynamics.com>wrote:

> Hello Guys,
>
> historically concat_ws() UDF was added in the scope of
> https://issues.apache.org/jira/browse/HIVE-682 ticket. I have a simple
> question about its implementation. According to above ticket's description
> it should have behavior like MySQL implementation (
> http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws).
> MySQL documentation says: "CONCAT_WS()<http://dev.mysql.com/doc/refman/5.0/en/string-functions.html#function_concat-ws> does
> not skip empty strings. However, it does skip any NULL values after the
> separator argument.". I have performed a simple test:
>
>> create table test (col1 string);
>> select concat_ws(',', '10', NULL, '4', '', 'a') from test limit 1;
>
>
> as a result I have got:
>
>> 10,4,,a
>
> which looks good for me.
>
> But in tests (
> https://github.com/apache/hive/blob/trunk/ql/src/test/results/clientpositive/udf_concat_ws.q.out :
> 78 line) I see the following lines:
>
>
>> POSTHOOK: query: SELECT concat_ws(dest1.c1, dest1.c2, dest1.c3),
>>        concat_ws(',', dest1.c1, dest1.c2, dest1.c3),
>>        concat_ws(NULL, dest1.c1, dest1.c2, dest1.c3),
>>        concat_ws('**', dest1.c1, *NULL*, dest1.c3) FROM dest1
>>
>> POSTHOOK: type: QUERY
>> POSTHOOK: Input: default@dest1
>> #### A masked pattern was here ####
>> POSTHOOK: Lineage: dest1.c1 SIMPLE []
>> POSTHOOK: Lineage: dest1.c2 SIMPLE []
>> POSTHOOK: Lineage: dest1.c3 SIMPLE []
>>
>> xyzabc8675309        abc,xyz,8675309        NULL        abc****8675309
>
> which looks like NULLs are not skipped.
>
> What have I missed?
>
> Thanks, Alexey.
>



-- 

Best regards

Zotov Alexey
Grid Dynamics
Skype: azotcsit