You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2015/03/27 08:47:19 UTC
Discrepancy in String matching between Teradata and HIVE
Hi All,
I am getting into Hive and learning hive. I have customer table in teradata
, used sqoop to extract complete table in hive which worked fine.
See below customer table both in Teradata and HIVE.
*In Teradata :*
select TOP 4 id,name,'"'||status||'"' from customer;
3172460 Customer#003172460 "BUILDING "
3017726 Customer#003017726 "BUILDING "
2817987 Customer#002817987 "COMPLETE "
2817984 Customer#002817984 "BUILDING "
*In HIVE :*
select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
3172460 Customer#003172460 "BUILDING "
3017726 Customer#003017726 "BUILDING "
2817987 Customer#002817987 "COMPLETE "
2817984 Customer#002817984 "BUILDING "
When I tried to fetch records from table customer with column matching
which is of String type. I am getting different result for same query in
different environment.
See below query results..
*In Teradata :*
select TOP 2 id,name,'"'||status||'"' from customer WHERE status =
'BUILDING';
3172460 Customer#003172460 "BUILDING "
3017726 Customer#003017726 "BUILDING "
*In HIVE :*
select id,name,CONCAT ('"' , status , '"') from customer WHERE
status = 'BUILDING' LIMIT 2;
***<<No Result>>***
It seems that teradata is doing trimming short of thing before actually
comparing stating values. But Hive is matching strings as it is.
Not sure, It is expected behaviour or bug or can be raised as enhancement.
I see below possible solution:
- Convert into like operator expression with wildcard character before
and after
Looking forward for your response on this. How can it be handled/achieved
in hive.
Regards
Sanjiv Singh
Mob : +091 9990-447-339
Re: Discrepancy in String matching between Teradata and HIVE
Posted by Raunak Jhawar <ra...@gmail.com>.
Here you are trying to compare two strings ' BUILDING ' (this value has
extra white spaces) and 'BUILDING'. These strings are not same since one
has bunch of extra white-spaces and other doesn't. Moreover, length of both
strings is not same.
This would return 1: select if('ABC' == rtrim('ABC '), 1,0);
This would return 0: select if('ABC' == 'ABC ', 1,0);
--
Thanks,
Raunak Jhawar
On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:
> Hi,
>
> I can use rtrim function, i.e:
>
> select id,name,CONCAT ('"' , status , '"') from customer WHERE rtrim(status) = 'BUILDING' LIMIT 2;
>
> But question raised what standard in string comparison Hive uses?
> According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
> <http://support.microsoft.com/en-us/kb/316626> for an article about it.
>
> Regards
> Sanjiv Singh
> Mob : +091 9990-447-339
>
> On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Hive does not manipulate data by its own, if your processing logic needs
>> the trimming of spaces then you can provide that in query.
>>
>>
>>
>> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
>> wrote:
>>
>>>
>>> Hi All,
>>>
>>> I am getting into Hive and learning hive. I have customer table in
>>> teradata , used sqoop to extract complete table in hive which worked fine.
>>>
>>> See below customer table both in Teradata and HIVE.
>>>
>>> *In Teradata :*
>>>
>>> select TOP 4 id,name,'"'||status||'"' from customer;
>>>
>>> 3172460 Customer#003172460 "BUILDING "
>>> 3017726 Customer#003017726 "BUILDING "
>>> 2817987 Customer#002817987 "COMPLETE "
>>> 2817984 Customer#002817984 "BUILDING "
>>>
>>> *In HIVE :*
>>>
>>> select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>>
>>> 3172460 Customer#003172460 "BUILDING "
>>> 3017726 Customer#003017726 "BUILDING "
>>> 2817987 Customer#002817987 "COMPLETE "
>>> 2817984 Customer#002817984 "BUILDING "
>>>
>>> When I tried to fetch records from table customer with column matching
>>> which is of String type. I am getting different result for same query in
>>> different environment.
>>>
>>> See below query results..
>>>
>>> *In Teradata :*
>>>
>>> select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>>
>>> 3172460 Customer#003172460 "BUILDING "
>>> 3017726 Customer#003017726 "BUILDING "
>>>
>>> *In HIVE :*
>>>
>>> select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>>>
>>> ***<<No Result>>***
>>>
>>> It seems that teradata is doing trimming short of thing before actually
>>> comparing stating values. But Hive is matching strings as it is.
>>>
>>> Not sure, It is expected behaviour or bug or can be raised as
>>> enhancement.
>>>
>>> I see below possible solution:
>>>
>>> - Convert into like operator expression with wildcard character
>>> before and after
>>>
>>> Looking forward for your response on this. How can it be
>>> handled/achieved in hive.
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob : +091 9990-447-339
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
Re: Discrepancy in String matching between Teradata and HIVE
Posted by Nitin Pawar <ni...@gmail.com>.
Hive is only PRO SQL compliance,
In hive the string comparisons work just like they would work in java
so in hive
"BUILDING" = "BUILDING"
"BUILDING " != "BUILDING" (extra space added)
On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:
> Hi,
>
> I can use rtrim function, i.e:
>
> select id,name,CONCAT ('"' , status , '"') from customer WHERE rtrim(status) = 'BUILDING' LIMIT 2;
>
> But question raised what standard in string comparison Hive uses?
> According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
> <http://support.microsoft.com/en-us/kb/316626> for an article about it.
>
> Regards
> Sanjiv Singh
> Mob : +091 9990-447-339
>
> On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Hive does not manipulate data by its own, if your processing logic needs
>> the trimming of spaces then you can provide that in query.
>>
>>
>>
>> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
>> wrote:
>>
>>>
>>> Hi All,
>>>
>>> I am getting into Hive and learning hive. I have customer table in
>>> teradata , used sqoop to extract complete table in hive which worked fine.
>>>
>>> See below customer table both in Teradata and HIVE.
>>>
>>> *In Teradata :*
>>>
>>> select TOP 4 id,name,'"'||status||'"' from customer;
>>>
>>> 3172460 Customer#003172460 "BUILDING "
>>> 3017726 Customer#003017726 "BUILDING "
>>> 2817987 Customer#002817987 "COMPLETE "
>>> 2817984 Customer#002817984 "BUILDING "
>>>
>>> *In HIVE :*
>>>
>>> select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>>
>>> 3172460 Customer#003172460 "BUILDING "
>>> 3017726 Customer#003017726 "BUILDING "
>>> 2817987 Customer#002817987 "COMPLETE "
>>> 2817984 Customer#002817984 "BUILDING "
>>>
>>> When I tried to fetch records from table customer with column matching
>>> which is of String type. I am getting different result for same query in
>>> different environment.
>>>
>>> See below query results..
>>>
>>> *In Teradata :*
>>>
>>> select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>>
>>> 3172460 Customer#003172460 "BUILDING "
>>> 3017726 Customer#003017726 "BUILDING "
>>>
>>> *In HIVE :*
>>>
>>> select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>>>
>>> ***<<No Result>>***
>>>
>>> It seems that teradata is doing trimming short of thing before actually
>>> comparing stating values. But Hive is matching strings as it is.
>>>
>>> Not sure, It is expected behaviour or bug or can be raised as
>>> enhancement.
>>>
>>> I see below possible solution:
>>>
>>> - Convert into like operator expression with wildcard character
>>> before and after
>>>
>>> Looking forward for your response on this. How can it be
>>> handled/achieved in hive.
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob : +091 9990-447-339
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>
--
Nitin Pawar
Re: Discrepancy in String matching between Teradata and HIVE
Posted by "@Sanjiv Singh" <sa...@gmail.com>.
Hi,
I can use rtrim function, i.e:
select id,name,CONCAT ('"' , status , '"') from customer WHERE
rtrim(status) = 'BUILDING' LIMIT 2;
But question raised what standard in string comparison Hive uses? According
to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
<http://support.microsoft.com/en-us/kb/316626> for an article about it.
Regards
Sanjiv Singh
Mob : +091 9990-447-339
On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <ni...@gmail.com>
wrote:
> Hive does not manipulate data by its own, if your processing logic needs
> the trimming of spaces then you can provide that in query.
>
>
>
> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>>
>> Hi All,
>>
>> I am getting into Hive and learning hive. I have customer table in
>> teradata , used sqoop to extract complete table in hive which worked fine.
>>
>> See below customer table both in Teradata and HIVE.
>>
>> *In Teradata :*
>>
>> select TOP 4 id,name,'"'||status||'"' from customer;
>>
>> 3172460 Customer#003172460 "BUILDING "
>> 3017726 Customer#003017726 "BUILDING "
>> 2817987 Customer#002817987 "COMPLETE "
>> 2817984 Customer#002817984 "BUILDING "
>>
>> *In HIVE :*
>>
>> select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>
>> 3172460 Customer#003172460 "BUILDING "
>> 3017726 Customer#003017726 "BUILDING "
>> 2817987 Customer#002817987 "COMPLETE "
>> 2817984 Customer#002817984 "BUILDING "
>>
>> When I tried to fetch records from table customer with column matching
>> which is of String type. I am getting different result for same query in
>> different environment.
>>
>> See below query results..
>>
>> *In Teradata :*
>>
>> select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>
>> 3172460 Customer#003172460 "BUILDING "
>> 3017726 Customer#003017726 "BUILDING "
>>
>> *In HIVE :*
>>
>> select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>>
>> ***<<No Result>>***
>>
>> It seems that teradata is doing trimming short of thing before actually
>> comparing stating values. But Hive is matching strings as it is.
>>
>> Not sure, It is expected behaviour or bug or can be raised as enhancement.
>>
>> I see below possible solution:
>>
>> - Convert into like operator expression with wildcard character
>> before and after
>>
>> Looking forward for your response on this. How can it be handled/achieved
>> in hive.
>>
>> Regards
>> Sanjiv Singh
>> Mob : +091 9990-447-339
>>
>
>
>
> --
> Nitin Pawar
>
Re: Discrepancy in String matching between Teradata and HIVE
Posted by Nitin Pawar <ni...@gmail.com>.
Hive does not manipulate data by its own, if your processing logic needs
the trimming of spaces then you can provide that in query.
On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:
>
> Hi All,
>
> I am getting into Hive and learning hive. I have customer table in
> teradata , used sqoop to extract complete table in hive which worked fine.
>
> See below customer table both in Teradata and HIVE.
>
> *In Teradata :*
>
> select TOP 4 id,name,'"'||status||'"' from customer;
>
> 3172460 Customer#003172460 "BUILDING "
> 3017726 Customer#003017726 "BUILDING "
> 2817987 Customer#002817987 "COMPLETE "
> 2817984 Customer#002817984 "BUILDING "
>
> *In HIVE :*
>
> select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>
> 3172460 Customer#003172460 "BUILDING "
> 3017726 Customer#003017726 "BUILDING "
> 2817987 Customer#002817987 "COMPLETE "
> 2817984 Customer#002817984 "BUILDING "
>
> When I tried to fetch records from table customer with column matching
> which is of String type. I am getting different result for same query in
> different environment.
>
> See below query results..
>
> *In Teradata :*
>
> select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>
> 3172460 Customer#003172460 "BUILDING "
> 3017726 Customer#003017726 "BUILDING "
>
> *In HIVE :*
>
> select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>
> ***<<No Result>>***
>
> It seems that teradata is doing trimming short of thing before actually
> comparing stating values. But Hive is matching strings as it is.
>
> Not sure, It is expected behaviour or bug or can be raised as enhancement.
>
> I see below possible solution:
>
> - Convert into like operator expression with wildcard character before
> and after
>
> Looking forward for your response on this. How can it be handled/achieved
> in hive.
>
> Regards
> Sanjiv Singh
> Mob : +091 9990-447-339
>
--
Nitin Pawar