You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2015/03/27 08:47:19 UTC

Discrepancy in String matching between Teradata and HIVE

Hi All,

I am getting into Hive and learning hive. I have customer table in teradata
, used sqoop to extract complete table in hive which worked fine.

See below customer table both in Teradata and HIVE.

*In Teradata :*

    select TOP 4 id,name,'"'||status||'"' from customer;

    3172460     Customer#003172460  "BUILDING  "
    3017726     Customer#003017726  "BUILDING  "
    2817987     Customer#002817987  "COMPLETE  "
    2817984     Customer#002817984  "BUILDING  "

*In HIVE :*

    select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;

    3172460     Customer#003172460  "BUILDING  "
    3017726     Customer#003017726  "BUILDING  "
    2817987     Customer#002817987  "COMPLETE  "
    2817984     Customer#002817984  "BUILDING  "

When I tried to fetch records from table customer with column matching
which is of String type. I am getting different result for same query in
different environment.

See below query results..

*In Teradata :*

    select TOP 2 id,name,'"'||status||'"' from customer WHERE status =
'BUILDING';

    3172460     Customer#003172460  "BUILDING  "
    3017726     Customer#003017726  "BUILDING  "

*In HIVE :*

    select id,name,CONCAT ('"' , status , '"') from customer WHERE
status = 'BUILDING' LIMIT 2;

    ***<<No Result>>***

It seems that teradata is doing trimming short of thing before actually
comparing stating values. But Hive is matching strings as it is.

Not sure, It is expected behaviour or bug or can be raised as enhancement.

I see below possible solution:

   - Convert into like operator expression with wildcard character before
   and after

Looking forward for your response on this. How can it be handled/achieved
in hive.

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

Re: Discrepancy in String matching between Teradata and HIVE

Posted by Raunak Jhawar <ra...@gmail.com>.
Here you are trying to compare two strings '   BUILDING   ' (this value has
extra white spaces) and 'BUILDING'. These strings are not same since one
has bunch of extra white-spaces and other doesn't. Moreover, length of both
strings is not same.

This would return 1: select if('ABC' == rtrim('ABC  '), 1,0);
This would return 0: select if('ABC' == 'ABC  ', 1,0);
--
Thanks,
Raunak Jhawar







On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Hi,
>
> I can use rtrim function, i.e:
>
> select id,name,CONCAT ('"' , status , '"') from customer WHERE rtrim(status) = 'BUILDING' LIMIT 2;
>
> But question raised what standard in string comparison Hive uses?
> According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
> <http://support.microsoft.com/en-us/kb/316626> for an article about it.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Hive does not manipulate data by its own, if your processing logic needs
>> the trimming of spaces then you can provide that in query.
>>
>>
>>
>> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
>> wrote:
>>
>>>
>>>   Hi All,
>>>
>>> I am getting into Hive and learning hive. I have customer table in
>>> teradata , used sqoop to extract complete table in hive which worked fine.
>>>
>>> See below customer table both in Teradata and HIVE.
>>>
>>> *In Teradata :*
>>>
>>>     select TOP 4 id,name,'"'||status||'"' from customer;
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>     2817987     Customer#002817987  "COMPLETE  "
>>>     2817984     Customer#002817984  "BUILDING  "
>>>
>>> *In HIVE :*
>>>
>>>     select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>     2817987     Customer#002817987  "COMPLETE  "
>>>     2817984     Customer#002817984  "BUILDING  "
>>>
>>> When I tried to fetch records from table customer with column matching
>>> which is of String type. I am getting different result for same query in
>>> different environment.
>>>
>>> See below query results..
>>>
>>> *In Teradata :*
>>>
>>>     select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>
>>> *In HIVE :*
>>>
>>>     select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>>>
>>>     ***<<No Result>>***
>>>
>>> It seems that teradata is doing trimming short of thing before actually
>>> comparing stating values. But Hive is matching strings as it is.
>>>
>>> Not sure, It is expected behaviour or bug or can be raised as
>>> enhancement.
>>>
>>> I see below possible solution:
>>>
>>>    - Convert into like operator expression with wildcard character
>>>    before and after
>>>
>>> Looking forward for your response on this. How can it be
>>> handled/achieved in hive.
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>

Re: Discrepancy in String matching between Teradata and HIVE

Posted by Nitin Pawar <ni...@gmail.com>.
Hive is only PRO SQL compliance,

In hive the string comparisons work just like they would work in java

so in hive

"BUILDING" = "BUILDING"
"BUILDING " != "BUILDING" (extra space added)

On Fri, Mar 27, 2015 at 2:11 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Hi,
>
> I can use rtrim function, i.e:
>
> select id,name,CONCAT ('"' , status , '"') from customer WHERE rtrim(status) = 'BUILDING' LIMIT 2;
>
> But question raised what standard in string comparison Hive uses?
> According to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
> <http://support.microsoft.com/en-us/kb/316626> for an article about it.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>
> On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
>
>> Hive does not manipulate data by its own, if your processing logic needs
>> the trimming of spaces then you can provide that in query.
>>
>>
>>
>> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
>> wrote:
>>
>>>
>>>   Hi All,
>>>
>>> I am getting into Hive and learning hive. I have customer table in
>>> teradata , used sqoop to extract complete table in hive which worked fine.
>>>
>>> See below customer table both in Teradata and HIVE.
>>>
>>> *In Teradata :*
>>>
>>>     select TOP 4 id,name,'"'||status||'"' from customer;
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>     2817987     Customer#002817987  "COMPLETE  "
>>>     2817984     Customer#002817984  "BUILDING  "
>>>
>>> *In HIVE :*
>>>
>>>     select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>     2817987     Customer#002817987  "COMPLETE  "
>>>     2817984     Customer#002817984  "BUILDING  "
>>>
>>> When I tried to fetch records from table customer with column matching
>>> which is of String type. I am getting different result for same query in
>>> different environment.
>>>
>>> See below query results..
>>>
>>> *In Teradata :*
>>>
>>>     select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>>
>>>     3172460     Customer#003172460  "BUILDING  "
>>>     3017726     Customer#003017726  "BUILDING  "
>>>
>>> *In HIVE :*
>>>
>>>     select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>>>
>>>     ***<<No Result>>***
>>>
>>> It seems that teradata is doing trimming short of thing before actually
>>> comparing stating values. But Hive is matching strings as it is.
>>>
>>> Not sure, It is expected behaviour or bug or can be raised as
>>> enhancement.
>>>
>>> I see below possible solution:
>>>
>>>    - Convert into like operator expression with wildcard character
>>>    before and after
>>>
>>> Looking forward for your response on this. How can it be
>>> handled/achieved in hive.
>>>
>>> Regards
>>> Sanjiv Singh
>>> Mob :  +091 9990-447-339
>>>
>>
>>
>>
>> --
>> Nitin Pawar
>>
>
>


-- 
Nitin Pawar

Re: Discrepancy in String matching between Teradata and HIVE

Posted by "@Sanjiv Singh" <sa...@gmail.com>.
Hi,

I can use rtrim function, i.e:

select id,name,CONCAT ('"' , status , '"') from customer WHERE
rtrim(status) = 'BUILDING' LIMIT 2;

But question raised what standard in string comparison Hive uses? According
to ANSI/ISO SQL-92 'BUILDING' == 'BUILDING ', Here is a link
<http://support.microsoft.com/en-us/kb/316626> for an article about it.

Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Fri, Mar 27, 2015 at 1:41 PM, Nitin Pawar <ni...@gmail.com>
wrote:

> Hive does not manipulate data by its own, if your processing logic needs
> the trimming of spaces then you can provide that in query.
>
>
>
> On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>>
>>   Hi All,
>>
>> I am getting into Hive and learning hive. I have customer table in
>> teradata , used sqoop to extract complete table in hive which worked fine.
>>
>> See below customer table both in Teradata and HIVE.
>>
>> *In Teradata :*
>>
>>     select TOP 4 id,name,'"'||status||'"' from customer;
>>
>>     3172460     Customer#003172460  "BUILDING  "
>>     3017726     Customer#003017726  "BUILDING  "
>>     2817987     Customer#002817987  "COMPLETE  "
>>     2817984     Customer#002817984  "BUILDING  "
>>
>> *In HIVE :*
>>
>>     select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>>
>>     3172460     Customer#003172460  "BUILDING  "
>>     3017726     Customer#003017726  "BUILDING  "
>>     2817987     Customer#002817987  "COMPLETE  "
>>     2817984     Customer#002817984  "BUILDING  "
>>
>> When I tried to fetch records from table customer with column matching
>> which is of String type. I am getting different result for same query in
>> different environment.
>>
>> See below query results..
>>
>> *In Teradata :*
>>
>>     select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>>
>>     3172460     Customer#003172460  "BUILDING  "
>>     3017726     Customer#003017726  "BUILDING  "
>>
>> *In HIVE :*
>>
>>     select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>>
>>     ***<<No Result>>***
>>
>> It seems that teradata is doing trimming short of thing before actually
>> comparing stating values. But Hive is matching strings as it is.
>>
>> Not sure, It is expected behaviour or bug or can be raised as enhancement.
>>
>> I see below possible solution:
>>
>>    - Convert into like operator expression with wildcard character
>>    before and after
>>
>> Looking forward for your response on this. How can it be handled/achieved
>> in hive.
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>
>
>
> --
> Nitin Pawar
>

Re: Discrepancy in String matching between Teradata and HIVE

Posted by Nitin Pawar <ni...@gmail.com>.
Hive does not manipulate data by its own, if your processing logic needs
the trimming of spaces then you can provide that in query.



On Fri, Mar 27, 2015 at 1:17 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

>
>   Hi All,
>
> I am getting into Hive and learning hive. I have customer table in
> teradata , used sqoop to extract complete table in hive which worked fine.
>
> See below customer table both in Teradata and HIVE.
>
> *In Teradata :*
>
>     select TOP 4 id,name,'"'||status||'"' from customer;
>
>     3172460     Customer#003172460  "BUILDING  "
>     3017726     Customer#003017726  "BUILDING  "
>     2817987     Customer#002817987  "COMPLETE  "
>     2817984     Customer#002817984  "BUILDING  "
>
> *In HIVE :*
>
>     select id,name,CONCAT ('"' , status , '"') from customer LIMIT 4;
>
>     3172460     Customer#003172460  "BUILDING  "
>     3017726     Customer#003017726  "BUILDING  "
>     2817987     Customer#002817987  "COMPLETE  "
>     2817984     Customer#002817984  "BUILDING  "
>
> When I tried to fetch records from table customer with column matching
> which is of String type. I am getting different result for same query in
> different environment.
>
> See below query results..
>
> *In Teradata :*
>
>     select TOP 2 id,name,'"'||status||'"' from customer WHERE status = 'BUILDING';
>
>     3172460     Customer#003172460  "BUILDING  "
>     3017726     Customer#003017726  "BUILDING  "
>
> *In HIVE :*
>
>     select id,name,CONCAT ('"' , status , '"') from customer WHERE status = 'BUILDING' LIMIT 2;
>
>     ***<<No Result>>***
>
> It seems that teradata is doing trimming short of thing before actually
> comparing stating values. But Hive is matching strings as it is.
>
> Not sure, It is expected behaviour or bug or can be raised as enhancement.
>
> I see below possible solution:
>
>    - Convert into like operator expression with wildcard character before
>    and after
>
> Looking forward for your response on this. How can it be handled/achieved
> in hive.
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>



-- 
Nitin Pawar