You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by Sai Sai <sa...@yahoo.in> on 2013/05/24 12:59:47 UTC

Re: Difference between like %A% and %a%


Just wondering about this, please let me know if you have any suggestions why we r getting these results:

This  query does not return any data:

Query1:hive (test)> select full_name from states where abbreviation like '%a%';


But this query returns data successfully:

Query2:hive (test)> select full_name from states where abbreviation like '%A%';

Result of Query 1:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201305240156_0012, Tracking URL = http://ubuntu:50030/jobdetails.jsp?jobid=job_201305240156_0012
Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305240156_0012
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-05-24 03:51:04,939 Stage-1 map = 0%,  reduce = 0%
2013-05-24 03:51:10,970 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 sec
2013-05-24 03:51:11,983 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 sec
2013-05-24 03:51:12,988 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 sec
2013-05-24 03:51:13,995 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 sec
2013-05-24 03:51:15,004 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 sec
2013-05-24 03:51:16,013 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.46 sec
2013-05-24 03:51:17,020 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.46 sec
MapReduce Total cumulative CPU time: 460 msec
Ended Job = job_201305240156_0012
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.46 sec   HDFS Read: 848 HDFS Write: 0 SUCCESS
Total MapReduce CPU Time Spent: 460 msec
OK
full_name
Time taken: 19.558 seconds

But this query returns data successfully:

hive (test)> select full_name from states where abbreviation like '%A%';

Result of Query2:


Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201305240156_0011, Tracking URL = http://ubuntu:50030/jobdetails.jsp?jobid=job_201305240156_0011
Kill Command = /home/satish/work/hadoop-1.0.4/libexec/../bin/hadoop job  -kill job_201305240156_0011
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-05-24 03:50:32,163 Stage-1 map = 0%,  reduce = 0%
2013-05-24 03:50:38,193 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 sec
2013-05-24 03:50:39,196 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 sec
2013-05-24 03:50:40,199 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 sec
2013-05-24 03:50:41,206 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 sec
2013-05-24 03:50:42,210 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 sec
2013-05-24 03:50:43,221 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.47 sec
2013-05-24 03:50:44,227 Stage-1 map = 100%,  reduce = 100%, Cumulative CPU 0.47 sec
MapReduce Total cumulative CPU time: 470 msec
Ended Job = job_201305240156_0011
MapReduce Jobs Launched: 
Job 0: Map: 1   Cumulative CPU: 0.47 sec   HDFS Read: 848 HDFS Write: 115 SUCCESS
Total MapReduce CPU Time Spent: 470 msec
OK
full_name
Alabama
Alaska
Arizona
Arkansas
California
Georgia
Iowa
Louisiana
Massachusetts  
Pennsylvania
Virginia
Washington
Time taken: 20.551 seconds

Thanks
Sai

Re: Where to find the external table file in HDFS

Posted by Sai Sai <sa...@yahoo.in>.

I have created an external table states and loaded it from a file under /tmp/states.txt

Then in the url: 

http://localhost.localdomain:50070/dfshealth.jsp

I have looked to see if this file states table exists and do not see it.
Just wondering if it is saved in hdfs or not.

How many days will the files exist under /tmp folder.
Thanks
Sai

Re: Where can we see the results of Select * from states

Posted by Jov <am...@amutu.com>.

you can write data into filesystem from query using "INSERT OVERWRITE
[LOCAL] DIRECTORY directory1 SELECT ... FROM ..."

more detail:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DML#LanguageManualDML-Writingdataintofilesystemfromqueries


2013/5/24 Sai Sai <sa...@yahoo.in>

> I have created an external table called states under a database called
> test,
> Then loaded the table successfully;
> The i have tried:
>
> Select * from states;
>
> It successfully executes MR and displays the results in the console but
> wondering where to look in hdfs to see these results.
>
> I have looked under all the dirs in filesystem for the below url but
> cannot see the results part file.
>
> http://localhost.localdomain:50070/dfshealth.jsp
>
> Also if i would like the results to save to a specific file from a query
> how to do it?
>
> For Ex:
>     Select * from states > myStates.txt ;
> Is there something like this.
> Thanks
> Sai
>
>
>


-- 
Jov
blog: http:amutu.com/blog <http://amutu.com/blog>

Re: Where can we see the results of Select * from states

Posted by Sai Sai <sa...@yahoo.in>.

I have created an external table called states under a database called test,
Then loaded the table successfully;
The i have tried:

Select * from states;

It successfully executes MR and displays the results in the console but wondering where to look in hdfs to see these results.

I have looked under all the dirs in filesystem for the below url but cannot see the results part file.

http://localhost.localdomain:50070/dfshealth.jsp


Also if i would like the results to save to a specific file from a query how to do it?

For Ex: 
    Select * from states > myStates.txt ;
Is there something like this.
Thanks
Sai

Re: Difference between like %A% and %a%

Posted by Edward Capriolo <ed...@gmail.com>.

It is not really a bug, as must as it is the way hive is designed.

https://issues.apache.org/jira/browse/HIVE-4070#comment-13666362

So there already is a 'like' and an 'rlike', mlike is a good idea. It seems
like an easier UDF (low hanging fruit) type issue anyone could tackle.


On Fri, May 24, 2013 at 9:16 AM, Dean Wampler <de...@gmail.com> wrote:

> Hortonworks has announced plans to make Hive more SQL compliant. I suspect
> bugs like this will be addressed sooner or later. It will be necessary to
> handle backwards compatibility, but that could be handled with a hive
> property that enables one or the other behaviors.
>
> On Fri, May 24, 2013 at 8:07 AM, John Omernik <jo...@omernik.com> wrote:
>
>> I have mentioned this before, and I think this a big miss by the Hive
>> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
>> case sensitive. Thus when you have new users moving over to Hive, if they
>> see a command like "like" they will assume similarity (like many other SQL
>> like qualities) and thus false negatives may ensue.  Even though it's
>> different by default (I am ok with this ... I guess, my personal preference
>> is that it matches the defaults on other systems, and outside of that
>> (which I am, in in the end fine with, just grumbly :) ) give us the ability
>> to set that behavior in the hive-site.xml.  That way when an org realizes
>> that it is different, and their users are all getting false negatives, they
>> can just update the hive-site and fix the problem rather than have to
>> include it in training that may or may not work.  I've added this comment
>> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for
>> fun. :)
>>
>> Please? :)
>>
>>
>>
>>
>> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com>wrote:
>>
>>> Your where clause looks at the abbreviation, requiring 'A', not the
>>> state name. You got the correct answer.
>>>
>>>
>>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>>>
>>>> But it should get more results for this:
>>>>
>>>> %a%
>>>>
>>>> than for
>>>>
>>>> %A%
>>>>
>>>> Please let me know if i am missing something.
>>>> Thanks
>>>> Sai
>>>>
>>>>
>>>>    ------------------------------
>>>>  *From:* Jov <am...@amutu.com>
>>>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>>> *Subject:* Re: Difference between like %A% and %a%
>>>>
>>>>
>>>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>>>
>>>> abbreviation l
>>>>
>>>>
>>>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal
>>>> with '%a%'.
>>>>
>>>>
>>>> --
>>>> Jov
>>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dean Wampler, Ph.D.
>>> @deanwampler
>>> http://polyglotprogramming.com
>>>
>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com

Re: Difference between like %A% and %a%

Posted by Anthony Urso <an...@cs.ucla.edu>.

Postgres/Vertica and their ilk have ILIKE which is a case-insensitive
version of LIKE, in addition to the case-sensitive LIKE. Works well having
both.

Cheers,
Anthony


On Fri, May 24, 2013 at 8:58 AM, Edward Capriolo <ed...@gmail.com>wrote:

> It is not as simple of a problem as you think. Mysql has the same problem
> just most everyone uses a default charset and comparator.
>
> http://www.bluebox.net/about/blog/2009/07/mysql_encoding/
>
> You do you account for foreign characters like the a~ etc. is that > then
> A and less then <
>
>
> On Fri, May 24, 2013 at 11:41 AM, Dean Wampler <de...@gmail.com>wrote:
>
>> If backwards compatibility wasn't an issue, the hive code that implements
>> LIKE could be changed to convert the fields and LIKE strings to lower case
>> before comparing ;) Of course, there is overhead doing that.
>>
>> On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo <ed...@gmail.com>wrote:
>>
>>> Also I am thinking that the rlike is based on regex and can be told to
>>> do case insensitive matching.
>>>
>>>
>>> On Fri, May 24, 2013 at 9:16 AM, Dean Wampler <de...@gmail.com>wrote:
>>>
>>>> Hortonworks has announced plans to make Hive more SQL compliant. I
>>>> suspect bugs like this will be addressed sooner or later. It will be
>>>> necessary to handle backwards compatibility, but that could be handled with
>>>> a hive property that enables one or the other behaviors.
>>>>
>>>> On Fri, May 24, 2013 at 8:07 AM, John Omernik <jo...@omernik.com> wrote:
>>>>
>>>>> I have mentioned this before, and I think this a big miss by the Hive
>>>>> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
>>>>> case sensitive. Thus when you have new users moving over to Hive, if they
>>>>> see a command like "like" they will assume similarity (like many other SQL
>>>>> like qualities) and thus false negatives may ensue.  Even though it's
>>>>> different by default (I am ok with this ... I guess, my personal preference
>>>>> is that it matches the defaults on other systems, and outside of that
>>>>> (which I am, in in the end fine with, just grumbly :) ) give us the ability
>>>>> to set that behavior in the hive-site.xml.  That way when an org realizes
>>>>> that it is different, and their users are all getting false negatives, they
>>>>> can just update the hive-site and fix the problem rather than have to
>>>>> include it in training that may or may not work.  I've added this comment
>>>>> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for fun. :)
>>>>>
>>>>> Please? :)
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com>wrote:
>>>>>
>>>>>> Your where clause looks at the abbreviation, requiring 'A', not the
>>>>>> state name. You got the correct answer.
>>>>>>
>>>>>>
>>>>>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>>>>>>
>>>>>>> But it should get more results for this:
>>>>>>>
>>>>>>> %a%
>>>>>>>
>>>>>>> than for
>>>>>>>
>>>>>>> %A%
>>>>>>>
>>>>>>> Please let me know if i am missing something.
>>>>>>> Thanks
>>>>>>> Sai
>>>>>>>
>>>>>>>
>>>>>>>    ------------------------------
>>>>>>>  *From:* Jov <am...@amutu.com>
>>>>>>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>>>>>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>>>>>> *Subject:* Re: Difference between like %A% and %a%
>>>>>>>
>>>>>>>
>>>>>>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>>>>>>
>>>>>>> abbreviation l
>>>>>>>
>>>>>>>
>>>>>>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal
>>>>>>> with '%a%'.
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jov
>>>>>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Dean Wampler, Ph.D.
>>>>>> @deanwampler
>>>>>> http://polyglotprogramming.com
>>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dean Wampler, Ph.D.
>>>> @deanwampler
>>>> http://polyglotprogramming.com
>>>
>>>
>>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> @deanwampler
>> http://polyglotprogramming.com
>
>
>

Re: Difference between like %A% and %a%

Posted by Edward Capriolo <ed...@gmail.com>.

It is not as simple of a problem as you think. Mysql has the same problem
just most everyone uses a default charset and comparator.

http://www.bluebox.net/about/blog/2009/07/mysql_encoding/

You do you account for foreign characters like the a~ etc. is that > then A
and less then <


On Fri, May 24, 2013 at 11:41 AM, Dean Wampler <de...@gmail.com>wrote:

> If backwards compatibility wasn't an issue, the hive code that implements
> LIKE could be changed to convert the fields and LIKE strings to lower case
> before comparing ;) Of course, there is overhead doing that.
>
> On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo <ed...@gmail.com>wrote:
>
>> Also I am thinking that the rlike is based on regex and can be told to do
>> case insensitive matching.
>>
>>
>> On Fri, May 24, 2013 at 9:16 AM, Dean Wampler <de...@gmail.com>wrote:
>>
>>> Hortonworks has announced plans to make Hive more SQL compliant. I
>>> suspect bugs like this will be addressed sooner or later. It will be
>>> necessary to handle backwards compatibility, but that could be handled with
>>> a hive property that enables one or the other behaviors.
>>>
>>> On Fri, May 24, 2013 at 8:07 AM, John Omernik <jo...@omernik.com> wrote:
>>>
>>>> I have mentioned this before, and I think this a big miss by the Hive
>>>> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
>>>> case sensitive. Thus when you have new users moving over to Hive, if they
>>>> see a command like "like" they will assume similarity (like many other SQL
>>>> like qualities) and thus false negatives may ensue.  Even though it's
>>>> different by default (I am ok with this ... I guess, my personal preference
>>>> is that it matches the defaults on other systems, and outside of that
>>>> (which I am, in in the end fine with, just grumbly :) ) give us the ability
>>>> to set that behavior in the hive-site.xml.  That way when an org realizes
>>>> that it is different, and their users are all getting false negatives, they
>>>> can just update the hive-site and fix the problem rather than have to
>>>> include it in training that may or may not work.  I've added this comment
>>>> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for fun. :)
>>>>
>>>> Please? :)
>>>>
>>>>
>>>>
>>>>
>>>> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com>wrote:
>>>>
>>>>> Your where clause looks at the abbreviation, requiring 'A', not the
>>>>> state name. You got the correct answer.
>>>>>
>>>>>
>>>>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>>>>>
>>>>>> But it should get more results for this:
>>>>>>
>>>>>> %a%
>>>>>>
>>>>>> than for
>>>>>>
>>>>>> %A%
>>>>>>
>>>>>> Please let me know if i am missing something.
>>>>>> Thanks
>>>>>> Sai
>>>>>>
>>>>>>
>>>>>>    ------------------------------
>>>>>>  *From:* Jov <am...@amutu.com>
>>>>>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>>>>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>>>>> *Subject:* Re: Difference between like %A% and %a%
>>>>>>
>>>>>>
>>>>>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>>>>>
>>>>>> abbreviation l
>>>>>>
>>>>>>
>>>>>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal
>>>>>> with '%a%'.
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Jov
>>>>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Dean Wampler, Ph.D.
>>>>> @deanwampler
>>>>> http://polyglotprogramming.com
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dean Wampler, Ph.D.
>>> @deanwampler
>>> http://polyglotprogramming.com
>>
>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com

Re: Difference between like %A% and %a%

Posted by Dean Wampler <de...@gmail.com>.

If backwards compatibility wasn't an issue, the hive code that implements
LIKE could be changed to convert the fields and LIKE strings to lower case
before comparing ;) Of course, there is overhead doing that.

On Fri, May 24, 2013 at 9:50 AM, Edward Capriolo <ed...@gmail.com>wrote:

> Also I am thinking that the rlike is based on regex and can be told to do
> case insensitive matching.
>
>
> On Fri, May 24, 2013 at 9:16 AM, Dean Wampler <de...@gmail.com>wrote:
>
>> Hortonworks has announced plans to make Hive more SQL compliant. I
>> suspect bugs like this will be addressed sooner or later. It will be
>> necessary to handle backwards compatibility, but that could be handled with
>> a hive property that enables one or the other behaviors.
>>
>> On Fri, May 24, 2013 at 8:07 AM, John Omernik <jo...@omernik.com> wrote:
>>
>>> I have mentioned this before, and I think this a big miss by the Hive
>>> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
>>> case sensitive. Thus when you have new users moving over to Hive, if they
>>> see a command like "like" they will assume similarity (like many other SQL
>>> like qualities) and thus false negatives may ensue.  Even though it's
>>> different by default (I am ok with this ... I guess, my personal preference
>>> is that it matches the defaults on other systems, and outside of that
>>> (which I am, in in the end fine with, just grumbly :) ) give us the ability
>>> to set that behavior in the hive-site.xml.  That way when an org realizes
>>> that it is different, and their users are all getting false negatives, they
>>> can just update the hive-site and fix the problem rather than have to
>>> include it in training that may or may not work.  I've added this comment
>>> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278 for fun. :)
>>>
>>> Please? :)
>>>
>>>
>>>
>>>
>>> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com>wrote:
>>>
>>>> Your where clause looks at the abbreviation, requiring 'A', not the
>>>> state name. You got the correct answer.
>>>>
>>>>
>>>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>>>>
>>>>> But it should get more results for this:
>>>>>
>>>>> %a%
>>>>>
>>>>> than for
>>>>>
>>>>> %A%
>>>>>
>>>>> Please let me know if i am missing something.
>>>>> Thanks
>>>>> Sai
>>>>>
>>>>>
>>>>>    ------------------------------
>>>>>  *From:* Jov <am...@amutu.com>
>>>>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>>>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>>>> *Subject:* Re: Difference between like %A% and %a%
>>>>>
>>>>>
>>>>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>>>>
>>>>> abbreviation l
>>>>>
>>>>>
>>>>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal
>>>>> with '%a%'.
>>>>>
>>>>>
>>>>> --
>>>>> Jov
>>>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Dean Wampler, Ph.D.
>>>> @deanwampler
>>>> http://polyglotprogramming.com
>>>>
>>>
>>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> @deanwampler
>> http://polyglotprogramming.com
>
>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

Re: Difference between like %A% and %a%

Posted by Edward Capriolo <ed...@gmail.com>.

Also I am thinking that the rlike is based on regex and can be told to do
case insensitive matching.


On Fri, May 24, 2013 at 9:16 AM, Dean Wampler <de...@gmail.com> wrote:

> Hortonworks has announced plans to make Hive more SQL compliant. I suspect
> bugs like this will be addressed sooner or later. It will be necessary to
> handle backwards compatibility, but that could be handled with a hive
> property that enables one or the other behaviors.
>
> On Fri, May 24, 2013 at 8:07 AM, John Omernik <jo...@omernik.com> wrote:
>
>> I have mentioned this before, and I think this a big miss by the Hive
>> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
>> case sensitive. Thus when you have new users moving over to Hive, if they
>> see a command like "like" they will assume similarity (like many other SQL
>> like qualities) and thus false negatives may ensue.  Even though it's
>> different by default (I am ok with this ... I guess, my personal preference
>> is that it matches the defaults on other systems, and outside of that
>> (which I am, in in the end fine with, just grumbly :) ) give us the ability
>> to set that behavior in the hive-site.xml.  That way when an org realizes
>> that it is different, and their users are all getting false negatives, they
>> can just update the hive-site and fix the problem rather than have to
>> include it in training that may or may not work.  I've added this comment
>> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for
>> fun. :)
>>
>> Please? :)
>>
>>
>>
>>
>> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com>wrote:
>>
>>> Your where clause looks at the abbreviation, requiring 'A', not the
>>> state name. You got the correct answer.
>>>
>>>
>>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>>>
>>>> But it should get more results for this:
>>>>
>>>> %a%
>>>>
>>>> than for
>>>>
>>>> %A%
>>>>
>>>> Please let me know if i am missing something.
>>>> Thanks
>>>> Sai
>>>>
>>>>
>>>>    ------------------------------
>>>>  *From:* Jov <am...@amutu.com>
>>>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>>> *Subject:* Re: Difference between like %A% and %a%
>>>>
>>>>
>>>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>>>
>>>> abbreviation l
>>>>
>>>>
>>>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal
>>>> with '%a%'.
>>>>
>>>>
>>>> --
>>>> Jov
>>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Dean Wampler, Ph.D.
>>> @deanwampler
>>> http://polyglotprogramming.com
>>>
>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com

Re: Difference between like %A% and %a%

Posted by Dean Wampler <de...@gmail.com>.

Hortonworks has announced plans to make Hive more SQL compliant. I suspect
bugs like this will be addressed sooner or later. It will be necessary to
handle backwards compatibility, but that could be handled with a hive
property that enables one or the other behaviors.

On Fri, May 24, 2013 at 8:07 AM, John Omernik <jo...@omernik.com> wrote:

> I have mentioned this before, and I think this a big miss by the Hive
> team.  Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not
> case sensitive. Thus when you have new users moving over to Hive, if they
> see a command like "like" they will assume similarity (like many other SQL
> like qualities) and thus false negatives may ensue.  Even though it's
> different by default (I am ok with this ... I guess, my personal preference
> is that it matches the defaults on other systems, and outside of that
> (which I am, in in the end fine with, just grumbly :) ) give us the ability
> to set that behavior in the hive-site.xml.  That way when an org realizes
> that it is different, and their users are all getting false negatives, they
> can just update the hive-site and fix the problem rather than have to
> include it in training that may or may not work.  I've added this comment
> to https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for
> fun. :)
>
> Please? :)
>
>
>
>
> On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com>wrote:
>
>> Your where clause looks at the abbreviation, requiring 'A', not the state
>> name. You got the correct answer.
>>
>>
>> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>>
>>> But it should get more results for this:
>>>
>>> %a%
>>>
>>> than for
>>>
>>> %A%
>>>
>>> Please let me know if i am missing something.
>>> Thanks
>>> Sai
>>>
>>>
>>>    ------------------------------
>>>  *From:* Jov <am...@amutu.com>
>>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>>> *Sent:* Friday, 24 May 2013 4:39 PM
>>> *Subject:* Re: Difference between like %A% and %a%
>>>
>>>
>>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>>
>>> abbreviation l
>>>
>>>
>>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal
>>> with '%a%'.
>>>
>>>
>>> --
>>> Jov
>>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>>
>>>
>>>
>>
>>
>> --
>> Dean Wampler, Ph.D.
>> @deanwampler
>> http://polyglotprogramming.com
>>
>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

Re: Difference between like %A% and %a%

Posted by John Omernik <jo...@omernik.com>.

I have mentioned this before, and I think this a big miss by the Hive team.
 Like, by default in many SQL RDBMS (like MSSQL or MYSQL)  is not case
sensitive. Thus when you have new users moving over to Hive, if they see a
command like "like" they will assume similarity (like many other SQL like
qualities) and thus false negatives may ensue.  Even though it's different
by default (I am ok with this ... I guess, my personal preference is that
it matches the defaults on other systems, and outside of that (which I am,
in in the end fine with, just grumbly :) ) give us the ability to set
that behavior in the hive-site.xml.  That way when an org realizes that it
is different, and their users are all getting false negatives, they can
just update the hive-site and fix the problem rather than have to include
it in training that may or may not work.  I've added this comment to
https://issues.apache.org/jira/browse/HIVE-4070#comment-13666278  for fun.
:)

Please? :)

On Fri, May 24, 2013 at 7:53 AM, Dean Wampler <de...@gmail.com> wrote:

> Your where clause looks at the abbreviation, requiring 'A', not the state
> name. You got the correct answer.
>
>
> On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:
>
>> But it should get more results for this:
>>
>> %a%
>>
>> than for
>>
>> %A%
>>
>> Please let me know if i am missing something.
>> Thanks
>> Sai
>>
>>
>>    ------------------------------
>>  *From:* Jov <am...@amutu.com>
>> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
>> *Sent:* Friday, 24 May 2013 4:39 PM
>> *Subject:* Re: Difference between like %A% and %a%
>>
>>
>> 2013/5/24 Sai Sai <sa...@yahoo.in>
>>
>> abbreviation l
>>
>>
>> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal with
>> '%a%'.
>>
>>
>> --
>> Jov
>> blog: http:amutu.com/blog <http://amutu.com/blog>
>>
>>
>>
>
>
> --
> Dean Wampler, Ph.D.
> @deanwampler
> http://polyglotprogramming.com
>

Re: Difference between like %A% and %a%

Posted by Dean Wampler <de...@gmail.com>.

Your where clause looks at the abbreviation, requiring 'A', not the state
name. You got the correct answer.


On Fri, May 24, 2013 at 6:21 AM, Sai Sai <sa...@yahoo.in> wrote:

> But it should get more results for this:
>
> %a%
>
> than for
>
> %A%
>
> Please let me know if i am missing something.
> Thanks
> Sai
>
>
>   ------------------------------
>  *From:* Jov <am...@amutu.com>
> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Friday, 24 May 2013 4:39 PM
> *Subject:* Re: Difference between like %A% and %a%
>
>
> 2013/5/24 Sai Sai <sa...@yahoo.in>
>
> abbreviation l
>
>
> unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal with
> '%a%'.
>
>
> --
> Jov
> blog: http:amutu.com/blog <http://amutu.com/blog>
>
>
>


-- 
Dean Wampler, Ph.D.
@deanwampler
http://polyglotprogramming.com

Re: How to look at the metadata of the tables we have created.

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.

I have an earlier version ER diagram of Hive….There are not too many tables and u can navigate to them and poke around easily

For example I created a hive locations alias for myself (and I think this is already implemented in 10.0) where u can get the HDFS location for each partition

select t.TBL_NAME, p.PART_NAME, s.LOCATION from PARTITIONS p, SDS s, TBLS t where t.TBL_ID=p.TBL_ID and p.SD_ID=s.SD_ID




From: Stephen Sprague <sp...@gmail.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Date: Friday, May 24, 2013 3:19 PM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, Sai Sai <sa...@yahoo.in>>
Subject: Re: How to look at the metadata of the tables we have created.

yes. there's this high faluntin' concept called the 'metastore'.  should you google it in conjunction with 'hive'  the answer shall be before you.


On Fri, May 24, 2013 at 4:30 AM, Sai Sai <sa...@yahoo.in>> wrote:
Is it possible to look at the metadata of the databases/tables/views we have created in hive.
Is there some thing like sysobjects in hive.
Thanks
Sai


CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: How to look at the metadata of the tables we have created.

Posted by Stephen Sprague <sp...@gmail.com>.

yes. there's this high faluntin' concept called the 'metastore'.  should
you google it in conjunction with 'hive'  the answer shall be before you.

On Fri, May 24, 2013 at 4:30 AM, Sai Sai <sa...@yahoo.in> wrote:

> Is it possible to look at the metadata of the databases/tables/views we
> have created in hive.
> Is there some thing like sysobjects in hive.
> Thanks
> Sai
>

Re: Partitioning confusion

Posted by Nitin Pawar <ni...@gmail.com>.

if you have a 250GB file, then how did it become 2.5TB ?

if you can not write a mapreduce job to process to write your data into
specific partitions, then the other way around would be load the entire
data into a temporary table and then load data into partitioned tabled and
then drop the temporary table.


approach would be
1) Create a temporary table
CREATE TABLE temp_employees (name STRING, salary FLOAT, subordinates
ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING,
city:STRING, state:STRING, zip:INT, country:STRING> );

2) Load data into this table
LOAD DATA LOCAL INPATH
'/home/satish/data/employees/input/employees-country.txt'
INTO TABLE temp_employees;

3) Create partitoned table
CREATE TABLE employees (name STRING, salary FLOAT, subordinates
ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING,
city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY
(country STRING, state STRING);

2) Load data from temporary data into partitioned tables
   2a) if there are already partitions existing then be careful giving a
blanket query or it will overwrite data into exisiting partitions

insert overwrite employees partition(country='USA',state='IL') select *
from temp_employees where country="USA" and state ="IL";
this you will need to do for all of them


   2b) if your partitioned table is empty currently then you can just
enable dynamic partitioning and hive will take care of adding correct data
into individual partitions
set hive.exec.dynamic.partition.mode=nonstrict;
set hive.exec.dynamic.partition=true;

insert overwrite table employees select * from employees;

3) drop the temporary table

by doing above approach you will have your data being replicated 2x times
of replication factor for sometime but then once you drop temporary table
it will be normal HDFS replication factor size data.

Its been sometime I have worked on hive so others may have better
approaches as well, so wait for someone to correct me before going further
:)


On Mon, May 27, 2013 at 2:13 PM, Sai Sai <sa...@yahoo.in> wrote:

> Nitin
> I am still confused, from the below data that  i have given should the
> file which sits in the folder Country=USA and state=IL have only the rows
> where Country=USA and state=IL or will it have rows of other countries also.
> The reason i ask is because if we have a 250GB file and would like to
> create 10 partitions that would end up in 2.5 TB * 3 = 7.5TB. Is this
> expected.
> Thanks
> S
>
>   ------------------------------
>  *From:* Nitin Pawar <ni...@gmail.com>
> *To:* user@hive.apache.org; Sai Sai <sa...@yahoo.in>
> *Sent:* Monday, 27 May 2013 2:08 PM
> *Subject:* Re: Partitioning confusion
>
> when you specify the load data query with specific partition, it will put
> the entire data into that partition.
>
>
>
> On Mon, May 27, 2013 at 1:08 PM, Sai Sai <sa...@yahoo.in> wrote:
>
>
> After creating a partition for a country (USA) and state (IL) and when we
> go to the the hdfs site to look at the partition in the browser we r seeing
>  all the records for all the countries and states rather than just for the
> partition created for US and IL given below, is this correct behavior:
> ********************
> Here is my commands:
> ********************
>
> CREATE TABLE employees (name STRING, salary FLOAT, subordinates
> ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING,
> city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY
> (country STRING, state STRING);
>
> LOAD DATA LOCAL INPATH
> '/home/satish/data/employees/input/employees-country.txt' INTO TABLE
> employees PARTITION (country='USA',state='IL');
>
> ********************
> Here is my original data file, where i have a few countries data such as
> USA, INDIA, UK, AUS:
> ********************
>
> John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State
> Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA
> Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100
> Ontario St.ChicagoIL60601USA
> Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago
> Ave.Oak ParkIL60700USA
> Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure
> Dr.ObscuriaIL60100USA
> Boss Man200000.0John DoeFred FinanceFederal Taxes.3State
> Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA
> Fred Finance150000.0Stacy AccountantFederal Taxes.3State
> Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA
> Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main
> St.NapervilleIL60563USA
> John Doe 2100000.0Mary SmithTodd JonesFederal Taxes.2State
> Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA
> Mary Smith 280000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100
> Ontario St.ChicagoIL60601INDIA
> Todd Jones 270000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago
> Ave.Oak ParkIL60700AUSTRALIA
> Bill King 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure
> Dr.ObscuriaIL60100AUSTRALIA
> Boss Man2 200000.0John DoeFred FinanceFederal Taxes.3State
> Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK
> Fred Finance 2150000.0Stacy AccountantFederal Taxes.3State
> Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK
> Stacy Accountant 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Main
> St.NapervilleIL60563UK
> ********************
> Now when i navigate to:
> Contents of directory
> /user/hive/warehouse/db1.db/employees/country=USA/state=IL
> ********************
> I see all the records and was wondering if it should have only USA & IL
> records.
> Please help.
>
>
>
>
> --
> Nitin Pawar
>
>
>


-- 
Nitin Pawar

Re: Partitioning confusion

Posted by Sai Sai <sa...@yahoo.in>.

Nitin
I am still confused, from the below data that  i have given should the file which sits in the folder Country=USA and state=IL have only the rows where Country=USA and state=IL or will it have rows of other countries also.
The reason i ask is because if we have a 250GB file and would like to create 10 partitions that would end up in 2.5 TB * 3 = 7.5TB. Is this expected.
Thanks
S


________________________________
 From: Nitin Pawar <ni...@gmail.com>
To: user@hive.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Monday, 27 May 2013 2:08 PM
Subject: Re: Partitioning confusion
 


when you specify the load data query with specific partition, it will put the entire data into that partition. 




On Mon, May 27, 2013 at 1:08 PM, Sai Sai <sa...@yahoo.in> wrote:


>
>After creating a partition for a country (USA) and state (IL) and when we go to the the hdfs site to look at the partition in the browser we r seeing  all the records for all the countries and states rather than just for the partition created for US and IL given below, is this correct behavior:
>********************
>Here is my commands:
>********************
>
>
>
>CREATE TABLE employees (name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY (country STRING, state STRING);
>
>
>LOAD DATA LOCAL INPATH '/home/satish/data/employees/input/employees-country.txt' INTO TABLE employees PARTITION (country='USA',state='IL');
>
>
>********************
>
>Here is my original data file, where i have a few countries data such as USA, INDIA, UK, AUS:
>********************
>
>
>
>John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA
>Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601USA
>Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700USA
>Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100USA
>Boss Man200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA
>Fred Finance150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA
>Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563USA
>John Doe 2100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA
>Mary Smith 280000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601INDIA
>Todd Jones 270000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700AUSTRALIA
>Bill King 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100AUSTRALIA
>Boss Man2 200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK
>Fred Finance 2150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK
>Stacy Accountant 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563UK
>********************
>
>Now when i navigate to:
>Contents of directory /user/hive/warehouse/db1.db/employees/country=USA/state=IL
>
>********************
>
>I see all the records and was wondering if it should have only USA & IL records.
>Please help.


-- 
Nitin Pawar

Re: Partitioning confusion

Posted by Nitin Pawar <ni...@gmail.com>.

when you specify the load data query with specific partition, it will put
the entire data into that partition.



On Mon, May 27, 2013 at 1:08 PM, Sai Sai <sa...@yahoo.in> wrote:

>
> After creating a partition for a country (USA) and state (IL) and when we
> go to the the hdfs site to look at the partition in the browser we r seeing
>  all the records for all the countries and states rather than just for the
> partition created for US and IL given below, is this correct behavior:
> ********************
> Here is my commands:
> ********************
>
> CREATE TABLE employees (name STRING, salary FLOAT, subordinates
> ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING,
> city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY
> (country STRING, state STRING);
>
> LOAD DATA LOCAL INPATH
> '/home/satish/data/employees/input/employees-country.txt' INTO TABLE
> employees PARTITION (country='USA',state='IL');
>
> ********************
> Here is my original data file, where i have a few countries data such as
> USA, INDIA, UK, AUS:
> ********************
>
> John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State
> Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA
> Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100
> Ontario St.ChicagoIL60601USA
> Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago
> Ave.Oak ParkIL60700USA
> Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure
> Dr.ObscuriaIL60100USA
> Boss Man200000.0John DoeFred FinanceFederal Taxes.3State
> Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA
> Fred Finance150000.0Stacy AccountantFederal Taxes.3State
> Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA
> Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main
> St.NapervilleIL60563USA
> John Doe 2100000.0Mary SmithTodd JonesFederal Taxes.2State
> Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA
> Mary Smith 280000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100
> Ontario St.ChicagoIL60601INDIA
> Todd Jones 270000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago
> Ave.Oak ParkIL60700AUSTRALIA
> Bill King 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure
> Dr.ObscuriaIL60100AUSTRALIA
> Boss Man2 200000.0John DoeFred FinanceFederal Taxes.3State
> Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK
> Fred Finance 2150000.0Stacy AccountantFederal Taxes.3State
> Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK
> Stacy Accountant 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Main
> St.NapervilleIL60563UK
> ********************
> Now when i navigate to:
> Contents of directory
> /user/hive/warehouse/db1.db/employees/country=USA/state=IL
> ********************
> I see all the records and was wondering if it should have only USA & IL
> records.
> Please help.
>



-- 
Nitin Pawar

Re:Partitioning confusion

Posted by Sai Sai <sa...@yahoo.in>.


After creating a partition for a country (USA) and state (IL) and when we go to the the hdfs site to look at the partition in the browser we r seeing  all the records for all the countries and states rather than just for the partition created for US and IL given below, is this correct behavior:
********************
Here is my commands:
********************


CREATE TABLE employees (name STRING, salary FLOAT, subordinates ARRAY<STRING>, deductions MAP<STRING, FLOAT>, address STRUCT<street:STRING, city:STRING, state:STRING, zip:INT, country:STRING> ) PARTITIONED BY (country STRING, state STRING);

LOAD DATA LOCAL INPATH '/home/satish/data/employees/input/employees-country.txt' INTO TABLE employees PARTITION (country='USA',state='IL');

********************

Here is my original data file, where i have a few countries data such as USA, INDIA, UK, AUS:
********************


John Doe100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600USA
Mary Smith80000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601USA
Todd Jones70000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700USA
Bill King60000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100USA
Boss Man200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500USA
Fred Finance150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500USA
Stacy Accountant60000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563USA
John Doe 2100000.0Mary SmithTodd JonesFederal Taxes.2State Taxes.05Insurance.11 Michigan Ave.ChicagoIL60600INDIA
Mary Smith 280000.0Bill KingFederal Taxes.2State Taxes.05Insurance.1100 Ontario St.ChicagoIL60601INDIA
Todd Jones 270000.0Federal Taxes.15State Taxes.03Insurance.1200 Chicago Ave.Oak ParkIL60700AUSTRALIA
Bill King 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Obscure Dr.ObscuriaIL60100AUSTRALIA
Boss Man2 200000.0John DoeFred FinanceFederal Taxes.3State Taxes.07Insurance.051 Pretentious Drive.ChicagoIL60500UK
Fred Finance 2150000.0Stacy AccountantFederal Taxes.3State Taxes.07Insurance.052 Pretentious Drive.ChicagoIL60500UK
Stacy Accountant 260000.0Federal Taxes.15State Taxes.03Insurance.1300 Main St.NapervilleIL60563UK
********************

Now when i navigate to:
Contents of directory /user/hive/warehouse/db1.db/employees/country=USA/state=IL

********************

I see all the records and was wondering if it should have only USA & IL records.
Please help.

Re: Issue with Json tuple lateral view

Posted by Sai Sai <sa...@yahoo.in>.

Thanks Navis


________________________________
 From: Navis류승우 <na...@nexr.com>
To: user@hive.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Monday, 27 May 2013 12:15 PM
Subject: Re: Issue with Json tuple lateral view
 

Removing last ',' in second row would make result as you expected.

I can't tell it's bug or not.

2013/5/27 Sai Sai <sa...@yahoo.in>:
>
> *************************
> Here is the json-data that i load:
> *************************
>
> { "blogID" : "FJY26J1333", "date" : "2012-04-01", "name" : "vpxnksu",
> "comment" : "good stuff", "contact" : { "email" : "vpxnksu@gmail.com",
> "website" : "vpxnksu.wordpress.com" } }
> { "blogID" : "VSAUMDFGSD", "date" : "2012-04-01", "name" : "yhftrcx",
> "comment" : "another comment",}
>
> *************************
> Here is the hive commands :
> *************************
>
> CREATE  EXTERNAL  TABLE json_comments(value STRING) LOCATION
> '/user/json-comments';
>
> LOAD DATA LOCAL INPATH '/home/satish/data/inputSai/json-comments.txt'
> OVERWRITE INTO TABLE json_comments;
>
> SELECT b.blogID, c.email FROM json_comments a LATERAL VIEW
> json_tuple(a.value, 'blogID', 'contact') b AS blogID, contact LATERAL VIEW
> json_tuple(b.contact, 'email', 'website') c AS email, website;
>
> *************************
> Here r the results of  map reduce:
> *************************
>
> blogid email
> FJY26J1333 vpxnksu@gmail.com
> NULL NULL
>
> *************************
> My question is why the 2nd row is coming up as Null values, i was expecting
> the results to be like this:
> *************************
>
> blogid email
> FJY26J1333 vpxnksu@gmail.com
> VSAUMDFGSD NULL
>
> Any input is appreciated in understanding this.
> Thanks
> S

Re: Issue with Json tuple lateral view

Posted by Navis류승우 <na...@nexr.com>.

Removing last ',' in second row would make result as you expected.

I can't tell it's bug or not.

2013/5/27 Sai Sai <sa...@yahoo.in>:
>
> *************************
> Here is the json-data that i load:
> *************************
>
> { "blogID" : "FJY26J1333", "date" : "2012-04-01", "name" : "vpxnksu",
> "comment" : "good stuff", "contact" : { "email" : "vpxnksu@gmail.com",
> "website" : "vpxnksu.wordpress.com" } }
> { "blogID" : "VSAUMDFGSD", "date" : "2012-04-01", "name" : "yhftrcx",
> "comment" : "another comment",}
>
> *************************
> Here is the hive commands :
> *************************
>
> CREATE  EXTERNAL  TABLE json_comments(value STRING) LOCATION
> '/user/json-comments';
>
> LOAD DATA LOCAL INPATH '/home/satish/data/inputSai/json-comments.txt'
> OVERWRITE INTO TABLE json_comments;
>
> SELECT b.blogID, c.email FROM json_comments a LATERAL VIEW
> json_tuple(a.value, 'blogID', 'contact') b AS blogID, contact LATERAL VIEW
> json_tuple(b.contact, 'email', 'website') c AS email, website;
>
> *************************
> Here r the results of  map reduce:
> *************************
>
> blogid email
> FJY26J1333 vpxnksu@gmail.com
> NULL NULL
>
> *************************
> My question is why the 2nd row is coming up as Null values, i was expecting
> the results to be like this:
> *************************
>
> blogid email
> FJY26J1333 vpxnksu@gmail.com
> VSAUMDFGSD NULL
>
> Any input is appreciated in understanding this.
> Thanks
> S

Re: Issue with Json tuple lateral view

Posted by Sai Sai <sa...@yahoo.in>.


*************************
Here is the json-data that i load:
*************************


{ "blogID" : "FJY26J1333", "date" : "2012-04-01", "name" : "vpxnksu", "comment" : "good stuff", "contact" : { "email" : "vpxnksu@gmail.com", "website" : "vpxnksu.wordpress.com" } }
{ "blogID" : "VSAUMDFGSD", "date" : "2012-04-01", "name" : "yhftrcx", "comment" : "another comment",}

*************************

Here is the hive commands :
*************************


CREATE  EXTERNAL  TABLE json_comments(value STRING) LOCATION  '/user/json-comments';

LOAD DATA LOCAL INPATH '/home/satish/data/inputSai/json-comments.txt' OVERWRITE INTO TABLE json_comments;

SELECT b.blogID, c.email FROM json_comments a LATERAL VIEW json_tuple(a.value, 'blogID', 'contact') b AS blogID, contact LATERAL VIEW json_tuple(b.contact, 'email', 'website') c AS email, website;


*************************

Here r the results of  map reduce:
*************************


blogidemail

FJY26J1333vpxnksu@gmail.com
NULLNULL

*************************

My question is why the 2nd row is coming up as Null values, i was expecting the results to be like this:
*************************


blogidemail
FJY26J1333vpxnksu@gmail.com
VSAUMDFGSDNULL

Any input is appreciated in understanding this.
Thanks
S

Re: How to look at the metadata of the tables we have created.

Posted by Sai Sai <sa...@yahoo.in>.

Is it possible to look at the metadata of the databases/tables/views we have created in hive.
Is there some thing like sysobjects in hive.
Thanks
Sai

Re: Difference between like %A% and %a%

Posted by Sai Sai <sa...@yahoo.in>.

But it should get more results for this:

%a%

than for

%A%

Please let me know if i am missing something.
Thanks
Sai



________________________________
 From: Jov <am...@amutu.com>
To: user@hive.apache.org; Sai Sai <sa...@yahoo.in> 
Sent: Friday, 24 May 2013 4:39 PM
Subject: Re: Difference between like %A% and %a%
 




2013/5/24 Sai Sai <sa...@yahoo.in>

abbreviation l
unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal with '%a%'.


-- 
Jov

blog: http:amutu.com/blog

Re: Difference between like %A% and %a%

Posted by Jov <am...@amutu.com>.

2013/5/24 Sai Sai <sa...@yahoo.in>

> abbreviation l


unlike MySQL, string in Hive is case sensitive，so '%A%' is not equal with
'%a%'.


-- 
Jov
blog: http:amutu.com/blog <http://amutu.com/blog>