You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Soham Sardar <so...@gmail.com> on 2012/06/26 10:53:12 UTC

date datatype in hive

I have a native data type in mysql and i just imported it into hive
and the data type of the column has now become string ..
Now i would like to know if there is any native data type in hive and
What are the pros and cons of using string type in hive rather than
(int)(thats what i expect ) type

And with the string type can i just run any type of queries like
print the names of the person which has the maximum number of users
between 1 PM and 3 PM

and the table should have just name int(5) and login time (datetime )
{as per the mysql syntax}

Re: date datatype in hive

Posted by "Gesli, Nicole" <Ni...@memorylane.com>.
2 problems:
1) I don't see any "inner" join in Hive. If you just use join it will work
as direct join.
2) Give your count(1) an alias and use that alias in the order by clause.

Something like this:

hive> select users_info.country,count(1) cnt from (select userid from
users_audit
    > where SUBSTR(users_audit.logtime,1,17)>='1971-06-27 13:00' and
SUBSTR(users_audit.logtime,1,17)<='2010-06-27 14:00')
    > users_audit join users_info on
(users_info.id=users_audit.userid)
    > group by users_info.country order by cnt desc;

I hope this helps.

-Nicole


On 6/27/12 11:08 PM, "Soham Sardar" <so...@gmail.com> wrote:

>Hey nicole ,
>your query is working fine in mysql but when it comes to hive then it
>just fails as it is
>
>hive> select users_info.country,count(1) from (select userid from
>users_audit
>    > where SUBSTR(users_audit.logtime,1,17)>='1971-06-27 13:00' and
>SUBSTR(users_audit.logtime,1,17)<='2010-06-27 14:00')
>    > users_audit inner  join users_info on
>(users_info.id=users_audit.userid)
>    > group by users_info.country order by count(1) desc;
>FAILED: Hive Internal Error: java.lang.NullPointerException(null)
>java.lang.NullPointerException
>	at 
>org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNod
>eGenericFuncDesc.java:214)
>	at 
>org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.
>getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:684)
>	at 
>org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.
>process(TypeCheckProcFactory.java:805)
>	at 
>org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDi
>spatcher.java:89)
>	at 
>org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalk
>er.java:88)
>	at 
>org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.j
>ava:125)
>	at 
>org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraph
>Walker.java:102)
>	at 
>org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheck
>ProcFactory.java:161)
>	at 
>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticA
>nalyzer.java:7506)
>	at 
>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticA
>nalyzer.java:7464)
>	at 
>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(Semanti
>cAnalyzer.java:4562)
>	at 
>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnaly
>zer.java:5936)
>	at 
>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.
>java:6524)
>	at 
>org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticA
>nalyzer.java:7282)
>	at 
>org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticA
>nalyzer.java:243)
>	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
>	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
>	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
>	at 
>org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
>	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
>	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
>	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
>	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
>	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>	at 
>sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:
>39)
>	at 
>sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorIm
>pl.java:25)
>	at java.lang.reflect.Method.invoke(Method.java:597)
>	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
>
>can u just tell me why this is happening :)
>
>
>
>On Wed, Jun 27, 2012 at 11:32 PM, Gesli, Nicole
><Ni...@memorylane.com> wrote:
>> Soham, I think you need something like this:
>>
>> SELECT ui.country, COUNT(1)
>> FROM ( SELECT DISTINCT userid
>>        FROM   users_audit
>>        WHERE  SUBSTR(ua.logtime, 1, 17) >= '2012-06-27 13:00'
>>        AND    SUBSTR(ua.logtime, 1, 17) <= '2012-06-27 14:00'
>>      ) ua
>>        JOIN
>>        users_info ui ON (ui.id = ua.userid)
>> GROUP BY
>>        ui.country;
>>
>> -Nicole
>>
>> From: Nitin Pawar <ni...@gmail.com>
>> Reply-To: <us...@hive.apache.org>
>> Date: Wed, 27 Jun 2012 15:22:23 +0530
>>
>> To: <us...@hive.apache.org>
>> Subject: Re: date datatype in hive
>>
>> soham,
>>
>> the reason u r getting the error is there may be multiple names
>>associated
>> with one country
>> so when you do  a group by there is no unique output to generate
>>
>> so group by country, name will give you unique results
>>
>> same is the case with select *
>>
>> in case you want to sort the output you may try with order by country
>> instead of group by
>> if you want to omit duplicates you can use distinct
>>
>> so try something like
>>
>> select count(*), country from users_info group by country
>> select count(*) from users_info group by country
>> select distinct name, country from users_info order by country
>> select name, county from users_info group by country, name
>>
>> all above should work unless i made a typo error :)
>>
>>
>>
>> On Wed, Jun 27, 2012 at 3:11 PM, Soham Sardar
>><so...@gmail.com>
>> wrote:
>>>
>>> Thanks Bejoy and Nitin for replyin
>>> well now i got that thing ryt but then i have serious issues
>>>
>>> in hive querying like some of such queries are
>>>
>>> 1)  select * from users_info group by country;
>>>
>>> and this is giving me only the countries grouped alphabetically and
>>> not any other information
>>>
>>> 2) select name,country from users_info group by country;
>>>
>>> and this gives me an error
>>> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>>> BY key 'name'
>>>
>>> can someone help me out with this :)
>>>
>>>
>>>
>>>
>>> On Wed, Jun 27, 2012 at 1:39 PM, Nitin Pawar <ni...@gmail.com>
>>> wrote:
>>> > soham,
>>> >
>>> > in your query
>>> > hive> select name from users_info  group by country;
>>> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>>> > BY key 'name'
>>> >
>>> > when you run group by it needs to be present in the select columns as
>>> > well
>>> > thats what Bejoy suggested
>>> >
>>> > On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar
>>> > <so...@gmail.com>
>>> > wrote:
>>> >>
>>> >> Hey bejoy thats the problem i am not able to run the group by query
>>>in
>>> >> hive i dunno whether i m making a mistake or some thing
>>> >> see my previoius reply to this same thread i put up the same issue
>>>...
>>> >>
>>> >>
>>> >> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com>
>>>wrote:
>>> >> > Hi Soham
>>> >> >
>>> >> > Rewrite your query with the columns in Group By included in
>>>Select as
>>> >> > well. Something like
>>> >> >
>>> >> > select country,name from users_info  group by country;
>>> >> >
>>> >> > Regards
>>> >> > Bejoy KS
>>> >> >
>>> >> > Sent from handheld, please excuse typos.
>>> >> >
>>> >> > -----Original Message-----
>>> >> > From: Soham Sardar <so...@gmail.com>
>>> >> > Date: Wed, 27 Jun 2012 11:57:23
>>> >> > To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
>>> >> > Reply-To: user@hive.apache.org
>>> >> > Subject: Re: date datatype in hive
>>> >> >
>>> >> > And btw does group by works in hive because the same wuery i am
>>> >> > running in mysql and its working fine but its failing in hive
>>> >> >
>>> >> > select name from users_info  group by country;
>>> >> >
>>> >> > in mysql its working but whn i try to run it in hive its telling
>>> >> >
>>> >> > hive> select name from users_info  group by country;
>>> >> > FAILED: Error in semantic analysis: Line 1:7 Expression not in
>>>GROUP
>>> >> > BY key 'name'
>>> >> >
>>> >> > I wanna know why is it failing
>>> >> >
>>> >> >
>>> >> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
>>> >> > <so...@gmail.com> wrote:
>>> >> >> See Bejoy and Everyone ,
>>> >> >> I have two tables
>>> >> >> one users_info and one users_audit
>>> >> >> in hive ..
>>> >> >>
>>> >> >> hive> desc users_audit;
>>> >> >> OK
>>> >> >> id      int
>>> >> >> userid  int
>>> >> >> logtime string
>>> >> >>
>>> >> >> hive> desc users_info;
>>> >> >> OK
>>> >> >> id      int
>>> >> >> name    string
>>> >> >> age     int
>>> >> >> country string
>>> >> >> gender  string
>>> >> >> bday    string
>>> >> >>
>>> >> >> now i have given the description of the tables in hive
>>> >> >>
>>> >> >> my goal is to find:
>>> >> >> the maximum number of users loging - in at between a certain time
>>> >> >> frame say 1PM to 2PM- belonging to a specific country
>>> >> >>
>>> >> >> for example
>>> >> >> in between 1PM to 2PM there are 10000 users loging into and 5500
>>>are
>>> >> >> from africa , then i need to print the names of all the users who
>>> >> >> are
>>> >> >> logged in between the time stamp and who are from africa ..
>>> >> >> the logtime in the users_audit table gives the login time and the
>>> >> >> names and country are from the users_info table .
>>> >> >>
>>> >> >> Can someone help me out with this . I am a new user in hive so
>>>would
>>> >> >> like some ones help !!!!!!
>>> >> >> Thanks in advance!!  :)
>>> >> >>
>>> >> >>
>>> >> >>
>>> >> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com>
>>> >> >> wrote:
>>> >> >>>
>>> >> >>>
>>> >> >>> Hi Soham
>>> >> >>>
>>> >> >>> Hive Supports pretty much all the primitive data types including
>>> >> >>> INT.
>>> >> >>> For a detaild list please refer
>>> >> >>>
>>> >> >>>
>>> >> >>> 
>>>https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-Primi
>>>tiveTypes
>>> >> >>>
>>> >> >>> The only draw back as in common is when you have the data type
>>>as
>>> >> >>> String you cannot use it directly on Mathematical functions.
>>> >> >>>
>>> >> >>> Your requirement can easily be satisfied with a few date
>>>functions
>>> >> >>> on
>>> >> >>> String data itself.
>>> >> >>>
>>> >> >>> Regards
>>> >> >>> Bejoy KS
>>> >> >>>
>>> >> >>>
>>> >> >>> ________________________________
>>> >> >>> From: Soham Sardar <so...@gmail.com>
>>> >> >>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
>>> >> >>> Sent: Tuesday, June 26, 2012 2:23 PM
>>> >> >>> Subject: date datatype in hive
>>> >> >>>
>>> >> >>> I have a native data type in mysql and i just imported it into
>>>hive
>>> >> >>> and the data type of the column has now become string ..
>>> >> >>> Now i would like to know if there is any native data type in
>>>hive
>>> >> >>> and
>>> >> >>> What are the pros and cons of using string type in hive rather
>>>than
>>> >> >>> (int)(thats what i expect ) type
>>> >> >>>
>>> >> >>> And with the string type can i just run any type of queries like
>>> >> >>> print the names of the person which has the maximum number of
>>>users
>>> >> >>> between 1 PM and 3 PM
>>> >> >>>
>>> >> >>> and the table should have just name int(5) and login time
>>>(datetime
>>> >> >>> )
>>> >> >>> {as per the mysql syntax}
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Nitin Pawar
>>> >
>>
>>
>>
>>
>> --
>> Nitin Pawar
>>


Re: date datatype in hive

Posted by Soham Sardar <so...@gmail.com>.
Hey nicole ,
your query is working fine in mysql but when it comes to hive then it
just fails as it is

hive> select users_info.country,count(1) from (select userid from
users_audit
    > where SUBSTR(users_audit.logtime,1,17)>='1971-06-27 13:00' and
SUBSTR(users_audit.logtime,1,17)<='2010-06-27 14:00')
    > users_audit inner  join users_info on
(users_info.id=users_audit.userid)
    > group by users_info.country order by count(1) desc;
FAILED: Hive Internal Error: java.lang.NullPointerException(null)
java.lang.NullPointerException
	at org.apache.hadoop.hive.ql.plan.ExprNodeGenericFuncDesc.newInstance(ExprNodeGenericFuncDesc.java:214)
	at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.getXpathOrFuncExprNodeDesc(TypeCheckProcFactory.java:684)
	at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory$DefaultExprProcessor.process(TypeCheckProcFactory.java:805)
	at org.apache.hadoop.hive.ql.lib.DefaultRuleDispatcher.dispatch(DefaultRuleDispatcher.java:89)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.dispatch(DefaultGraphWalker.java:88)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.walk(DefaultGraphWalker.java:125)
	at org.apache.hadoop.hive.ql.lib.DefaultGraphWalker.startWalking(DefaultGraphWalker.java:102)
	at org.apache.hadoop.hive.ql.parse.TypeCheckProcFactory.genExprNode(TypeCheckProcFactory.java:161)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7506)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:7464)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genReduceSinkPlan(SemanticAnalyzer.java:4562)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genBodyPlan(SemanticAnalyzer.java:5936)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genPlan(SemanticAnalyzer.java:6524)
	at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.analyzeInternal(SemanticAnalyzer.java:7282)
	at org.apache.hadoop.hive.ql.parse.BaseSemanticAnalyzer.analyze(BaseSemanticAnalyzer.java:243)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:430)
	at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:337)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:889)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:255)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:212)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:671)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:554)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)

can u just tell me why this is happening :)



On Wed, Jun 27, 2012 at 11:32 PM, Gesli, Nicole
<Ni...@memorylane.com> wrote:
> Soham, I think you need something like this:
>
> SELECT ui.country, COUNT(1)
> FROM ( SELECT DISTINCT userid
>        FROM   users_audit
>        WHERE  SUBSTR(ua.logtime, 1, 17) >= '2012-06-27 13:00'
>        AND    SUBSTR(ua.logtime, 1, 17) <= '2012-06-27 14:00'
>      ) ua
>        JOIN
>        users_info ui ON (ui.id = ua.userid)
> GROUP BY
>        ui.country;
>
> -Nicole
>
> From: Nitin Pawar <ni...@gmail.com>
> Reply-To: <us...@hive.apache.org>
> Date: Wed, 27 Jun 2012 15:22:23 +0530
>
> To: <us...@hive.apache.org>
> Subject: Re: date datatype in hive
>
> soham,
>
> the reason u r getting the error is there may be multiple names associated
> with one country
> so when you do  a group by there is no unique output to generate
>
> so group by country, name will give you unique results
>
> same is the case with select *
>
> in case you want to sort the output you may try with order by country
> instead of group by
> if you want to omit duplicates you can use distinct
>
> so try something like
>
> select count(*), country from users_info group by country
> select count(*) from users_info group by country
> select distinct name, country from users_info order by country
> select name, county from users_info group by country, name
>
> all above should work unless i made a typo error :)
>
>
>
> On Wed, Jun 27, 2012 at 3:11 PM, Soham Sardar <so...@gmail.com>
> wrote:
>>
>> Thanks Bejoy and Nitin for replyin
>> well now i got that thing ryt but then i have serious issues
>>
>> in hive querying like some of such queries are
>>
>> 1)  select * from users_info group by country;
>>
>> and this is giving me only the countries grouped alphabetically and
>> not any other information
>>
>> 2) select name,country from users_info group by country;
>>
>> and this gives me an error
>> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>> BY key 'name'
>>
>> can someone help me out with this :)
>>
>>
>>
>>
>> On Wed, Jun 27, 2012 at 1:39 PM, Nitin Pawar <ni...@gmail.com>
>> wrote:
>> > soham,
>> >
>> > in your query
>> > hive> select name from users_info  group by country;
>> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>> > BY key 'name'
>> >
>> > when you run group by it needs to be present in the select columns as
>> > well
>> > thats what Bejoy suggested
>> >
>> > On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar
>> > <so...@gmail.com>
>> > wrote:
>> >>
>> >> Hey bejoy thats the problem i am not able to run the group by query in
>> >> hive i dunno whether i m making a mistake or some thing
>> >> see my previoius reply to this same thread i put up the same issue ...
>> >>
>> >>
>> >> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com> wrote:
>> >> > Hi Soham
>> >> >
>> >> > Rewrite your query with the columns in Group By included in Select as
>> >> > well. Something like
>> >> >
>> >> > select country,name from users_info  group by country;
>> >> >
>> >> > Regards
>> >> > Bejoy KS
>> >> >
>> >> > Sent from handheld, please excuse typos.
>> >> >
>> >> > -----Original Message-----
>> >> > From: Soham Sardar <so...@gmail.com>
>> >> > Date: Wed, 27 Jun 2012 11:57:23
>> >> > To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
>> >> > Reply-To: user@hive.apache.org
>> >> > Subject: Re: date datatype in hive
>> >> >
>> >> > And btw does group by works in hive because the same wuery i am
>> >> > running in mysql and its working fine but its failing in hive
>> >> >
>> >> > select name from users_info  group by country;
>> >> >
>> >> > in mysql its working but whn i try to run it in hive its telling
>> >> >
>> >> > hive> select name from users_info  group by country;
>> >> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>> >> > BY key 'name'
>> >> >
>> >> > I wanna know why is it failing
>> >> >
>> >> >
>> >> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
>> >> > <so...@gmail.com> wrote:
>> >> >> See Bejoy and Everyone ,
>> >> >> I have two tables
>> >> >> one users_info and one users_audit
>> >> >> in hive ..
>> >> >>
>> >> >> hive> desc users_audit;
>> >> >> OK
>> >> >> id      int
>> >> >> userid  int
>> >> >> logtime string
>> >> >>
>> >> >> hive> desc users_info;
>> >> >> OK
>> >> >> id      int
>> >> >> name    string
>> >> >> age     int
>> >> >> country string
>> >> >> gender  string
>> >> >> bday    string
>> >> >>
>> >> >> now i have given the description of the tables in hive
>> >> >>
>> >> >> my goal is to find:
>> >> >> the maximum number of users loging - in at between a certain time
>> >> >> frame say 1PM to 2PM- belonging to a specific country
>> >> >>
>> >> >> for example
>> >> >> in between 1PM to 2PM there are 10000 users loging into and 5500 are
>> >> >> from africa , then i need to print the names of all the users who
>> >> >> are
>> >> >> logged in between the time stamp and who are from africa ..
>> >> >> the logtime in the users_audit table gives the login time and the
>> >> >> names and country are from the users_info table .
>> >> >>
>> >> >> Can someone help me out with this . I am a new user in hive so would
>> >> >> like some ones help !!!!!!
>> >> >> Thanks in advance!!  :)
>> >> >>
>> >> >>
>> >> >>
>> >> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com>
>> >> >> wrote:
>> >> >>>
>> >> >>>
>> >> >>> Hi Soham
>> >> >>>
>> >> >>> Hive Supports pretty much all the primitive data types including
>> >> >>> INT.
>> >> >>> For a detaild list please refer
>> >> >>>
>> >> >>>
>> >> >>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>> >> >>>
>> >> >>> The only draw back as in common is when you have the data type as
>> >> >>> String you cannot use it directly on Mathematical functions.
>> >> >>>
>> >> >>> Your requirement can easily be satisfied with a few date functions
>> >> >>> on
>> >> >>> String data itself.
>> >> >>>
>> >> >>> Regards
>> >> >>> Bejoy KS
>> >> >>>
>> >> >>>
>> >> >>> ________________________________
>> >> >>> From: Soham Sardar <so...@gmail.com>
>> >> >>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
>> >> >>> Sent: Tuesday, June 26, 2012 2:23 PM
>> >> >>> Subject: date datatype in hive
>> >> >>>
>> >> >>> I have a native data type in mysql and i just imported it into hive
>> >> >>> and the data type of the column has now become string ..
>> >> >>> Now i would like to know if there is any native data type in hive
>> >> >>> and
>> >> >>> What are the pros and cons of using string type in hive rather than
>> >> >>> (int)(thats what i expect ) type
>> >> >>>
>> >> >>> And with the string type can i just run any type of queries like
>> >> >>> print the names of the person which has the maximum number of users
>> >> >>> between 1 PM and 3 PM
>> >> >>>
>> >> >>> and the table should have just name int(5) and login time (datetime
>> >> >>> )
>> >> >>> {as per the mysql syntax}
>> >
>> >
>> >
>> >
>> > --
>> > Nitin Pawar
>> >
>
>
>
>
> --
> Nitin Pawar
>

Re: date datatype in hive

Posted by "Gesli, Nicole" <Ni...@memorylane.com>.
Soham, I think you need something like this:

SELECT ui.country, COUNT(1)
FROM ( SELECT DISTINCT userid
       FROM   users_audit
       WHERE  SUBSTR(ua.logtime, 1, 17) >= '2012-06-27 13:00'
       AND    SUBSTR(ua.logtime, 1, 17) <= '2012-06-27 14:00'
     ) ua
       JOIN
       users_info ui ON (ui.id = ua.userid)
GROUP BY
       ui.country;

-Nicole

From: Nitin Pawar <ni...@gmail.com>>
Reply-To: <us...@hive.apache.org>>
Date: Wed, 27 Jun 2012 15:22:23 +0530
To: <us...@hive.apache.org>>
Subject: Re: date datatype in hive

soham,

the reason u r getting the error is there may be multiple names associated with one country
so when you do  a group by there is no unique output to generate

so group by country, name will give you unique results

same is the case with select *

in case you want to sort the output you may try with order by country instead of group by
if you want to omit duplicates you can use distinct

so try something like

select count(*), country from users_info group by country
select count(*) from users_info group by country
select distinct name, country from users_info order by country
select name, county from users_info group by country, name

all above should work unless i made a typo error :)



On Wed, Jun 27, 2012 at 3:11 PM, Soham Sardar <so...@gmail.com>> wrote:
Thanks Bejoy and Nitin for replyin
well now i got that thing ryt but then i have serious issues

in hive querying like some of such queries are

1)  select * from users_info group by country;

and this is giving me only the countries grouped alphabetically and
not any other information

2) select name,country from users_info group by country;

and this gives me an error
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
BY key 'name'

can someone help me out with this :)




On Wed, Jun 27, 2012 at 1:39 PM, Nitin Pawar <ni...@gmail.com>> wrote:
> soham,
>
> in your query
> hive> select name from users_info  group by country;
> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> BY key 'name'
>
> when you run group by it needs to be present in the select columns as well
> thats what Bejoy suggested
>
> On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar <so...@gmail.com>>
> wrote:
>>
>> Hey bejoy thats the problem i am not able to run the group by query in
>> hive i dunno whether i m making a mistake or some thing
>> see my previoius reply to this same thread i put up the same issue ...
>>
>>
>> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com>> wrote:
>> > Hi Soham
>> >
>> > Rewrite your query with the columns in Group By included in Select as
>> > well. Something like
>> >
>> > select country,name from users_info  group by country;
>> >
>> > Regards
>> > Bejoy KS
>> >
>> > Sent from handheld, please excuse typos.
>> >
>> > -----Original Message-----
>> > From: Soham Sardar <so...@gmail.com>>
>> > Date: Wed, 27 Jun 2012 11:57:23
>> > To: <us...@hive.apache.org>>; Bejoy Ks<be...@yahoo.com>>
>> > Reply-To: user@hive.apache.org<ma...@hive.apache.org>
>> > Subject: Re: date datatype in hive
>> >
>> > And btw does group by works in hive because the same wuery i am
>> > running in mysql and its working fine but its failing in hive
>> >
>> > select name from users_info  group by country;
>> >
>> > in mysql its working but whn i try to run it in hive its telling
>> >
>> > hive> select name from users_info  group by country;
>> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>> > BY key 'name'
>> >
>> > I wanna know why is it failing
>> >
>> >
>> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
>> > <so...@gmail.com>> wrote:
>> >> See Bejoy and Everyone ,
>> >> I have two tables
>> >> one users_info and one users_audit
>> >> in hive ..
>> >>
>> >> hive> desc users_audit;
>> >> OK
>> >> id      int
>> >> userid  int
>> >> logtime string
>> >>
>> >> hive> desc users_info;
>> >> OK
>> >> id      int
>> >> name    string
>> >> age     int
>> >> country string
>> >> gender  string
>> >> bday    string
>> >>
>> >> now i have given the description of the tables in hive
>> >>
>> >> my goal is to find:
>> >> the maximum number of users loging - in at between a certain time
>> >> frame say 1PM to 2PM- belonging to a specific country
>> >>
>> >> for example
>> >> in between 1PM to 2PM there are 10000 users loging into and 5500 are
>> >> from africa , then i need to print the names of all the users who are
>> >> logged in between the time stamp and who are from africa ..
>> >> the logtime in the users_audit table gives the login time and the
>> >> names and country are from the users_info table .
>> >>
>> >> Can someone help me out with this . I am a new user in hive so would
>> >> like some ones help !!!!!!
>> >> Thanks in advance!!  :)
>> >>
>> >>
>> >>
>> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com>> wrote:
>> >>>
>> >>>
>> >>> Hi Soham
>> >>>
>> >>> Hive Supports pretty much all the primitive data types including INT.
>> >>> For a detaild list please refer
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>> >>>
>> >>> The only draw back as in common is when you have the data type as
>> >>> String you cannot use it directly on Mathematical functions.
>> >>>
>> >>> Your requirement can easily be satisfied with a few date functions on
>> >>> String data itself.
>> >>>
>> >>> Regards
>> >>> Bejoy KS
>> >>>
>> >>>
>> >>> ________________________________
>> >>> From: Soham Sardar <so...@gmail.com>>
>> >>> To: user@hive.apache.org<ma...@hive.apache.org>; user <us...@sqoop.apache.org>>
>> >>> Sent: Tuesday, June 26, 2012 2:23 PM
>> >>> Subject: date datatype in hive
>> >>>
>> >>> I have a native data type in mysql and i just imported it into hive
>> >>> and the data type of the column has now become string ..
>> >>> Now i would like to know if there is any native data type in hive and
>> >>> What are the pros and cons of using string type in hive rather than
>> >>> (int)(thats what i expect ) type
>> >>>
>> >>> And with the string type can i just run any type of queries like
>> >>> print the names of the person which has the maximum number of users
>> >>> between 1 PM and 3 PM
>> >>>
>> >>> and the table should have just name int(5) and login time (datetime )
>> >>> {as per the mysql syntax}
>
>
>
>
> --
> Nitin Pawar
>



--
Nitin Pawar


Re: date datatype in hive

Posted by Nitin Pawar <ni...@gmail.com>.
soham,

the reason u r getting the error is there may be multiple names associated
with one country
so when you do  a group by there is no unique output to generate

so group by country, name will give you unique results

same is the case with select *

in case you want to sort the output you may try with order by country
instead of group by
if you want to omit duplicates you can use distinct

so try something like

select count(*), country from users_info group by country
select count(*) from users_info group by country
select distinct name, country from users_info order by country
select name, county from users_info group by country, name

all above should work unless i made a typo error :)



On Wed, Jun 27, 2012 at 3:11 PM, Soham Sardar <so...@gmail.com>wrote:

> Thanks Bejoy and Nitin for replyin
> well now i got that thing ryt but then i have serious issues
>
> in hive querying like some of such queries are
>
> 1)  select * from users_info group by country;
>
> and this is giving me only the countries grouped alphabetically and
> not any other information
>
> 2) select name,country from users_info group by country;
>
> and this gives me an error
> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> BY key 'name'
>
> can someone help me out with this :)
>
>
>
>
> On Wed, Jun 27, 2012 at 1:39 PM, Nitin Pawar <ni...@gmail.com>
> wrote:
> > soham,
> >
> > in your query
> > hive> select name from users_info  group by country;
> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> > BY key 'name'
> >
> > when you run group by it needs to be present in the select columns as
> well
> > thats what Bejoy suggested
> >
> > On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar <
> sohamsardartech@gmail.com>
> > wrote:
> >>
> >> Hey bejoy thats the problem i am not able to run the group by query in
> >> hive i dunno whether i m making a mistake or some thing
> >> see my previoius reply to this same thread i put up the same issue ...
> >>
> >>
> >> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com> wrote:
> >> > Hi Soham
> >> >
> >> > Rewrite your query with the columns in Group By included in Select as
> >> > well. Something like
> >> >
> >> > select country,name from users_info  group by country;
> >> >
> >> > Regards
> >> > Bejoy KS
> >> >
> >> > Sent from handheld, please excuse typos.
> >> >
> >> > -----Original Message-----
> >> > From: Soham Sardar <so...@gmail.com>
> >> > Date: Wed, 27 Jun 2012 11:57:23
> >> > To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
> >> > Reply-To: user@hive.apache.org
> >> > Subject: Re: date datatype in hive
> >> >
> >> > And btw does group by works in hive because the same wuery i am
> >> > running in mysql and its working fine but its failing in hive
> >> >
> >> > select name from users_info  group by country;
> >> >
> >> > in mysql its working but whn i try to run it in hive its telling
> >> >
> >> > hive> select name from users_info  group by country;
> >> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> >> > BY key 'name'
> >> >
> >> > I wanna know why is it failing
> >> >
> >> >
> >> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
> >> > <so...@gmail.com> wrote:
> >> >> See Bejoy and Everyone ,
> >> >> I have two tables
> >> >> one users_info and one users_audit
> >> >> in hive ..
> >> >>
> >> >> hive> desc users_audit;
> >> >> OK
> >> >> id      int
> >> >> userid  int
> >> >> logtime string
> >> >>
> >> >> hive> desc users_info;
> >> >> OK
> >> >> id      int
> >> >> name    string
> >> >> age     int
> >> >> country string
> >> >> gender  string
> >> >> bday    string
> >> >>
> >> >> now i have given the description of the tables in hive
> >> >>
> >> >> my goal is to find:
> >> >> the maximum number of users loging - in at between a certain time
> >> >> frame say 1PM to 2PM- belonging to a specific country
> >> >>
> >> >> for example
> >> >> in between 1PM to 2PM there are 10000 users loging into and 5500 are
> >> >> from africa , then i need to print the names of all the users who are
> >> >> logged in between the time stamp and who are from africa ..
> >> >> the logtime in the users_audit table gives the login time and the
> >> >> names and country are from the users_info table .
> >> >>
> >> >> Can someone help me out with this . I am a new user in hive so would
> >> >> like some ones help !!!!!!
> >> >> Thanks in advance!!  :)
> >> >>
> >> >>
> >> >>
> >> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com>
> wrote:
> >> >>>
> >> >>>
> >> >>> Hi Soham
> >> >>>
> >> >>> Hive Supports pretty much all the primitive data types including
> INT.
> >> >>> For a detaild list please refer
> >> >>>
> >> >>>
> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
> >> >>>
> >> >>> The only draw back as in common is when you have the data type as
> >> >>> String you cannot use it directly on Mathematical functions.
> >> >>>
> >> >>> Your requirement can easily be satisfied with a few date functions
> on
> >> >>> String data itself.
> >> >>>
> >> >>> Regards
> >> >>> Bejoy KS
> >> >>>
> >> >>>
> >> >>> ________________________________
> >> >>> From: Soham Sardar <so...@gmail.com>
> >> >>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
> >> >>> Sent: Tuesday, June 26, 2012 2:23 PM
> >> >>> Subject: date datatype in hive
> >> >>>
> >> >>> I have a native data type in mysql and i just imported it into hive
> >> >>> and the data type of the column has now become string ..
> >> >>> Now i would like to know if there is any native data type in hive
> and
> >> >>> What are the pros and cons of using string type in hive rather than
> >> >>> (int)(thats what i expect ) type
> >> >>>
> >> >>> And with the string type can i just run any type of queries like
> >> >>> print the names of the person which has the maximum number of users
> >> >>> between 1 PM and 3 PM
> >> >>>
> >> >>> and the table should have just name int(5) and login time (datetime
> )
> >> >>> {as per the mysql syntax}
> >
> >
> >
> >
> > --
> > Nitin Pawar
> >
>



-- 
Nitin Pawar

Re: date datatype in hive

Posted by Soham Sardar <so...@gmail.com>.
Thanks Bejoy and Nitin for replyin
well now i got that thing ryt but then i have serious issues

in hive querying like some of such queries are

1)  select * from users_info group by country;

and this is giving me only the countries grouped alphabetically and
not any other information

2) select name,country from users_info group by country;

and this gives me an error
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
BY key 'name'

can someone help me out with this :)




On Wed, Jun 27, 2012 at 1:39 PM, Nitin Pawar <ni...@gmail.com> wrote:
> soham,
>
> in your query
> hive> select name from users_info  group by country;
> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> BY key 'name'
>
> when you run group by it needs to be present in the select columns as well
> thats what Bejoy suggested
>
> On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar <so...@gmail.com>
> wrote:
>>
>> Hey bejoy thats the problem i am not able to run the group by query in
>> hive i dunno whether i m making a mistake or some thing
>> see my previoius reply to this same thread i put up the same issue ...
>>
>>
>> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com> wrote:
>> > Hi Soham
>> >
>> > Rewrite your query with the columns in Group By included in Select as
>> > well. Something like
>> >
>> > select country,name from users_info  group by country;
>> >
>> > Regards
>> > Bejoy KS
>> >
>> > Sent from handheld, please excuse typos.
>> >
>> > -----Original Message-----
>> > From: Soham Sardar <so...@gmail.com>
>> > Date: Wed, 27 Jun 2012 11:57:23
>> > To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
>> > Reply-To: user@hive.apache.org
>> > Subject: Re: date datatype in hive
>> >
>> > And btw does group by works in hive because the same wuery i am
>> > running in mysql and its working fine but its failing in hive
>> >
>> > select name from users_info  group by country;
>> >
>> > in mysql its working but whn i try to run it in hive its telling
>> >
>> > hive> select name from users_info  group by country;
>> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
>> > BY key 'name'
>> >
>> > I wanna know why is it failing
>> >
>> >
>> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
>> > <so...@gmail.com> wrote:
>> >> See Bejoy and Everyone ,
>> >> I have two tables
>> >> one users_info and one users_audit
>> >> in hive ..
>> >>
>> >> hive> desc users_audit;
>> >> OK
>> >> id      int
>> >> userid  int
>> >> logtime string
>> >>
>> >> hive> desc users_info;
>> >> OK
>> >> id      int
>> >> name    string
>> >> age     int
>> >> country string
>> >> gender  string
>> >> bday    string
>> >>
>> >> now i have given the description of the tables in hive
>> >>
>> >> my goal is to find:
>> >> the maximum number of users loging - in at between a certain time
>> >> frame say 1PM to 2PM- belonging to a specific country
>> >>
>> >> for example
>> >> in between 1PM to 2PM there are 10000 users loging into and 5500 are
>> >> from africa , then i need to print the names of all the users who are
>> >> logged in between the time stamp and who are from africa ..
>> >> the logtime in the users_audit table gives the login time and the
>> >> names and country are from the users_info table .
>> >>
>> >> Can someone help me out with this . I am a new user in hive so would
>> >> like some ones help !!!!!!
>> >> Thanks in advance!!  :)
>> >>
>> >>
>> >>
>> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com> wrote:
>> >>>
>> >>>
>> >>> Hi Soham
>> >>>
>> >>> Hive Supports pretty much all the primitive data types including INT.
>> >>> For a detaild list please refer
>> >>>
>> >>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>> >>>
>> >>> The only draw back as in common is when you have the data type as
>> >>> String you cannot use it directly on Mathematical functions.
>> >>>
>> >>> Your requirement can easily be satisfied with a few date functions on
>> >>> String data itself.
>> >>>
>> >>> Regards
>> >>> Bejoy KS
>> >>>
>> >>>
>> >>> ________________________________
>> >>> From: Soham Sardar <so...@gmail.com>
>> >>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
>> >>> Sent: Tuesday, June 26, 2012 2:23 PM
>> >>> Subject: date datatype in hive
>> >>>
>> >>> I have a native data type in mysql and i just imported it into hive
>> >>> and the data type of the column has now become string ..
>> >>> Now i would like to know if there is any native data type in hive and
>> >>> What are the pros and cons of using string type in hive rather than
>> >>> (int)(thats what i expect ) type
>> >>>
>> >>> And with the string type can i just run any type of queries like
>> >>> print the names of the person which has the maximum number of users
>> >>> between 1 PM and 3 PM
>> >>>
>> >>> and the table should have just name int(5) and login time (datetime )
>> >>> {as per the mysql syntax}
>
>
>
>
> --
> Nitin Pawar
>

Re: date datatype in hive

Posted by Nitin Pawar <ni...@gmail.com>.
soham,

in your query
hive> select name from users_info  group by country;
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
BY key 'name'

when you run group by it needs to be present in the select columns as well
thats what Bejoy suggested

On Wed, Jun 27, 2012 at 12:05 PM, Soham Sardar <so...@gmail.com>wrote:

> Hey bejoy thats the problem i am not able to run the group by query in
> hive i dunno whether i m making a mistake or some thing
> see my previoius reply to this same thread i put up the same issue ...
>
>
> On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com> wrote:
> > Hi Soham
> >
> > Rewrite your query with the columns in Group By included in Select as
> well. Something like
> >
> > select country,name from users_info  group by country;
> >
> > Regards
> > Bejoy KS
> >
> > Sent from handheld, please excuse typos.
> >
> > -----Original Message-----
> > From: Soham Sardar <so...@gmail.com>
> > Date: Wed, 27 Jun 2012 11:57:23
> > To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
> > Reply-To: user@hive.apache.org
> > Subject: Re: date datatype in hive
> >
> > And btw does group by works in hive because the same wuery i am
> > running in mysql and its working fine but its failing in hive
> >
> > select name from users_info  group by country;
> >
> > in mysql its working but whn i try to run it in hive its telling
> >
> > hive> select name from users_info  group by country;
> > FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> > BY key 'name'
> >
> > I wanna know why is it failing
> >
> >
> > On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
> > <so...@gmail.com> wrote:
> >> See Bejoy and Everyone ,
> >> I have two tables
> >> one users_info and one users_audit
> >> in hive ..
> >>
> >> hive> desc users_audit;
> >> OK
> >> id      int
> >> userid  int
> >> logtime string
> >>
> >> hive> desc users_info;
> >> OK
> >> id      int
> >> name    string
> >> age     int
> >> country string
> >> gender  string
> >> bday    string
> >>
> >> now i have given the description of the tables in hive
> >>
> >> my goal is to find:
> >> the maximum number of users loging - in at between a certain time
> >> frame say 1PM to 2PM- belonging to a specific country
> >>
> >> for example
> >> in between 1PM to 2PM there are 10000 users loging into and 5500 are
> >> from africa , then i need to print the names of all the users who are
> >> logged in between the time stamp and who are from africa ..
> >> the logtime in the users_audit table gives the login time and the
> >> names and country are from the users_info table .
> >>
> >> Can someone help me out with this . I am a new user in hive so would
> >> like some ones help !!!!!!
> >> Thanks in advance!!  :)
> >>
> >>
> >>
> >> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com> wrote:
> >>>
> >>>
> >>> Hi Soham
> >>>
> >>> Hive Supports pretty much all the primitive data types including INT.
> For a detaild list please refer
> >>>
> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
> >>>
> >>> The only draw back as in common is when you have the data type as
> String you cannot use it directly on Mathematical functions.
> >>>
> >>> Your requirement can easily be satisfied with a few date functions on
> String data itself.
> >>>
> >>> Regards
> >>> Bejoy KS
> >>>
> >>>
> >>> ________________________________
> >>> From: Soham Sardar <so...@gmail.com>
> >>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
> >>> Sent: Tuesday, June 26, 2012 2:23 PM
> >>> Subject: date datatype in hive
> >>>
> >>> I have a native data type in mysql and i just imported it into hive
> >>> and the data type of the column has now become string ..
> >>> Now i would like to know if there is any native data type in hive and
> >>> What are the pros and cons of using string type in hive rather than
> >>> (int)(thats what i expect ) type
> >>>
> >>> And with the string type can i just run any type of queries like
> >>> print the names of the person which has the maximum number of users
> >>> between 1 PM and 3 PM
> >>>
> >>> and the table should have just name int(5) and login time (datetime )
> >>> {as per the mysql syntax}
>



-- 
Nitin Pawar

Re: date datatype in hive

Posted by Soham Sardar <so...@gmail.com>.
Hey bejoy thats the problem i am not able to run the group by query in
hive i dunno whether i m making a mistake or some thing
see my previoius reply to this same thread i put up the same issue ...


On Wed, Jun 27, 2012 at 12:02 PM, Bejoy KS <be...@yahoo.com> wrote:
> Hi Soham
>
> Rewrite your query with the columns in Group By included in Select as well. Something like
>
> select country,name from users_info  group by country;
>
> Regards
> Bejoy KS
>
> Sent from handheld, please excuse typos.
>
> -----Original Message-----
> From: Soham Sardar <so...@gmail.com>
> Date: Wed, 27 Jun 2012 11:57:23
> To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
> Reply-To: user@hive.apache.org
> Subject: Re: date datatype in hive
>
> And btw does group by works in hive because the same wuery i am
> running in mysql and its working fine but its failing in hive
>
> select name from users_info  group by country;
>
> in mysql its working but whn i try to run it in hive its telling
>
> hive> select name from users_info  group by country;
> FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
> BY key 'name'
>
> I wanna know why is it failing
>
>
> On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
> <so...@gmail.com> wrote:
>> See Bejoy and Everyone ,
>> I have two tables
>> one users_info and one users_audit
>> in hive ..
>>
>> hive> desc users_audit;
>> OK
>> id      int
>> userid  int
>> logtime string
>>
>> hive> desc users_info;
>> OK
>> id      int
>> name    string
>> age     int
>> country string
>> gender  string
>> bday    string
>>
>> now i have given the description of the tables in hive
>>
>> my goal is to find:
>> the maximum number of users loging - in at between a certain time
>> frame say 1PM to 2PM- belonging to a specific country
>>
>> for example
>> in between 1PM to 2PM there are 10000 users loging into and 5500 are
>> from africa , then i need to print the names of all the users who are
>> logged in between the time stamp and who are from africa ..
>> the logtime in the users_audit table gives the login time and the
>> names and country are from the users_info table .
>>
>> Can someone help me out with this . I am a new user in hive so would
>> like some ones help !!!!!!
>> Thanks in advance!!  :)
>>
>>
>>
>> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com> wrote:
>>>
>>>
>>> Hi Soham
>>>
>>> Hive Supports pretty much all the primitive data types including INT. For a detaild list please refer
>>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>>>
>>> The only draw back as in common is when you have the data type as String you cannot use it directly on Mathematical functions.
>>>
>>> Your requirement can easily be satisfied with a few date functions on String data itself.
>>>
>>> Regards
>>> Bejoy KS
>>>
>>>
>>> ________________________________
>>> From: Soham Sardar <so...@gmail.com>
>>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
>>> Sent: Tuesday, June 26, 2012 2:23 PM
>>> Subject: date datatype in hive
>>>
>>> I have a native data type in mysql and i just imported it into hive
>>> and the data type of the column has now become string ..
>>> Now i would like to know if there is any native data type in hive and
>>> What are the pros and cons of using string type in hive rather than
>>> (int)(thats what i expect ) type
>>>
>>> And with the string type can i just run any type of queries like
>>> print the names of the person which has the maximum number of users
>>> between 1 PM and 3 PM
>>>
>>> and the table should have just name int(5) and login time (datetime )
>>> {as per the mysql syntax}

Re: date datatype in hive

Posted by Bejoy KS <be...@yahoo.com>.
Hi Soham

Rewrite your query with the columns in Group By included in Select as well. Something like

select country,name from users_info  group by country;

Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Soham Sardar <so...@gmail.com>
Date: Wed, 27 Jun 2012 11:57:23 
To: <us...@hive.apache.org>; Bejoy Ks<be...@yahoo.com>
Reply-To: user@hive.apache.org
Subject: Re: date datatype in hive

And btw does group by works in hive because the same wuery i am
running in mysql and its working fine but its failing in hive

select name from users_info  group by country;

in mysql its working but whn i try to run it in hive its telling

hive> select name from users_info  group by country;
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
BY key 'name'

I wanna know why is it failing


On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
<so...@gmail.com> wrote:
> See Bejoy and Everyone ,
> I have two tables
> one users_info and one users_audit
> in hive ..
>
> hive> desc users_audit;
> OK
> id      int
> userid  int
> logtime string
>
> hive> desc users_info;
> OK
> id      int
> name    string
> age     int
> country string
> gender  string
> bday    string
>
> now i have given the description of the tables in hive
>
> my goal is to find:
> the maximum number of users loging - in at between a certain time
> frame say 1PM to 2PM- belonging to a specific country
>
> for example
> in between 1PM to 2PM there are 10000 users loging into and 5500 are
> from africa , then i need to print the names of all the users who are
> logged in between the time stamp and who are from africa ..
> the logtime in the users_audit table gives the login time and the
> names and country are from the users_info table .
>
> Can someone help me out with this . I am a new user in hive so would
> like some ones help !!!!!!
> Thanks in advance!!  :)
>
>
>
> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com> wrote:
>>
>>
>> Hi Soham
>>
>> Hive Supports pretty much all the primitive data types including INT. For a detaild list please refer
>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>>
>> The only draw back as in common is when you have the data type as String you cannot use it directly on Mathematical functions.
>>
>> Your requirement can easily be satisfied with a few date functions on String data itself.
>>
>> Regards
>> Bejoy KS
>>
>>
>> ________________________________
>> From: Soham Sardar <so...@gmail.com>
>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
>> Sent: Tuesday, June 26, 2012 2:23 PM
>> Subject: date datatype in hive
>>
>> I have a native data type in mysql and i just imported it into hive
>> and the data type of the column has now become string ..
>> Now i would like to know if there is any native data type in hive and
>> What are the pros and cons of using string type in hive rather than
>> (int)(thats what i expect ) type
>>
>> And with the string type can i just run any type of queries like
>> print the names of the person which has the maximum number of users
>> between 1 PM and 3 PM
>>
>> and the table should have just name int(5) and login time (datetime )
>> {as per the mysql syntax}

Re: date datatype in hive

Posted by Soham Sardar <so...@gmail.com>.
And btw does group by works in hive because the same wuery i am
running in mysql and its working fine but its failing in hive

select name from users_info  group by country;

in mysql its working but whn i try to run it in hive its telling

hive> select name from users_info  group by country;
FAILED: Error in semantic analysis: Line 1:7 Expression not in GROUP
BY key 'name'

I wanna know why is it failing


On Wed, Jun 27, 2012 at 10:19 AM, Soham Sardar
<so...@gmail.com> wrote:
> See Bejoy and Everyone ,
> I have two tables
> one users_info and one users_audit
> in hive ..
>
> hive> desc users_audit;
> OK
> id      int
> userid  int
> logtime string
>
> hive> desc users_info;
> OK
> id      int
> name    string
> age     int
> country string
> gender  string
> bday    string
>
> now i have given the description of the tables in hive
>
> my goal is to find:
> the maximum number of users loging - in at between a certain time
> frame say 1PM to 2PM- belonging to a specific country
>
> for example
> in between 1PM to 2PM there are 10000 users loging into and 5500 are
> from africa , then i need to print the names of all the users who are
> logged in between the time stamp and who are from africa ..
> the logtime in the users_audit table gives the login time and the
> names and country are from the users_info table .
>
> Can someone help me out with this . I am a new user in hive so would
> like some ones help !!!!!!
> Thanks in advance!!  :)
>
>
>
> On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com> wrote:
>>
>>
>> Hi Soham
>>
>> Hive Supports pretty much all the primitive data types including INT. For a detaild list please refer
>> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>>
>> The only draw back as in common is when you have the data type as String you cannot use it directly on Mathematical functions.
>>
>> Your requirement can easily be satisfied with a few date functions on String data itself.
>>
>> Regards
>> Bejoy KS
>>
>>
>> ________________________________
>> From: Soham Sardar <so...@gmail.com>
>> To: user@hive.apache.org; user <us...@sqoop.apache.org>
>> Sent: Tuesday, June 26, 2012 2:23 PM
>> Subject: date datatype in hive
>>
>> I have a native data type in mysql and i just imported it into hive
>> and the data type of the column has now become string ..
>> Now i would like to know if there is any native data type in hive and
>> What are the pros and cons of using string type in hive rather than
>> (int)(thats what i expect ) type
>>
>> And with the string type can i just run any type of queries like
>> print the names of the person which has the maximum number of users
>> between 1 PM and 3 PM
>>
>> and the table should have just name int(5) and login time (datetime )
>> {as per the mysql syntax}

Re: date datatype in hive

Posted by Soham Sardar <so...@gmail.com>.
See Bejoy and Everyone ,
I have two tables
one users_info and one users_audit
in hive ..

hive> desc users_audit;
OK
id	int	
userid	int	
logtime	string	

hive> desc users_info;
OK
id	int	
name	string	
age	int	
country	string	
gender	string	
bday	string	

now i have given the description of the tables in hive

my goal is to find:
the maximum number of users loging - in at between a certain time
frame say 1PM to 2PM- belonging to a specific country

for example
in between 1PM to 2PM there are 10000 users loging into and 5500 are
from africa , then i need to print the names of all the users who are
logged in between the time stamp and who are from africa ..
the logtime in the users_audit table gives the login time and the
names and country are from the users_info table .

Can someone help me out with this . I am a new user in hive so would
like some ones help !!!!!!
Thanks in advance!!  :)



On Tue, Jun 26, 2012 at 6:53 PM, Bejoy Ks <be...@yahoo.com> wrote:
>
>
> Hi Soham
>
> Hive Supports pretty much all the primitive data types including INT. For a detaild list please refer
> https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes
>
> The only draw back as in common is when you have the data type as String you cannot use it directly on Mathematical functions.
>
> Your requirement can easily be satisfied with a few date functions on String data itself.
>
> Regards
> Bejoy KS
>
>
> ________________________________
> From: Soham Sardar <so...@gmail.com>
> To: user@hive.apache.org; user <us...@sqoop.apache.org>
> Sent: Tuesday, June 26, 2012 2:23 PM
> Subject: date datatype in hive
>
> I have a native data type in mysql and i just imported it into hive
> and the data type of the column has now become string ..
> Now i would like to know if there is any native data type in hive and
> What are the pros and cons of using string type in hive rather than
> (int)(thats what i expect ) type
>
> And with the string type can i just run any type of queries like
> print the names of the person which has the maximum number of users
> between 1 PM and 3 PM
>
> and the table should have just name int(5) and login time (datetime )
> {as per the mysql syntax}

Re: date datatype in hive

Posted by Bejoy Ks <be...@yahoo.com>.

Hi Soham

Hive Supports pretty much all the primitive data types including INT. For a detaild list please refer
https://cwiki.apache.org/confluence/display/Hive/Tutorial#Tutorial-PrimitiveTypes

The only draw back as in common is when you have the data type as String you cannot use it directly on Mathematical functions.

Your requirement can easily be satisfied with a few date functions on String data itself.

Regards
Bejoy KS


________________________________
From: Soham Sardar <so...@gmail.com>
To: user@hive.apache.org; user <us...@sqoop.apache.org> 
Sent: Tuesday, June 26, 2012 2:23 PM
Subject: date datatype in hive

I have a native data type in mysql and i just imported it into hive
and the data type of the column has now become string ..
Now i would like to know if there is any native data type in hive and
What are the pros and cons of using string type in hive rather than
(int)(thats what i expect ) type

And with the string type can i just run any type of queries like
print the names of the person which has the maximum number of users
between 1 PM and 3 PM

and the table should have just name int(5) and login time (datetime )
{as per the mysql syntax}