You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Jov <zh...@gmail.com> on 2011/02/21 16:39:34 UTC

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

在 2011-2-21 下午10:54,"Bejoy Ks" <be...@yahoo.com>写道:
>
> Hi Experts
>      I'm using hive for a few projects and i found it a great tool in
hadoop to process end to end structured data. Unfortunately I'm facing a few
challenges out here as follows
>
> Availability of database/schemas in Hive
> I'm having multiple projects running in hive each having fairly large
number of tables. With this much tables all together it is  looking a bit
messed up. Is there any option of creating database/schema in Hive so that I
can maintain the tables in different databases/schemas corresponding to each
project.

it seems the resent version has already support database ddl,so,you can use
create database.

> Using INTERVAL
>     I need to replicate a job running in Teradata edw into hive, i'm
facing a challenge out here.Not able to identify a similar usage
corresponding to Interval in teradata within hive. Here is the snippet where
I'm facing the issue
>  *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime +
INTERVAL '05' HOUR)
> In this query how do i replicate the last part in hive ie (r1.rc_datetime
+ INTERVAL '05' HOUR) , where it is adding 5 hours to the obtained time
stamp rc_datetime.
> *The where condition is part of a very large query involving multiple
table joins.

hive do not have date or timestamp data type,all such type is string,but you
can write your udf to implement similar function

>
> Using IN
>     How do we replicate the SQL IN function in hive
> ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN (
'Production', 'Stage' , 'QA', 'Development')
> the last part of the query is where i'm facing the challenge
r1.PROCCESS_PHASE IN ( 'Production', 'Stage' , 'QA', 'Development')
> *The where condition is part of a very large query involving multiple
table joins.

you can use or,e.g.

'x in(1,2)' can be 'x=1 or x=2'

> Please advise.
>
> Regards
> Bejoy KS
>
>
>
>
>
>
>

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

Posted by Bejoy Ks <be...@yahoo.com>.
Yes Ajo time was the concern. I just got a chance though the in memory join 
implementation within hive and it is great.
Sorry for the confusion :)




________________________________
From: Ajo Fod <aj...@gmail.com>
To: user@hive.apache.org
Sent: Wed, February 23, 2011 8:01:07 PM
Subject: Re: Database/Schema , INTERVAL and SQL IN usages in Hive

Better in what sense? ... if it is time you are concerned about there are in 
memory joins.

-Ajo


On Wed, Feb 23, 2011 at 3:39 AM, Bejoy Ks <be...@yahoo.com> wrote:

Ajo,
>    If we have a good number of elements in the comparison set then going for a 
>table would be beneficial. But in case of a few elements say 5 wont multiple '=' 
>be better?
>
>Regards 
>Bejoy KS
>
>
>
>
________________________________
From: Ajo Fod <aj...@gmail.com>
>
>To: user@hive.apache.org
>Sent: Mon, February 21, 2011 10:04:41 PM
>
>Subject: Re: Database/Schema , INTERVAL and SQL IN usages in  Hive
>
>
>On using SQL IN ... what would happen if you created a short table with the 
>enteries in the IN clause and used a "inner join" ?
>
>-Ajo
>
>
>On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks <be...@yahoo.com> wrote:
>
>Thanks Jov for the quick response
>>
>>Could you please let me know which is the latest stable version of hive. Also 
>>how would you find out your hive version from command line?
>>
>>Regarding the SQL IN  I'm also currently using multiple '=' in my jobs, but 
>>still wanted to know whether there would be some better usage for the same apart 
>>from this. 
>>
>>
>>
>>Regards
>>Bejoy KS
>>
>>
>>
>>
>>
>>
>>
________________________________
 From: Jov <zh...@gmail.com>
>>To: user@hive.apache.org
>>Sent: Mon, February 21, 2011 9:09:34 PM
>>Subject: Re: Database/Schema , INTERVAL and SQL IN usages in Hive
>>
>>
>>
>>
>>在 2011-2-21 下午10:54,"Bejoy Ks" <be...@yahoo.com>写道:
>>>
>>> Hi Experts
>>>      I'm using hive for a few projects and i found it a great tool in hadoop to 
>>>process end to end structured data. Unfortunately I'm facing a few challenges 
>>>out here as follows
>>>
>>> Availability of database/schemas in Hive
>>> I'm having multiple projects running in hive each having fairly large number of 
>>>tables. With this much tables all together it is  looking a bit  messed up. Is 
>>>there any option of creating database/schema in Hive so that I can maintain the 
>>>tables in different databases/schemas corresponding to each project.
>>it seems the resent version has already support database ddl,so,you can use 
>>create database. 
>>
>>> Using INTERVAL 
>>>     I need to replicate a job running in Teradata edw into hive, i'm facing a 
>>>challenge out here.Not able to identify a similar usage corresponding to 
>>>Interval in teradata within hive. Here is the snippet where I'm facing the issue
>>>  *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime + 
>>>INTERVAL '05' HOUR)
>>> In this query how do i replicate the last part in hive ie (r1.rc_datetime + 
>>>INTERVAL '05' HOUR) , where it is adding 5 hours to the obtained time stamp 
>>>rc_datetime.
>>> *The where condition is part of a very large query involving multiple table 
>>>joins.
>>hive do not have date or timestamp data type,all such type is string,but you can 
>>write your udf to implement similar function 
>>
>>>
>>> Using IN 
>>>     How do we replicate the SQL IN function in hive
>>> ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN ( 'Production', 
>>>'Stage' , 'QA', 'Development')
>>> the last part of the query is where i'm facing the challenge r1.PROCCESS_PHASE 
>>>IN ( 'Production', 'Stage' , 'QA', 'Development')
>>> *The where condition is part of a very large query involving multiple table 
>>>joins.
>>you can use or,e.g.
>>'x in(1,2)' can be 'x=1 or x=2'
>>> Please advise.
>>>
>>> Regards
>>> Bejoy KS
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>
>>
>
>



      

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

Posted by Ajo Fod <aj...@gmail.com>.
Better in what sense? ... if it is time you are concerned about there are in
memory joins.

-Ajo

On Wed, Feb 23, 2011 at 3:39 AM, Bejoy Ks <be...@yahoo.com> wrote:

> Ajo,
>     If we have a good number of elements in the comparison set then going
> for a table would be beneficial. But in case of a few elements say 5 wont
> multiple '=' be better?
>
> Regards
> Bejoy KS
>
> ------------------------------
> *From:* Ajo Fod <aj...@gmail.com>
>
> *To:* user@hive.apache.org
> *Sent:* Mon, February 21, 2011 10:04:41 PM
>
> *Subject:* Re: Database/Schema , INTERVAL and SQL IN usages in Hive
>
> On using SQL IN ... what would happen if you created a short table with the
> enteries in the IN clause and used a "inner join" ?
>
> -Ajo
>
> On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks <be...@yahoo.com> wrote:
>
>> Thanks Jov for the quick response
>>
>> Could you please let me know which is the latest stable version of hive.
>> Also how would you find out your hive version from command line?
>>
>> Regarding the SQL IN  I'm also currently using multiple '=' in my jobs,
>> but still wanted to know whether there would be some better usage for the
>> same apart from this.
>>
>>
>> Regards
>> Bejoy KS
>>
>>
>>
>> ------------------------------
>> *From:* Jov <zh...@gmail.com>
>> *To:* user@hive.apache.org
>> *Sent:* Mon, February 21, 2011 9:09:34 PM
>> *Subject:* Re: Database/Schema , INTERVAL and SQL IN usages in Hive
>>
>>
>> 在 2011-2-21 下午10:54,"Bejoy Ks" <be...@yahoo.com>写道:
>> >
>> > Hi Experts
>> >      I'm using hive for a few projects and i found it a great tool in
>> hadoop to process end to end structured data. Unfortunately I'm facing a few
>> challenges out here as follows
>> >
>> > Availability of database/schemas in Hive
>> > I'm having multiple projects running in hive each having fairly large
>> number of tables. With this much tables all together it is  looking a bit
>> messed up. Is there any option of creating database/schema in Hive so that I
>> can maintain the tables in different databases/schemas corresponding to each
>> project.
>>
>> it seems the resent version has already support database ddl,so,you can
>> use create database.
>>
>> > Using INTERVAL
>> >     I need to replicate a job running in Teradata edw into hive, i'm
>> facing a challenge out here.Not able to identify a similar usage
>> corresponding to Interval in teradata within hive. Here is the snippet where
>> I'm facing the issue
>> >  *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime +
>> INTERVAL '05' HOUR)
>> > In this query how do i replicate the last part in hive ie
>> (r1.rc_datetime + INTERVAL '05' HOUR) , where it is adding 5 hours to the
>> obtained time stamp rc_datetime.
>> > *The where condition is part of a very large query involving multiple
>> table joins.
>>
>> hive do not have date or timestamp data type,all such type is string,but
>> you can write your udf to implement similar function
>>
>> >
>> > Using IN
>> >     How do we replicate the SQL IN function in hive
>> > ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN (
>> 'Production', 'Stage' , 'QA', 'Development')
>> > the last part of the query is where i'm facing the challenge
>> r1.PROCCESS_PHASE IN ( 'Production', 'Stage' , 'QA', 'Development')
>> > *The where condition is part of a very large query involving multiple
>> table joins.
>>
>> you can use or,e.g.
>>
>> 'x in(1,2)' can be 'x=1 or x=2'
>>
>> > Please advise.
>> >
>> > Regards
>> > Bejoy KS
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>>
>>
>
>

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

Posted by Bejoy Ks <be...@yahoo.com>.
Ajo,
    If we have a good number of elements in the comparison set then going for a 
table would be beneficial. But in case of a few elements say 5 wont multiple '=' 
be better?

Regards 
Bejoy KS




________________________________
From: Ajo Fod <aj...@gmail.com>
To: user@hive.apache.org
Sent: Mon, February 21, 2011 10:04:41 PM
Subject: Re: Database/Schema , INTERVAL and SQL IN usages in Hive

On using SQL IN ... what would happen if you created a short table with the 
enteries in the IN clause and used a "inner join" ?

-Ajo


On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks <be...@yahoo.com> wrote:

Thanks Jov for the quick response
>
>Could you please let me know which is the latest stable version of hive. Also 
>how would you find out your hive version from command line?
>
>Regarding the SQL IN  I'm also currently using multiple '=' in my jobs, but 
>still wanted to know whether there would be some better usage for the same apart 
>from this. 
>
>
>
>Regards
>Bejoy KS
>
>
>
>
>
>
>
________________________________
 From: Jov <zh...@gmail.com>
>To: user@hive.apache.org
>Sent: Mon, February 21, 2011 9:09:34 PM
>Subject: Re: Database/Schema , INTERVAL and SQL IN usages in Hive
>
>
>
>
>在 2011-2-21 下午10:54,"Bejoy Ks" <be...@yahoo.com>写道:
>>
>> Hi Experts
>>      I'm using hive for a few projects and i found it a great tool in hadoop to 
>>process end to end structured data. Unfortunately I'm facing a few challenges 
>>out here as follows
>>
>> Availability of database/schemas in Hive
>> I'm having multiple projects running in hive each having fairly large number of 
>>tables. With this much tables all together it is  looking a bit  messed up. Is 
>>there any option of creating database/schema in Hive so that I can maintain the 
>>tables in different databases/schemas corresponding to each project.
>it seems the resent version has already support database ddl,so,you can use 
>create database. 
>
>> Using INTERVAL 
>>     I need to replicate a job running in Teradata edw into hive, i'm facing a 
>>challenge out here.Not able to identify a similar usage corresponding to 
>>Interval in teradata within hive. Here is the snippet where I'm facing the issue
>>  *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime + 
>>INTERVAL '05' HOUR)
>> In this query how do i replicate the last part in hive ie (r1.rc_datetime + 
>>INTERVAL '05' HOUR) , where it is adding 5 hours to the obtained time stamp 
>>rc_datetime.
>> *The where condition is part of a very large query involving multiple table 
>>joins.
>hive do not have date or timestamp data type,all such type is string,but you can 
>write your udf to implement similar function 
>
>>
>> Using IN 
>>     How do we replicate the SQL IN function in hive
>> ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN ( 'Production', 
>>'Stage' , 'QA', 'Development')
>> the last part of the query is where i'm facing the challenge r1.PROCCESS_PHASE 
>>IN ( 'Production', 'Stage' , 'QA', 'Development')
>> *The where condition is part of a very large query involving multiple table 
>>joins.
>you can use or,e.g.
>'x in(1,2)' can be 'x=1 or x=2'
>> Please advise.
>>
>> Regards
>> Bejoy KS
>>
>>
>>
>>
>>
>>
>>
>
>



      

Re: Database/Schema , INTERVAL and SQL IN usages in Hive

Posted by Ajo Fod <aj...@gmail.com>.
On using SQL IN ... what would happen if you created a short table with the
enteries in the IN clause and used a "inner join" ?

-Ajo

On Mon, Feb 21, 2011 at 7:57 AM, Bejoy Ks <be...@yahoo.com> wrote:

> Thanks Jov for the quick response
>
> Could you please let me know which is the latest stable version of hive.
> Also how would you find out your hive version from command line?
>
> Regarding the SQL IN  I'm also currently using multiple '=' in my jobs, but
> still wanted to know whether there would be some better usage for the same
> apart from this.
>
>
> Regards
> Bejoy KS
>
>
>
> ------------------------------
> *From:* Jov <zh...@gmail.com>
> *To:* user@hive.apache.org
> *Sent:* Mon, February 21, 2011 9:09:34 PM
> *Subject:* Re: Database/Schema , INTERVAL and SQL IN usages in Hive
>
>
> 在 2011-2-21 下午10:54,"Bejoy Ks" <be...@yahoo.com>写道:
> >
> > Hi Experts
> >      I'm using hive for a few projects and i found it a great tool in
> hadoop to process end to end structured data. Unfortunately I'm facing a few
> challenges out here as follows
> >
> > Availability of database/schemas in Hive
> > I'm having multiple projects running in hive each having fairly large
> number of tables. With this much tables all together it is  looking a bit
> messed up. Is there any option of creating database/schema in Hive so that I
> can maintain the tables in different databases/schemas corresponding to each
> project.
>
> it seems the resent version has already support database ddl,so,you can use
> create database.
>
> > Using INTERVAL
> >     I need to replicate a job running in Teradata edw into hive, i'm
> facing a challenge out here.Not able to identify a similar usage
> corresponding to Interval in teradata within hive. Here is the snippet where
> I'm facing the issue
> >  *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime +
> INTERVAL '05' HOUR)
> > In this query how do i replicate the last part in hive ie (r1.rc_datetime
> + INTERVAL '05' HOUR) , where it is adding 5 hours to the obtained time
> stamp rc_datetime.
> > *The where condition is part of a very large query involving multiple
> table joins.
>
> hive do not have date or timestamp data type,all such type is string,but
> you can write your udf to implement similar function
>
> >
> > Using IN
> >     How do we replicate the SQL IN function in hive
> > ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN (
> 'Production', 'Stage' , 'QA', 'Development')
> > the last part of the query is where i'm facing the challenge
> r1.PROCCESS_PHASE IN ( 'Production', 'Stage' , 'QA', 'Development')
> > *The where condition is part of a very large query involving multiple
> table joins.
>
> you can use or,e.g.
>
> 'x in(1,2)' can be 'x=1 or x=2'
>
> > Please advise.
> >
> > Regards
> > Bejoy KS
> >
> >
> >
> >
> >
> >
> >
>
>

Re: Extract Create Table statement from Hive

Posted by Jay Ramadorai <jr...@tripadvisor.com>.
Thank you, Ed. Works like a charm after I remove the Hive2rdbms references. I've uploaded the jar to the JIRA for those who want to use it..... 

On Feb 22, 2011, at 1:13 PM, Edward Capriolo wrote:

> On Tue, Feb 22, 2011 at 1:09 PM, Jay Ramadorai
> <jr...@tripadvisor.com> wrote:
>> Thank you, Ed. Trying it now, but I need the following package to build
>> HiveUtil:
>> 
>> com.media6.hive2rdbms.common.Hive2RdbmsConf;
>> 
>> can you point me to where I can get it from?
>> 
>> On Feb 22, 2011, at 10:51 AM, Edward Capriolo wrote:
>> 
>> On Mon, Feb 21, 2011 at 7:31 PM, Edward Capriolo <ed...@gmail.com>
>> wrote:
>> 
>> On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai
>> 
>> <jr...@tripadvisor.com> wrote:
>> 
>> Does anyone have a way of generating the create table statement for a table
>> 
>> that is in Hive?  I see a jira for
>> 
>> this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed
>> 
>> Capriolo might have a solution for this. Ed, are you able to share this
>> 
>> solution?
>> 
>> My goal is to copy a bunch of Hive tables from one cluster to another. What
>> 
>> I'd like to do is generate the DDL from one cluster and run it on the other,
>> 
>> and then use DISTCP with lots of mappers to parallel-copy the table-data
>> 
>> between the clusters. FYI, my metadata is in Derby. I could backup and
>> 
>> restore the metadata directories but I dont want a copy of all the tables
>> 
>> from the source database, only a subset.
>> 
>> I tried Sqooping the table out of Hive, but it runs into :
>> 
>> 11/02/21 18:23:50 ERROR manager.SqlManager: Error executing statement:
>> 
>> java.sql.SQLException: Method not supported
>> 
>> 11/02/21 18:23:50 ERROR sqoop.Sqoop: Got exception running Sqoop:
>> 
>> java.lang.NullPointerException
>> 
>> java.lang.NullPointerException
>> 
>> at
>> 
>> com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:133)
>> 
>> Apparently the Hive Thrift jdbc driver doesnt support extracting table
>> 
>> metadata.
>> 
>> Please feel free to suggest alternative approaches.
>> 
>> thanks
>> 
>> Jay
>> 
>> 
>> I do have some code that can generate DML. It is a separate problem
>> 
>> that talks to the metastore (outside of the hive cli). I will throw
>> 
>> the code up in the Jira tomorrow. However the plan for the Jira is to
>> 
>> incorporate this in hive with a show create table.
>> 
>> 
>> Jay,
>> 
>> I uploaded my "show create table" program to
>> https://issues.apache.org/jira/browse/HIVE-967.
>> 
>> Caveats: This is a program you need to build against hive lib files.
>> Caveats: There are edge cases where this will not produce the correct
>> show table statement (hbase storage handler for example)
>> Caveats: Does actually work though !
>> 
>> Enjoy
>> 
>> 
> 
> That is just an artifact from the code i ripped it out of. From a
> quick scan you should be able to remove those references. Contact me
> off list if you are having issues.


Re: Extract Create Table statement from Hive

Posted by Edward Capriolo <ed...@gmail.com>.
On Tue, Feb 22, 2011 at 1:09 PM, Jay Ramadorai
<jr...@tripadvisor.com> wrote:
> Thank you, Ed. Trying it now, but I need the following package to build
> HiveUtil:
>
> com.media6.hive2rdbms.common.Hive2RdbmsConf;
>
> can you point me to where I can get it from?
>
> On Feb 22, 2011, at 10:51 AM, Edward Capriolo wrote:
>
> On Mon, Feb 21, 2011 at 7:31 PM, Edward Capriolo <ed...@gmail.com>
> wrote:
>
> On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai
>
> <jr...@tripadvisor.com> wrote:
>
> Does anyone have a way of generating the create table statement for a table
>
> that is in Hive?  I see a jira for
>
> this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed
>
> Capriolo might have a solution for this. Ed, are you able to share this
>
> solution?
>
> My goal is to copy a bunch of Hive tables from one cluster to another. What
>
> I'd like to do is generate the DDL from one cluster and run it on the other,
>
> and then use DISTCP with lots of mappers to parallel-copy the table-data
>
> between the clusters. FYI, my metadata is in Derby. I could backup and
>
> restore the metadata directories but I dont want a copy of all the tables
>
> from the source database, only a subset.
>
> I tried Sqooping the table out of Hive, but it runs into :
>
> 11/02/21 18:23:50 ERROR manager.SqlManager: Error executing statement:
>
> java.sql.SQLException: Method not supported
>
> 11/02/21 18:23:50 ERROR sqoop.Sqoop: Got exception running Sqoop:
>
> java.lang.NullPointerException
>
> java.lang.NullPointerException
>
> at
>
> com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:133)
>
> Apparently the Hive Thrift jdbc driver doesnt support extracting table
>
> metadata.
>
> Please feel free to suggest alternative approaches.
>
> thanks
>
> Jay
>
>
> I do have some code that can generate DML. It is a separate problem
>
> that talks to the metastore (outside of the hive cli). I will throw
>
> the code up in the Jira tomorrow. However the plan for the Jira is to
>
> incorporate this in hive with a show create table.
>
>
> Jay,
>
> I uploaded my "show create table" program to
> https://issues.apache.org/jira/browse/HIVE-967.
>
> Caveats: This is a program you need to build against hive lib files.
> Caveats: There are edge cases where this will not produce the correct
> show table statement (hbase storage handler for example)
> Caveats: Does actually work though !
>
> Enjoy
>
>

That is just an artifact from the code i ripped it out of. From a
quick scan you should be able to remove those references. Contact me
off list if you are having issues.

Re: Extract Create Table statement from Hive

Posted by Jay Ramadorai <jr...@tripadvisor.com>.
Thank you, Ed. Trying it now, but I need the following package to build HiveUtil:
com.media6.hive2rdbms.common.Hive2RdbmsConf;
can you point me to where I can get it from?


On Feb 22, 2011, at 10:51 AM, Edward Capriolo wrote:

> On Mon, Feb 21, 2011 at 7:31 PM, Edward Capriolo <ed...@gmail.com> wrote:
>> On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai
>> <jr...@tripadvisor.com> wrote:
>>> Does anyone have a way of generating the create table statement for a table
>>> that is in Hive?  I see a jira for
>>> this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed
>>> Capriolo might have a solution for this. Ed, are you able to share this
>>> solution?
>>> My goal is to copy a bunch of Hive tables from one cluster to another. What
>>> I'd like to do is generate the DDL from one cluster and run it on the other,
>>> and then use DISTCP with lots of mappers to parallel-copy the table-data
>>> between the clusters. FYI, my metadata is in Derby. I could backup and
>>> restore the metadata directories but I dont want a copy of all the tables
>>> from the source database, only a subset.
>>> I tried Sqooping the table out of Hive, but it runs into :
>>> 11/02/21 18:23:50 ERROR manager.SqlManager: Error executing statement:
>>> java.sql.SQLException: Method not supported
>>> 11/02/21 18:23:50 ERROR sqoop.Sqoop: Got exception running Sqoop:
>>> java.lang.NullPointerException
>>> java.lang.NullPointerException
>>> at
>>> com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:133)
>>> Apparently the Hive Thrift jdbc driver doesnt support extracting table
>>> metadata.
>>> Please feel free to suggest alternative approaches.
>>> thanks
>>> Jay
>>> 
>>> 
>> I do have some code that can generate DML. It is a separate problem
>> that talks to the metastore (outside of the hive cli). I will throw
>> the code up in the Jira tomorrow. However the plan for the Jira is to
>> incorporate this in hive with a show create table.
>> 
> 
> Jay,
> 
> I uploaded my "show create table" program to
> https://issues.apache.org/jira/browse/HIVE-967.
> 
> Caveats: This is a program you need to build against hive lib files.
> Caveats: There are edge cases where this will not produce the correct
> show table statement (hbase storage handler for example)
> Caveats: Does actually work though !
> 
> Enjoy


Re: Extract Create Table statement from Hive

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Feb 21, 2011 at 7:31 PM, Edward Capriolo <ed...@gmail.com> wrote:
> On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai
> <jr...@tripadvisor.com> wrote:
>> Does anyone have a way of generating the create table statement for a table
>> that is in Hive?  I see a jira for
>> this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed
>> Capriolo might have a solution for this. Ed, are you able to share this
>> solution?
>> My goal is to copy a bunch of Hive tables from one cluster to another. What
>> I'd like to do is generate the DDL from one cluster and run it on the other,
>> and then use DISTCP with lots of mappers to parallel-copy the table-data
>> between the clusters. FYI, my metadata is in Derby. I could backup and
>> restore the metadata directories but I dont want a copy of all the tables
>> from the source database, only a subset.
>> I tried Sqooping the table out of Hive, but it runs into :
>> 11/02/21 18:23:50 ERROR manager.SqlManager: Error executing statement:
>> java.sql.SQLException: Method not supported
>> 11/02/21 18:23:50 ERROR sqoop.Sqoop: Got exception running Sqoop:
>> java.lang.NullPointerException
>> java.lang.NullPointerException
>> at
>> com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:133)
>> Apparently the Hive Thrift jdbc driver doesnt support extracting table
>> metadata.
>> Please feel free to suggest alternative approaches.
>> thanks
>> Jay
>>
>>
> I do have some code that can generate DML. It is a separate problem
> that talks to the metastore (outside of the hive cli). I will throw
> the code up in the Jira tomorrow. However the plan for the Jira is to
> incorporate this in hive with a show create table.
>

Jay,

I uploaded my "show create table" program to
https://issues.apache.org/jira/browse/HIVE-967.

Caveats: This is a program you need to build against hive lib files.
Caveats: There are edge cases where this will not produce the correct
show table statement (hbase storage handler for example)
Caveats: Does actually work though !

Enjoy

Re: Extract Create Table statement from Hive

Posted by Edward Capriolo <ed...@gmail.com>.
On Mon, Feb 21, 2011 at 6:42 PM, Jay Ramadorai
<jr...@tripadvisor.com> wrote:
> Does anyone have a way of generating the create table statement for a table
> that is in Hive?  I see a jira for
> this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed
> Capriolo might have a solution for this. Ed, are you able to share this
> solution?
> My goal is to copy a bunch of Hive tables from one cluster to another. What
> I'd like to do is generate the DDL from one cluster and run it on the other,
> and then use DISTCP with lots of mappers to parallel-copy the table-data
> between the clusters. FYI, my metadata is in Derby. I could backup and
> restore the metadata directories but I dont want a copy of all the tables
> from the source database, only a subset.
> I tried Sqooping the table out of Hive, but it runs into :
> 11/02/21 18:23:50 ERROR manager.SqlManager: Error executing statement:
> java.sql.SQLException: Method not supported
> 11/02/21 18:23:50 ERROR sqoop.Sqoop: Got exception running Sqoop:
> java.lang.NullPointerException
> java.lang.NullPointerException
> at
> com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:133)
> Apparently the Hive Thrift jdbc driver doesnt support extracting table
> metadata.
> Please feel free to suggest alternative approaches.
> thanks
> Jay
>
>
I do have some code that can generate DML. It is a separate problem
that talks to the metastore (outside of the hive cli). I will throw
the code up in the Jira tomorrow. However the plan for the Jira is to
incorporate this in hive with a show create table.

Extract Create Table statement from Hive

Posted by Jay Ramadorai <jr...@tripadvisor.com>.
Does anyone have a way of generating the create table statement for a table that is in Hive?  I see a jira for this https://issues.apache.org/jira/browse/HIVE-967 and it appears that Ed Capriolo might have a solution for this. Ed, are you able to share this solution? 

My goal is to copy a bunch of Hive tables from one cluster to another. What I'd like to do is generate the DDL from one cluster and run it on the other, and then use DISTCP with lots of mappers to parallel-copy the table-data between the clusters. FYI, my metadata is in Derby. I could backup and restore the metadata directories but I dont want a copy of all the tables from the source database, only a subset. 

I tried Sqooping the table out of Hive, but it runs into :
11/02/21 18:23:50 ERROR manager.SqlManager: Error executing statement: java.sql.SQLException: Method not supported
11/02/21 18:23:50 ERROR sqoop.Sqoop: Got exception running Sqoop: java.lang.NullPointerException
java.lang.NullPointerException
	at com.cloudera.sqoop.hive.TableDefWriter.getCreateTableStmt(TableDefWriter.java:133)

Apparently the Hive Thrift jdbc driver doesnt support extracting table metadata.

Please feel free to suggest alternative approaches. 

thanks
Jay


Re: Database/Schema , INTERVAL and SQL IN usages in Hive

Posted by Bejoy Ks <be...@yahoo.com>.
Thanks Jov for the quick response

Could you please let me know which is the latest stable version of hive. Also 
how would you find out your hive version from command line?

Regarding the SQL IN  I'm also currently using multiple '=' in my jobs, but 
still wanted to know whether there would be some better usage for the same apart 
from this. 



Regards
Bejoy KS






________________________________
From: Jov <zh...@gmail.com>
To: user@hive.apache.org
Sent: Mon, February 21, 2011 9:09:34 PM
Subject: Re: Database/Schema , INTERVAL and SQL IN usages in Hive



在 2011-2-21 下午10:54,"Bejoy Ks" <be...@yahoo.com>写道:
>
> Hi Experts
>      I'm using hive for a few projects and i found it a great tool in hadoop to 
>process end to end structured data. Unfortunately I'm facing a few challenges 
>out here as follows
>
> Availability of database/schemas in Hive
> I'm having multiple projects running in hive each having fairly large number of 
>tables. With this much tables all together it is  looking a bit  messed up. Is 
>there any option of creating database/schema in Hive so that I can maintain the 
>tables in different databases/schemas corresponding to each project.
it seems the resent version has already support database ddl,so,you can use 
create database. 

> Using INTERVAL 
>     I need to replicate a job running in Teradata edw into hive, i'm facing a 
>challenge out here.Not able to identify a similar usage corresponding to 
>Interval in teradata within hive. Here is the snippet where I'm facing the issue
>  *** where 1.seq_id = r4.seq_id and r4.mc_datetime >= (r1.rc_datetime + 
>INTERVAL '05' HOUR)
> In this query how do i replicate the last part in hive ie (r1.rc_datetime + 
>INTERVAL '05' HOUR) , where it is adding 5 hours to the obtained time stamp 
>rc_datetime.
> *The where condition is part of a very large query involving multiple table 
>joins.
hive do not have date or timestamp data type,all such type is string,but you can 
write your udf to implement similar function 

>
> Using IN 
>     How do we replicate the SQL IN function in hive
> ie *** where R1.seq_id = r4.seq_id and r1.PROCCESS_PHASE IN ( 'Production', 
>'Stage' , 'QA', 'Development')
> the last part of the query is where i'm facing the challenge r1.PROCCESS_PHASE 
>IN ( 'Production', 'Stage' , 'QA', 'Development')
> *The where condition is part of a very large query involving multiple table 
>joins.
you can use or,e.g.
'x in(1,2)' can be 'x=1 or x=2'
> Please advise.
>
> Regards
> Bejoy KS
>
>
>
>
>
>
>