You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sean Colgan <Se...@nasdaqomx.com> on 2013/12/21 00:01:17 UTC

Automatic creation of Hive output table based on SELECT statement

Hi all,
I'm looking for a way to automatically generate an output table schema based on the input and query.  Basically, a given input schema and SELECT statement produce a deterministic result schema.  And Hive will complain if a given output table does not have a matching schema.  Like so:

FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'test_out': Table insclause-0 has 1 columns, but query has 3 columns.

I am developing a process that will accept input data with an arbitrary known schema, and an HQL SELECT statement to query that data.  I'm looking for a way to automatically generate the correct output table schema.  I'm starting to poke through the code in SemanticAnalyzer.java (which generates the above error string) to see if I can use something in there.  But I certainly welcome any thoughts or suggestions on this topic.

Thanks!
Sean

--

sean.colgan@nasdaqomx.com<ma...@nasdaqomx.com>
www.nasdaqomx.com/finqloud<http://www.nasdaqomx.com/finqloud>

Re: Automatic creation of Hive output table based on SELECT statement

Posted by bharath vissapragada <bh...@gmail.com>.
I had the same problem sometime back and I found a dirty workaround that
actually works.

1) create view foo as <select statement> # this automatically creates foo
view with select output schema, this can contain joins, sel conditions,
group/count operators etc.
2) create table bar like foo # creates bar with schema like foo
3) drop view foo # you don't need the view anymore, so jus drop it

I just used these steps to automate selects in a script. Maybe you can find
it useful too though it doesn't make any sense.

 As already said, I just used it for automating a few select jobs on my
test cluster, if you intend to use it in production, I would suggest to
look into the code of how "view" creation works as it has this logic and


On Mon, Dec 23, 2013 at 10:12 PM, Sean Colgan <Se...@nasdaqomx.com>wrote:

> Very interesting.  At least for the time being, I think this may be our
> way forward.  Too bad the create-as-select statement wont work for
> external tables.  Because that would really be our solution.
>
> Thanks for the help! :)
>
> Sean
>
> *Sean Colgan*
> NASDAQ OMX® | FinQloud
>
> Direct: +1 720 889 5169
> Mobile: +1 720 775 7390
> sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
> www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
>
>
>
> On Fri 20 Dec 2013 04:54:15 PM MST, Stephen Sprague wrote:
> > its a not via an API but what if you experimented with "create table
> > XXX as select ..... where 1=0" followed by "desc XXX".
> >
> > Maybe something like that can tide you over until a more programmatic
> > way comes to light?
> >
> >
> > On Fri, Dec 20, 2013 at 3:01 PM, Sean Colgan
> > <Sean.Colgan@nasdaqomx.com <ma...@nasdaqomx.com>> wrote:
> >
> >     Hi all,
> >     I'm looking for a way to automatically generate an output table
> >     schema based on the input and query.  Basically, a given input
> >     schema and SELECT statement produce a deterministic result
> >     schema.  And Hive will complain if a given output table does not
> >     have a matching schema.  Like so:
> >
> >     FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert
> >     into target table because column number/types are different
> >     'test_out': Table insclause-0 has 1 columns, but query has 3 columns.
> >
> >     I am developing a process that will accept input data with an
> >     arbitrary known schema, and an HQL SELECT statement to query that
> >     data.  I'm looking for a way to automatically generate the correct
> >     output table schema.  I'm starting to poke through the code in
> >     SemanticAnalyzer.java (which generates the above error string) to
> >     see if I can use something in there.  But I certainly welcome any
> >     thoughts or suggestions on this topic.
> >
> >     Thanks!
> >     Sean
> >
> >     --
> >
> >     sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
> >     www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
> >
> >

Re: Automatic creation of Hive output table based on SELECT statement

Posted by Sean Colgan <Se...@nasdaqomx.com>.
Very interesting.  At least for the time being, I think this may be our 
way forward.  Too bad the create-as-select statement wont work for 
external tables.  Because that would really be our solution.

Thanks for the help! :)

Sean

*Sean Colgan*
NASDAQ OMX® | FinQloud

Direct: +1 720 889 5169
Mobile: +1 720 775 7390
sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>



On Fri 20 Dec 2013 04:54:15 PM MST, Stephen Sprague wrote:
> its a not via an API but what if you experimented with "create table
> XXX as select ..... where 1=0" followed by "desc XXX".
>
> Maybe something like that can tide you over until a more programmatic
> way comes to light?
>
>
> On Fri, Dec 20, 2013 at 3:01 PM, Sean Colgan
> <Sean.Colgan@nasdaqomx.com <ma...@nasdaqomx.com>> wrote:
>
>     Hi all,
>     I'm looking for a way to automatically generate an output table
>     schema based on the input and query.  Basically, a given input
>     schema and SELECT statement produce a deterministic result
>     schema.  And Hive will complain if a given output table does not
>     have a matching schema.  Like so:
>
>     FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert
>     into target table because column number/types are different
>     'test_out': Table insclause-0 has 1 columns, but query has 3 columns.
>
>     I am developing a process that will accept input data with an
>     arbitrary known schema, and an HQL SELECT statement to query that
>     data.  I'm looking for a way to automatically generate the correct
>     output table schema.  I'm starting to poke through the code in
>     SemanticAnalyzer.java (which generates the above error string) to
>     see if I can use something in there.  But I certainly welcome any
>     thoughts or suggestions on this topic.
>
>     Thanks!
>     Sean
>
>     --
>
>     sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
>     www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
>
>

Re: Automatic creation of Hive output table based on SELECT statement

Posted by Stephen Sprague <sp...@gmail.com>.
its a not via an API but what if you experimented with "create table XXX as
select ..... where 1=0" followed by "desc XXX".

Maybe something like that can tide you over until a more programmatic way
comes to light?


On Fri, Dec 20, 2013 at 3:01 PM, Sean Colgan <Se...@nasdaqomx.com>wrote:

>  Hi all,
> I'm looking for a way to automatically generate an output table schema
> based on the input and query.  Basically, a given input schema and SELECT
> statement produce a deterministic result schema.  And Hive will complain if
> a given output table does not have a matching schema.  Like so:
>
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into
> target table because column number/types are different 'test_out': Table
> insclause-0 has 1 columns, but query has 3 columns.
>
> I am developing a process that will accept input data with an arbitrary
> known schema, and an HQL SELECT statement to query that data.  I'm looking
> for a way to automatically generate the correct output table schema.  I'm
> starting to poke through the code in SemanticAnalyzer.java (which generates
> the above error string) to see if I can use something in there.  But I
> certainly welcome any thoughts or suggestions on this topic.
>
> Thanks!
> Sean
>
> --
>
> sean.colgan@nasdaqomx.com
> www.nasdaqomx.com/finqloud
>