You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sean Colgan <Se...@nasdaqomx.com> on 2013/12/21 00:01:17 UTC
Automatic creation of Hive output table based on SELECT statement
Hi all,
I'm looking for a way to automatically generate an output table schema based on the input and query. Basically, a given input schema and SELECT statement produce a deterministic result schema. And Hive will complain if a given output table does not have a matching schema. Like so:
FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into target table because column number/types are different 'test_out': Table insclause-0 has 1 columns, but query has 3 columns.
I am developing a process that will accept input data with an arbitrary known schema, and an HQL SELECT statement to query that data. I'm looking for a way to automatically generate the correct output table schema. I'm starting to poke through the code in SemanticAnalyzer.java (which generates the above error string) to see if I can use something in there. But I certainly welcome any thoughts or suggestions on this topic.
Thanks!
Sean
--
sean.colgan@nasdaqomx.com<ma...@nasdaqomx.com>
www.nasdaqomx.com/finqloud<http://www.nasdaqomx.com/finqloud>
Re: Automatic creation of Hive output table based on SELECT statement
Posted by bharath vissapragada <bh...@gmail.com>.
I had the same problem sometime back and I found a dirty workaround that
actually works.
1) create view foo as <select statement> # this automatically creates foo
view with select output schema, this can contain joins, sel conditions,
group/count operators etc.
2) create table bar like foo # creates bar with schema like foo
3) drop view foo # you don't need the view anymore, so jus drop it
I just used these steps to automate selects in a script. Maybe you can find
it useful too though it doesn't make any sense.
As already said, I just used it for automating a few select jobs on my
test cluster, if you intend to use it in production, I would suggest to
look into the code of how "view" creation works as it has this logic and
On Mon, Dec 23, 2013 at 10:12 PM, Sean Colgan <Se...@nasdaqomx.com>wrote:
> Very interesting. At least for the time being, I think this may be our
> way forward. Too bad the create-as-select statement wont work for
> external tables. Because that would really be our solution.
>
> Thanks for the help! :)
>
> Sean
>
> *Sean Colgan*
> NASDAQ OMX® | FinQloud
>
> Direct: +1 720 889 5169
> Mobile: +1 720 775 7390
> sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
> www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
>
>
>
> On Fri 20 Dec 2013 04:54:15 PM MST, Stephen Sprague wrote:
> > its a not via an API but what if you experimented with "create table
> > XXX as select ..... where 1=0" followed by "desc XXX".
> >
> > Maybe something like that can tide you over until a more programmatic
> > way comes to light?
> >
> >
> > On Fri, Dec 20, 2013 at 3:01 PM, Sean Colgan
> > <Sean.Colgan@nasdaqomx.com <ma...@nasdaqomx.com>> wrote:
> >
> > Hi all,
> > I'm looking for a way to automatically generate an output table
> > schema based on the input and query. Basically, a given input
> > schema and SELECT statement produce a deterministic result
> > schema. And Hive will complain if a given output table does not
> > have a matching schema. Like so:
> >
> > FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert
> > into target table because column number/types are different
> > 'test_out': Table insclause-0 has 1 columns, but query has 3 columns.
> >
> > I am developing a process that will accept input data with an
> > arbitrary known schema, and an HQL SELECT statement to query that
> > data. I'm looking for a way to automatically generate the correct
> > output table schema. I'm starting to poke through the code in
> > SemanticAnalyzer.java (which generates the above error string) to
> > see if I can use something in there. But I certainly welcome any
> > thoughts or suggestions on this topic.
> >
> > Thanks!
> > Sean
> >
> > --
> >
> > sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
> > www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
> >
> >
Re: Automatic creation of Hive output table based on SELECT
statement
Posted by Sean Colgan <Se...@nasdaqomx.com>.
Very interesting. At least for the time being, I think this may be our
way forward. Too bad the create-as-select statement wont work for
external tables. Because that would really be our solution.
Thanks for the help! :)
Sean
*Sean Colgan*
NASDAQ OMX® | FinQloud
Direct: +1 720 889 5169
Mobile: +1 720 775 7390
sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
On Fri 20 Dec 2013 04:54:15 PM MST, Stephen Sprague wrote:
> its a not via an API but what if you experimented with "create table
> XXX as select ..... where 1=0" followed by "desc XXX".
>
> Maybe something like that can tide you over until a more programmatic
> way comes to light?
>
>
> On Fri, Dec 20, 2013 at 3:01 PM, Sean Colgan
> <Sean.Colgan@nasdaqomx.com <ma...@nasdaqomx.com>> wrote:
>
> Hi all,
> I'm looking for a way to automatically generate an output table
> schema based on the input and query. Basically, a given input
> schema and SELECT statement produce a deterministic result
> schema. And Hive will complain if a given output table does not
> have a matching schema. Like so:
>
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert
> into target table because column number/types are different
> 'test_out': Table insclause-0 has 1 columns, but query has 3 columns.
>
> I am developing a process that will accept input data with an
> arbitrary known schema, and an HQL SELECT statement to query that
> data. I'm looking for a way to automatically generate the correct
> output table schema. I'm starting to poke through the code in
> SemanticAnalyzer.java (which generates the above error string) to
> see if I can use something in there. But I certainly welcome any
> thoughts or suggestions on this topic.
>
> Thanks!
> Sean
>
> --
>
> sean.colgan@nasdaqomx.com <ma...@nasdaqomx.com>
> www.nasdaqomx.com/finqloud <http://www.nasdaqomx.com/finqloud>
>
>
Re: Automatic creation of Hive output table based on SELECT statement
Posted by Stephen Sprague <sp...@gmail.com>.
its a not via an API but what if you experimented with "create table XXX as
select ..... where 1=0" followed by "desc XXX".
Maybe something like that can tide you over until a more programmatic way
comes to light?
On Fri, Dec 20, 2013 at 3:01 PM, Sean Colgan <Se...@nasdaqomx.com>wrote:
> Hi all,
> I'm looking for a way to automatically generate an output table schema
> based on the input and query. Basically, a given input schema and SELECT
> statement produce a deterministic result schema. And Hive will complain if
> a given output table does not have a matching schema. Like so:
>
> FAILED: SemanticException [Error 10044]: Line 3:23 Cannot insert into
> target table because column number/types are different 'test_out': Table
> insclause-0 has 1 columns, but query has 3 columns.
>
> I am developing a process that will accept input data with an arbitrary
> known schema, and an HQL SELECT statement to query that data. I'm looking
> for a way to automatically generate the correct output table schema. I'm
> starting to poke through the code in SemanticAnalyzer.java (which generates
> the above error string) to see if I can use something in there. But I
> certainly welcome any thoughts or suggestions on this topic.
>
> Thanks!
> Sean
>
> --
>
> sean.colgan@nasdaqomx.com
> www.nasdaqomx.com/finqloud
>