You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Johannes Stamminger <jo...@airbus.com> on 2016/08/03 11:53:50 UTC

Create table from orc file

Hi,


is it possible to write data to an orc file(s) using the hive-orc api and to 
use such by hive (create a table from it)?


Regards
This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure. If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete this message and any attachments from your system. Astrium and Airbus Group companies disclaim any and all liability if this email transmission was virus corrupted, altered or falsified.
---------------------------------------------------------
Airbus DS GmbH 
Vorsitzender des Aufsichtsrates: Bernhard Gerwert 
Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr. Michael Menking, Dr. Johannes von Thadden 
Sitz der Gesellschaft: München - Registergericht: Amtsgericht München, HRB Nr. 107 647 
Ust. Ident. Nr. /VAT reg. no. DE167015356

Re: Create table from orc file

Posted by Marcin Tustin <mt...@handybook.com>.
Yes. Create an external table whose location contains only the orc file(s)
you want to include in the table.

On Wed, Aug 3, 2016 at 7:53 AM, Johannes Stamminger <
johannes.stamminger@airbus.com> wrote:

> Hi,
>
>
> is it possible to write data to an orc file(s) using the hive-orc api and
> to
> use such by hive (create a table from it)?
>
>
> Regards
> This email (including any attachments) may contain confidential and/or
> privileged information or information otherwise protected from disclosure.
> If you are not the intended recipient, please notify the sender
> immediately, do not copy this message or any attachments and do not use it
> for any purpose or disclose its content to any person, but delete this
> message and any attachments from your system. Astrium and Airbus Group
> companies disclaim any and all liability if this email transmission was
> virus corrupted, altered or falsified.
> ---------------------------------------------------------
> Airbus DS GmbH
> Vorsitzender des Aufsichtsrates: Bernhard Gerwert
> Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr.
> Michael Menking, Dr. Johannes von Thadden
> Sitz der Gesellschaft: München - Registergericht: Amtsgericht München, HRB
> Nr. 107 647
> Ust. Ident. Nr. /VAT reg. no. DE167015356

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity


Re: Create table from orc file

Posted by Johannes Stamminger <jo...@airbus.com>.
Some progress: I could eliminate the error reported in a): the data file needs 
to be named 000000_0 and must be placed in a the directory denoted by the 
location given at table creation. This is what the error message is about? ;-)


Now the situation for a) is the same as for b):

Trying to fetch data by

select * from CFA1_Fan_Speed_DMC limit 1;

results in 

Error: java.io.IOException: java.io.IOException: ORC does not support type 
conversion from file type timestamp (1) to reader type 
struct<normalizedTime:timestamp,RAW:bigint,ENG:float> (1) (state=,code=0)



But if I create a comparable table within hive, things do work:

create table x(y struct<a:timestamp,b:bigint,c:float>);

insert into x select named_struct('a', current_timestamp, 'b', bigint(42), 
'c', float(4.2)) from dummy limit 1;

select * from x;
+-------------------------------------------------+--+
|                       x.y                       |
+-------------------------------------------------+--+
| {"a":"2016-08-04 14:49:01.636","b":42,"c":4.2}  |
+-------------------------------------------------+--+



The tables look similar:

describe CFA1_Fan_Speed_DMC;
+-----------+--------------------------------------------------------
+----------+--+
| col_name  |                       data_type                        | comment  
|
+-----------+--------------------------------------------------------
+----------+--+
| record    | struct<normalizedTime:timestamp,RAW:bigint,ENG:float>  |          
|
+-----------+--------------------------------------------------------
+----------+--+

describe x;
+-----------+---------------------------------------+----------+--+
| col_name  |               data_type               | comment  |
+-----------+---------------------------------------+----------+--+
| y         | struct<a:timestamp,b:bigint,c:float>  |          |
+-----------+---------------------------------------+----------+--+



So does anybody have an idea what might be wrong with my external table 
access? Anything that I could give a try?

This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure. If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete this message and any attachments from your system. Astrium and Airbus Group companies disclaim any and all liability if this email transmission was virus corrupted, altered or falsified.
---------------------------------------------------------
Airbus DS GmbH 
Vorsitzender des Aufsichtsrates: Bernhard Gerwert 
Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr. Michael Menking, Dr. Johannes von Thadden 
Sitz der Gesellschaft: München - Registergericht: Amtsgericht München, HRB Nr. 107 647 
Ust. Ident. Nr. /VAT reg. no. DE167015356

Re: Create table from orc file

Posted by Marcin Tustin <mt...@handybook.com>.
Correct you need to specify the columns. If you created the file I assume
you have a record of them.

Someone more familiar with the hive code will have to comment on the
exceptions.

On Wednesday, August 3, 2016, Johannes Stamminger <
johannes.stamminger@airbus.com> wrote:

> But doing so I assume it does not detect the columns on it's own, I have to
> specify such manually - or am I wrong? The orc file I finally want to work
> with contains ~28000 columns (513MB size, ~500000 rows, 3 structs with 2 of
> them containing ~14000 fields each) ...
>
> The hive documentation for the create table statement shows the columns
> part
> being optional. In fact it seems required, at least I found no way to avoid
> it.
>
>
> For testing purposes I started with a smaller one and found two ways of
> bringing the data to hive. Unfortunately I actually fail on accessing it:
>
>
> a) create external table:
>
> Succeeding statement:
>
> create external table if not exists CFA1_Fan_Speed_DMC(record
> struct<normalizedTime:bigint,RAW:bigint,ENG:float>) stored as ORC location
> '...';
>
> with the location having specified containig my existing orc file named
> exactly like the table, CFA1_Fan_Speed_DMC.
>
> But every selection for data results in:
>
> Error: java.io.IOException: java.lang.RuntimeException: Char length 256
> out of
> allowed range [1, 255] (state=,code=0)
>
> Tried with:
>  - select * from CFA1_Fan_Speed_DMC;
>  - select record from CFA1_Fan_Speed_DMC;
>  - select record.normalizedTime from CFA1_Fan_Speed_DMC;
>
>
> b) create table and load from file
>
> Succeeding statements:
>
> create table cfa1(record
> struct<normalizedTime:bigint,RAW:bigint,ENG:float>)
> stored as orc;
>
> load data inpath '.../CFA1_Fan_Speed_DMC' into table cfa1;
>
> Same statements for querying as above (of course using the different table
> name) still fail, but now with:
>
> Error: java.io.IOException: java.io.IOException: ORC does not support type
> conversion from file type bigint (1) to reader type
> struct<normalizedTime:bigint,RAW:bigint,ENG:float> (1) (state=,code=0)
>
>
>
> So what is wrong with the above?
>
>
> I should mention, that I created the orc files having used using the latest
> orc-core lib (1.1.2). That seems not to be the same implementation for orc
> files access as being used in hive.
>
>
> Thanks for all hints!
>
>
>
> Am Mittwoch, 3. August 2016, 08:45:45 CEST schrieb Marcin Tustin:
> > Yes. Create an external table whose location contains only the orc
> file(s)
> > you want to include in the table.
> >
> > On Wed, Aug 3, 2016 at 7:53 AM, Johannes Stamminger <
> >
> > johannes.stamminger@airbus.com <javascript:;>> wrote:
> > > Hi,
> > >
> > >
> > > is it possible to write data to an orc file(s) using the hive-orc api
> and
> > > to
> > > use such by hive (create a table from it)?
> > >
> > >
> > > Regards
> > > This email (including any attachments) may contain confidential and/or
> > > privileged information or information otherwise protected from
> disclosure.
> > > If you are not the intended recipient, please notify the sender
> > > immediately, do not copy this message or any attachments and do not
> use it
> > > for any purpose or disclose its content to any person, but delete this
> > > message and any attachments from your system. Astrium and Airbus Group
> > > companies disclaim any and all liability if this email transmission was
> > > virus corrupted, altered or falsified.
> > > ---------------------------------------------------------
> > > Airbus DS GmbH
> > > Vorsitzender des Aufsichtsrates: Bernhard Gerwert
> > > Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr.
> > > Michael Menking, Dr. Johannes von Thadden
> > > Sitz der Gesellschaft: München - Registergericht: Amtsgericht München,
> HRB
> > > Nr. 107 647
> > > Ust. Ident. Nr. /VAT reg. no. DE167015356
>
>
> --
>    Johannes.Stamminger@Airbus.com      [2FE783D0 http://wwwkeys.PGP.net]
> ------ ----<--{(@ ------------------              AIRBUS Defence & Space
> Koenigsberger Str. 17, 28857 Barrien     Ground SW Eng. & Del. (TSOTC 6)
> +49 4242 169582 (Tel + FAX)                 Airbus Allee 1, 28199 Bremen
> +49 174 7731593 (Mobile)             +49 421 539 4152 (Tel) / 4378 (FAX)
>
> This email (including any attachments) may contain confidential and/or
> privileged information or information otherwise protected from disclosure.
> If you are not the intended recipient, please notify the sender
> immediately, do not copy this message or any attachments and do not use it
> for any purpose or disclose its content to any person, but delete this
> message and any attachments from your system. Astrium and Airbus Group
> companies disclaim any and all liability if this email transmission was
> virus corrupted, altered or falsified.
> ---------------------------------------------------------
> Airbus DS GmbH
> Vorsitzender des Aufsichtsrates: Bernhard Gerwert
> Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr.
> Michael Menking, Dr. Johannes von Thadden
> Sitz der Gesellschaft: München - Registergericht: Amtsgericht München, HRB
> Nr. 107 647
> Ust. Ident. Nr. /VAT reg. no. DE167015356
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity


Re: Create table from orc file

Posted by Johannes Stamminger <jo...@airbus.com>.
But doing so I assume it does not detect the columns on it's own, I have to 
specify such manually - or am I wrong? The orc file I finally want to work 
with contains ~28000 columns (513MB size, ~500000 rows, 3 structs with 2 of 
them containing ~14000 fields each) ...

The hive documentation for the create table statement shows the columns part 
being optional. In fact it seems required, at least I found no way to avoid 
it.


For testing purposes I started with a smaller one and found two ways of 
bringing the data to hive. Unfortunately I actually fail on accessing it:


a) create external table:

Succeeding statement:

create external table if not exists CFA1_Fan_Speed_DMC(record 
struct<normalizedTime:bigint,RAW:bigint,ENG:float>) stored as ORC location 
'...';

with the location having specified containig my existing orc file named 
exactly like the table, CFA1_Fan_Speed_DMC.

But every selection for data results in:

Error: java.io.IOException: java.lang.RuntimeException: Char length 256 out of 
allowed range [1, 255] (state=,code=0)

Tried with:
 - select * from CFA1_Fan_Speed_DMC;
 - select record from CFA1_Fan_Speed_DMC;
 - select record.normalizedTime from CFA1_Fan_Speed_DMC;


b) create table and load from file

Succeeding statements:

create table cfa1(record struct<normalizedTime:bigint,RAW:bigint,ENG:float>) 
stored as orc;

load data inpath '.../CFA1_Fan_Speed_DMC' into table cfa1;

Same statements for querying as above (of course using the different table 
name) still fail, but now with:

Error: java.io.IOException: java.io.IOException: ORC does not support type 
conversion from file type bigint (1) to reader type 
struct<normalizedTime:bigint,RAW:bigint,ENG:float> (1) (state=,code=0)



So what is wrong with the above?


I should mention, that I created the orc files having used using the latest 
orc-core lib (1.1.2). That seems not to be the same implementation for orc 
files access as being used in hive.


Thanks for all hints!



Am Mittwoch, 3. August 2016, 08:45:45 CEST schrieb Marcin Tustin:
> Yes. Create an external table whose location contains only the orc file(s)
> you want to include in the table.
> 
> On Wed, Aug 3, 2016 at 7:53 AM, Johannes Stamminger <
> 
> johannes.stamminger@airbus.com> wrote:
> > Hi,
> > 
> > 
> > is it possible to write data to an orc file(s) using the hive-orc api and
> > to
> > use such by hive (create a table from it)?
> > 
> > 
> > Regards
> > This email (including any attachments) may contain confidential and/or
> > privileged information or information otherwise protected from disclosure.
> > If you are not the intended recipient, please notify the sender
> > immediately, do not copy this message or any attachments and do not use it
> > for any purpose or disclose its content to any person, but delete this
> > message and any attachments from your system. Astrium and Airbus Group
> > companies disclaim any and all liability if this email transmission was
> > virus corrupted, altered or falsified.
> > ---------------------------------------------------------
> > Airbus DS GmbH
> > Vorsitzender des Aufsichtsrates: Bernhard Gerwert
> > Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr.
> > Michael Menking, Dr. Johannes von Thadden
> > Sitz der Gesellschaft: München - Registergericht: Amtsgericht München, HRB
> > Nr. 107 647
> > Ust. Ident. Nr. /VAT reg. no. DE167015356


-- 
   Johannes.Stamminger@Airbus.com      [2FE783D0 http://wwwkeys.PGP.net]
------ ----<--{(@ ------------------              AIRBUS Defence & Space
Koenigsberger Str. 17, 28857 Barrien     Ground SW Eng. & Del. (TSOTC 6)
+49 4242 169582 (Tel + FAX)                 Airbus Allee 1, 28199 Bremen
+49 174 7731593 (Mobile)             +49 421 539 4152 (Tel) / 4378 (FAX)

This email (including any attachments) may contain confidential and/or privileged information or information otherwise protected from disclosure. If you are not the intended recipient, please notify the sender immediately, do not copy this message or any attachments and do not use it for any purpose or disclose its content to any person, but delete this message and any attachments from your system. Astrium and Airbus Group companies disclaim any and all liability if this email transmission was virus corrupted, altered or falsified.
---------------------------------------------------------
Airbus DS GmbH 
Vorsitzender des Aufsichtsrates: Bernhard Gerwert 
Geschäftsführung: Evert Dudok (Vorsitzender), Dr. Lars Immisch, Dr. Michael Menking, Dr. Johannes von Thadden 
Sitz der Gesellschaft: München - Registergericht: Amtsgericht München, HRB Nr. 107 647 
Ust. Ident. Nr. /VAT reg. no. DE167015356