You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@hive.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2016/06/21 23:45:01 UTC

loading in ORC from big compressed file

Hi ,

I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)

# load staging table *STAGE_**my_table* from file *my_table.dat.gz*

HIVE>> LOAD DATA  INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
TABLE STAGE_my_table ;

*# insert into ORC table "my_table"*

HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
....
INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
....


Insertion into orc table in going on since 5-6 hours , Seems everything is
going sequential with one mapper reading complete file?

Please suggest ? help me in improving ORC table load.




Regards
Sanjiv Singh
Mob :  +091 9990-447-339

Re: loading in ORC from big compressed file

Posted by Mich Talebzadeh <mi...@gmail.com>.

Hi

Are you using map-reduce as execution engine?

what version of Hive are you on?

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com



On 22 June 2016 at 00:45, @Sanjiv Singh <sa...@gmail.com> wrote:

> Hi ,
>
> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>
> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>
> HIVE>> LOAD DATA  INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
> TABLE STAGE_my_table ;
>
> *# insert into ORC table "my_table"*
>
> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
> ....
> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
> ....
>
>
> Insertion into orc table in going on since 5-6 hours , Seems everything is
> going sequential with one mapper reading complete file?
>
> Please suggest ? help me in improving ORC table load.
>
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>

Re: loading in ORC from big compressed file

Posted by "@Sanjiv Singh" <sa...@gmail.com>.

Thanks Marcin, I worked ....I uncompressed file and then loaded file in
hive table.

Now its been quick, few mins.




Regards
Sanjiv Singh
Mob :  +091 9990-447-339

On Wed, Jun 22, 2016 at 7:44 AM, Jörn Franke <jo...@gmail.com> wrote:

>
>
> Marcin is correct : either split up the gzip files in smaller files of at
> least on HDFS block or use bzip2 with block compression.
> What is the original format of the table?
>
> On 22 Jun 2016, at 01:50, Marcin Tustin <mt...@handybook.com> wrote:
>
> This is because a GZ file is not splittable at all. Basically, try
> creating this from an uncompressed file, or even better split up the file
> and put the files in a directory in hdfs/s3/whatever.
>
> On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>> Hi ,
>>
>> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>>
>> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>>
>> HIVE>> LOAD DATA  INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
>> TABLE STAGE_my_table ;
>>
>> *# insert into ORC table "my_table"*
>>
>> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
>> ....
>> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
>> ....
>>
>>
>> Insertion into orc table in going on since 5-6 hours , Seems everything
>> is going sequential with one mapper reading complete file?
>>
>> Please suggest ? help me in improving ORC table load.
>>
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
>>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>

Re: loading in ORC from big compressed file

Posted by Jörn Franke <jo...@gmail.com>.


Marcin is correct : either split up the gzip files in smaller files of at least on HDFS block or use bzip2 with block compression.
What is the original format of the table?

> On 22 Jun 2016, at 01:50, Marcin Tustin <mt...@handybook.com> wrote:
> 
> This is because a GZ file is not splittable at all. Basically, try creating this from an uncompressed file, or even better split up the file and put the files in a directory in hdfs/s3/whatever. 
> 
>> On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sa...@gmail.com> wrote:
>> Hi ,
>> 
>> I have big compressed data file my_table.dat.gz ( approx size 100 GB)
>> 
>> # load staging table STAGE_my_table from file my_table.dat.gz
>> 
>> HIVE>> LOAD DATA  INPATH '/var/lib/txt/my_table.dat.gz' OVERWRITE INTO TABLE STAGE_my_table	;
>> 
>> # insert into ORC table "my_table"
>> 
>> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
>> ....
>> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
>> ....
>> 
>> 
>> Insertion into orc table in going on since 5-6 hours , Seems everything is going sequential with one mapper reading complete file? 
>> 
>> Please suggest ? help me in improving ORC table load.
>> 
>> 
>> 
>> 
>> Regards
>> Sanjiv Singh
>> Mob :  +091 9990-447-339
> 
> 
> Want to work at Handy? Check out our culture deck and open roles
> Latest news at Handy
> Handy just raised $50m led by Fidelity
>

Re: loading in ORC from big compressed file

Posted by Marcin Tustin <mt...@handybook.com>.

This is because a GZ file is not splittable at all. Basically, try creating
this from an uncompressed file, or even better split up the file and put
the files in a directory in hdfs/s3/whatever.

On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:

> Hi ,
>
> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>
> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>
> HIVE>> LOAD DATA  INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
> TABLE STAGE_my_table ;
>
> *# insert into ORC table "my_table"*
>
> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
> ....
> INFO  : Map 1: 0(+1)/1  Reducer 2: 0/1
> ....
>
>
> Insertion into orc table in going on since 5-6 hours , Seems everything is
> going sequential with one mapper reading complete file?
>
> Please suggest ? help me in improving ORC table load.
>
>
>
>
> Regards
> Sanjiv Singh
> Mob :  +091 9990-447-339
>

-- 
Want to work at Handy? Check out our culture deck and open roles 
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m 
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led 
by Fidelity