You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "@Sanjiv Singh" <sa...@gmail.com> on 2016/06/21 23:45:01 UTC
loading in ORC from big compressed file
Hi ,
I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
# load staging table *STAGE_**my_table* from file *my_table.dat.gz*
HIVE>> LOAD DATA INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
TABLE STAGE_my_table ;
*# insert into ORC table "my_table"*
HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
....
INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
....
Insertion into orc table in going on since 5-6 hours , Seems everything is
going sequential with one mapper reading complete file?
Please suggest ? help me in improving ORC table load.
Regards
Sanjiv Singh
Mob : +091 9990-447-339
Re: loading in ORC from big compressed file
Posted by Mich Talebzadeh <mi...@gmail.com>.
Hi
Are you using map-reduce as execution engine?
what version of Hive are you on?
HTH
Dr Mich Talebzadeh
LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com
On 22 June 2016 at 00:45, @Sanjiv Singh <sa...@gmail.com> wrote:
> Hi ,
>
> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>
> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>
> HIVE>> LOAD DATA INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
> TABLE STAGE_my_table ;
>
> *# insert into ORC table "my_table"*
>
> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
> ....
> INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
> ....
>
>
> Insertion into orc table in going on since 5-6 hours , Seems everything is
> going sequential with one mapper reading complete file?
>
> Please suggest ? help me in improving ORC table load.
>
>
>
>
> Regards
> Sanjiv Singh
> Mob : +091 9990-447-339
>
Re: loading in ORC from big compressed file
Posted by "@Sanjiv Singh" <sa...@gmail.com>.
Thanks Marcin, I worked ....I uncompressed file and then loaded file in
hive table.
Now its been quick, few mins.
Regards
Sanjiv Singh
Mob : +091 9990-447-339
On Wed, Jun 22, 2016 at 7:44 AM, Jörn Franke <jo...@gmail.com> wrote:
>
>
> Marcin is correct : either split up the gzip files in smaller files of at
> least on HDFS block or use bzip2 with block compression.
> What is the original format of the table?
>
> On 22 Jun 2016, at 01:50, Marcin Tustin <mt...@handybook.com> wrote:
>
> This is because a GZ file is not splittable at all. Basically, try
> creating this from an uncompressed file, or even better split up the file
> and put the files in a directory in hdfs/s3/whatever.
>
> On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sa...@gmail.com>
> wrote:
>
>> Hi ,
>>
>> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>>
>> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>>
>> HIVE>> LOAD DATA INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
>> TABLE STAGE_my_table ;
>>
>> *# insert into ORC table "my_table"*
>>
>> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
>> ....
>> INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
>> ....
>>
>>
>> Insertion into orc table in going on since 5-6 hours , Seems everything
>> is going sequential with one mapper reading complete file?
>>
>> Please suggest ? help me in improving ORC table load.
>>
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob : +091 9990-447-339
>>
>
>
> Want to work at Handy? Check out our culture deck and open roles
> <http://www.handy.com/careers>
> Latest news <http://www.handy.com/press> at Handy
> Handy just raised $50m
> <http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
> by Fidelity
>
>
Re: loading in ORC from big compressed file
Posted by Jörn Franke <jo...@gmail.com>.
Marcin is correct : either split up the gzip files in smaller files of at least on HDFS block or use bzip2 with block compression.
What is the original format of the table?
> On 22 Jun 2016, at 01:50, Marcin Tustin <mt...@handybook.com> wrote:
>
> This is because a GZ file is not splittable at all. Basically, try creating this from an uncompressed file, or even better split up the file and put the files in a directory in hdfs/s3/whatever.
>
>> On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sa...@gmail.com> wrote:
>> Hi ,
>>
>> I have big compressed data file my_table.dat.gz ( approx size 100 GB)
>>
>> # load staging table STAGE_my_table from file my_table.dat.gz
>>
>> HIVE>> LOAD DATA INPATH '/var/lib/txt/my_table.dat.gz' OVERWRITE INTO TABLE STAGE_my_table ;
>>
>> # insert into ORC table "my_table"
>>
>> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
>> ....
>> INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
>> ....
>>
>>
>> Insertion into orc table in going on since 5-6 hours , Seems everything is going sequential with one mapper reading complete file?
>>
>> Please suggest ? help me in improving ORC table load.
>>
>>
>>
>>
>> Regards
>> Sanjiv Singh
>> Mob : +091 9990-447-339
>
>
> Want to work at Handy? Check out our culture deck and open roles
> Latest news at Handy
> Handy just raised $50m led by Fidelity
>
Re: loading in ORC from big compressed file
Posted by Marcin Tustin <mt...@handybook.com>.
This is because a GZ file is not splittable at all. Basically, try creating
this from an uncompressed file, or even better split up the file and put
the files in a directory in hdfs/s3/whatever.
On Tue, Jun 21, 2016 at 7:45 PM, @Sanjiv Singh <sa...@gmail.com>
wrote:
> Hi ,
>
> I have big compressed data file *my_table.dat.gz* ( approx size 100 GB)
>
> # load staging table *STAGE_**my_table* from file *my_table.dat.gz*
>
> HIVE>> LOAD DATA INPATH '/var/lib/txt/*my_table.dat.gz*' OVERWRITE INTO
> TABLE STAGE_my_table ;
>
> *# insert into ORC table "my_table"*
>
> HIVE>> INSERT INTO TABLE my_table SELECT * FROM TXT_my_table;
> ....
> INFO : Map 1: 0(+1)/1 Reducer 2: 0/1
> ....
>
>
> Insertion into orc table in going on since 5-6 hours , Seems everything is
> going sequential with one mapper reading complete file?
>
> Please suggest ? help me in improving ORC table load.
>
>
>
>
> Regards
> Sanjiv Singh
> Mob : +091 9990-447-339
>
--
Want to work at Handy? Check out our culture deck and open roles
<http://www.handy.com/careers>
Latest news <http://www.handy.com/press> at Handy
Handy just raised $50m
<http://venturebeat.com/2015/11/02/on-demand-home-service-handy-raises-50m-in-round-led-by-fidelity/> led
by Fidelity