You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Amatucci, Mario, Vodafone Group" <ma...@vodafone.com> on 2016/07/19 11:03:50 UTC

RE: hive external table on gzip

Hi I have huge gzip on hdfs and |I'd like to create an external table on top of them
Any code example? Cheers
Ps
I cannot use snappy or lzo for some constraints

--
Kind regards
Mario Amatucci
CG TB PS GDC PRAGUE THINK BIG


Re: hive external table on gzip

Posted by Mich Talebzadeh <mi...@gmail.com>.
pretty simple

--1 Move gz file or files into HDFS: Multiple files can be in that staging
directory with hdfs dfs -copyFromLocal <local_dir>/*.gz
hdfs://rhes564:9000/data/stg/
--2 Create an external table. Just one will do CREATE EXTERNAL TABLE stg_t2
... STORED AS TEXTFILE.... LOCATION '/data/stg/'
--3 Create the internal Hive table.  CREATE TABLE t2 ( .... STORED AS ORC
TBLPROPERTIES ( "orc.compress"="SNAPPY" )
--4 Insert the data from the external table to the Hive  table INSERT INTO
TABLE t2 SELECT...FROM stg_t2
--5 remove the gz files if needed once processed hdfs dfs -rm
hdfs://rhes564:9000/data/stg/*.gz

HTH

Dr Mich Talebzadeh



LinkedIn * https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com


*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.



On 19 July 2016 at 12:03, Amatucci, Mario, Vodafone Group <
mario.amatucci@vodafone.com> wrote:

>
>
> Hi I have huge gzip on hdfs and |I’d like to create an external table on
> top of them
>
> Any code example? Cheers
>
> Ps
>
> I cannot use snappy or lzo for some constraints
>
>
>
> --
>
> Kind regards
>
> Mario Amatucci
> CG TB PS GDC PRAGUE THINK BIG
>
>
>

Re: hive external table on gzip

Posted by Jörn Franke <jo...@gmail.com>.
Gzip is transparently handled by Hive (* by the formats available in Hive. If it is a custom format it depends on it).. What format is the table (csv? Json?) depending on that you simply choose the corresponding serde and it transparently does the decompression. Keep in mind that gzip is not splittable that means it cannot be decompressed in parallel. Try to go for bzip2 to enable parallel decompression it or split the large file in several smaller files (at minimum the size of a HDFS block).

> On 19 Jul 2016, at 13:03, Amatucci, Mario, Vodafone Group <ma...@vodafone.com> wrote:
> 
>  
> Hi I have huge gzip on hdfs and |I’d like to create an external table on top of them
> Any code example? Cheers
> Ps
> I cannot use snappy or lzo for some constraints
>  
> --
> Kind regards
> Mario Amatucci
> CG TB PS GDC PRAGUE THINK BIG
>