You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Manish.Bhoge" <Ma...@target.com> on 2012/09/26 09:17:32 UTC

zip file or tar file cosumption

Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>


RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file 
> 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
> > True Manish.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Thanks Savant. I believe this will hold good for .zip file also.
> > 
> >  
> > 
> > Thank You,
> > 
> > Manish.
> > 
> >  
> > 
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> > 
> >  
> > 
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> > 
> >  
> > 
> > After this you can use regular load command to load these files, for
> > example
> > 
> >  
> > 
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> > 
> >  
> > 
> > hope this helps
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Richin,
> > 
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> > 
> > Thank You,
> > Manish. 
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:<ri...@nokia.com> 
> > 
> > 
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> > 
> > 
> > To:<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org 
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> > 
> >  
> > 
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> > 
> > But you can always compress your files in gzip format and they
> > should be good to go.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> > 
> >  
> > 
> > Chuck Connell
> > 
> > Nuance R&D Data Team
> > 
> > Burlington, MA
> > 
> >  
> > 
> >  
> > 
> > 
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Manish,
> > 
> >  
> > 
> > If you have your zip file at location -  /home/manish/zipfile, you
> > can just point your external table to that location like
> > 
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> > 
> >  
> > 
> > OR
> > 
> >  
> > 
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> > 
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> > 
> >  
> > 
> > Hope this helps.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Savant,
> > 
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> > 
> > 
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> > 
> > 
> > To:user@hive.apache.org<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org
> > 
> > 
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > Another solution would be
> > 
> >  
> > 
> > Using shell script do following
> > 
> > 1.      unzip txt files, 
> > 
> > 2.      one by one merge those 50 (or N number of) text files into
> > one text file,
> > 
> > 3.      then the zip/tar that bigger text file,
> > 
> > 4.      then that big zip/tar file can be uploaded into hive.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> > 
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> > 
> > Chuck
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> > 
> > 
> > Hivers,
> > 
> >  
> > 
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure).  Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> > 
> >  
> > 
> > Thanks & Regards
> > 
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> > 
> >  
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> 
> 
> 



RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file 
> 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
> > True Manish.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Thanks Savant. I believe this will hold good for .zip file also.
> > 
> >  
> > 
> > Thank You,
> > 
> > Manish.
> > 
> >  
> > 
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> > 
> >  
> > 
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> > 
> >  
> > 
> > After this you can use regular load command to load these files, for
> > example
> > 
> >  
> > 
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> > 
> >  
> > 
> > hope this helps
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Richin,
> > 
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> > 
> > Thank You,
> > Manish. 
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:<ri...@nokia.com> 
> > 
> > 
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> > 
> > 
> > To:<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org 
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> > 
> >  
> > 
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> > 
> > But you can always compress your files in gzip format and they
> > should be good to go.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> > 
> >  
> > 
> > Chuck Connell
> > 
> > Nuance R&D Data Team
> > 
> > Burlington, MA
> > 
> >  
> > 
> >  
> > 
> > 
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Manish,
> > 
> >  
> > 
> > If you have your zip file at location -  /home/manish/zipfile, you
> > can just point your external table to that location like
> > 
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> > 
> >  
> > 
> > OR
> > 
> >  
> > 
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> > 
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> > 
> >  
> > 
> > Hope this helps.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Savant,
> > 
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> > 
> > 
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> > 
> > 
> > To:user@hive.apache.org<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org
> > 
> > 
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > Another solution would be
> > 
> >  
> > 
> > Using shell script do following
> > 
> > 1.      unzip txt files, 
> > 
> > 2.      one by one merge those 50 (or N number of) text files into
> > one text file,
> > 
> > 3.      then the zip/tar that bigger text file,
> > 
> > 4.      then that big zip/tar file can be uploaded into hive.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> > 
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> > 
> > Chuck
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> > 
> > 
> > Hivers,
> > 
> >  
> > 
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure).  Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> > 
> >  
> > 
> > Thanks & Regards
> > 
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> > 
> >  
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> 
> 
> 



RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file 
> 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
> > True Manish.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Thanks Savant. I believe this will hold good for .zip file also.
> > 
> >  
> > 
> > Thank You,
> > 
> > Manish.
> > 
> >  
> > 
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> > 
> >  
> > 
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> > 
> >  
> > 
> > After this you can use regular load command to load these files, for
> > example
> > 
> >  
> > 
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> > 
> >  
> > 
> > hope this helps
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Richin,
> > 
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> > 
> > Thank You,
> > Manish. 
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:<ri...@nokia.com> 
> > 
> > 
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> > 
> > 
> > To:<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org 
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> > 
> >  
> > 
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> > 
> > But you can always compress your files in gzip format and they
> > should be good to go.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> > 
> >  
> > 
> > Chuck Connell
> > 
> > Nuance R&D Data Team
> > 
> > Burlington, MA
> > 
> >  
> > 
> >  
> > 
> > 
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Manish,
> > 
> >  
> > 
> > If you have your zip file at location -  /home/manish/zipfile, you
> > can just point your external table to that location like
> > 
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> > 
> >  
> > 
> > OR
> > 
> >  
> > 
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> > 
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> > 
> >  
> > 
> > Hope this helps.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Savant,
> > 
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> > 
> > 
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> > 
> > 
> > To:user@hive.apache.org<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org
> > 
> > 
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > Another solution would be
> > 
> >  
> > 
> > Using shell script do following
> > 
> > 1.      unzip txt files, 
> > 
> > 2.      one by one merge those 50 (or N number of) text files into
> > one text file,
> > 
> > 3.      then the zip/tar that bigger text file,
> > 
> > 4.      then that big zip/tar file can be uploaded into hive.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> > 
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> > 
> > Chuck
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> > 
> > 
> > Hivers,
> > 
> >  
> > 
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure).  Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> > 
> >  
> > 
> > Thanks & Regards
> > 
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> > 
> >  
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> 
> 
> 



RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file 
> 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
> > True Manish.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Thanks Savant. I believe this will hold good for .zip file also.
> > 
> >  
> > 
> > Thank You,
> > 
> > Manish.
> > 
> >  
> > 
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> > 
> >  
> > 
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> > 
> >  
> > 
> > After this you can use regular load command to load these files, for
> > example
> > 
> >  
> > 
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> > 
> >  
> > 
> > hope this helps
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Richin,
> > 
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> > 
> > Thank You,
> > Manish. 
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:<ri...@nokia.com> 
> > 
> > 
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> > 
> > 
> > To:<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org 
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> > 
> >  
> > 
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> > 
> > But you can always compress your files in gzip format and they
> > should be good to go.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> > 
> >  
> > 
> > Chuck Connell
> > 
> > Nuance R&D Data Team
> > 
> > Burlington, MA
> > 
> >  
> > 
> >  
> > 
> > 
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Manish,
> > 
> >  
> > 
> > If you have your zip file at location -  /home/manish/zipfile, you
> > can just point your external table to that location like
> > 
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> > 
> >  
> > 
> > OR
> > 
> >  
> > 
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> > 
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> > 
> >  
> > 
> > Hope this helps.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Savant,
> > 
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> > 
> > 
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> > 
> > 
> > To:user@hive.apache.org<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org
> > 
> > 
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > Another solution would be
> > 
> >  
> > 
> > Using shell script do following
> > 
> > 1.      unzip txt files, 
> > 
> > 2.      one by one merge those 50 (or N number of) text files into
> > one text file,
> > 
> > 3.      then the zip/tar that bigger text file,
> > 
> > 4.      then that big zip/tar file can be uploaded into hive.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> > 
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> > 
> > Chuck
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> > 
> > 
> > Hivers,
> > 
> >  
> > 
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure).  Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> > 
> >  
> > 
> > Thanks & Regards
> > 
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> > 
> >  
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> 
> 
> 



RE: zip file or tar file cosumption

Posted by Bejoy KS <be...@outlook.com>.
Definitely Raja, but looks like the one for zip is blocked for some time now

https://issues.apache.org/jira/browse/MAPREDUCE-210

Regards
Bejoy KS

> Date: Sun, 30 Sep 2012 12:41:29 -0700
> Subject: Re: zip file or tar file cosumption
> From: thiruvathuru@gmail.com
> To: user@hive.apache.org
> 
> we can write custom codecs
> 
> On Sun, Sep 30, 2012 at 11:47 AM, Bejoy KS <be...@outlook.com> wrote:
> > Yes Manish, Zip is not supported in hadoop. You may have to use gzip
> > instead.
> >
> > Regards
> > Bejoy KS
> >
> >
> > ________________________________
> > Subject: RE: zip file or tar file cosumption
> > From: manishbhoge@rocketmail.com
> > To: user@hive.apache.org
> > CC: Chuck.Connell@nuance.com
> > Date: Sun, 30 Sep 2012 20:35:35 +0530
> >
> > Thanks Bejoy. I have zip file there is sense to convert into gzip again.
> >
> > Chuck, I got what you are trying to say. So I need to process it outside
> > HDFS and bring the text file into HDFS.
> >
> >
> > On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
> >
> > Hi Manish
> >
> > Gzip works well if you have the compression codec available in
> > 'io.compression.codes' . Gzip codec is present in default.
> >
> > I don't think untar ing world be done by map reduce jobs. So tar files may
> > not work with hive, you need to untar the files out of hadoop hive as a
> > prerequisite.
> >
> >
> >
> > Regards
> >
> > Bejoy KS
> >
> >
> > ________________________________
> >
> > To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> > Subject: Re: zip file or tar file cosumption
> > From: manishbhoge@rocketmail.com
> > Date: Sun, 30 Sep 2012 12:32:15 +0000
> >
> > What about .gz OR tar file. Does this unzip require at HDFS and load into
> > hive? How you resolve it.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> > ________________________________
> >
> > From: "Connell, Chuck" <Ch...@nuance.com>
> >
> > Date: Sun, 30 Sep 2012 12:24:37 +0000
> >
> > To: user@hive.apache.org<us...@hive.apache.org>; Savant,
> > Keshav<Ke...@fisglobal.com>
> >
> > ReplyTo: user@hive.apache.org
> >
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> > I have seen that error when I try to overwrite an existing file.
> >
> > But, more importantly, Hive cannot understand ZIP files. There was a long
> > thread about this just a few days ago. Your table def says "stored as
> > textfile" but you are not giving it a text file.
> >
> > Chuck
> >
> >
> > ________________________________
> >
> > From: Manish [manishbhoge@rocketmail.com]
> > Sent: Sunday, September 30, 2012 7:38 AM
> > To: Savant, Keshav
> > Cc: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > I am getting below error when loading zip file
> >
> > Driver returned: 9.  Errors: Hive history
> > file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> > Loading data to table default.pageview_zip
> > Failed with exception Error moving:
> > hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into:
> > /user/manish/input/zip
> > FAILED: Execution Error, return code 1 from
> > org.apache.hadoop.hive.ql.exec.MoveTask
> >
> > My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip'
> > OVERWRITE INTO TABLE `pageview_zip`
> >
> > Table definition:
> > CREATE external TABLE pageview_zip
> > (
> > C_0 STRING,
> > C_1 STRING,
> > C_7 MAP<STRING,STRING>,
> > C_8 STRING,
> > C_13 MAP<STRING,STRING>,
> > C_21 STRING
> > )
> > COMMENT 'Page View'
> > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY
> > ';' MAP KEYS TERMINATED BY '='
> > STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> >
> > Thank You,
> > Manish
> >
> >
> >
> > On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
> >
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should be
> > defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED
> > FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file to Hive
> > table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> > ________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files or any
> > compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
> >
> > But you can always compress your files in gzip format and they should be
> > good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How could
> > this possibly work with a zip/tar file that can contain ASCII 10 characters
> > at random locations, and certainly does not have ASCII 10 at the end of each
> > data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location -  /home/manish/zipfile, you can just
> > point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE
> > LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location you can
> > load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I directly
> > use zip file in external table. can u pls help to get the load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> > ________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1.      unzip txt files,
> >
> > 2.      one by one merge those 50 (or N number of) text files into one text
> > file,
> >
> > 3.      then the zip/tar that bigger text file,
> >
> > 4.      then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator. A ZIP
> > file will certainly newline characters. So I doubt this is possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is always
> > a record separator" problem, because we ran into it for another type of
> > compressed file.
> >
> > Chuck
> >
> > ________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar files
> > directly into Hive. All the files has similar schema (structure).  Say 50
> > *.txt files are zipped into a single zip file can we load data directly from
> > this zip file OR should we need to unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext:
> > 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i) delete the
> > message and all copies; (ii) do not disclose, distribute or use the message
> > in any manner; and (iii) notify the sender immediately. In addition, please
> > be aware that any message addressed to our domain is subject to archiving
> > and review by persons other than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i) delete the
> > message and all copies; (ii) do not disclose, distribute or use the message
> > in any manner; and (iii) notify the sender immediately. In addition, please
> > be aware that any message addressed to our domain is subject to archiving
> > and review by persons other than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i) delete the
> > message and all copies; (ii) do not disclose, distribute or use the message
> > in any manner; and (iii) notify the sender immediately. In addition, please
> > be aware that any message addressed to our domain is subject to archiving
> > and review by persons other than the intended recipient. Thank you.
> >
> >
> >
> >
> >
> 
> 
> 
> -- 
> 
> Raja Thiruvathuru
 		 	   		  

Re: zip file or tar file cosumption

Posted by Raja Thiruvathuru <th...@gmail.com>.
we can write custom codecs

On Sun, Sep 30, 2012 at 11:47 AM, Bejoy KS <be...@outlook.com> wrote:
> Yes Manish, Zip is not supported in hadoop. You may have to use gzip
> instead.
>
> Regards
> Bejoy KS
>
>
> ________________________________
> Subject: RE: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> To: user@hive.apache.org
> CC: Chuck.Connell@nuance.com
> Date: Sun, 30 Sep 2012 20:35:35 +0530
>
> Thanks Bejoy. I have zip file there is sense to convert into gzip again.
>
> Chuck, I got what you are trying to say. So I need to process it outside
> HDFS and bring the text file into HDFS.
>
>
> On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
>
> Hi Manish
>
> Gzip works well if you have the compression codec available in
> 'io.compression.codes' . Gzip codec is present in default.
>
> I don't think untar ing world be done by map reduce jobs. So tar files may
> not work with hive, you need to untar the files out of hadoop hive as a
> prerequisite.
>
>
>
> Regards
>
> Bejoy KS
>
>
> ________________________________
>
> To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> Subject: Re: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> Date: Sun, 30 Sep 2012 12:32:15 +0000
>
> What about .gz OR tar file. Does this unzip require at HDFS and load into
> hive? How you resolve it.
>
> Sent from my BlackBerry, pls excuse typo
>
> ________________________________
>
> From: "Connell, Chuck" <Ch...@nuance.com>
>
> Date: Sun, 30 Sep 2012 12:24:37 +0000
>
> To: user@hive.apache.org<us...@hive.apache.org>; Savant,
> Keshav<Ke...@fisglobal.com>
>
> ReplyTo: user@hive.apache.org
>
> Subject: RE: zip file or tar file cosumption
>
>
>
> I have seen that error when I try to overwrite an existing file.
>
> But, more importantly, Hive cannot understand ZIP files. There was a long
> thread about this just a few days ago. Your table def says "stored as
> textfile" but you are not giving it a text file.
>
> Chuck
>
>
> ________________________________
>
> From: Manish [manishbhoge@rocketmail.com]
> Sent: Sunday, September 30, 2012 7:38 AM
> To: Savant, Keshav
> Cc: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> I am getting below error when loading zip file
>
> Driver returned: 9.  Errors: Hive history
> file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving:
> hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into:
> /user/manish/input/zip
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip'
> OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY
> ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> True Manish.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> Sent: Thursday, September 27, 2012 4:26 PM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Thanks Savant. I believe this will hold good for .zip file also.
>
>
>
> Thank You,
>
> Manish.
>
>
>
> From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> Sent: Thursday, September 27, 2012 10:19 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Manish the table that has been created for zipped text files should be
> defined as sequence file, for example
>
>
>
> CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',' stored as sequencefile;
>
>
>
> After this you can use regular load command to load these files, for example
>
>
>
> load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
>
>
>
> hope this helps
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:43 PM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Richin,
>
> Thanks! Yes this is what I wanted to understand how to load zip file to Hive
> table. Now, I'll try this option.
>
> Thank You,
> Manish.
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ________________________________
>
> From:<ri...@nokia.com>
>
>
> Date:Wed, 26 Sep 2012 14:51:39 +0000
>
>
> To:<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> You are right Chuck. I thought his question was how to use zip files or any
> compressed files in Hive tables.
>
>
>
> Yeah, seems like you can’t do that
> see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
>
> But you can always compress your files in gzip format and they should be
> good to go.
>
>
>
> Richin
>
>
>
> From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 10:44 AM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> But TEXTFILE in Hive always has newline as the record delimiter. How could
> this possibly work with a zip/tar file that can contain ASCII 10 characters
> at random locations, and certainly does not have ASCII 10 at the end of each
> data record?
>
>
>
> Chuck Connell
>
> Nuance R&D Data Team
>
> Burlington, MA
>
>
>
>
>
>
> From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> Sent: Wednesday, September 26, 2012 10:14 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Hi Manish,
>
>
>
> If you have your zip file at location -  /home/manish/zipfile, you can just
> point your external table to that location like
>
> CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE
> LOCATION ‘/home/manish/zipfile’;
>
>
>
> OR
>
>
>
> If you already have external table pointing to a certain location you can
> load this zip file into your table as
>
> LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
>
>
>
> Hope this helps.
>
>
>
> Richin
>
>
>
> From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:13 AM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Savant,
>
> Got it. But I still need to understand that how to load zip? Can I directly
> use zip file in external table. can u pls help to get the load statement.
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ________________________________
>
> From:"Savant, Keshav" <Ke...@fisglobal.com>
>
>
> Date:Wed, 26 Sep 2012 12:25:38 +0000
>
>
> To:user@hive.apache.org<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> Chuck.Connell@nuance.com<Ch...@nuance.com>
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> Another solution would be
>
>
>
> Using shell script do following
>
> 1.      unzip txt files,
>
> 2.      one by one merge those 50 (or N number of) text files into one text
> file,
>
> 3.      then the zip/tar that bigger text file,
>
> 4.      then that big zip/tar file can be uploaded into hive.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 4:04 PM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> This could be a problem. Hive uses newline as the record separator. A ZIP
> file will certainly newline characters. So I doubt this is possible.
>
> BUT, I would like to hear from anyone who has solved the "newline is always
> a record separator" problem, because we ran into it for another type of
> compressed file.
>
> Chuck
>
> ________________________________
>
> From: Manish.Bhoge [Manish.Bhoge@target.com]
> Sent: Wednesday, September 26, 2012 3:17 AM
> To: user@hive.apache.org
> Subject: zip file or tar file cosumption
>
>
> Hivers,
>
>
>
> I want to understand that would it be possible to utilize zip/tar files
> directly into Hive. All the files has similar schema (structure).  Say 50
> *.txt files are zipped into a single zip file can we load data directly from
> this zip file OR should we need to unzip first?
>
>
>
> Thanks & Regards
>
> Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext:
> 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite
>
>
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
>
>
>



-- 

Raja Thiruvathuru

RE: zip file or tar file cosumption

Posted by Bejoy KS <be...@outlook.com>.
Yes Manish, Zip is not supported in hadoop. You may have to use gzip instead.Regards
Bejoy KS

Subject: RE: zip file or tar file cosumption
From: manishbhoge@rocketmail.com
To: user@hive.apache.org
CC: Chuck.Connell@nuance.com
Date: Sun, 30 Sep 2012 20:35:35 +0530




  
  


Thanks Bejoy. I have zip file there is sense to convert into gzip again.



Chuck, I got what you are trying to say. So I need to process it outside HDFS and bring the text file into HDFS.





On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote: 

    Hi Manish

    

    Gzip works well if you have the compression codec available in 'io.compression.codes' . Gzip codec is present in default.

    

    I don't think untar ing world be done by map reduce jobs. So tar files may not work with hive, you need to untar the files out of hadoop hive as a prerequisite.


    

    



    Regards


    Bejoy KS

    



    



    



    To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com

    Subject: Re: zip file or tar file cosumption

    From: manishbhoge@rocketmail.com

    Date: Sun, 30 Sep 2012 12:32:15 +0000

    

    What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve it.

    



    Sent from my BlackBerry, pls excuse typo


    




    From: "Connell, Chuck" <Ch...@nuance.com>


    Date: Sun, 30 Sep 2012 12:24:37 +0000


    To: user@hive.apache.org<us...@hive.apache.org>; Savant, Keshav<Ke...@fisglobal.com>


    ReplyTo: user@hive.apache.org


    Subject: RE: zip file or tar file cosumption


    

    



    I have seen that error when I try to overwrite an existing file. 

    

    But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.

    

    Chuck

    

    



    




    From: Manish [manishbhoge@rocketmail.com]

    Sent: Sunday, September 30, 2012 7:38 AM

    To: Savant, Keshav

    Cc: user@hive.apache.org

    Subject: RE: zip file or tar file cosumption

    

    



    



    

    I am getting below error when loading zip file 
Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`

Table definition: 
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish

    

    

    On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 

    
        True Manish.

        

         

        

        Keshav C Savant 

        

        

         

        

        From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 

        Sent: Thursday, September 27, 2012 4:26 PM

        To: user@hive.apache.org; manishbhoge@rocketmail.com

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        Thanks Savant. I believe this will hold good for .zip file also.

        

         

        

        Thank You,

        

        Manish.

        

         

        

        From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 

        Sent: Thursday, September 27, 2012 10:19 AM

        To: user@hive.apache.org; manishbhoge@rocketmail.com

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        Manish the table that has been created for zipped text files should be defined as sequence file, for example

        

         

        

        CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;

        

         

        

        After this you can use regular load command to load these files, for example

        

         

        

        load data local inpath 'path-to-csv-file.gz' into table my_table_zip;

        

         

        

        hope this helps

        

         

        

        Keshav C Savant 

        

        

         

        

        From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 

        Sent: Wednesday, September 26, 2012 9:43 PM

        To: user@hive.apache.org

        Subject: Re: zip file or tar file cosumption

        

        

         

        

        Hi Richin,

        

        Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

        

        Thank You,

        Manish. 

        

        Sent from my BlackBerry, pls excuse typo

        

        

    


    
        


    


    
        From:<ri...@nokia.com> 

        

        

        Date:Wed, 26 Sep 2012 14:51:39 +0000

        

        

        To:<us...@hive.apache.org>

        

        

        ReplyTo:user@hive.apache.org 

        

        

        Subject:RE: zip file or tar file cosumption

        

        

         

        

        

        You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.

        

         

        

        Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E

        

        But you can always compress your files in gzip format and they should be good to go.

        

         

        

        Richin

        

         

        

        From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 

        Sent: Wednesday, September 26, 2012 10:44 AM

        To: user@hive.apache.org

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

        

         

        

        Chuck Connell

        

        Nuance R&D Data Team

        

        Burlington, MA

        

         

        

         

        

        

        From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 

        Sent: Wednesday, September 26, 2012 10:14 AM

        To: user@hive.apache.org; manishbhoge@rocketmail.com

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        Hi Manish,

        

         

        

        If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like

        

        CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;

        

         

        

        OR

        

         

        

        If you already have external table pointing to a certain location you can load this zip file into your table as

        

        LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;

        

         

        

        Hope this helps.

        

         

        

        Richin

        

         

        

        From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 

        Sent: Wednesday, September 26, 2012 9:13 AM

        To: user@hive.apache.org

        Subject: Re: zip file or tar file cosumption

        

        

         

        

        Hi Savant,

        

        Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.

        

        Sent from my BlackBerry, pls excuse typo

        

        

    


    
        


    


    
        From:"Savant, Keshav" <Ke...@fisglobal.com>

        

        

        Date:Wed, 26 Sep 2012 12:25:38 +0000

        

        

        To:user@hive.apache.org<us...@hive.apache.org>

        

        

        ReplyTo:user@hive.apache.org

        

        

        Cc:Manish.Bhoge@target.com<Ma...@target.com>; Chuck.Connell@nuance.com<Ch...@nuance.com>

        

        

        Subject:RE: zip file or tar file cosumption

        

        

         

        

        

        Another solution would be

        

         

        

        Using shell script do following

        

        1.      unzip txt files, 

        

        2.      one by one merge those 50 (or N number of) text files into one text file,

        

        3.      then the zip/tar that bigger text file,

        

        4.      then that big zip/tar file can be uploaded into hive.

        

         

        

        Keshav C Savant 

        

        

         

        

        From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 

        Sent: Wednesday, September 26, 2012 4:04 PM

        To: user@hive.apache.org

        Subject: RE: zip file or tar file cosumption

        

        

         

        

        This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

        

        BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

        

        Chuck

        

    


    
        


    


    
        From: Manish.Bhoge [Manish.Bhoge@target.com]

        Sent: Wednesday, September 26, 2012 3:17 AM

        To: user@hive.apache.org

        Subject: zip file or tar file cosumption

        

        

        Hivers,

        

         

        

        I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

        

         

        

        Thanks & Regards

        

        Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite

        

         

        

        

        _____________

        The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

        

        

        _____________

        The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

        

        

        _____________

        The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

        

    
    

    

    



 		 	   		  

RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
Thanks Bejoy. I have zip file there is sense to convert into gzip again.

Chuck, I got what you are trying to say. So I need to process it outside
HDFS and bring the text file into HDFS.


On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote: 
> Hi Manish
> 
> Gzip works well if you have the compression codec available in
> 'io.compression.codes' . Gzip codec is present in default.
> 
> I don't think untar ing world be done by map reduce jobs. So tar files
> may not work with hive, you need to untar the files out of hadoop hive
> as a prerequisite.
> 
> 
> 
> Regards
> Bejoy KS
> 
> 
> 
> 
> 
> ______________________________________________________________________
> To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> Subject: Re: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> Date: Sun, 30 Sep 2012 12:32:15 +0000
> 
> What about .gz OR tar file. Does this unzip require at HDFS and load
> into hive? How you resolve it.
> 
> 
> Sent from my BlackBerry, pls excuse typo
> 
> 
> ______________________________________________________________________
> 
> From: "Connell, Chuck" <Ch...@nuance.com>
> Date: Sun, 30 Sep 2012 12:24:37 +0000
> To: user@hive.apache.org<us...@hive.apache.org>; Savant,
> Keshav<Ke...@fisglobal.com>
> ReplyTo: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
> 
> 
> I have seen that error when I try to overwrite an existing file. 
> 
> But, more importantly, Hive cannot understand ZIP files. There was a
> long thread about this just a few days ago. Your table def says
> "stored as textfile" but you are not giving it a text file.
> 
> Chuck
> 
> 
> 
> 
> ______________________________________________________________________
> 
> From: Manish [manishbhoge@rocketmail.com]
> Sent: Sunday, September 30, 2012 7:38 AM
> To: Savant, Keshav
> Cc: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
> 
> 
> 
> 
> I am getting below error when loading zip file 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
>         True Manish.
>         
>          
>         
>         Keshav C Savant 
>         
>         
>          
>         
>         From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
>         Sent: Thursday, September 27, 2012 4:26 PM
>         To: user@hive.apache.org; manishbhoge@rocketmail.com
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         Thanks Savant. I believe this will hold good for .zip file
>         also.
>         
>          
>         
>         Thank You,
>         
>         Manish.
>         
>          
>         
>         From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
>         Sent: Thursday, September 27, 2012 10:19 AM
>         To: user@hive.apache.org; manishbhoge@rocketmail.com
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         Manish the table that has been created for zipped text files
>         should be defined as sequence file, for example
>         
>          
>         
>         CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
>         DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
>         
>          
>         
>         After this you can use regular load command to load these
>         files, for example
>         
>          
>         
>         load data local inpath 'path-to-csv-file.gz' into table
>         my_table_zip;
>         
>          
>         
>         hope this helps
>         
>          
>         
>         Keshav C Savant 
>         
>         
>          
>         
>         From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
>         Sent: Wednesday, September 26, 2012 9:43 PM
>         To: user@hive.apache.org
>         Subject: Re: zip file or tar file cosumption
>         
>         
>          
>         
>         Hi Richin,
>         
>         Thanks! Yes this is what I wanted to understand how to load
>         zip file to Hive table. Now, I'll try this option.
>         
>         Thank You,
>         Manish. 
>         
>         Sent from my BlackBerry, pls excuse typo
>         
>         
>         
>                                        
>         ______________________________________________________________
>         
>         From:<ri...@nokia.com> 
>         
>         
>         Date:Wed, 26 Sep 2012 14:51:39 +0000
>         
>         
>         To:<us...@hive.apache.org>
>         
>         
>         ReplyTo:user@hive.apache.org 
>         
>         
>         Subject:RE: zip file or tar file cosumption
>         
>         
>          
>         
>         
>         You are right Chuck. I thought his question was how to use zip
>         files or any compressed files in Hive tables.
>         
>          
>         
>         Yeah, seems like you can’t do that
>         see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
>         
>         But you can always compress your files in gzip format and they
>         should be good to go.
>         
>          
>         
>         Richin
>         
>          
>         
>         From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
>         Sent: Wednesday, September 26, 2012 10:44 AM
>         To: user@hive.apache.org
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         But TEXTFILE in Hive always has newline as the record
>         delimiter. How could this possibly work with a zip/tar file
>         that can contain ASCII 10 characters at random locations, and
>         certainly does not have ASCII 10 at the end of each data
>         record?
>         
>          
>         
>         Chuck Connell
>         
>         Nuance R&D Data Team
>         
>         Burlington, MA
>         
>          
>         
>          
>         
>         
>         From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
>         Sent: Wednesday, September 26, 2012 10:14 AM
>         To: user@hive.apache.org; manishbhoge@rocketmail.com
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         Hi Manish,
>         
>          
>         
>         If you have your zip file at location -  /home/manish/zipfile,
>         you can just point your external table to that location like
>         
>         CREATE EXTERNAL TABLE manish_test (field1 string, field2
>         string) ROW FORMAT DELIMITED FIELDS TERMINATED BY
>         <your_column_delimiter> STORED AS TEXTFILE LOCATION
>         ‘/home/manish/zipfile’;
>         
>          
>         
>         OR
>         
>          
>         
>         If you already have external table pointing to a certain
>         location you can load this zip file into your table as
>         
>         LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE
>         manish_test;
>         
>          
>         
>         Hope this helps.
>         
>          
>         
>         Richin
>         
>          
>         
>         From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
>         Sent: Wednesday, September 26, 2012 9:13 AM
>         To: user@hive.apache.org
>         Subject: Re: zip file or tar file cosumption
>         
>         
>          
>         
>         Hi Savant,
>         
>         Got it. But I still need to understand that how to load zip?
>         Can I directly use zip file in external table. can u pls help
>         to get the load statement.
>         
>         Sent from my BlackBerry, pls excuse typo
>         
>         
>         
>                                        
>         ______________________________________________________________
>         
>         From:"Savant, Keshav" <Ke...@fisglobal.com>
>         
>         
>         Date:Wed, 26 Sep 2012 12:25:38 +0000
>         
>         
>         To:user@hive.apache.org<us...@hive.apache.org>
>         
>         
>         ReplyTo:user@hive.apache.org
>         
>         
>         Cc:Manish.Bhoge@target.com<Ma...@target.com>;
>         Chuck.Connell@nuance.com<Ch...@nuance.com>
>         
>         
>         Subject:RE: zip file or tar file cosumption
>         
>         
>          
>         
>         
>         Another solution would be
>         
>          
>         
>         Using shell script do following
>         
>         1.      unzip txt files, 
>         
>         2.      one by one merge those 50 (or N number of) text files
>         into one text file,
>         
>         3.      then the zip/tar that bigger text file,
>         
>         4.      then that big zip/tar file can be uploaded into hive.
>         
>          
>         
>         Keshav C Savant 
>         
>         
>          
>         
>         From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
>         Sent: Wednesday, September 26, 2012 4:04 PM
>         To: user@hive.apache.org
>         Subject: RE: zip file or tar file cosumption
>         
>         
>          
>         
>         This could be a problem. Hive uses newline as the record
>         separator. A ZIP file will certainly newline characters. So I
>         doubt this is possible.
>         
>         BUT, I would like to hear from anyone who has solved the
>         "newline is always a record separator" problem, because we ran
>         into it for another type of compressed file.
>         
>         Chuck
>         
>         
>                                        
>         ______________________________________________________________
>         
>         From: Manish.Bhoge [Manish.Bhoge@target.com]
>         Sent: Wednesday, September 26, 2012 3:17 AM
>         To: user@hive.apache.org
>         Subject: zip file or tar file cosumption
>         
>         
>         Hivers,
>         
>          
>         
>         I want to understand that would it be possible to utilize
>         zip/tar files directly into Hive. All the files has similar
>         schema (structure).  Say 50 *.txt files are zipped into a
>         single zip file can we load data directly from this zip file
>         OR should we need to unzip first?
>         
>          
>         
>         Thanks & Regards
>         
>         Manish Bhoge | Technical
>         Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP:
>         22165 |! “Excellence is not a skill, It is an attitude.”
>         MySite
>         
>          
>         
>         
>         _____________
>         The information contained in this message is proprietary
>         and/or confidential. If you are not the intended recipient,
>         please: (i) delete the message and all copies; (ii) do not
>         disclose, distribute or use the message in any manner; and
>         (iii) notify the sender immediately. In addition, please be
>         aware that any message addressed to our domain is subject to
>         archiving and review by persons other than the intended
>         recipient. Thank you.
>         
>         
>         _____________
>         The information contained in this message is proprietary
>         and/or confidential. If you are not the intended recipient,
>         please: (i) delete the message and all copies; (ii) do not
>         disclose, distribute or use the message in any manner; and
>         (iii) notify the sender immediately. In addition, please be
>         aware that any message addressed to our domain is subject to
>         archiving and review by persons other than the intended
>         recipient. Thank you.
>         
>         
>         _____________
>         The information contained in this message is proprietary
>         and/or confidential. If you are not the intended recipient,
>         please: (i) delete the message and all copies; (ii) do not
>         disclose, distribute or use the message in any manner; and
>         (iii) notify the sender immediately. In addition, please be
>         aware that any message addressed to our domain is subject to
>         archiving and review by persons other than the intended
>         recipient. Thank you.
>         
> 
> 
> 
> 


RE: zip file or tar file cosumption

Posted by Bejoy KS <be...@outlook.com>.
Hi ManishGzip works well if you have the compression codec available in 'io.compression.codes' . Gzip codec is present in default.I don't think untar ing world be done by map reduce jobs. So tar files may not work with hive, you need to untar the files out of hadoop hive as a prerequisite.
RegardsBejoy KS

To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
Subject: Re: zip file or tar file cosumption
From: manishbhoge@rocketmail.com
Date: Sun, 30 Sep 2012 12:32:15 +0000






What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve it.

Sent from my BlackBerry, pls excuse typoFrom:  "Connell, Chuck" <Ch...@nuance.com>
Date: Sun, 30 Sep 2012 12:24:37 +0000To: user@hive.apache.org<us...@hive.apache.org>; Savant, Keshav<Ke...@fisglobal.com>ReplyTo:  user@hive.apache.org
Subject: RE: zip file or tar file cosumption

I have seen that error when I try to overwrite an existing file.




But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.



Chuck







From: Manish [manishbhoge@rocketmail.com]

Sent: Sunday, September 30, 2012 7:38 AM

To: Savant, Keshav

Cc: user@hive.apache.org

Subject: RE: zip file or tar file cosumption







I am getting below error when loading zip file
Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`

Table definition: 
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish





On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 

True Manish.



 



Keshav C Savant 





 



From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 

Sent: Thursday, September 27, 2012 4:26 PM

To: user@hive.apache.org; manishbhoge@rocketmail.com

Subject: RE: zip file or tar file cosumption





 



Thanks Savant. I believe this will hold good for .zip file also.



 



Thank You,



Manish.



 



From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]


Sent: Thursday, September 27, 2012 10:19 AM

To: user@hive.apache.org;
manishbhoge@rocketmail.com

Subject: RE: zip file or tar file cosumption





 



Manish the table that has been created for zipped text files should be defined as sequence file, for example



 



CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;



 



After this you can use regular load command to load these files, for example



 



load data local inpath 'path-to-csv-file.gz' into table my_table_zip;



 



hope this helps



 



Keshav C Savant 





 



From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]


Sent: Wednesday, September 26, 2012 9:43 PM

To: user@hive.apache.org

Subject: Re: zip file or tar file cosumption





 



Hi Richin,



Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.



Thank You,

Manish. 



Sent from my BlackBerry, pls excuse typo











From:<ri...@nokia.com>






Date:Wed, 26 Sep 2012 14:51:39 +0000





To:<us...@hive.apache.org>





ReplyTo:user@hive.apache.org






Subject:RE: zip file or tar file cosumption





 





You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.



 



Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E



But you can always compress your files in gzip format and they should be good to go.



 



Richin



 



From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]


Sent: Wednesday, September 26, 2012 10:44 AM

To: user@hive.apache.org

Subject: RE: zip file or tar file cosumption





 



But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?



 



Chuck Connell



Nuance R&D Data Team



Burlington, MA



 



 





From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]


Sent: Wednesday, September 26, 2012 10:14 AM

To: user@hive.apache.org;
manishbhoge@rocketmail.com

Subject: RE: zip file or tar file cosumption





 



Hi Manish,



 



If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like



CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;



 



OR



 



If you already have external table pointing to a certain location you can load this zip file into your table as



LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;



 



Hope this helps.



 



Richin



 



From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]


Sent: Wednesday, September 26, 2012 9:13 AM

To: user@hive.apache.org

Subject: Re: zip file or tar file cosumption





 



Hi Savant,



Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.



Sent from my BlackBerry, pls excuse typo











From:"Savant, Keshav" <Ke...@fisglobal.com>





Date:Wed, 26 Sep 2012 12:25:38 +0000





To:user@hive.apache.org<us...@hive.apache.org>





ReplyTo:user@hive.apache.org





Cc:Manish.Bhoge@target.com<Ma...@target.com>;

Chuck.Connell@nuance.com<Ch...@nuance.com>





Subject:RE: zip file or tar file cosumption





 





Another solution would be



 



Using shell script do following



1.      unzip txt files, 



2.      one by one merge those 50 (or N number of) text files into one text file,



3.      then the zip/tar that bigger text file,



4.      then that big zip/tar file can be uploaded into hive.



 



Keshav C Savant 





 



From: Connell, Chuck 
[mailto:Chuck.Connell@nuance.com] 

Sent: Wednesday, September 26, 2012 4:04 PM

To: user@hive.apache.org

Subject: RE: zip file or tar file cosumption





 



This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.



BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.



Chuck









From: Manish.Bhoge [Manish.Bhoge@target.com]

Sent: Wednesday, September 26, 2012 3:17 AM

To: user@hive.apache.org

Subject: zip file or tar file cosumption





Hivers,



 



I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to
 unzip first?



 



Thanks & Regards



Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.”
MySite



 





_____________

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.





_____________

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.





_____________

The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender
 immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.










 		 	   		  

Re: zip file or tar file cosumption

Posted by Manish Bhoge <ma...@rocketmail.com>.
What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve it.


Sent from my BlackBerry, pls excuse typo

-----Original Message-----
From: "Connell, Chuck" <Ch...@nuance.com>
Date: Sun, 30 Sep 2012 12:24:37 
To: user@hive.apache.org<us...@hive.apache.org>; Savant, Keshav<Ke...@fisglobal.com>
Reply-To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption

I have seen that error when I try to overwrite an existing file.

But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.

Chuck


________________________________
From: Manish [manishbhoge@rocketmail.com]
Sent: Sunday, September 30, 2012 7:38 AM
To: Savant, Keshav
Cc: user@hive.apache.org
Subject: RE: zip file or tar file cosumption


I am getting below error when loading zip file

Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`

Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish



On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
True Manish.



Keshav C Savant




From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption




Thanks Savant. I believe this will hold good for .zip file also.



Thank You,

Manish.



From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption




Manish the table that has been created for zipped text files should be defined as sequence file, for example



CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;



After this you can use regular load command to load these files, for example



load data local inpath 'path-to-csv-file.gz' into table my_table_zip;



hope this helps



Keshav C Savant




From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption




Hi Richin,

Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

Thank You,
Manish.

Sent from my BlackBerry, pls excuse typo


________________________________
From:<ri...@nokia.com>>


Date:Wed, 26 Sep 2012 14:51:39 +0000


To:<us...@hive.apache.org>>


ReplyTo:user@hive.apache.org<ma...@hive.apache.org>


Subject:RE: zip file or tar file cosumption





You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.



Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E

But you can always compress your files in gzip format and they should be good to go.



Richin



From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption




But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?



Chuck Connell

Nuance R&D Data Team

Burlington, MA






From:richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption




Hi Manish,



If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like

CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;



OR



If you already have external table pointing to a certain location you can load this zip file into your table as

LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;



Hope this helps.



Richin



From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption




Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.

Sent from my BlackBerry, pls excuse typo


________________________________
From:"Savant, Keshav" <Ke...@fisglobal.com>>


Date:Wed, 26 Sep 2012 12:25:38 +0000


To:user@hive.apache.org<us...@hive.apache.org>>


ReplyTo:user@hive.apache.org<ma...@hive.apache.org>


Cc:Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>


Subject:RE: zip file or tar file cosumption





Another solution would be



Using shell script do following

1.      unzip txt files,

2.      one by one merge those 50 (or N number of) text files into one text file,

3.      then the zip/tar that bigger text file,

4.      then that big zip/tar file can be uploaded into hive.



Keshav C Savant




From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption




This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck

________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption


Hivers,



I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?



Thanks & Regards

Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite<http://mysites.target.com/personal/z063783>




_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.





RE: zip file or tar file cosumption

Posted by "Connell, Chuck" <Ch...@nuance.com>.
I have seen that error when I try to overwrite an existing file.

But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.

Chuck


________________________________
From: Manish [manishbhoge@rocketmail.com]
Sent: Sunday, September 30, 2012 7:38 AM
To: Savant, Keshav
Cc: user@hive.apache.org
Subject: RE: zip file or tar file cosumption


I am getting below error when loading zip file

Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`

Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish



On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
True Manish.



Keshav C Savant




From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption




Thanks Savant. I believe this will hold good for .zip file also.



Thank You,

Manish.



From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption




Manish the table that has been created for zipped text files should be defined as sequence file, for example



CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;



After this you can use regular load command to load these files, for example



load data local inpath 'path-to-csv-file.gz' into table my_table_zip;



hope this helps



Keshav C Savant




From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption




Hi Richin,

Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

Thank You,
Manish.

Sent from my BlackBerry, pls excuse typo


________________________________
From:<ri...@nokia.com>>


Date:Wed, 26 Sep 2012 14:51:39 +0000


To:<us...@hive.apache.org>>


ReplyTo:user@hive.apache.org<ma...@hive.apache.org>


Subject:RE: zip file or tar file cosumption





You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.



Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E

But you can always compress your files in gzip format and they should be good to go.



Richin



From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption




But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?



Chuck Connell

Nuance R&D Data Team

Burlington, MA






From:richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption




Hi Manish,



If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like

CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;



OR



If you already have external table pointing to a certain location you can load this zip file into your table as

LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;



Hope this helps.



Richin



From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption




Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.

Sent from my BlackBerry, pls excuse typo


________________________________
From:"Savant, Keshav" <Ke...@fisglobal.com>>


Date:Wed, 26 Sep 2012 12:25:38 +0000


To:user@hive.apache.org<us...@hive.apache.org>>


ReplyTo:user@hive.apache.org<ma...@hive.apache.org>


Cc:Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>


Subject:RE: zip file or tar file cosumption





Another solution would be



Using shell script do following

1.      unzip txt files,

2.      one by one merge those 50 (or N number of) text files into one text file,

3.      then the zip/tar that bigger text file,

4.      then that big zip/tar file can be uploaded into hive.



Keshav C Savant




From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption




This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck

________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption


Hivers,



I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?



Thanks & Regards

Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite<http://mysites.target.com/personal/z063783>




_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.




RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file 
> 
> 
> Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
> 
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
> 
> Table definition: 
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> 
> Thank You,
> Manish
> 
> 
> 
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 
> 
> > True Manish.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Thanks Savant. I believe this will hold good for .zip file also.
> > 
> >  
> > 
> > Thank You,
> > 
> > Manish.
> > 
> >  
> > 
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> > 
> >  
> > 
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> > 
> >  
> > 
> > After this you can use regular load command to load these files, for
> > example
> > 
> >  
> > 
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> > 
> >  
> > 
> > hope this helps
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Richin,
> > 
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> > 
> > Thank You,
> > Manish. 
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:<ri...@nokia.com> 
> > 
> > 
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> > 
> > 
> > To:<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org 
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> > 
> >  
> > 
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> > 
> > But you can always compress your files in gzip format and they
> > should be good to go.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> > 
> >  
> > 
> > Chuck Connell
> > 
> > Nuance R&D Data Team
> > 
> > Burlington, MA
> > 
> >  
> > 
> >  
> > 
> > 
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Manish,
> > 
> >  
> > 
> > If you have your zip file at location -  /home/manish/zipfile, you
> > can just point your external table to that location like
> > 
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> > 
> >  
> > 
> > OR
> > 
> >  
> > 
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> > 
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> > 
> >  
> > 
> > Hope this helps.
> > 
> >  
> > 
> > Richin
> > 
> >  
> > 
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > Hi Savant,
> > 
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> > 
> > Sent from my BlackBerry, pls excuse typo
> > 
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> > 
> > 
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> > 
> > 
> > To:user@hive.apache.org<us...@hive.apache.org>
> > 
> > 
> > ReplyTo:user@hive.apache.org
> > 
> > 
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> > 
> > 
> > Subject:RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > 
> > Another solution would be
> > 
> >  
> > 
> > Using shell script do following
> > 
> > 1.      unzip txt files, 
> > 
> > 2.      one by one merge those 50 (or N number of) text files into
> > one text file,
> > 
> > 3.      then the zip/tar that bigger text file,
> > 
> > 4.      then that big zip/tar file can be uploaded into hive.
> > 
> >  
> > 
> > Keshav C Savant 
> > 
> > 
> >  
> > 
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> > 
> > 
> >  
> > 
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> > 
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> > 
> > Chuck
> > 
> > 
> >                                   
> > ____________________________________________________________________
> > 
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> > 
> > 
> > Hivers,
> > 
> >  
> > 
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure).  Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> > 
> >  
> > 
> > Thanks & Regards
> > 
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> > 
> >  
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> > 
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> > 
> 
> 
> 



RE: zip file or tar file cosumption

Posted by Manish <ma...@rocketmail.com>.
I am getting below error when loading zip file 


Driver returned: 9.  Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask

My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`

Table definition: 
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '=' 
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'

Thank You,
Manish



On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote: 

> True Manish.
> 
>  
> 
> Keshav C Savant 
> 
> 
>  
> 
> From: Manish.Bhoge [mailto:Manish.Bhoge@target.com] 
> Sent: Thursday, September 27, 2012 4:26 PM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
> 
> 
>  
> 
> Thanks Savant. I believe this will hold good for .zip file also.
> 
>  
> 
> Thank You,
> 
> Manish.
> 
>  
> 
> From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com] 
> Sent: Thursday, September 27, 2012 10:19 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
> 
> 
>  
> 
> Manish the table that has been created for zipped text files should be
> defined as sequence file, for example
> 
>  
> 
> CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> 
>  
> 
> After this you can use regular load command to load these files, for
> example
> 
>  
> 
> load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
> 
>  
> 
> hope this helps
> 
>  
> 
> Keshav C Savant 
> 
> 
>  
> 
> From: Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> Sent: Wednesday, September 26, 2012 9:43 PM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
> 
> 
>  
> 
> Hi Richin,
> 
> Thanks! Yes this is what I wanted to understand how to load zip file
> to Hive table. Now, I'll try this option.
> 
> Thank You,
> Manish. 
> 
> Sent from my BlackBerry, pls excuse typo
> 
> 
> 
>                                    
> ______________________________________________________________________
> 
> From:<ri...@nokia.com> 
> 
> 
> Date:Wed, 26 Sep 2012 14:51:39 +0000
> 
> 
> To:<us...@hive.apache.org>
> 
> 
> ReplyTo:user@hive.apache.org 
> 
> 
> Subject:RE: zip file or tar file cosumption
> 
> 
>  
> 
> 
> You are right Chuck. I thought his question was how to use zip files
> or any compressed files in Hive tables.
> 
>  
> 
> Yeah, seems like you can’t do that
> see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%
> 3E
> 
> But you can always compress your files in gzip format and they should
> be good to go.
> 
>  
> 
> Richin
> 
>  
> 
> From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> Sent: Wednesday, September 26, 2012 10:44 AM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
> 
> 
>  
> 
> But TEXTFILE in Hive always has newline as the record delimiter. How
> could this possibly work with a zip/tar file that can contain ASCII 10
> characters at random locations, and certainly does not have ASCII 10
> at the end of each data record?
> 
>  
> 
> Chuck Connell
> 
> Nuance R&D Data Team
> 
> Burlington, MA
> 
>  
> 
>  
> 
> 
> From:richin.jain@nokia.com [mailto:richin.jain@nokia.com] 
> Sent: Wednesday, September 26, 2012 10:14 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
> 
> 
>  
> 
> Hi Manish,
> 
>  
> 
> If you have your zip file at location -  /home/manish/zipfile, you can
> just point your external table to that location like
> 
> CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> 
>  
> 
> OR
> 
>  
> 
> If you already have external table pointing to a certain location you
> can load this zip file into your table as
> 
> LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> 
>  
> 
> Hope this helps.
> 
>  
> 
> Richin
> 
>  
> 
> From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com] 
> Sent: Wednesday, September 26, 2012 9:13 AM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
> 
> 
>  
> 
> Hi Savant,
> 
> Got it. But I still need to understand that how to load zip? Can I
> directly use zip file in external table. can u pls help to get the
> load statement.
> 
> Sent from my BlackBerry, pls excuse typo
> 
> 
> 
>                                    
> ______________________________________________________________________
> 
> From:"Savant, Keshav" <Ke...@fisglobal.com>
> 
> 
> Date:Wed, 26 Sep 2012 12:25:38 +0000
> 
> 
> To:user@hive.apache.org<us...@hive.apache.org>
> 
> 
> ReplyTo:user@hive.apache.org
> 
> 
> Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> Chuck.Connell@nuance.com<Ch...@nuance.com>
> 
> 
> Subject:RE: zip file or tar file cosumption
> 
> 
>  
> 
> 
> Another solution would be
> 
>  
> 
> Using shell script do following
> 
> 1.      unzip txt files, 
> 
> 2.      one by one merge those 50 (or N number of) text files into one
> text file,
> 
> 3.      then the zip/tar that bigger text file,
> 
> 4.      then that big zip/tar file can be uploaded into hive.
> 
>  
> 
> Keshav C Savant 
> 
> 
>  
> 
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com] 
> Sent: Wednesday, September 26, 2012 4:04 PM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
> 
> 
>  
> 
> This could be a problem. Hive uses newline as the record separator. A
> ZIP file will certainly newline characters. So I doubt this is
> possible.
> 
> BUT, I would like to hear from anyone who has solved the "newline is
> always a record separator" problem, because we ran into it for another
> type of compressed file.
> 
> Chuck
> 
> 
>                                    
> ______________________________________________________________________
> 
> From: Manish.Bhoge [Manish.Bhoge@target.com]
> Sent: Wednesday, September 26, 2012 3:17 AM
> To: user@hive.apache.org
> Subject: zip file or tar file cosumption
> 
> 
> Hivers,
> 
>  
> 
> I want to understand that would it be possible to utilize zip/tar
> files directly into Hive. All the files has similar schema
> (structure).  Say 50 *.txt files are zipped into a single zip file can
> we load data directly from this zip file OR should we need to unzip
> first?
> 
>  
> 
> Thanks & Regards
> 
> Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> attitude.” MySite
> 
>  
> 
> 
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i)
> delete the message and all copies; (ii) do not disclose, distribute or
> use the message in any manner; and (iii) notify the sender
> immediately. In addition, please be aware that any message addressed
> to our domain is subject to archiving and review by persons other than
> the intended recipient. Thank you.
> 
> 
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i)
> delete the message and all copies; (ii) do not disclose, distribute or
> use the message in any manner; and (iii) notify the sender
> immediately. In addition, please be aware that any message addressed
> to our domain is subject to archiving and review by persons other than
> the intended recipient. Thank you.
> 
> 
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i)
> delete the message and all copies; (ii) do not disclose, distribute or
> use the message in any manner; and (iii) notify the sender
> immediately. In addition, please be aware that any message addressed
> to our domain is subject to archiving and review by persons other than
> the intended recipient. Thank you.
> 




RE: zip file or tar file cosumption

Posted by "Savant, Keshav" <Ke...@fisglobal.com>.
True Manish.

Keshav C Savant

From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption

Thanks Savant. I believe this will hold good for .zip file also.

Thank You,
Manish.

From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Manish the table that has been created for zipped text files should be defined as sequence file, for example

CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;

After this you can use regular load command to load these files, for example

load data local inpath 'path-to-csv-file.gz' into table my_table_zip;

hope this helps

Keshav C Savant

From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Richin,

Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From: <ri...@nokia.com>>
Date: Wed, 26 Sep 2012 14:51:39 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.

Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.

Richin

From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

Chuck Connell
Nuance R&D Data Team
Burlington, MA


From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

RE: zip file or tar file cosumption

Posted by "Manish.Bhoge" <Ma...@target.com>.
Thanks Savant. I believe this will hold good for .zip file also.

Thank You,
Manish.

From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption

Manish the table that has been created for zipped text files should be defined as sequence file, for example

CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;

After this you can use regular load command to load these files, for example

load data local inpath 'path-to-csv-file.gz' into table my_table_zip;

hope this helps

Keshav C Savant

From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption

Hi Richin,

Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From: <ri...@nokia.com>>
Date: Wed, 26 Sep 2012 14:51:39 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.

Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.

Richin

From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

Chuck Connell
Nuance R&D Data Team
Burlington, MA


From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

RE: zip file or tar file cosumption

Posted by "Savant, Keshav" <Ke...@fisglobal.com>.
Manish the table that has been created for zipped text files should be defined as sequence file, for example

CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;

After this you can use regular load command to load these files, for example

load data local inpath 'path-to-csv-file.gz' into table my_table_zip;

hope this helps

Keshav C Savant

From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption

Hi Richin,

Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From: <ri...@nokia.com>>
Date: Wed, 26 Sep 2012 14:51:39 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.

Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.

Richin

From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

Chuck Connell
Nuance R&D Data Team
Burlington, MA


From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

Re: zip file or tar file cosumption

Posted by Manish Bhoge <ma...@rocketmail.com>.
Hi Richin,

Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.

Thank You,
Manish. 


Sent from my BlackBerry, pls excuse typo

-----Original Message-----
From: <ri...@nokia.com>
Date: Wed, 26 Sep 2012 14:51:39 
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption

You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.

Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.

Richin

From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption

But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

Chuck Connell
Nuance R&D Data Team
Burlington, MA


From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


RE: zip file or tar file cosumption

Posted by ri...@nokia.com.
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.

Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.

Richin

From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption

But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

Chuck Connell
Nuance R&D Data Team
Burlington, MA


From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

RE: zip file or tar file cosumption

Posted by "Connell, Chuck" <Ch...@nuance.com>.
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?

Chuck Connell
Nuance R&D Data Team
Burlington, MA


From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption

Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

RE: zip file or tar file cosumption

Posted by ri...@nokia.com.
Hi Manish,

If you have your zip file at location -  /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';

OR

If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;

Hope this helps.

Richin

From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption

Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

Re: zip file or tar file cosumption

Posted by Manish Bhoge <ma...@rocketmail.com>.
Hi Savant,

Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement. 
Sent from my BlackBerry, pls excuse typo

-----Original Message-----
From: "Savant, Keshav" <Ke...@fisglobal.com>
Date: Wed, 26 Sep 2012 12:25:38 
To: user@hive.apache.org<us...@hive.apache.org>
Reply-To: user@hive.apache.org
Cc: Manish.Bhoge@target.com<Ma...@target.com>; Chuck.Connell@nuance.com<Ch...@nuance.com>
Subject: RE: zip file or tar file cosumption

Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck

________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.


RE: zip file or tar file cosumption

Posted by "Savant, Keshav" <Ke...@fisglobal.com>.
Another solution would be

Using shell script do following

1.       unzip txt files,

2.       one by one merge those 50 (or N number of) text files into one text file,

3.       then the zip/tar that bigger text file,

4.       then that big zip/tar file can be uploaded into hive.

Keshav C Savant

From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption

This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck

________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>

_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.

RE: zip file or tar file cosumption

Posted by "Connell, Chuck" <Ch...@nuance.com>.
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.

BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.

Chuck


________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org
Subject: zip file or tar file cosumption

Hivers,

I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure).  Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?

Thanks & Regards
Manish Bhoge | Technical Architect  • Target DW/BI| • +919379850010 (M) Ext: 5691 VOIP: 22165 | • “Excellence is not a skill, It is an attitude.” MySite<http://mysites.target.com/personal/z063783>