You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by "Manish.Bhoge" <Ma...@target.com> on 2012/09/26 09:17:32 UTC
zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file
>
>
> Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for
> > example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> >
> > But you can always compress your files in gzip format and they
> > should be good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location - /home/manish/zipfile, you
> > can just point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1. unzip txt files,
> >
> > 2. one by one merge those 50 (or N number of) text files into
> > one text file,
> >
> > 3. then the zip/tar that bigger text file,
> >
> > 4. then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> >
> > Chuck
> >
> >
> >
> > ____________________________________________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure). Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
>
>
>
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file
>
>
> Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for
> > example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> >
> > But you can always compress your files in gzip format and they
> > should be good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location - /home/manish/zipfile, you
> > can just point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1. unzip txt files,
> >
> > 2. one by one merge those 50 (or N number of) text files into
> > one text file,
> >
> > 3. then the zip/tar that bigger text file,
> >
> > 4. then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> >
> > Chuck
> >
> >
> >
> > ____________________________________________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure). Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
>
>
>
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file
>
>
> Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for
> > example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> >
> > But you can always compress your files in gzip format and they
> > should be good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location - /home/manish/zipfile, you
> > can just point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1. unzip txt files,
> >
> > 2. one by one merge those 50 (or N number of) text files into
> > one text file,
> >
> > 3. then the zip/tar that bigger text file,
> >
> > 4. then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> >
> > Chuck
> >
> >
> >
> > ____________________________________________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure). Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
>
>
>
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file
>
>
> Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for
> > example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> >
> > But you can always compress your files in gzip format and they
> > should be good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location - /home/manish/zipfile, you
> > can just point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1. unzip txt files,
> >
> > 2. one by one merge those 50 (or N number of) text files into
> > one text file,
> >
> > 3. then the zip/tar that bigger text file,
> >
> > 4. then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> >
> > Chuck
> >
> >
> >
> > ____________________________________________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure). Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
>
>
>
RE: zip file or tar file cosumption
Posted by Bejoy KS <be...@outlook.com>.
Definitely Raja, but looks like the one for zip is blocked for some time now
https://issues.apache.org/jira/browse/MAPREDUCE-210
Regards
Bejoy KS
> Date: Sun, 30 Sep 2012 12:41:29 -0700
> Subject: Re: zip file or tar file cosumption
> From: thiruvathuru@gmail.com
> To: user@hive.apache.org
>
> we can write custom codecs
>
> On Sun, Sep 30, 2012 at 11:47 AM, Bejoy KS <be...@outlook.com> wrote:
> > Yes Manish, Zip is not supported in hadoop. You may have to use gzip
> > instead.
> >
> > Regards
> > Bejoy KS
> >
> >
> > ________________________________
> > Subject: RE: zip file or tar file cosumption
> > From: manishbhoge@rocketmail.com
> > To: user@hive.apache.org
> > CC: Chuck.Connell@nuance.com
> > Date: Sun, 30 Sep 2012 20:35:35 +0530
> >
> > Thanks Bejoy. I have zip file there is sense to convert into gzip again.
> >
> > Chuck, I got what you are trying to say. So I need to process it outside
> > HDFS and bring the text file into HDFS.
> >
> >
> > On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
> >
> > Hi Manish
> >
> > Gzip works well if you have the compression codec available in
> > 'io.compression.codes' . Gzip codec is present in default.
> >
> > I don't think untar ing world be done by map reduce jobs. So tar files may
> > not work with hive, you need to untar the files out of hadoop hive as a
> > prerequisite.
> >
> >
> >
> > Regards
> >
> > Bejoy KS
> >
> >
> > ________________________________
> >
> > To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> > Subject: Re: zip file or tar file cosumption
> > From: manishbhoge@rocketmail.com
> > Date: Sun, 30 Sep 2012 12:32:15 +0000
> >
> > What about .gz OR tar file. Does this unzip require at HDFS and load into
> > hive? How you resolve it.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> > ________________________________
> >
> > From: "Connell, Chuck" <Ch...@nuance.com>
> >
> > Date: Sun, 30 Sep 2012 12:24:37 +0000
> >
> > To: user@hive.apache.org<us...@hive.apache.org>; Savant,
> > Keshav<Ke...@fisglobal.com>
> >
> > ReplyTo: user@hive.apache.org
> >
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> > I have seen that error when I try to overwrite an existing file.
> >
> > But, more importantly, Hive cannot understand ZIP files. There was a long
> > thread about this just a few days ago. Your table def says "stored as
> > textfile" but you are not giving it a text file.
> >
> > Chuck
> >
> >
> > ________________________________
> >
> > From: Manish [manishbhoge@rocketmail.com]
> > Sent: Sunday, September 30, 2012 7:38 AM
> > To: Savant, Keshav
> > Cc: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > I am getting below error when loading zip file
> >
> > Driver returned: 9. Errors: Hive history
> > file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> > Loading data to table default.pageview_zip
> > Failed with exception Error moving:
> > hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into:
> > /user/manish/input/zip
> > FAILED: Execution Error, return code 1 from
> > org.apache.hadoop.hive.ql.exec.MoveTask
> >
> > My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip'
> > OVERWRITE INTO TABLE `pageview_zip`
> >
> > Table definition:
> > CREATE external TABLE pageview_zip
> > (
> > C_0 STRING,
> > C_1 STRING,
> > C_7 MAP<STRING,STRING>,
> > C_8 STRING,
> > C_13 MAP<STRING,STRING>,
> > C_21 STRING
> > )
> > COMMENT 'Page View'
> > ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY
> > ';' MAP KEYS TERMINATED BY '='
> > STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
> >
> > Thank You,
> > Manish
> >
> >
> >
> > On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
> >
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should be
> > defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED
> > FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file to Hive
> > table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> > ________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files or any
> > compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
> >
> > But you can always compress your files in gzip format and they should be
> > good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How could
> > this possibly work with a zip/tar file that can contain ASCII 10 characters
> > at random locations, and certainly does not have ASCII 10 at the end of each
> > data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location - /home/manish/zipfile, you can just
> > point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE
> > LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location you can
> > load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I directly
> > use zip file in external table. can u pls help to get the load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> > ________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1. unzip txt files,
> >
> > 2. one by one merge those 50 (or N number of) text files into one text
> > file,
> >
> > 3. then the zip/tar that bigger text file,
> >
> > 4. then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator. A ZIP
> > file will certainly newline characters. So I doubt this is possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is always
> > a record separator" problem, because we ran into it for another type of
> > compressed file.
> >
> > Chuck
> >
> > ________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar files
> > directly into Hive. All the files has similar schema (structure). Say 50
> > *.txt files are zipped into a single zip file can we load data directly from
> > this zip file OR should we need to unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext:
> > 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i) delete the
> > message and all copies; (ii) do not disclose, distribute or use the message
> > in any manner; and (iii) notify the sender immediately. In addition, please
> > be aware that any message addressed to our domain is subject to archiving
> > and review by persons other than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i) delete the
> > message and all copies; (ii) do not disclose, distribute or use the message
> > in any manner; and (iii) notify the sender immediately. In addition, please
> > be aware that any message addressed to our domain is subject to archiving
> > and review by persons other than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i) delete the
> > message and all copies; (ii) do not disclose, distribute or use the message
> > in any manner; and (iii) notify the sender immediately. In addition, please
> > be aware that any message addressed to our domain is subject to archiving
> > and review by persons other than the intended recipient. Thank you.
> >
> >
> >
> >
> >
>
>
>
> --
>
> Raja Thiruvathuru
Re: zip file or tar file cosumption
Posted by Raja Thiruvathuru <th...@gmail.com>.
we can write custom codecs
On Sun, Sep 30, 2012 at 11:47 AM, Bejoy KS <be...@outlook.com> wrote:
> Yes Manish, Zip is not supported in hadoop. You may have to use gzip
> instead.
>
> Regards
> Bejoy KS
>
>
> ________________________________
> Subject: RE: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> To: user@hive.apache.org
> CC: Chuck.Connell@nuance.com
> Date: Sun, 30 Sep 2012 20:35:35 +0530
>
> Thanks Bejoy. I have zip file there is sense to convert into gzip again.
>
> Chuck, I got what you are trying to say. So I need to process it outside
> HDFS and bring the text file into HDFS.
>
>
> On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
>
> Hi Manish
>
> Gzip works well if you have the compression codec available in
> 'io.compression.codes' . Gzip codec is present in default.
>
> I don't think untar ing world be done by map reduce jobs. So tar files may
> not work with hive, you need to untar the files out of hadoop hive as a
> prerequisite.
>
>
>
> Regards
>
> Bejoy KS
>
>
> ________________________________
>
> To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> Subject: Re: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> Date: Sun, 30 Sep 2012 12:32:15 +0000
>
> What about .gz OR tar file. Does this unzip require at HDFS and load into
> hive? How you resolve it.
>
> Sent from my BlackBerry, pls excuse typo
>
> ________________________________
>
> From: "Connell, Chuck" <Ch...@nuance.com>
>
> Date: Sun, 30 Sep 2012 12:24:37 +0000
>
> To: user@hive.apache.org<us...@hive.apache.org>; Savant,
> Keshav<Ke...@fisglobal.com>
>
> ReplyTo: user@hive.apache.org
>
> Subject: RE: zip file or tar file cosumption
>
>
>
> I have seen that error when I try to overwrite an existing file.
>
> But, more importantly, Hive cannot understand ZIP files. There was a long
> thread about this just a few days ago. Your table def says "stored as
> textfile" but you are not giving it a text file.
>
> Chuck
>
>
> ________________________________
>
> From: Manish [manishbhoge@rocketmail.com]
> Sent: Sunday, September 30, 2012 7:38 AM
> To: Savant, Keshav
> Cc: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> I am getting below error when loading zip file
>
> Driver returned: 9. Errors: Hive history
> file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving:
> hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into:
> /user/manish/input/zip
> FAILED: Execution Error, return code 1 from
> org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip'
> OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY
> ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> True Manish.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> Sent: Thursday, September 27, 2012 4:26 PM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Thanks Savant. I believe this will hold good for .zip file also.
>
>
>
> Thank You,
>
> Manish.
>
>
>
> From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> Sent: Thursday, September 27, 2012 10:19 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Manish the table that has been created for zipped text files should be
> defined as sequence file, for example
>
>
>
> CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED
> FIELDS TERMINATED BY ',' stored as sequencefile;
>
>
>
> After this you can use regular load command to load these files, for example
>
>
>
> load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
>
>
>
> hope this helps
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:43 PM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Richin,
>
> Thanks! Yes this is what I wanted to understand how to load zip file to Hive
> table. Now, I'll try this option.
>
> Thank You,
> Manish.
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ________________________________
>
> From:<ri...@nokia.com>
>
>
> Date:Wed, 26 Sep 2012 14:51:39 +0000
>
>
> To:<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> You are right Chuck. I thought his question was how to use zip files or any
> compressed files in Hive tables.
>
>
>
> Yeah, seems like you can’t do that
> see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
>
> But you can always compress your files in gzip format and they should be
> good to go.
>
>
>
> Richin
>
>
>
> From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 10:44 AM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> But TEXTFILE in Hive always has newline as the record delimiter. How could
> this possibly work with a zip/tar file that can contain ASCII 10 characters
> at random locations, and certainly does not have ASCII 10 at the end of each
> data record?
>
>
>
> Chuck Connell
>
> Nuance R&D Data Team
>
> Burlington, MA
>
>
>
>
>
>
> From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> Sent: Wednesday, September 26, 2012 10:14 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Hi Manish,
>
>
>
> If you have your zip file at location - /home/manish/zipfile, you can just
> point your external table to that location like
>
> CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE
> LOCATION ‘/home/manish/zipfile’;
>
>
>
> OR
>
>
>
> If you already have external table pointing to a certain location you can
> load this zip file into your table as
>
> LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
>
>
>
> Hope this helps.
>
>
>
> Richin
>
>
>
> From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:13 AM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Savant,
>
> Got it. But I still need to understand that how to load zip? Can I directly
> use zip file in external table. can u pls help to get the load statement.
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ________________________________
>
> From:"Savant, Keshav" <Ke...@fisglobal.com>
>
>
> Date:Wed, 26 Sep 2012 12:25:38 +0000
>
>
> To:user@hive.apache.org<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> Chuck.Connell@nuance.com<Ch...@nuance.com>
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> Another solution would be
>
>
>
> Using shell script do following
>
> 1. unzip txt files,
>
> 2. one by one merge those 50 (or N number of) text files into one text
> file,
>
> 3. then the zip/tar that bigger text file,
>
> 4. then that big zip/tar file can be uploaded into hive.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 4:04 PM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> This could be a problem. Hive uses newline as the record separator. A ZIP
> file will certainly newline characters. So I doubt this is possible.
>
> BUT, I would like to hear from anyone who has solved the "newline is always
> a record separator" problem, because we ran into it for another type of
> compressed file.
>
> Chuck
>
> ________________________________
>
> From: Manish.Bhoge [Manish.Bhoge@target.com]
> Sent: Wednesday, September 26, 2012 3:17 AM
> To: user@hive.apache.org
> Subject: zip file or tar file cosumption
>
>
> Hivers,
>
>
>
> I want to understand that would it be possible to utilize zip/tar files
> directly into Hive. All the files has similar schema (structure). Say 50
> *.txt files are zipped into a single zip file can we load data directly from
> this zip file OR should we need to unzip first?
>
>
>
> Thanks & Regards
>
> Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext:
> 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite
>
>
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i) delete the
> message and all copies; (ii) do not disclose, distribute or use the message
> in any manner; and (iii) notify the sender immediately. In addition, please
> be aware that any message addressed to our domain is subject to archiving
> and review by persons other than the intended recipient. Thank you.
>
>
>
>
>
--
Raja Thiruvathuru
RE: zip file or tar file cosumption
Posted by Bejoy KS <be...@outlook.com>.
Yes Manish, Zip is not supported in hadoop. You may have to use gzip instead.Regards
Bejoy KS
Subject: RE: zip file or tar file cosumption
From: manishbhoge@rocketmail.com
To: user@hive.apache.org
CC: Chuck.Connell@nuance.com
Date: Sun, 30 Sep 2012 20:35:35 +0530
Thanks Bejoy. I have zip file there is sense to convert into gzip again.
Chuck, I got what you are trying to say. So I need to process it outside HDFS and bring the text file into HDFS.
On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
Hi Manish
Gzip works well if you have the compression codec available in 'io.compression.codes' . Gzip codec is present in default.
I don't think untar ing world be done by map reduce jobs. So tar files may not work with hive, you need to untar the files out of hadoop hive as a prerequisite.
Regards
Bejoy KS
To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
Subject: Re: zip file or tar file cosumption
From: manishbhoge@rocketmail.com
Date: Sun, 30 Sep 2012 12:32:15 +0000
What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve it.
Sent from my BlackBerry, pls excuse typo
From: "Connell, Chuck" <Ch...@nuance.com>
Date: Sun, 30 Sep 2012 12:24:37 +0000
To: user@hive.apache.org<us...@hive.apache.org>; Savant, Keshav<Ke...@fisglobal.com>
ReplyTo: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I have seen that error when I try to overwrite an existing file.
But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.
Chuck
From: Manish [manishbhoge@rocketmail.com]
Sent: Sunday, September 30, 2012 7:38 AM
To: Savant, Keshav
Cc: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I am getting below error when loading zip file
Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
Thank You,
Manish
On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
True Manish.
Keshav C Savant
From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Thanks Savant. I believe this will hold good for .zip file also.
Thank You,
Manish.
From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
From:<ri...@nokia.com>
Date:Wed, 26 Sep 2012 14:51:39 +0000
To:<us...@hive.apache.org>
ReplyTo:user@hive.apache.org
Subject:RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
From:"Savant, Keshav" <Ke...@fisglobal.com>
Date:Wed, 26 Sep 2012 12:25:38 +0000
To:user@hive.apache.org<us...@hive.apache.org>
ReplyTo:user@hive.apache.org
Cc:Manish.Bhoge@target.com<Ma...@target.com>; Chuck.Connell@nuance.com<Ch...@nuance.com>
Subject:RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
Thanks Bejoy. I have zip file there is sense to convert into gzip again.
Chuck, I got what you are trying to say. So I need to process it outside
HDFS and bring the text file into HDFS.
On Sun, 2012-09-30 at 18:21 +0530, Bejoy KS wrote:
> Hi Manish
>
> Gzip works well if you have the compression codec available in
> 'io.compression.codes' . Gzip codec is present in default.
>
> I don't think untar ing world be done by map reduce jobs. So tar files
> may not work with hive, you need to untar the files out of hadoop hive
> as a prerequisite.
>
>
>
> Regards
> Bejoy KS
>
>
>
>
>
> ______________________________________________________________________
> To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
> Subject: Re: zip file or tar file cosumption
> From: manishbhoge@rocketmail.com
> Date: Sun, 30 Sep 2012 12:32:15 +0000
>
> What about .gz OR tar file. Does this unzip require at HDFS and load
> into hive? How you resolve it.
>
>
> Sent from my BlackBerry, pls excuse typo
>
>
> ______________________________________________________________________
>
> From: "Connell, Chuck" <Ch...@nuance.com>
> Date: Sun, 30 Sep 2012 12:24:37 +0000
> To: user@hive.apache.org<us...@hive.apache.org>; Savant,
> Keshav<Ke...@fisglobal.com>
> ReplyTo: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
> I have seen that error when I try to overwrite an existing file.
>
> But, more importantly, Hive cannot understand ZIP files. There was a
> long thread about this just a few days ago. Your table def says
> "stored as textfile" but you are not giving it a text file.
>
> Chuck
>
>
>
>
> ______________________________________________________________________
>
> From: Manish [manishbhoge@rocketmail.com]
> Sent: Sunday, September 30, 2012 7:38 AM
> To: Savant, Keshav
> Cc: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> I am getting below error when loading zip file
>
> Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> True Manish.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> Sent: Thursday, September 27, 2012 4:26 PM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Thanks Savant. I believe this will hold good for .zip file
> also.
>
>
>
> Thank You,
>
> Manish.
>
>
>
> From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> Sent: Thursday, September 27, 2012 10:19 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Manish the table that has been created for zipped text files
> should be defined as sequence file, for example
>
>
>
> CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
>
>
>
> After this you can use regular load command to load these
> files, for example
>
>
>
> load data local inpath 'path-to-csv-file.gz' into table
> my_table_zip;
>
>
>
> hope this helps
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:43 PM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Richin,
>
> Thanks! Yes this is what I wanted to understand how to load
> zip file to Hive table. Now, I'll try this option.
>
> Thank You,
> Manish.
>
> Sent from my BlackBerry, pls excuse typo
>
>
>
>
> ______________________________________________________________
>
> From:<ri...@nokia.com>
>
>
> Date:Wed, 26 Sep 2012 14:51:39 +0000
>
>
> To:<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> You are right Chuck. I thought his question was how to use zip
> files or any compressed files in Hive tables.
>
>
>
> Yeah, seems like you can’t do that
> see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
>
> But you can always compress your files in gzip format and they
> should be good to go.
>
>
>
> Richin
>
>
>
> From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 10:44 AM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> But TEXTFILE in Hive always has newline as the record
> delimiter. How could this possibly work with a zip/tar file
> that can contain ASCII 10 characters at random locations, and
> certainly does not have ASCII 10 at the end of each data
> record?
>
>
>
> Chuck Connell
>
> Nuance R&D Data Team
>
> Burlington, MA
>
>
>
>
>
>
> From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> Sent: Wednesday, September 26, 2012 10:14 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Hi Manish,
>
>
>
> If you have your zip file at location - /home/manish/zipfile,
> you can just point your external table to that location like
>
> CREATE EXTERNAL TABLE manish_test (field1 string, field2
> string) ROW FORMAT DELIMITED FIELDS TERMINATED BY
> <your_column_delimiter> STORED AS TEXTFILE LOCATION
> ‘/home/manish/zipfile’;
>
>
>
> OR
>
>
>
> If you already have external table pointing to a certain
> location you can load this zip file into your table as
>
> LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE
> manish_test;
>
>
>
> Hope this helps.
>
>
>
> Richin
>
>
>
> From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:13 AM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Savant,
>
> Got it. But I still need to understand that how to load zip?
> Can I directly use zip file in external table. can u pls help
> to get the load statement.
>
> Sent from my BlackBerry, pls excuse typo
>
>
>
>
> ______________________________________________________________
>
> From:"Savant, Keshav" <Ke...@fisglobal.com>
>
>
> Date:Wed, 26 Sep 2012 12:25:38 +0000
>
>
> To:user@hive.apache.org<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> Chuck.Connell@nuance.com<Ch...@nuance.com>
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> Another solution would be
>
>
>
> Using shell script do following
>
> 1. unzip txt files,
>
> 2. one by one merge those 50 (or N number of) text files
> into one text file,
>
> 3. then the zip/tar that bigger text file,
>
> 4. then that big zip/tar file can be uploaded into hive.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 4:04 PM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> This could be a problem. Hive uses newline as the record
> separator. A ZIP file will certainly newline characters. So I
> doubt this is possible.
>
> BUT, I would like to hear from anyone who has solved the
> "newline is always a record separator" problem, because we ran
> into it for another type of compressed file.
>
> Chuck
>
>
>
> ______________________________________________________________
>
> From: Manish.Bhoge [Manish.Bhoge@target.com]
> Sent: Wednesday, September 26, 2012 3:17 AM
> To: user@hive.apache.org
> Subject: zip file or tar file cosumption
>
>
> Hivers,
>
>
>
> I want to understand that would it be possible to utilize
> zip/tar files directly into Hive. All the files has similar
> schema (structure). Say 50 *.txt files are zipped into a
> single zip file can we load data directly from this zip file
> OR should we need to unzip first?
>
>
>
> Thanks & Regards
>
> Manish Bhoge | Technical
> Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP:
> 22165 |! “Excellence is not a skill, It is an attitude.”
> MySite
>
>
>
>
> _____________
> The information contained in this message is proprietary
> and/or confidential. If you are not the intended recipient,
> please: (i) delete the message and all copies; (ii) do not
> disclose, distribute or use the message in any manner; and
> (iii) notify the sender immediately. In addition, please be
> aware that any message addressed to our domain is subject to
> archiving and review by persons other than the intended
> recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary
> and/or confidential. If you are not the intended recipient,
> please: (i) delete the message and all copies; (ii) do not
> disclose, distribute or use the message in any manner; and
> (iii) notify the sender immediately. In addition, please be
> aware that any message addressed to our domain is subject to
> archiving and review by persons other than the intended
> recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary
> and/or confidential. If you are not the intended recipient,
> please: (i) delete the message and all copies; (ii) do not
> disclose, distribute or use the message in any manner; and
> (iii) notify the sender immediately. In addition, please be
> aware that any message addressed to our domain is subject to
> archiving and review by persons other than the intended
> recipient. Thank you.
>
>
>
>
>
RE: zip file or tar file cosumption
Posted by Bejoy KS <be...@outlook.com>.
Hi ManishGzip works well if you have the compression codec available in 'io.compression.codes' . Gzip codec is present in default.I don't think untar ing world be done by map reduce jobs. So tar files may not work with hive, you need to untar the files out of hadoop hive as a prerequisite.
RegardsBejoy KS
To: user@hive.apache.org; Keshav.C.Savant@fisglobal.com
Subject: Re: zip file or tar file cosumption
From: manishbhoge@rocketmail.com
Date: Sun, 30 Sep 2012 12:32:15 +0000
What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve it.
Sent from my BlackBerry, pls excuse typoFrom: "Connell, Chuck" <Ch...@nuance.com>
Date: Sun, 30 Sep 2012 12:24:37 +0000To: user@hive.apache.org<us...@hive.apache.org>; Savant, Keshav<Ke...@fisglobal.com>ReplyTo: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I have seen that error when I try to overwrite an existing file.
But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.
Chuck
From: Manish [manishbhoge@rocketmail.com]
Sent: Sunday, September 30, 2012 7:38 AM
To: Savant, Keshav
Cc: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I am getting below error when loading zip file
Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
Thank You,
Manish
On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
True Manish.
Keshav C Savant
From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Thanks Savant. I believe this will hold good for .zip file also.
Thank You,
Manish.
From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org;
manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
From:<ri...@nokia.com>
Date:Wed, 26 Sep 2012 14:51:39 +0000
To:<us...@hive.apache.org>
ReplyTo:user@hive.apache.org
Subject:RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org;
manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
From:"Savant, Keshav" <Ke...@fisglobal.com>
Date:Wed, 26 Sep 2012 12:25:38 +0000
To:user@hive.apache.org<us...@hive.apache.org>
ReplyTo:user@hive.apache.org
Cc:Manish.Bhoge@target.com<Ma...@target.com>;
Chuck.Connell@nuance.com<Ch...@nuance.com>
Subject:RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck
[mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to
unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.”
MySite
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender
immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender
immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender
immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Re: zip file or tar file cosumption
Posted by Manish Bhoge <ma...@rocketmail.com>.
What about .gz OR tar file. Does this unzip require at HDFS and load into hive? How you resolve it.
Sent from my BlackBerry, pls excuse typo
-----Original Message-----
From: "Connell, Chuck" <Ch...@nuance.com>
Date: Sun, 30 Sep 2012 12:24:37
To: user@hive.apache.org<us...@hive.apache.org>; Savant, Keshav<Ke...@fisglobal.com>
Reply-To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I have seen that error when I try to overwrite an existing file.
But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.
Chuck
________________________________
From: Manish [manishbhoge@rocketmail.com]
Sent: Sunday, September 30, 2012 7:38 AM
To: Savant, Keshav
Cc: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I am getting below error when loading zip file
Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
Thank You,
Manish
On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
True Manish.
Keshav C Savant
From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Thanks Savant. I believe this will hold good for .zip file also.
Thank You,
Manish.
From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From:<ri...@nokia.com>>
Date:Wed, 26 Sep 2012 14:51:39 +0000
To:<us...@hive.apache.org>>
ReplyTo:user@hive.apache.org<ma...@hive.apache.org>
Subject:RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From:richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From:"Savant, Keshav" <Ke...@fisglobal.com>>
Date:Wed, 26 Sep 2012 12:25:38 +0000
To:user@hive.apache.org<us...@hive.apache.org>>
ReplyTo:user@hive.apache.org<ma...@hive.apache.org>
Cc:Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject:RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by "Connell, Chuck" <Ch...@nuance.com>.
I have seen that error when I try to overwrite an existing file.
But, more importantly, Hive cannot understand ZIP files. There was a long thread about this just a few days ago. Your table def says "stored as textfile" but you are not giving it a text file.
Chuck
________________________________
From: Manish [manishbhoge@rocketmail.com]
Sent: Sunday, September 30, 2012 7:38 AM
To: Savant, Keshav
Cc: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
I am getting below error when loading zip file
Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
Thank You,
Manish
On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
True Manish.
Keshav C Savant
From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Thanks Savant. I believe this will hold good for .zip file also.
Thank You,
Manish.
From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From:<ri...@nokia.com>>
Date:Wed, 26 Sep 2012 14:51:39 +0000
To:<us...@hive.apache.org>>
ReplyTo:user@hive.apache.org<ma...@hive.apache.org>
Subject:RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can’t do that see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From:richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From:"Savant, Keshav" <Ke...@fisglobal.com>>
Date:Wed, 26 Sep 2012 12:25:38 +0000
To:user@hive.apache.org<us...@hive.apache.org>>
ReplyTo:user@hive.apache.org<ma...@hive.apache.org>
Cc:Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject:RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M) Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an attitude.” MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
> I am getting below error when loading zip file
>
>
> Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
> Loading data to table default.pageview_zip
> Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
> FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
>
> My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
>
> Table definition:
> CREATE external TABLE pageview_zip
> (
> C_0 STRING,
> C_1 STRING,
> C_7 MAP<STRING,STRING>,
> C_8 STRING,
> C_13 MAP<STRING,STRING>,
> C_21 STRING
> )
> COMMENT 'Page View'
> ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
> STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
>
> Thank You,
> Manish
>
>
>
> On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
>
> > True Manish.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> > Sent: Thursday, September 27, 2012 4:26 PM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Thanks Savant. I believe this will hold good for .zip file also.
> >
> >
> >
> > Thank You,
> >
> > Manish.
> >
> >
> >
> > From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> > Sent: Thursday, September 27, 2012 10:19 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Manish the table that has been created for zipped text files should
> > be defined as sequence file, for example
> >
> >
> >
> > CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> > DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
> >
> >
> >
> > After this you can use regular load command to load these files, for
> > example
> >
> >
> >
> > load data local inpath 'path-to-csv-file.gz' into table
> > my_table_zip;
> >
> >
> >
> > hope this helps
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:43 PM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Richin,
> >
> > Thanks! Yes this is what I wanted to understand how to load zip file
> > to Hive table. Now, I'll try this option.
> >
> > Thank You,
> > Manish.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:<ri...@nokia.com>
> >
> >
> > Date:Wed, 26 Sep 2012 14:51:39 +0000
> >
> >
> > To:<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > You are right Chuck. I thought his question was how to use zip files
> > or any compressed files in Hive tables.
> >
> >
> >
> > Yeah, seems like you can’t do that
> > see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> > 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com
> > %3E
> >
> > But you can always compress your files in gzip format and they
> > should be good to go.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 10:44 AM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > But TEXTFILE in Hive always has newline as the record delimiter. How
> > could this possibly work with a zip/tar file that can contain ASCII
> > 10 characters at random locations, and certainly does not have ASCII
> > 10 at the end of each data record?
> >
> >
> >
> > Chuck Connell
> >
> > Nuance R&D Data Team
> >
> > Burlington, MA
> >
> >
> >
> >
> >
> >
> > From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> > Sent: Wednesday, September 26, 2012 10:14 AM
> > To: user@hive.apache.org; manishbhoge@rocketmail.com
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Manish,
> >
> >
> >
> > If you have your zip file at location - /home/manish/zipfile, you
> > can just point your external table to that location like
> >
> > CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> > FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> > AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
> >
> >
> >
> > OR
> >
> >
> >
> > If you already have external table pointing to a certain location
> > you can load this zip file into your table as
> >
> > LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
> >
> >
> >
> > Hope this helps.
> >
> >
> >
> > Richin
> >
> >
> >
> > From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> > Sent: Wednesday, September 26, 2012 9:13 AM
> > To: user@hive.apache.org
> > Subject: Re: zip file or tar file cosumption
> >
> >
> >
> >
> > Hi Savant,
> >
> > Got it. But I still need to understand that how to load zip? Can I
> > directly use zip file in external table. can u pls help to get the
> > load statement.
> >
> > Sent from my BlackBerry, pls excuse typo
> >
> >
> >
> >
> > ____________________________________________________________________
> >
> > From:"Savant, Keshav" <Ke...@fisglobal.com>
> >
> >
> > Date:Wed, 26 Sep 2012 12:25:38 +0000
> >
> >
> > To:user@hive.apache.org<us...@hive.apache.org>
> >
> >
> > ReplyTo:user@hive.apache.org
> >
> >
> > Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> > Chuck.Connell@nuance.com<Ch...@nuance.com>
> >
> >
> > Subject:RE: zip file or tar file cosumption
> >
> >
> >
> >
> >
> > Another solution would be
> >
> >
> >
> > Using shell script do following
> >
> > 1. unzip txt files,
> >
> > 2. one by one merge those 50 (or N number of) text files into
> > one text file,
> >
> > 3. then the zip/tar that bigger text file,
> >
> > 4. then that big zip/tar file can be uploaded into hive.
> >
> >
> >
> > Keshav C Savant
> >
> >
> >
> >
> > From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> > Sent: Wednesday, September 26, 2012 4:04 PM
> > To: user@hive.apache.org
> > Subject: RE: zip file or tar file cosumption
> >
> >
> >
> >
> > This could be a problem. Hive uses newline as the record separator.
> > A ZIP file will certainly newline characters. So I doubt this is
> > possible.
> >
> > BUT, I would like to hear from anyone who has solved the "newline is
> > always a record separator" problem, because we ran into it for
> > another type of compressed file.
> >
> > Chuck
> >
> >
> >
> > ____________________________________________________________________
> >
> > From: Manish.Bhoge [Manish.Bhoge@target.com]
> > Sent: Wednesday, September 26, 2012 3:17 AM
> > To: user@hive.apache.org
> > Subject: zip file or tar file cosumption
> >
> >
> > Hivers,
> >
> >
> >
> > I want to understand that would it be possible to utilize zip/tar
> > files directly into Hive. All the files has similar schema
> > (structure). Say 50 *.txt files are zipped into a single zip file
> > can we load data directly from this zip file OR should we need to
> > unzip first?
> >
> >
> >
> > Thanks & Regards
> >
> > Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> > Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> > attitude.” MySite
> >
> >
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
> >
> > _____________
> > The information contained in this message is proprietary and/or
> > confidential. If you are not the intended recipient, please: (i)
> > delete the message and all copies; (ii) do not disclose, distribute
> > or use the message in any manner; and (iii) notify the sender
> > immediately. In addition, please be aware that any message addressed
> > to our domain is subject to archiving and review by persons other
> > than the intended recipient. Thank you.
> >
>
>
>
RE: zip file or tar file cosumption
Posted by Manish <ma...@rocketmail.com>.
I am getting below error when loading zip file
Driver returned: 9. Errors: Hive history file=/tmp/hue/hive_job_log_hue_201209300434_1768401171.txt
Loading data to table default.pageview_zip
Failed with exception Error moving: hdfs://localhost:54310/user/manish/input/zip/11sep12.zip into: /user/manish/input/zip
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.MoveTask
My load statement is: LOAD DATA INPATH '/user/manish/input/11sep12.zip' OVERWRITE INTO TABLE `pageview_zip`
Table definition:
CREATE external TABLE pageview_zip
(
C_0 STRING,
C_1 STRING,
C_7 MAP<STRING,STRING>,
C_8 STRING,
C_13 MAP<STRING,STRING>,
C_21 STRING
)
COMMENT 'Page View'
ROW FORMAT DELIMITED FIELDS TERMINATED BY ' ' COLLECTION ITEMS TERMINATED BY ';' MAP KEYS TERMINATED BY '='
STORED AS TEXTFILE LOCATION '/user/manish/input/zip'
Thank You,
Manish
On Thu, 2012-09-27 at 11:11 +0000, Savant, Keshav wrote:
> True Manish.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
> Sent: Thursday, September 27, 2012 4:26 PM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Thanks Savant. I believe this will hold good for .zip file also.
>
>
>
> Thank You,
>
> Manish.
>
>
>
> From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
> Sent: Thursday, September 27, 2012 10:19 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Manish the table that has been created for zipped text files should be
> defined as sequence file, for example
>
>
>
> CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT
> DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
>
>
>
> After this you can use regular load command to load these files, for
> example
>
>
>
> load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
>
>
>
> hope this helps
>
>
>
> Keshav C Savant
>
>
>
>
> From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:43 PM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Richin,
>
> Thanks! Yes this is what I wanted to understand how to load zip file
> to Hive table. Now, I'll try this option.
>
> Thank You,
> Manish.
>
> Sent from my BlackBerry, pls excuse typo
>
>
>
>
> ______________________________________________________________________
>
> From:<ri...@nokia.com>
>
>
> Date:Wed, 26 Sep 2012 14:51:39 +0000
>
>
> To:<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> You are right Chuck. I thought his question was how to use zip files
> or any compressed files in Hive tables.
>
>
>
> Yeah, seems like you can’t do that
> see:http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%
> 3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%
> 3E
>
> But you can always compress your files in gzip format and they should
> be good to go.
>
>
>
> Richin
>
>
>
> From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 10:44 AM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> But TEXTFILE in Hive always has newline as the record delimiter. How
> could this possibly work with a zip/tar file that can contain ASCII 10
> characters at random locations, and certainly does not have ASCII 10
> at the end of each data record?
>
>
>
> Chuck Connell
>
> Nuance R&D Data Team
>
> Burlington, MA
>
>
>
>
>
>
> From:richin.jain@nokia.com [mailto:richin.jain@nokia.com]
> Sent: Wednesday, September 26, 2012 10:14 AM
> To: user@hive.apache.org; manishbhoge@rocketmail.com
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> Hi Manish,
>
>
>
> If you have your zip file at location - /home/manish/zipfile, you can
> just point your external table to that location like
>
> CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW
> FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED
> AS TEXTFILE LOCATION ‘/home/manish/zipfile’;
>
>
>
> OR
>
>
>
> If you already have external table pointing to a certain location you
> can load this zip file into your table as
>
> LOAD DATA INPATH ‘/home/manish/zipfile’ INTO TABLE manish_test;
>
>
>
> Hope this helps.
>
>
>
> Richin
>
>
>
> From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
> Sent: Wednesday, September 26, 2012 9:13 AM
> To: user@hive.apache.org
> Subject: Re: zip file or tar file cosumption
>
>
>
>
> Hi Savant,
>
> Got it. But I still need to understand that how to load zip? Can I
> directly use zip file in external table. can u pls help to get the
> load statement.
>
> Sent from my BlackBerry, pls excuse typo
>
>
>
>
> ______________________________________________________________________
>
> From:"Savant, Keshav" <Ke...@fisglobal.com>
>
>
> Date:Wed, 26 Sep 2012 12:25:38 +0000
>
>
> To:user@hive.apache.org<us...@hive.apache.org>
>
>
> ReplyTo:user@hive.apache.org
>
>
> Cc:Manish.Bhoge@target.com<Ma...@target.com>;
> Chuck.Connell@nuance.com<Ch...@nuance.com>
>
>
> Subject:RE: zip file or tar file cosumption
>
>
>
>
>
> Another solution would be
>
>
>
> Using shell script do following
>
> 1. unzip txt files,
>
> 2. one by one merge those 50 (or N number of) text files into one
> text file,
>
> 3. then the zip/tar that bigger text file,
>
> 4. then that big zip/tar file can be uploaded into hive.
>
>
>
> Keshav C Savant
>
>
>
>
> From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
> Sent: Wednesday, September 26, 2012 4:04 PM
> To: user@hive.apache.org
> Subject: RE: zip file or tar file cosumption
>
>
>
>
> This could be a problem. Hive uses newline as the record separator. A
> ZIP file will certainly newline characters. So I doubt this is
> possible.
>
> BUT, I would like to hear from anyone who has solved the "newline is
> always a record separator" problem, because we ran into it for another
> type of compressed file.
>
> Chuck
>
>
>
> ______________________________________________________________________
>
> From: Manish.Bhoge [Manish.Bhoge@target.com]
> Sent: Wednesday, September 26, 2012 3:17 AM
> To: user@hive.apache.org
> Subject: zip file or tar file cosumption
>
>
> Hivers,
>
>
>
> I want to understand that would it be possible to utilize zip/tar
> files directly into Hive. All the files has similar schema
> (structure). Say 50 *.txt files are zipped into a single zip file can
> we load data directly from this zip file OR should we need to unzip
> first?
>
>
>
> Thanks & Regards
>
> Manish Bhoge | Technical Architect ¤TargetDW/BI|( +919379850010 (M)
> Ext: 5691 VOIP: 22165 |! “Excellence is not a skill, It is an
> attitude.” MySite
>
>
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i)
> delete the message and all copies; (ii) do not disclose, distribute or
> use the message in any manner; and (iii) notify the sender
> immediately. In addition, please be aware that any message addressed
> to our domain is subject to archiving and review by persons other than
> the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i)
> delete the message and all copies; (ii) do not disclose, distribute or
> use the message in any manner; and (iii) notify the sender
> immediately. In addition, please be aware that any message addressed
> to our domain is subject to archiving and review by persons other than
> the intended recipient. Thank you.
>
>
> _____________
> The information contained in this message is proprietary and/or
> confidential. If you are not the intended recipient, please: (i)
> delete the message and all copies; (ii) do not disclose, distribute or
> use the message in any manner; and (iii) notify the sender
> immediately. In addition, please be aware that any message addressed
> to our domain is subject to archiving and review by persons other than
> the intended recipient. Thank you.
>
RE: zip file or tar file cosumption
Posted by "Savant, Keshav" <Ke...@fisglobal.com>.
True Manish.
Keshav C Savant
From: Manish.Bhoge [mailto:Manish.Bhoge@target.com]
Sent: Thursday, September 27, 2012 4:26 PM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Thanks Savant. I believe this will hold good for .zip file also.
Thank You,
Manish.
From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From: <ri...@nokia.com>>
Date: Wed, 26 Sep 2012 14:51:39 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by "Manish.Bhoge" <Ma...@target.com>.
Thanks Savant. I believe this will hold good for .zip file also.
Thank You,
Manish.
From: Savant, Keshav [mailto:Keshav.C.Savant@fisglobal.com]
Sent: Thursday, September 27, 2012 10:19 AM
To: user@hive.apache.org; manishbhoge@rocketmail.com
Subject: RE: zip file or tar file cosumption
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From: <ri...@nokia.com>>
Date: Wed, 26 Sep 2012 14:51:39 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by "Savant, Keshav" <Ke...@fisglobal.com>.
Manish the table that has been created for zipped text files should be defined as sequence file, for example
CREATE TABLE my_table_zip(col1 STRING,col2 STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' stored as sequencefile;
After this you can use regular load command to load these files, for example
load data local inpath 'path-to-csv-file.gz' into table my_table_zip;
hope this helps
Keshav C Savant
From: Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:43 PM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
________________________________
From: <ri...@nokia.com>>
Date: Wed, 26 Sep 2012 14:51:39 +0000
To: <us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Re: zip file or tar file cosumption
Posted by Manish Bhoge <ma...@rocketmail.com>.
Hi Richin,
Thanks! Yes this is what I wanted to understand how to load zip file to Hive table. Now, I'll try this option.
Thank You,
Manish.
Sent from my BlackBerry, pls excuse typo
-----Original Message-----
From: <ri...@nokia.com>
Date: Wed, 26 Sep 2012 14:51:39
To: <us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by ri...@nokia.com.
You are right Chuck. I thought his question was how to use zip files or any compressed files in Hive tables.
Yeah, seems like you can't do that see: http://mail-archives.apache.org/mod_mbox/hive-user/201203.mbox/%3CCAENxBwxkF--3PzCkpz1HX21=Gb9YVASr2JL0U3yUL2tfGu010Q@mail.gmail.com%3E
But you can always compress your files in gzip format and they should be good to go.
Richin
From: ext Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 10:44 AM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by "Connell, Chuck" <Ch...@nuance.com>.
But TEXTFILE in Hive always has newline as the record delimiter. How could this possibly work with a zip/tar file that can contain ASCII 10 characters at random locations, and certainly does not have ASCII 10 at the end of each data record?
Chuck Connell
Nuance R&D Data Team
Burlington, MA
From: richin.jain@nokia.com<ma...@nokia.com> [mailto:richin.jain@nokia.com]
Sent: Wednesday, September 26, 2012 10:14 AM
To: user@hive.apache.org<ma...@hive.apache.org>; manishbhoge@rocketmail.com<ma...@rocketmail.com>
Subject: RE: zip file or tar file cosumption
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by ri...@nokia.com.
Hi Manish,
If you have your zip file at location - /home/manish/zipfile, you can just point your external table to that location like
CREATE EXTERNAL TABLE manish_test (field1 string, field2 string) ROW FORMAT DELIMITED FIELDS TERMINATED BY <your_column_delimiter> STORED AS TEXTFILE LOCATION '/home/manish/zipfile';
OR
If you already have external table pointing to a certain location you can load this zip file into your table as
LOAD DATA INPATH '/home/manish/zipfile' INTO TABLE manish_test;
Hope this helps.
Richin
From: ext Manish Bhoge [mailto:manishbhoge@rocketmail.com]
Sent: Wednesday, September 26, 2012 9:13 AM
To: user@hive.apache.org
Subject: Re: zip file or tar file cosumption
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
________________________________
From: "Savant, Keshav" <Ke...@fisglobal.com>>
Date: Wed, 26 Sep 2012 12:25:38 +0000
To: user@hive.apache.org<us...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Cc: Manish.Bhoge@target.com<Ma...@target.com>>; Chuck.Connell@nuance.com<Ch...@nuance.com>>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]<mailto:[mailto:Chuck.Connell@nuance.com]>
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
Re: zip file or tar file cosumption
Posted by Manish Bhoge <ma...@rocketmail.com>.
Hi Savant,
Got it. But I still need to understand that how to load zip? Can I directly use zip file in external table. can u pls help to get the load statement.
Sent from my BlackBerry, pls excuse typo
-----Original Message-----
From: "Savant, Keshav" <Ke...@fisglobal.com>
Date: Wed, 26 Sep 2012 12:25:38
To: user@hive.apache.org<us...@hive.apache.org>
Reply-To: user@hive.apache.org
Cc: Manish.Bhoge@target.com<Ma...@target.com>; Chuck.Connell@nuance.com<Ch...@nuance.com>
Subject: RE: zip file or tar file cosumption
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by "Savant, Keshav" <Ke...@fisglobal.com>.
Another solution would be
Using shell script do following
1. unzip txt files,
2. one by one merge those 50 (or N number of) text files into one text file,
3. then the zip/tar that bigger text file,
4. then that big zip/tar file can be uploaded into hive.
Keshav C Savant
From: Connell, Chuck [mailto:Chuck.Connell@nuance.com]
Sent: Wednesday, September 26, 2012 4:04 PM
To: user@hive.apache.org
Subject: RE: zip file or tar file cosumption
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org<ma...@hive.apache.org>
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect * Target DW/BI| * +919379850010 (M) Ext: 5691 VOIP: 22165 | * "Excellence is not a skill, It is an attitude." MySite<http://mysites.target.com/personal/z063783>
_____________
The information contained in this message is proprietary and/or confidential. If you are not the intended recipient, please: (i) delete the message and all copies; (ii) do not disclose, distribute or use the message in any manner; and (iii) notify the sender immediately. In addition, please be aware that any message addressed to our domain is subject to archiving and review by persons other than the intended recipient. Thank you.
RE: zip file or tar file cosumption
Posted by "Connell, Chuck" <Ch...@nuance.com>.
This could be a problem. Hive uses newline as the record separator. A ZIP file will certainly newline characters. So I doubt this is possible.
BUT, I would like to hear from anyone who has solved the "newline is always a record separator" problem, because we ran into it for another type of compressed file.
Chuck
________________________________
From: Manish.Bhoge [Manish.Bhoge@target.com]
Sent: Wednesday, September 26, 2012 3:17 AM
To: user@hive.apache.org
Subject: zip file or tar file cosumption
Hivers,
I want to understand that would it be possible to utilize zip/tar files directly into Hive. All the files has similar schema (structure). Say 50 *.txt files are zipped into a single zip file can we load data directly from this zip file OR should we need to unzip first?
Thanks & Regards
Manish Bhoge | Technical Architect • Target DW/BI| • +919379850010 (M) Ext: 5691 VOIP: 22165 | • “Excellence is not a skill, It is an attitude.” MySite<http://mysites.target.com/personal/z063783>