You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by md...@orange.com on 2012/01/19 10:31:43 UTC

LZO and Hive table

Hi all,
I am planning to compress my Hive tables in LZO and I have a few questions:


1)      Is there a point to compress both SequenceFile and TextFile formats ?



2)      Before an INSERT command I set up the following variables :
SET hive.exec.compress.output=true
SET io.seqfile.compression.type=BLOCK
SET mapred.output.compress=true
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec

Is that enough to ensure LZO compression or do I need to specify something else when I create my tables ?


3)      Do I need to use the LZOIndexer externally or will Hive do it for me automatically ?


4)      Do I need to set up something else to make sure that Hive will use the LZO index to split my tables for read operations ?


Thanks for your help.

Cheers,
Michael




_________________________________________________________________________________________________________________________

Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci

This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorization.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, France Telecom - Orange shall not be liable if this message was modified, changed or falsified.
Thank you.


Re: LZO and Hive table

Posted by Bejoy Ks <be...@yahoo.com>.
Hi Michael
      Please find some pointers inline


1)      Is there a point to compress both SequenceFile and TextFile formats ?
      [Bejoy]  Textfile, definitely you need to compress if you are looking to store large volume of data. You can compress Sequence Files as well with LZO.

 
2)      Before an INSERT command I set up the following variables :
SET hive.exec.compress.output=true
SET io.seqfile.compression.type=BLOCK
SET mapred.output.compress=true
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec
 
Is that enough to ensure LZO compression or do I need to specify something else when I create my tables ? 


[Bejoy]  If you are loading data into your table with INSERT OVERWRITE/INTO that involves a MR job this is good. If your data is in hdfs already in compressed format and want to load the same into hive table need to use the following with DDL
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"
          OUTPUTFORMAT "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"

 
3)      Do I need to use the LZOIndexer externally or will Hive do it for me automatically ? 

[Bejoy]AFAIK If you want to LOAD some LZO compressed files and need to LZO index the same you need to do it externally. It is just a line of code.
//Run theLZO indexer on files in hdfs
LzoIndexer indexer = new LzoIndexer(fs.getConf());
indexer.index(filePath);

To be noted: Once you have indexed your LZO files and if you load the partition/table again with new data. (may be a new run of data population job) There are chances of corrupted index because while indexing if a .index file is already on that location LZO indexer skips silently there by leaving the old index for new data. It would create Corrupted LZO execption when your run hive Queries on those tables. If your are using LZO indexing remove the index files/remove and recreate the dir before rerunning the data load job.

 
4)      Do I need to set up something else to make sure that Hive will use the LZO index to split my tables for read operations ?
[Bejoy]  You don't need to use anything else specifically once it is LZO indexed.

Hope it helps!...

Regards
Bejoy.K.S


________________________________
 From: "mdefoinplatel.ext@orange.com" <md...@orange.com>
To: "user@hive.apache.org" <us...@hive.apache.org> 
Sent: Thursday, January 19, 2012 3:01 PM
Subject: LZO and Hive table
 

Hi all,
I am planning to compress my Hive tables in LZO and I have a few questions:
 
1)      Is there a point to compress both SequenceFile and TextFile formats ?
 
2)      Before an INSERT command I set up the following variables :
SET hive.exec.compress.output=true
SET io.seqfile.compression.type=BLOCK
SET mapred.output.compress=true
SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec
 
Is that enough to ensure LZO compression or do I need to specify something else when I create my tables ? 
 
3)      Do I need to use the LZOIndexer externally or will Hive do it for me automatically ? 
 
4)      Do I need to set up something else to make sure that Hive will use the LZO index to split my tables for read operations ?
 
Thanks for your help.
 
Cheers,
Michael
 
 
 
_________________________________________________________________________________________________________________________ Ce message et ses pieces jointes peuvent contenir des informations confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages electroniques etant susceptibles d'alteration,
France Telecom - Orange decline toute responsabilite si ce message a ete altere, deforme ou falsifie. Merci This message and its attachments may contain confidential or privileged information that may be protected by law;
they should not be distributed, used or copied without authorization.
If you have received this email in error, please notify the sender and delete this message and its attachments.
As emails may be altered, France Telecom - Orange shall not be liable if this message was modified, changed or falsified.
Thank you.