You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@hive.apache.org by Sanjay Subramanian <Sa...@wizecommerce.com> on 2013/05/22 01:30:09 UTC

Snappy with HIve

Hi guys

QUESTION 1
I have an MR job that creates Snappy Codec Output files.
My table definition is as follows
CREATE EXTERNAL TABLE IF NOT EXISTS outpdir_header_hive_only(hbase_pk STRING,header_servername_donotquerySTRING,header_date_donotquery STRING, header_id STRING, header_hbpk STRING,header_channelId INT,header_searchAnnotation STRING,header_continuedSearchFlag INT,header_prodLow INT,header_prodTotal INT,header_sort INT,header_view INT,header_adNodes INT,header_spellingSuggestion STRING,header_queryType INT,header_nodeId INT,header_pinpointPtitleId INT,header_firedSearchRulesSTRING,header_rbAbsentSellers INT,header_shuffled INT,header_searchSessionId STRING,header_normalizationFlag STRING,header_relatedItemResultCount INT,header_unrankedSelectedPtitleIds INT,header_normKeyword STRING,header_kplEntry INT,header_isSaved STRING,header_rawProfileScore DOUBLE,header_normalizedProfileScore INT,header_scorerInfo STRING,header_contextNode INT,header_fbId STRING,norm_stem_keyword STRING, attrs_origNodeId INT,attrs_mfrId INT,attrs_sellerId INT,attrs_otherAttrs STRING,attrs_ptitleId INT,cached_date STRING,cached_recordId STRING,cached_visitorId STRING,cached_visit_id STRING,cached_appStyle STRING,cached_publisherId INT,cached_IP STRING,cached_source STRING,cached_refkw STRING,cached_pixeled INT,cached_searchRefineAttrImps STRING,cached_pageType STRING,cached_zipCode STRING,cached_zipType STRING,cached_perpage INT) PARTITIONED BY (header_date STRING, header_servername STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"


QUESTION 2
For Hive scripts that will READ Snappy files and Output Snappy Files to Hive Tables are the following settings enough ?
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

Thanks

sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Snappy with HIve

Posted by Sanjay Subramanian <Sa...@wizecommerce.com>.
Thanks Bejoy…I tracked down the issue..there was an earlier table (with leo definition) that I had not dropped and recreated - hence giving input snappy to that was giving issues
Regards
sanjay

From: "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Reply-To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>, "bejoy_ks@yahoo.com<ma...@yahoo.com>" <be...@yahoo.com>>
Date: Thursday, May 23, 2013 7:31 AM
To: "user@hive.apache.org<ma...@hive.apache.org>" <us...@hive.apache.org>>
Subject: Re: Snappy with HIve

Hi

Please find responses below.

Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"

Bejoy : No custom input format required. Add the snappy codec in io.compression.codecs.

QUESTION 2
For Hive scripts that will READ Snappy files and Output Snappy Files to Hive Tables are the following settings enough ?
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

Bejoy: It should be fine. If it shows any issues add mapred.output.compress=true as well
Regards
Bejoy KS

Sent from remote device, Please excuse typos
________________________________
From: Sanjay Subramanian <Sa...@wizecommerce.com>>
Date: Tue, 21 May 2013 23:30:09 +0000
To: user@hive.apache.org<ma...@hive.apache.org>>
ReplyTo: user@hive.apache.org<ma...@hive.apache.org>
Subject: Snappy with HIve

Hi guys

QUESTION 1
I have an MR job that creates Snappy Codec Output files.
My table definition is as follows
CREATE EXTERNAL TABLE IF NOT EXISTS outpdir_header_hive_only(hbase_pk STRING,header_servername_donotquerySTRING,header_date_donotquery STRING, header_id STRING, header_hbpk STRING,header_channelId INT,header_searchAnnotation STRING,header_continuedSearchFlag INT,header_prodLow INT,header_prodTotal INT,header_sort INT,header_view INT,header_adNodes INT,header_spellingSuggestion STRING,header_queryType INT,header_nodeId INT,header_pinpointPtitleId INT,header_firedSearchRulesSTRING,header_rbAbsentSellers INT,header_shuffled INT,header_searchSessionId STRING,header_normalizationFlag STRING,header_relatedItemResultCount INT,header_unrankedSelectedPtitleIds INT,header_normKeyword STRING,header_kplEntry INT,header_isSaved STRING,header_rawProfileScore DOUBLE,header_normalizedProfileScore INT,header_scorerInfo STRING,header_contextNode INT,header_fbId STRING,norm_stem_keyword STRING, attrs_origNodeId INT,attrs_mfrId INT,attrs_sellerId INT,attrs_otherAttrs STRING,attrs_ptitleId INT,cached_date STRING,cached_recordId STRING,cached_visitorId STRING,cached_visit_id STRING,cached_appStyle STRING,cached_publisherId INT,cached_IP STRING,cached_source STRING,cached_refkw STRING,cached_pixeled INT,cached_searchRefineAttrImps STRING,cached_pageType STRING,cached_zipCode STRING,cached_zipType STRING,cached_perpage INT) PARTITIONED BY (header_date STRING, header_servername STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"


QUESTION 2
For Hive scripts that will READ Snappy files and Output Snappy Files to Hive Tables are the following settings enough ?
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

Thanks

sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.

Re: Snappy with HIve

Posted by be...@yahoo.com.
Hi

Please find responses below.

Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"

Bejoy : No custom input format required. Add the snappy codec in io.compression.codecs.

QUESTION 2
For Hive scripts that will READ Snappy files and Output Snappy Files to Hive Tables are the following settings enough ?
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

Bejoy: It should be fine. If it shows any issues add mapred.output.compress=true as well 

Regards 
Bejoy KS

Sent from remote device, Please excuse typos

-----Original Message-----
From: Sanjay Subramanian <Sa...@wizecommerce.com>
Date: Tue, 21 May 2013 23:30:09 
To: user@hive.apache.org<us...@hive.apache.org>
Reply-To: user@hive.apache.org
Subject: Snappy with HIve

Hi guys

QUESTION 1
I have an MR job that creates Snappy Codec Output files.
My table definition is as follows
CREATE EXTERNAL TABLE IF NOT EXISTS outpdir_header_hive_only(hbase_pk STRING,header_servername_donotquerySTRING,header_date_donotquery STRING, header_id STRING, header_hbpk STRING,header_channelId INT,header_searchAnnotation STRING,header_continuedSearchFlag INT,header_prodLow INT,header_prodTotal INT,header_sort INT,header_view INT,header_adNodes INT,header_spellingSuggestion STRING,header_queryType INT,header_nodeId INT,header_pinpointPtitleId INT,header_firedSearchRulesSTRING,header_rbAbsentSellers INT,header_shuffled INT,header_searchSessionId STRING,header_normalizationFlag STRING,header_relatedItemResultCount INT,header_unrankedSelectedPtitleIds INT,header_normKeyword STRING,header_kplEntry INT,header_isSaved STRING,header_rawProfileScore DOUBLE,header_normalizedProfileScore INT,header_scorerInfo STRING,header_contextNode INT,header_fbId STRING,norm_stem_keyword STRING, attrs_origNodeId INT,attrs_mfrId INT,attrs_sellerId INT,attrs_otherAttrs STRING,attrs_ptitleId INT,cached_date STRING,cached_recordId STRING,cached_visitorId STRING,cached_visit_id STRING,cached_appStyle STRING,cached_publisherId INT,cached_IP STRING,cached_source STRING,cached_refkw STRING,cached_pixeled INT,cached_searchRefineAttrImps STRING,cached_pageType STRING,cached_zipCode STRING,cached_zipType STRING,cached_perpage INT) PARTITIONED BY (header_date STRING, header_servername STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'

Do I have to give some INPUTFORMAT directive to make the Hive Table read Snappy Codec files ?
For example for LZO its
STORED AS INPUTFORMAT "com.hadoop.mapred.DeprecatedLzoTextInputFormat"


QUESTION 2
For Hive scripts that will READ Snappy files and Output Snappy Files to Hive Tables are the following settings enough ?
SET hive.exec.compress.output=true;
SET mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

Thanks

sanjay

CONFIDENTIALITY NOTICE
======================
This email message and any attachments are for the exclusive use of the intended recipient(s) and may contain confidential and privileged information. Any unauthorized review, use, disclosure or distribution is prohibited. If you are not the intended recipient, please contact the sender by reply email and destroy all copies of the original message along with any attachments, from your computer system. If you are the intended recipient, please be advised that the content of this message is subject to access, review and disclosure by the sender's Email System Administrator.