You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@avro.apache.org by nir_zamir <ni...@gmail.com> on 2013/04/25 16:12:44 UTC

Avro with Snappy compression on Hive

Hi,

I have a Hive table created with the Avro Serde.

When I add some data to it using the Snappy compression, it still looks
compressed with deflate (the file starts with
'Obj...avro.codec.deflate.avro.Schema' where for raw data compressed with
Snappy, the Snappy coded is specified at the beginning of the file). 

Anything I'm doing wrong?

Here's what I do:

CREATE TABLE p2c_comp_avro
  ROW FORMAT SERDE
  'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
  STORED as INPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
  OUTPUTFORMAT
  'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
  TBLPROPERTIES (
   
'avro.schema.url'='file:///home/cloudera/bigdata/path_to_conversions_raw.avsc');

SET hive.exec.compress.output=true;
SET
mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
SET mapred.output.compression.type=BLOCK;

INSERT OVERWRITE TABLE p2c_comp_avro SELECT * FROM p2c;


Thanks!



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Avro-with-Snappy-compression-on-Hive-tp4027079.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: Avro with Snappy compression on Hive

Posted by Martin Kleppmann <ma...@rapportive.com>.

http://svn.apache.org/viewvc/avro/trunk/lang/java/mapred/src/main/java/org/apache/avro/mapred/AvroJob.java?view=markup#l45:)


On 25 April 2013 07:57, nir_zamir <ni...@gmail.com> wrote:

> Thanks Martin, that worked!
>
> Would be happy to know how you guess it..
>
>
>
> --
> View this message in context:
> http://apache-avro.679487.n3.nabble.com/Avro-with-Snappy-compression-on-Hive-tp4027079p4027082.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>

Re: Avro with Snappy compression on Hive

Posted by nir_zamir <ni...@gmail.com>.

Thanks Martin, that worked!

Would be happy to know how you guess it.. 



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Avro-with-Snappy-compression-on-Hive-tp4027079p4027082.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: Avro with Snappy compression on Hive

Posted by Martin Kleppmann <ma...@rapportive.com>.

I've never Avro output with Hive, but just as a guess, try this:

SET avro.output.codec=snappy;

The mapred.output.compression.codec and mapred.output.compression.type
options are probably redundant.


On 25 April 2013 07:12, nir_zamir <ni...@gmail.com> wrote:

> Hi,
>
> I have a Hive table created with the Avro Serde.
>
> When I add some data to it using the Snappy compression, it still looks
> compressed with deflate (the file starts with
> 'Obj...avro.codec.deflate.avro.Schema' where for raw data compressed with
> Snappy, the Snappy coded is specified at the beginning of the file).
>
> Anything I'm doing wrong?
>
> Here's what I do:
>
> CREATE TABLE p2c_comp_avro
>   ROW FORMAT SERDE
>   'org.apache.hadoop.hive.serde2.avro.AvroSerDe'
>   STORED as INPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerInputFormat'
>   OUTPUTFORMAT
>   'org.apache.hadoop.hive.ql.io.avro.AvroContainerOutputFormat'
>   TBLPROPERTIES (
>
>
> 'avro.schema.url'='file:///home/cloudera/bigdata/path_to_conversions_raw.avsc');
>
> SET hive.exec.compress.output=true;
> SET
> mapred.output.compression.codec=org.apache.hadoop.io.compress.SnappyCodec;
> SET mapred.output.compression.type=BLOCK;
>
> INSERT OVERWRITE TABLE p2c_comp_avro SELECT * FROM p2c;
>
>
> Thanks!
>
>
>
> --
> View this message in context:
> http://apache-avro.679487.n3.nabble.com/Avro-with-Snappy-compression-on-Hive-tp4027079.html
> Sent from the Avro - Users mailing list archive at Nabble.com.
>