You are viewing a plain text version of this content. The canonical link for it is here.
Posted to hdfs-user@hadoop.apache.org by SP <sa...@gmail.com> on 2015/07/30 00:34:24 UTC

Need command to compress the files

Hi All,

I am working on comparing different compression ratios.

I have these files in AVRO format. How can I compress them using snappy or
gzip.

-rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
/tmp/fact_splitby_date_id/part-m-00000.avro
-rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
/tmp/fact_splitby_date_id/part-m-00001.avro
-rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
/tmp/fact_splitby_date_id/part-m-00002.avro
-rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
/tmp/fact_splitby_date_id/part-m-00003.avro




Thanks
Sp

Re: Need command to compress the files

Posted by Hadoop User <kj...@gmail.com>.
I already have the data in HDFS. I want to test compression ratio with gzip and snappy.

Thanks 
Sajid

Sent from my iPhone

> On Jul 29, 2015, at 5:37 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> I think you can pick the compression algorithm when using sqoop - either deflate or snappy when specifying the --compress option.
> Is that what you were asking?
> 
> Thanks,
> Ron
> 
>> On 07/29/2015 03:40 PM, Ted Yu wrote:
>> You can use the following command to see options for gzip:
>> gzip -h
>> 
>> For snappy, see:
>> https://github.com/kubo/snzip
>> https://code.google.com/p/snappy/issues/detail?id=34
>> 
>> FYI
>> 
>>> On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:
>>> Hi All,
>>> 
>>> I am working on comparing different compression ratios. 
>>> 
>>> I have these files in AVRO format. How can I compress them using snappy or gzip.
>>> 
>>> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00000.avro
>>> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15 /tmp/fact_splitby_date_id/part-m-00001.avro
>>> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17 /tmp/fact_splitby_date_id/part-m-00002.avro
>>> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00003.avro
>>> 
>>> 
>>> 
>>> 
>>> Thanks
>>> Sp

Re: Need command to compress the files

Posted by Hadoop User <kj...@gmail.com>.
I already have the data in HDFS. I want to test compression ratio with gzip and snappy.

Thanks 
Sajid

Sent from my iPhone

> On Jul 29, 2015, at 5:37 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> I think you can pick the compression algorithm when using sqoop - either deflate or snappy when specifying the --compress option.
> Is that what you were asking?
> 
> Thanks,
> Ron
> 
>> On 07/29/2015 03:40 PM, Ted Yu wrote:
>> You can use the following command to see options for gzip:
>> gzip -h
>> 
>> For snappy, see:
>> https://github.com/kubo/snzip
>> https://code.google.com/p/snappy/issues/detail?id=34
>> 
>> FYI
>> 
>>> On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:
>>> Hi All,
>>> 
>>> I am working on comparing different compression ratios. 
>>> 
>>> I have these files in AVRO format. How can I compress them using snappy or gzip.
>>> 
>>> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00000.avro
>>> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15 /tmp/fact_splitby_date_id/part-m-00001.avro
>>> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17 /tmp/fact_splitby_date_id/part-m-00002.avro
>>> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00003.avro
>>> 
>>> 
>>> 
>>> 
>>> Thanks
>>> Sp

Re: Need command to compress the files

Posted by Hadoop User <kj...@gmail.com>.
I already have the data in HDFS. I want to test compression ratio with gzip and snappy.

Thanks 
Sajid

Sent from my iPhone

> On Jul 29, 2015, at 5:37 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> I think you can pick the compression algorithm when using sqoop - either deflate or snappy when specifying the --compress option.
> Is that what you were asking?
> 
> Thanks,
> Ron
> 
>> On 07/29/2015 03:40 PM, Ted Yu wrote:
>> You can use the following command to see options for gzip:
>> gzip -h
>> 
>> For snappy, see:
>> https://github.com/kubo/snzip
>> https://code.google.com/p/snappy/issues/detail?id=34
>> 
>> FYI
>> 
>>> On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:
>>> Hi All,
>>> 
>>> I am working on comparing different compression ratios. 
>>> 
>>> I have these files in AVRO format. How can I compress them using snappy or gzip.
>>> 
>>> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00000.avro
>>> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15 /tmp/fact_splitby_date_id/part-m-00001.avro
>>> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17 /tmp/fact_splitby_date_id/part-m-00002.avro
>>> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00003.avro
>>> 
>>> 
>>> 
>>> 
>>> Thanks
>>> Sp

Re: Need command to compress the files

Posted by Hadoop User <kj...@gmail.com>.
I already have the data in HDFS. I want to test compression ratio with gzip and snappy.

Thanks 
Sajid

Sent from my iPhone

> On Jul 29, 2015, at 5:37 PM, Ron Gonzalez <zl...@yahoo.com> wrote:
> 
> I think you can pick the compression algorithm when using sqoop - either deflate or snappy when specifying the --compress option.
> Is that what you were asking?
> 
> Thanks,
> Ron
> 
>> On 07/29/2015 03:40 PM, Ted Yu wrote:
>> You can use the following command to see options for gzip:
>> gzip -h
>> 
>> For snappy, see:
>> https://github.com/kubo/snzip
>> https://code.google.com/p/snappy/issues/detail?id=34
>> 
>> FYI
>> 
>>> On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:
>>> Hi All,
>>> 
>>> I am working on comparing different compression ratios. 
>>> 
>>> I have these files in AVRO format. How can I compress them using snappy or gzip.
>>> 
>>> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00000.avro
>>> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15 /tmp/fact_splitby_date_id/part-m-00001.avro
>>> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17 /tmp/fact_splitby_date_id/part-m-00002.avro
>>> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16 /tmp/fact_splitby_date_id/part-m-00003.avro
>>> 
>>> 
>>> 
>>> 
>>> Thanks
>>> Sp

Re: Need command to compress the files

Posted by Ron Gonzalez <zl...@yahoo.com>.
I think you can pick the compression algorithm when using sqoop - either 
deflate or snappy when specifying the --compress option.
Is that what you were asking?

Thanks,
Ron

On 07/29/2015 03:40 PM, Ted Yu wrote:
> You can use the following command to see options for gzip:
> gzip -h
>
> For snappy, see:
> https://github.com/kubo/snzip
> https://code.google.com/p/snappy/issues/detail?id=34
>
> FYI
>
> On Wed, Jul 29, 2015 at 3:34 PM, SP <sajidmca@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi All,
>
>     I am working on comparing different compression ratios.
>
>     I have these files in AVRO format. How can I compress them using
>     snappy or gzip.
>
>     -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00000.avro
>     -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
>     /tmp/fact_splitby_date_id/part-m-00001.avro
>     -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
>     /tmp/fact_splitby_date_id/part-m-00002.avro
>     -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
>     Thanks
>     Sp
>
>


Re: Need command to compress the files

Posted by Ron Gonzalez <zl...@yahoo.com>.
I think you can pick the compression algorithm when using sqoop - either 
deflate or snappy when specifying the --compress option.
Is that what you were asking?

Thanks,
Ron

On 07/29/2015 03:40 PM, Ted Yu wrote:
> You can use the following command to see options for gzip:
> gzip -h
>
> For snappy, see:
> https://github.com/kubo/snzip
> https://code.google.com/p/snappy/issues/detail?id=34
>
> FYI
>
> On Wed, Jul 29, 2015 at 3:34 PM, SP <sajidmca@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi All,
>
>     I am working on comparing different compression ratios.
>
>     I have these files in AVRO format. How can I compress them using
>     snappy or gzip.
>
>     -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00000.avro
>     -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
>     /tmp/fact_splitby_date_id/part-m-00001.avro
>     -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
>     /tmp/fact_splitby_date_id/part-m-00002.avro
>     -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
>     Thanks
>     Sp
>
>


Re: Need command to compress the files

Posted by Ron Gonzalez <zl...@yahoo.com>.
I think you can pick the compression algorithm when using sqoop - either 
deflate or snappy when specifying the --compress option.
Is that what you were asking?

Thanks,
Ron

On 07/29/2015 03:40 PM, Ted Yu wrote:
> You can use the following command to see options for gzip:
> gzip -h
>
> For snappy, see:
> https://github.com/kubo/snzip
> https://code.google.com/p/snappy/issues/detail?id=34
>
> FYI
>
> On Wed, Jul 29, 2015 at 3:34 PM, SP <sajidmca@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi All,
>
>     I am working on comparing different compression ratios.
>
>     I have these files in AVRO format. How can I compress them using
>     snappy or gzip.
>
>     -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00000.avro
>     -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
>     /tmp/fact_splitby_date_id/part-m-00001.avro
>     -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
>     /tmp/fact_splitby_date_id/part-m-00002.avro
>     -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
>     Thanks
>     Sp
>
>


Re: Need command to compress the files

Posted by Ron Gonzalez <zl...@yahoo.com>.
I think you can pick the compression algorithm when using sqoop - either 
deflate or snappy when specifying the --compress option.
Is that what you were asking?

Thanks,
Ron

On 07/29/2015 03:40 PM, Ted Yu wrote:
> You can use the following command to see options for gzip:
> gzip -h
>
> For snappy, see:
> https://github.com/kubo/snzip
> https://code.google.com/p/snappy/issues/detail?id=34
>
> FYI
>
> On Wed, Jul 29, 2015 at 3:34 PM, SP <sajidmca@gmail.com 
> <ma...@gmail.com>> wrote:
>
>     Hi All,
>
>     I am working on comparing different compression ratios.
>
>     I have these files in AVRO format. How can I compress them using
>     snappy or gzip.
>
>     -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00000.avro
>     -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
>     /tmp/fact_splitby_date_id/part-m-00001.avro
>     -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
>     /tmp/fact_splitby_date_id/part-m-00002.avro
>     -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
>     /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
>     Thanks
>     Sp
>
>


Re: Need command to compress the files

Posted by Ted Yu <yu...@gmail.com>.
You can use the following command to see options for gzip:
gzip -h

For snappy, see:
https://github.com/kubo/snzip
https://code.google.com/p/snappy/issues/detail?id=34

FYI

On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:

> Hi All,
>
> I am working on comparing different compression ratios.
>
> I have these files in AVRO format. How can I compress them using snappy or
> gzip.
>
> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00000.avro
> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
> /tmp/fact_splitby_date_id/part-m-00001.avro
> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
> /tmp/fact_splitby_date_id/part-m-00002.avro
> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
> Thanks
> Sp
>

Re: Need command to compress the files

Posted by Ted Yu <yu...@gmail.com>.
You can use the following command to see options for gzip:
gzip -h

For snappy, see:
https://github.com/kubo/snzip
https://code.google.com/p/snappy/issues/detail?id=34

FYI

On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:

> Hi All,
>
> I am working on comparing different compression ratios.
>
> I have these files in AVRO format. How can I compress them using snappy or
> gzip.
>
> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00000.avro
> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
> /tmp/fact_splitby_date_id/part-m-00001.avro
> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
> /tmp/fact_splitby_date_id/part-m-00002.avro
> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
> Thanks
> Sp
>

Re: Need command to compress the files

Posted by Ted Yu <yu...@gmail.com>.
You can use the following command to see options for gzip:
gzip -h

For snappy, see:
https://github.com/kubo/snzip
https://code.google.com/p/snappy/issues/detail?id=34

FYI

On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:

> Hi All,
>
> I am working on comparing different compression ratios.
>
> I have these files in AVRO format. How can I compress them using snappy or
> gzip.
>
> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00000.avro
> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
> /tmp/fact_splitby_date_id/part-m-00001.avro
> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
> /tmp/fact_splitby_date_id/part-m-00002.avro
> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
> Thanks
> Sp
>

Re: Need command to compress the files

Posted by Ted Yu <yu...@gmail.com>.
You can use the following command to see options for gzip:
gzip -h

For snappy, see:
https://github.com/kubo/snzip
https://code.google.com/p/snappy/issues/detail?id=34

FYI

On Wed, Jul 29, 2015 at 3:34 PM, SP <sa...@gmail.com> wrote:

> Hi All,
>
> I am working on comparing different compression ratios.
>
> I have these files in AVRO format. How can I compress them using snappy or
> gzip.
>
> -rw-r--r--   3 hdfs supergroup 3080866838 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00000.avro
> -rw-r--r--   3 hdfs supergroup 3021258762 2015-07-29 18:15
> /tmp/fact_splitby_date_id/part-m-00001.avro
> -rw-r--r--   3 hdfs supergroup 3164101762 2015-07-29 18:17
> /tmp/fact_splitby_date_id/part-m-00002.avro
> -rw-r--r--   3 hdfs supergroup 3251578205 2015-07-29 18:16
> /tmp/fact_splitby_date_id/part-m-00003.avro
>
>
>
>
> Thanks
> Sp
>