You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@avro.apache.org by "Niels Basjes (JIRA)" <ji...@apache.org> on 2017/05/15 12:43:04 UTC
[jira] [Commented] (AVRO-1862) AvroOutputFormat saves compressed avrò files without respecting codec's default extension
[ https://issues.apache.org/jira/browse/AVRO-1862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16010427#comment-16010427 ]
Niels Basjes commented on AVRO-1862:
------------------------------------
If I create a tar archive and the compress it with gzip I get a name like {{example.tar.gz}}.
If I gunzip that file I actually get a {{example.tar}} which is a tar archive.
Avro files are Avro files.
You cannot 'unavro' a {{example.gz.avro}} file and get a {{example.gz}} file.
The other way around is also wrong using a name like {{example.avro.gz}} would lead to the expectation that it is a gzipped file and if you ungzip it you get a {{example.avro}}.
Based on the explanation I interpret the reason behind this change as a workaround for a need in scripting to avoid certain situations.
Alternative solution for the described use case: Run Camus in a script that after running the task simply renames the output file {{example.avro}} to {{example.camus.avro}}
I see this as a problem that does not belong to the avro code base.
So based on what I see here I think this should not be committed.
> AvroOutputFormat saves compressed avrò files without respecting codec's default extension
> -----------------------------------------------------------------------------------------
>
> Key: AVRO-1862
> URL: https://issues.apache.org/jira/browse/AVRO-1862
> Project: Avro
> Issue Type: Improvement
> Components: java
> Affects Versions: 1.8.1
> Reporter: Piotr Wikieł
> Priority: Minor
> Labels: patch
> Fix For: 1.8.3
>
> Attachments: AVRO-1862-1.patch, AVRO-1862.patch
>
>
> Common pattern in naming compressed files is giving them extension derived from compression codec, for example: {{.gz}}, {{.zip}}, {{.bz2}}.
> {{AvroOutputFormat}} currently does not respect this convention.
> I've adapted some code from Hadoop's {{TextOutputFormat}} in backward-compatible manner adding following {{JobConf}} property:
> {{avro.mapred.output.extension.from-codec}} ({{boolean}}, default: {{false}}) - when set to {{true}}, extension will be changed according to above rule.
> EDIT: Please take a look at first comment for an update. {{.gz.avro}}, {{.snappy.avro}} will be an extension of the file when above property will be set to true.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)