You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ruslan Dautkhanov (Jira)" <ji...@apache.org> on 2020/08/27 03:54:00 UTC
[jira] [Created] (HADOOP-17231) empty getDefaultExtension() is
ignored
Ruslan Dautkhanov created HADOOP-17231:
------------------------------------------
Summary: empty getDefaultExtension() is ignored
Key: HADOOP-17231
URL: https://issues.apache.org/jira/browse/HADOOP-17231
Project: Hadoop Common
Issue Type: Bug
Affects Versions: 3.1.3, 3.2.0
Reporter: Ruslan Dautkhanov
Use case - source files are gz-compressed but have no extensions.
Attempt to auto-decompress them through
{code:java}
package com.my.codec.test
import org.apache.hadoop.io.compress.GzipCodec
class GZCodec extends GzipCodec {
override def getDefaultExtension(): String = ""
}
{code}
(notice empty getDefaultExtension ) and then setting *io.compression.codecs* to com.my.codec.test.GZCodec makes no effect
Similar tests with one-character encoding for last possible names makes it work. So only the empty-string getDefaultExtension case is broken.
I guess the issue is somewhere in [https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CompressionCodecFactory.java#L109]
but it's not obvious.
Folks have built some workarounds using custom readers, for example,
# [https://daynebatten.com/2015/11/override-hadoop-compression-codec-file-extension/]
# [https://stackoverflow.com/questions/52011697/how-to-read-a-compressed-gzip-file-without-extension-in-spark?rq=1]
Hopefully it would be an easy fix to support empty getDefaultExtension?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org