You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Ruslan Dautkhanov (Jira)" <ji...@apache.org> on 2020/08/27 03:54:00 UTC

[jira] [Created] (HADOOP-17231) empty getDefaultExtension() is ignored

Ruslan Dautkhanov created HADOOP-17231:
------------------------------------------

             Summary: empty getDefaultExtension() is ignored
                 Key: HADOOP-17231
                 URL: https://issues.apache.org/jira/browse/HADOOP-17231
             Project: Hadoop Common
          Issue Type: Bug
    Affects Versions: 3.1.3, 3.2.0
            Reporter: Ruslan Dautkhanov


Use case - source files are gz-compressed but have no extensions.

Attempt to auto-decompress them through 
{code:java}
package com.my.codec.test

import org.apache.hadoop.io.compress.GzipCodec

class GZCodec extends GzipCodec {
  override def getDefaultExtension(): String = ""
 }
{code}
 (notice empty getDefaultExtension ) and then setting *io.compression.codecs* to com.my.codec.test.GZCodec makes no effect 

Similar tests with one-character encoding for last possible names makes it work. So only the empty-string getDefaultExtension case is broken. 

I guess the issue is somewhere in [https://github.com/c9n/hadoop/blob/master/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/CompressionCodecFactory.java#L109] 

but it's not obvious. 

Folks have built some workarounds using custom readers, for example, 
 # [https://daynebatten.com/2015/11/override-hadoop-compression-codec-file-extension/]
 # [https://stackoverflow.com/questions/52011697/how-to-read-a-compressed-gzip-file-without-extension-in-spark?rq=1] 

Hopefully it would be an easy fix to support empty getDefaultExtension? 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscribe@hadoop.apache.org
For additional commands, e-mail: common-issues-help@hadoop.apache.org