You are viewing a plain text version of this content. The canonical link for it is here.
Posted to common-issues@hadoop.apache.org by "Muddy Dixon (JIRA)" <ji...@apache.org> on 2012/09/20 07:14:07 UTC

[jira] [Commented] (HADOOP-8449) hadoop fs -text fails with compressed sequence files with the codec file extension

    [ https://issues.apache.org/jira/browse/HADOOP-8449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13459352#comment-13459352 ] 

Muddy Dixon commented on HADOOP-8449:
-------------------------------------

Hi

We found the changes in order of switch and guard block in 

{code}private InputStream forMagic(Path p, FileSystem srcFs) throws IOException{code}

Because of this change, return value of {code}codec.createInputStream(i){code} is changed if codec exists.

h4. cdh3u3
{code}
private InputStream forMagic(Path p, FileSystem srcFs) throws IOException {
    FSDataInputStream i = srcFs.open(p);

    // check codecs
    CompressionCodecFactory cf = new CompressionCodecFactory(getConf());
    CompressionCodec codec = cf.getCodec(p);
    if (codec != null) {
      return codec.createInputStream(i);
    }

    switch(i.readShort()) {
       // cases
    }
{code}


h4. cdh3u5
{code}
private InputStream forMagic(Path p, FileSystem srcFs) throws IOException {
    FSDataInputStream i = srcFs.open(p);

    switch(i.readShort()) { // <=== index (or pointer) processes!!
      // cases
      default: {
        // Check the type of compression instead, depending on Codec class's
        // own detection methods, based on the provided path.
        CompressionCodecFactory cf = new CompressionCodecFactory(getConf());
        CompressionCodec codec = cf.getCodec(p);
        if (codec != null) {
          return codec.createInputStream(i);
        }
        break;
      }
    }

    // File is non-compressed, or not a file container we know.
    i.seek(0);
    return i;
  }
{code}
                
> hadoop fs -text fails with compressed sequence files with the codec file extension
> ----------------------------------------------------------------------------------
>
>                 Key: HADOOP-8449
>                 URL: https://issues.apache.org/jira/browse/HADOOP-8449
>             Project: Hadoop Common
>          Issue Type: Bug
>    Affects Versions: 1.0.3, 2.0.0-alpha
>            Reporter: Joey Echeverria
>            Assignee: Harsh J
>            Priority: Minor
>             Fix For: 2.0.2-alpha
>
>         Attachments: HADOOP-8449.patch, HADOOP-8449.patch
>
>
> When the -text command is run on a file and the file ends in the default extension for a codec (e.g. snappy or gz), but is a compressed sequence file, the command will fail.
> The issue is that it assumes that if it matches the extension, then it's plain compressed file. It might be more helpful to check if it's a sequence file first, and then check the file extension second.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira