You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Alex Kozlov (JIRA)" <ji...@apache.org> on 2013/12/10 00:52:07 UTC

[jira] [Created] (CRUNCH-310) There should be a way to specify projection schema for Parquet files

Alex Kozlov created CRUNCH-310:
----------------------------------

             Summary: There should be a way to specify projection schema for Parquet files
                 Key: CRUNCH-310
                 URL: https://issues.apache.org/jira/browse/CRUNCH-310
             Project: Crunch
          Issue Type: Improvement
          Components: IO
            Reporter: Alex Kozlov
            Priority: Critical


Currently the projection schema is set based on the ptype:

{code}
 private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S> ptype) {
    return FormatBundle.forInput(AvroParquetInputFormat.class)
        .set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, ptype.getSchema().toString())
        // ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
        // doesn't work with CombineFileInputFormat
        .set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
  }
{code}

Sometimes a user wants a subset of columns as a projection.  Need a mechanism to supply desired projection schema.



--
This message was sent by Atlassian JIRA
(v6.1.4#6159)