You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@crunch.apache.org by "Alex Kozlov (JIRA)" <ji...@apache.org> on 2013/12/10 00:52:07 UTC
[jira] [Created] (CRUNCH-310) There should be a way to specify
projection schema for Parquet files
Alex Kozlov created CRUNCH-310:
----------------------------------
Summary: There should be a way to specify projection schema for Parquet files
Key: CRUNCH-310
URL: https://issues.apache.org/jira/browse/CRUNCH-310
Project: Crunch
Issue Type: Improvement
Components: IO
Reporter: Alex Kozlov
Priority: Critical
Currently the projection schema is set based on the ptype:
{code}
private static <S> FormatBundle<AvroParquetInputFormat> getBundle(AvroType<S> ptype) {
return FormatBundle.forInput(AvroParquetInputFormat.class)
.set(AvroReadSupport.AVRO_REQUESTED_PROJECTION, ptype.getSchema().toString())
// ParquetRecordReader expects ParquetInputSplits, not FileSplits, so it
// doesn't work with CombineFileInputFormat
.set(RuntimeParameters.DISABLE_COMBINE_FILE, "true");
}
{code}
Sometimes a user wants a subset of columns as a projection. Need a mechanism to supply desired projection schema.
--
This message was sent by Atlassian JIRA
(v6.1.4#6159)