You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by John Smith <le...@gmail.com> on 2016/01/12 12:16:14 UTC
Pig - outputSchema - create schema for tuple
Im trying to define output schema which should be Tuple that contains
another two tuples, i.e `stats:tuple(c:tuple(),d:tuple)`.
The code below doesnt work as it was intended. It somehow produces
structure as:
stats:tuple(b:tuple(c:tuple(),d:tuple()))
Below is output produced by describe.
sourceData: {com.mortardata.pig.dataspliter_36: (stats: ((name:
chararray,customerId: chararray,VIN: chararray,birth_date:
chararray,fuel_mileage: chararray,fuel_consumption: chararray),(name:
chararray,customerId: chararray,VIN: chararray,birth_date:
chararray,fuel_mileage: chararray,fuel_consumption: chararray)))}
Is it possible to create structure as below, which means i need to remove
tuple b from the previous example.
grunt> describe sourceData;
sourceData: {t: (s: (name: chararray,customerId: chararray,VIN:
chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption:
chararray),n: (name: chararray,customerId: chararray,VIN:
chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption:
chararray))}
The below code doesnt work as expected.
public Schema outputSchema(Schema input) {
Schema sensTuple = new Schema();
sensTuple.add(new Schema.FieldSchema("name",
DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("customerId",
DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("VIN",
DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("birth_date",
DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("fuel_mileage",
DataType.CHARARRAY));
sensTuple.add(new Schema.FieldSchema("fuel_consumption",
DataType.CHARARRAY));
Schema nonSensTuple = new Schema();
nonSensTuple.add(new Schema.FieldSchema("name",
DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("customerId",
DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("VIN",
DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("birth_date",
DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("fuel_mileage",
DataType.CHARARRAY));
nonSensTuple.add(new Schema.FieldSchema("fuel_consumption",
DataType.CHARARRAY));
Schema parentTuple = new Schema();
parentTuple.add(new Schema.FieldSchema(null, sensTuple,
DataType.TUPLE));
parentTuple.add(new Schema.FieldSchema(null, nonSensTuple,
DataType.TUPLE));
Schema outputSchema = new Schema();
outputSchema.add(new Schema.FieldSchema("stats",
parentTuple, DataType.TUPLE));
return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
input),
outputSchema, DataType.TUPLE));
thank you