You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@pig.apache.org by John Smith <le...@gmail.com> on 2016/01/12 12:16:14 UTC

Pig - outputSchema - create schema for tuple

Im trying to define output schema which should be Tuple that contains
another two tuples, i.e  `stats:tuple(c:tuple(),d:tuple)`.

The code below doesnt work as it was intended. It somehow produces
structure as:

    stats:tuple(b:tuple(c:tuple(),d:tuple()))

Below is output produced by describe.

    sourceData: {com.mortardata.pig.dataspliter_36: (stats: ((name:
chararray,customerId: chararray,VIN: chararray,birth_date:
chararray,fuel_mileage: chararray,fuel_consumption: chararray),(name:
chararray,customerId: chararray,VIN: chararray,birth_date:
chararray,fuel_mileage: chararray,fuel_consumption: chararray)))}


Is it possible to create structure as below, which means i need to remove
tuple b from the previous example.

    grunt> describe sourceData;
    sourceData: {t: (s: (name: chararray,customerId: chararray,VIN:
chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption:
chararray),n: (name: chararray,customerId: chararray,VIN:
chararray,birth_date: chararray,fuel_mileage: chararray,fuel_consumption:
chararray))}


The below code doesnt work as expected.


      public Schema outputSchema(Schema input) {
                Schema sensTuple = new Schema();
                sensTuple.add(new Schema.FieldSchema("name",
DataType.CHARARRAY));
                sensTuple.add(new Schema.FieldSchema("customerId",
DataType.CHARARRAY));
                sensTuple.add(new Schema.FieldSchema("VIN",
DataType.CHARARRAY));
                sensTuple.add(new Schema.FieldSchema("birth_date",
DataType.CHARARRAY));
                sensTuple.add(new Schema.FieldSchema("fuel_mileage",
DataType.CHARARRAY));
                sensTuple.add(new Schema.FieldSchema("fuel_consumption",
DataType.CHARARRAY));

                Schema nonSensTuple = new Schema();
                nonSensTuple.add(new Schema.FieldSchema("name",
DataType.CHARARRAY));
                nonSensTuple.add(new Schema.FieldSchema("customerId",
DataType.CHARARRAY));
                nonSensTuple.add(new Schema.FieldSchema("VIN",
DataType.CHARARRAY));
                nonSensTuple.add(new Schema.FieldSchema("birth_date",
DataType.CHARARRAY));
                nonSensTuple.add(new Schema.FieldSchema("fuel_mileage",
DataType.CHARARRAY));
                nonSensTuple.add(new Schema.FieldSchema("fuel_consumption",
DataType.CHARARRAY));


                Schema parentTuple = new Schema();
                parentTuple.add(new Schema.FieldSchema(null, sensTuple,
DataType.TUPLE));
                parentTuple.add(new Schema.FieldSchema(null, nonSensTuple,
DataType.TUPLE));


                Schema outputSchema = new Schema();
                    outputSchema.add(new Schema.FieldSchema("stats",
parentTuple, DataType.TUPLE));

                return new Schema(new
Schema.FieldSchema(getSchemaName(this.getClass().getName().toLowerCase(),
input),
                        outputSchema, DataType.TUPLE));




thank you