You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/03/28 18:03:01 UTC
[Pig Wiki] Update of "PigStreamingFunctionalSpec" by OlgaN
Dear Wiki user,
You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.
The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec
------------------------------------------------------------------------------
S = stream A through `stream.pl`;
}}}
- In the example above, `DefaultSerializer` is used that takes tuples out of A and converts them into tab delimitted lines that are passed to `stream.pl`. If A was a result of a grouping operation, the `DefaultSerializer` would also flatten the data. The output of streaming is processed by `DefaultDeserializer` one line at a time and split on tabs.
+ In the example above, default serialize (!PigStorage) is used that takes tuples out of A and converts them into tab delimitted lines that are passed to `stream.pl`. The output of streaming is processed by default deserializer (!PigStorage) one line at a time and split on tabs.
The user would be able to provide an alternative delimiter to default (de)serializer via `define command`:
{{{
- define X `stream.pl` input(stdin using DefaultSerializer('^A')) output (stdout using DefaultDeserializer('^A'));
+ define X `stream.pl` input(stdin using PigStorage('^A')) output (stdout using PigStorage('^A'));
S = stream A through X;
}}}
@@ -209, +209 @@
S = stream A through X;
}}}
- The following serializers/deserializer will be part of pig distribution:
+ In addition to !PigStorage the following serializers/deserializer will be part of pig distribution:
- 1. !DefaultSerializer, !DefaultDeserializer as described above (This is going to be PigStorage)
+ 1. !BinarySerializer, !BinaryDeserializer - treats the entire file as byte stream - no formating or interpretation.
2. !PythonSerializer, !PythonDeserializer
- 3. !BinarySerializer, !BinaryDeserializer - treats the entire file as byte stream - no formating or interpretation.
Each deserializer will be implementing `LoadFunc` interface. Each serializer will be implementing `StoreFunc` interface. `StoreFunc` interface will be extended with `void flatten() throws OperationNotSupportedException;` method that would indicate that the data needs to be flattened before it is serialized. The class can choose not to support this functionality and through an exception.