You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@pig.apache.org by Apache Wiki <wi...@apache.org> on 2008/03/28 18:03:01 UTC

[Pig Wiki] Update of "PigStreamingFunctionalSpec" by OlgaN

Dear Wiki user,

You have subscribed to a wiki page or wiki category on "Pig Wiki" for change notification.

The following page has been changed by OlgaN:
http://wiki.apache.org/pig/PigStreamingFunctionalSpec

------------------------------------------------------------------------------
  S = stream A through `stream.pl`;
  }}}
  
- In the example above, `DefaultSerializer` is used that takes tuples out of A and converts them into tab delimitted lines that are passed to `stream.pl`. If A was a result of a grouping operation, the `DefaultSerializer` would also flatten the data. The output of streaming is processed by `DefaultDeserializer` one line at a time and split on tabs. 
+ In the example above, default serialize (!PigStorage) is used that takes tuples out of A and converts them into tab delimitted lines that are passed to `stream.pl`. The output of streaming is processed by default deserializer (!PigStorage) one line at a time and split on tabs. 
  
  The user would be able to provide an alternative delimiter to default (de)serializer via `define command`:
  
  {{{
- define X `stream.pl` input(stdin using DefaultSerializer('^A')) output (stdout using DefaultDeserializer('^A'));
+ define X `stream.pl` input(stdin using PigStorage('^A')) output (stdout using PigStorage('^A'));
  S = stream A through X;
  }}}
  
@@ -209, +209 @@

  S = stream A through X;
  }}}
  
- The following serializers/deserializer will be part of pig distribution:
+ In addition to !PigStorage the following serializers/deserializer will be part of pig distribution:
  
-  1. !DefaultSerializer, !DefaultDeserializer as described above (This is going to be PigStorage)
+  1. !BinarySerializer, !BinaryDeserializer - treats the entire file as byte stream - no formating or interpretation.
   2. !PythonSerializer, !PythonDeserializer 
-  3. !BinarySerializer, !BinaryDeserializer - treats the entire file as byte stream - no formating or interpretation.
  
  Each deserializer will be implementing `LoadFunc` interface. Each serializer will be implementing `StoreFunc` interface. `StoreFunc` interface will be extended with `void flatten() throws OperationNotSupportedException;` method that would indicate that the data needs to be flattened before it is serialized. The class can choose not to support this functionality and through an exception.