You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@flink.apache.org by "Chesnay Schepler (JIRA)" <ji...@apache.org> on 2014/07/16 10:18:04 UTC

[jira] [Comment Edited] (FLINK-671) Python interface for new API (Map/Reduce)

    [ https://issues.apache.org/jira/browse/FLINK-671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14063272#comment-14063272 ] 

Chesnay Schepler edited comment on FLINK-671 at 7/16/14 8:16 AM:
-----------------------------------------------------------------

binary data over streams.

writing a double from java would look like this:
{code:java}
buffer = new byte[8];
ByteBuffer.wrap(buffer).putDouble((Double) value);
outStream.write(buffer);
{code}
and here how python reads it:
{code}
raw_double = self._connection.receive(8)
return struct.unpack(">d", raw_double)[0]
{code}
raw_double is just a bunch of bytes represented as characters, which struct.unpack then reads as double.

for a given tuple, the write process looks like this:
* write the meta byte, containing the size of the tuple (0 if not a tuple) and an isLast flag (useful for iterators)
* for each field:
** write the type byte (this step could be removed actually, once the code is really stable)
** write binary data (that is created using ByteBuffers on java side, struct.pack on python side)

just for completion's sake, here's what the previous process looked like:
for a given tuple:
* convert the tuple into the user-defined (Java)ProtoTuple format (by adding every field manually)
* convert this ProtoTuple to a string (built-in method, this string is the part that's language agnostic)
* write the size of the string (using another! protocol format (with a fixed size))
* write the string to the stream

* read size (and parse and convert it)
* read string
* parse (Python)ProtoTuple from string (this takes long!)
* convert ProtoTuple to a normal tuple (manually)

PS: wow, jira has no code formatter for python.


was (Author: zentol):
binary data over streams.

writing a double from java would look like this:
{code:java}
buffer = new byte[8];
ByteBuffer.wrap(buffer).putDouble((Double) value);
outStream.write(buffer);
{code}
and here how python reads it:
{code}
raw_double = self._connection.receive(8)
return struct.unpack(">d", raw_double)[0]
{code}
raw_double is just a bunch of bytes represented as characters, which struct.unpack then reads as double.

for a given tuple, the write process looks like this:
* write the meta byte, containing the size of the tuple (0 if not a tuple) and an isLast flag (useful for iterators)
* for each field:
** write the type byte (this step could be removed actually, once the code is really stable)
** write binary data (that is created using ByteBuffers on java side, struct.pack on python side)

just for completion's sake, here's what the previous process looked like:
for a given tuple:
* convert the tuple into the user-defined (Java)ProtoTuple format (by adding every field manually)
* convert this ProtoTuple to a string (built-in method, this string is the part that's language agnostic)
* write the size of the string
* write the string to the stream

* read size
* read string
* parse (Python)ProtoTuple from string (this takes long!)
* convert ProtoTuple to a normal tuple (manually)

PS: wow, jira has no code formatter for python.

> Python interface for new API (Map/Reduce)
> -----------------------------------------
>
>                 Key: FLINK-671
>                 URL: https://issues.apache.org/jira/browse/FLINK-671
>             Project: Flink
>          Issue Type: Improvement
>          Components: Python API
>            Reporter: Chesnay Schepler
>            Assignee: Chesnay Schepler
>              Labels: github-import
>             Fix For: pre-apache
>
>         Attachments: pull-request-671-9139035883911146960.patch
>
>
> ([#615|https://github.com/stratosphere/stratosphere/issues/615] | [FLINK-615|https://issues.apache.org/jira/browse/FLINK-615])
> ---------------- Imported from GitHub ----------------
> Url: https://github.com/stratosphere/stratosphere/pull/671
> Created by: [zentol|https://github.com/zentol]
> Labels: enhancement, java api, 
> Milestone: Release 0.6 (unplanned)
> Created at: Wed Apr 09 20:52:06 CEST 2014
> State: open



--
This message was sent by Atlassian JIRA
(v6.2#6252)