You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Shannon Quinn <sq...@gatech.edu> on 2016/03/11 18:15:43 UTC

zipWithIndex in Python API

Hi all,

I'm interested in getting involved the Python API development. The first 
use-case I've encountered in my work is that of zipWithIndex, so I 
started looking into how to go about implementing that. It looks like 
the core of it involves being able to uniquely identify what worker 
you're currently in between distributed calls; the Scala end has 
getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime 
context is more or less limited to the broadcast variables.

Happy to hear any hints as to how I should get started with this. Thanks.

Regards,
Shannon

Re: zipWithIndex in Python API

Posted by Shannon Quinn <sq...@gatech.edu>.
I'm a Python guru; if it doesn't have a Python API, I'll likely help 
make one :)

Work is bad this week but I'm planning to get started on this next week!

Shannon

On 3/14/16 5:37 AM, Robert Metzger wrote:
> Hi Shannon,
>
> I'm happy to see some community engagement on our Python APIs!
>
> On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> The subtaskIndex is not currently exposed to the python operator.
>>
>> Fortunately this can be changed very easily:
>> On the java side, within PythonStreamer.startPython() the python process
>> is started and several parameters are transferred (L.129++) using
>> stdin/-out.
>> These parameters are received on the python side in Environment.execute()
>> (L.168++).
>>
>> So the transfer is rather straight-forward, after that you only have to
>> modify the operator.configure() method to
>> also take a subtaskIndex argument, modify the RuntimeContext constructor,
>> add a getIndexOfThisSubtask() method and you're set.
>>
>> Feel free to open a JIRA for this.
>>
>>
>> On 11.03.2016 18:15, Shannon Quinn wrote:
>>
>>> Hi all,
>>>
>>> I'm interested in getting involved the Python API development. The first
>>> use-case I've encountered in my work is that of zipWithIndex, so I started
>>> looking into how to go about implementing that. It looks like the core of
>>> it involves being able to uniquely identify what worker you're currently in
>>> between distributed calls; the Scala end has
>>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
>>> is more or less limited to the broadcast variables.
>>>
>>> Happy to hear any hints as to how I should get started with this. Thanks.
>>>
>>> Regards,
>>> Shannon
>>>
>>>


Re: zipWithIndex in Python API

Posted by Robert Metzger <rm...@apache.org>.
Hi Shannon,

I'm happy to see some community engagement on our Python APIs!

On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <ch...@apache.org>
wrote:

> The subtaskIndex is not currently exposed to the python operator.
>
> Fortunately this can be changed very easily:
> On the java side, within PythonStreamer.startPython() the python process
> is started and several parameters are transferred (L.129++) using
> stdin/-out.
> These parameters are received on the python side in Environment.execute()
> (L.168++).
>
> So the transfer is rather straight-forward, after that you only have to
> modify the operator.configure() method to
> also take a subtaskIndex argument, modify the RuntimeContext constructor,
> add a getIndexOfThisSubtask() method and you're set.
>
> Feel free to open a JIRA for this.
>
>
> On 11.03.2016 18:15, Shannon Quinn wrote:
>
>> Hi all,
>>
>> I'm interested in getting involved the Python API development. The first
>> use-case I've encountered in my work is that of zipWithIndex, so I started
>> looking into how to go about implementing that. It looks like the core of
>> it involves being able to uniquely identify what worker you're currently in
>> between distributed calls; the Scala end has
>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
>> is more or less limited to the broadcast variables.
>>
>> Happy to hear any hints as to how I should get started with this. Thanks.
>>
>> Regards,
>> Shannon
>>
>>
>

Re: zipWithIndex in Python API

Posted by Chesnay Schepler <ch...@apache.org>.
The subtaskIndex is not currently exposed to the python operator.

Fortunately this can be changed very easily:
On the java side, within PythonStreamer.startPython() the python process 
is started and several parameters are transferred (L.129++) using 
stdin/-out.
These parameters are received on the python side in 
Environment.execute() (L.168++).

So the transfer is rather straight-forward, after that you only have to 
modify the operator.configure() method to
also take a subtaskIndex argument, modify the RuntimeContext 
constructor, add a getIndexOfThisSubtask() method and you're set.

Feel free to open a JIRA for this.

On 11.03.2016 18:15, Shannon Quinn wrote:
> Hi all,
>
> I'm interested in getting involved the Python API development. The 
> first use-case I've encountered in my work is that of zipWithIndex, so 
> I started looking into how to go about implementing that. It looks 
> like the core of it involves being able to uniquely identify what 
> worker you're currently in between distributed calls; the Scala end 
> has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime 
> context is more or less limited to the broadcast variables.
>
> Happy to hear any hints as to how I should get started with this. Thanks.
>
> Regards,
> Shannon
>