You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@flink.apache.org by Shannon Quinn <sq...@gatech.edu> on 2016/03/11 18:15:43 UTC
zipWithIndex in Python API
Hi all,
I'm interested in getting involved the Python API development. The first
use-case I've encountered in my work is that of zipWithIndex, so I
started looking into how to go about implementing that. It looks like
the core of it involves being able to uniquely identify what worker
you're currently in between distributed calls; the Scala end has
getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime
context is more or less limited to the broadcast variables.
Happy to hear any hints as to how I should get started with this. Thanks.
Regards,
Shannon
Re: zipWithIndex in Python API
Posted by Shannon Quinn <sq...@gatech.edu>.
I'm a Python guru; if it doesn't have a Python API, I'll likely help
make one :)
Work is bad this week but I'm planning to get started on this next week!
Shannon
On 3/14/16 5:37 AM, Robert Metzger wrote:
> Hi Shannon,
>
> I'm happy to see some community engagement on our Python APIs!
>
> On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <ch...@apache.org>
> wrote:
>
>> The subtaskIndex is not currently exposed to the python operator.
>>
>> Fortunately this can be changed very easily:
>> On the java side, within PythonStreamer.startPython() the python process
>> is started and several parameters are transferred (L.129++) using
>> stdin/-out.
>> These parameters are received on the python side in Environment.execute()
>> (L.168++).
>>
>> So the transfer is rather straight-forward, after that you only have to
>> modify the operator.configure() method to
>> also take a subtaskIndex argument, modify the RuntimeContext constructor,
>> add a getIndexOfThisSubtask() method and you're set.
>>
>> Feel free to open a JIRA for this.
>>
>>
>> On 11.03.2016 18:15, Shannon Quinn wrote:
>>
>>> Hi all,
>>>
>>> I'm interested in getting involved the Python API development. The first
>>> use-case I've encountered in my work is that of zipWithIndex, so I started
>>> looking into how to go about implementing that. It looks like the core of
>>> it involves being able to uniquely identify what worker you're currently in
>>> between distributed calls; the Scala end has
>>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
>>> is more or less limited to the broadcast variables.
>>>
>>> Happy to hear any hints as to how I should get started with this. Thanks.
>>>
>>> Regards,
>>> Shannon
>>>
>>>
Re: zipWithIndex in Python API
Posted by Robert Metzger <rm...@apache.org>.
Hi Shannon,
I'm happy to see some community engagement on our Python APIs!
On Fri, Mar 11, 2016 at 6:32 PM, Chesnay Schepler <ch...@apache.org>
wrote:
> The subtaskIndex is not currently exposed to the python operator.
>
> Fortunately this can be changed very easily:
> On the java side, within PythonStreamer.startPython() the python process
> is started and several parameters are transferred (L.129++) using
> stdin/-out.
> These parameters are received on the python side in Environment.execute()
> (L.168++).
>
> So the transfer is rather straight-forward, after that you only have to
> modify the operator.configure() method to
> also take a subtaskIndex argument, modify the RuntimeContext constructor,
> add a getIndexOfThisSubtask() method and you're set.
>
> Feel free to open a JIRA for this.
>
>
> On 11.03.2016 18:15, Shannon Quinn wrote:
>
>> Hi all,
>>
>> I'm interested in getting involved the Python API development. The first
>> use-case I've encountered in my work is that of zipWithIndex, so I started
>> looking into how to go about implementing that. It looks like the core of
>> it involves being able to uniquely identify what worker you're currently in
>> between distributed calls; the Scala end has
>> getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime context
>> is more or less limited to the broadcast variables.
>>
>> Happy to hear any hints as to how I should get started with this. Thanks.
>>
>> Regards,
>> Shannon
>>
>>
>
Re: zipWithIndex in Python API
Posted by Chesnay Schepler <ch...@apache.org>.
The subtaskIndex is not currently exposed to the python operator.
Fortunately this can be changed very easily:
On the java side, within PythonStreamer.startPython() the python process
is started and several parameters are transferred (L.129++) using
stdin/-out.
These parameters are received on the python side in
Environment.execute() (L.168++).
So the transfer is rather straight-forward, after that you only have to
modify the operator.configure() method to
also take a subtaskIndex argument, modify the RuntimeContext
constructor, add a getIndexOfThisSubtask() method and you're set.
Feel free to open a JIRA for this.
On 11.03.2016 18:15, Shannon Quinn wrote:
> Hi all,
>
> I'm interested in getting involved the Python API development. The
> first use-case I've encountered in my work is that of zipWithIndex, so
> I started looking into how to go about implementing that. It looks
> like the core of it involves being able to uniquely identify what
> worker you're currently in between distributed calls; the Scala end
> has getRuntimeContext().getIndexOfThisSubtask(), but Python's runtime
> context is more or less limited to the broadcast variables.
>
> Happy to hear any hints as to how I should get started with this. Thanks.
>
> Regards,
> Shannon
>