You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@flink.apache.org by Attila Bernáth <be...@gmail.com> on 2014/12/03 15:49:25 UTC

partition identifier

Dear Developers,

Datasets are partitioned between machines. I wonder if there is a way
to get some identifier of a partition. I see that the class
HashPartition has a getPartitionNumber method, but I don't see how I
could use this.
(For example, I would like to see the partition identifier in a
MapFunction, or in a MapPartitionFunction).

Attila

Re: partition identifier

Posted by Attila Bernáth <be...@gmail.com>.
I think I have found it: it must be
getRuntimeContext().getIndexOfThisSubtask();
Attila

2014-12-03 16:12 GMT+01:00 Attila Bernáth <be...@gmail.com>:
> Thank you, Stephan.
> How to access the partition number from the RuntimeContext?
>
> Attila
>
> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <se...@apache.org>:
>> Hey!
>>
>> Here is a brief description how to use rich functions:
>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>> Hi!
>>>
>>> You can always use the "rich" version of the function, for example the
>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>>> which gives you access to many things, among them the partition number.
>>>
>>> Stephan
>>>
>>>
>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <be...@gmail.com>
>>> wrote:
>>>>
>>>> Dear Developers,
>>>>
>>>> Datasets are partitioned between machines. I wonder if there is a way
>>>> to get some identifier of a partition. I see that the class
>>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>>> could use this.
>>>> (For example, I would like to see the partition identifier in a
>>>> MapFunction, or in a MapPartitionFunction).
>>>>
>>>> Attila
>>>
>>>
>>

Re: partition identifier

Posted by Attila Bernáth <be...@gmail.com>.
I am trying to write some code that is cleverer than the optimizer.
The idea is that in spargel you often want to send the same message to
many other graph nodes. These target nodes are partitioned between the
machines of your cluster, and it would make sense to send the message
to a target machine only once, and then it would distribute it to the
nodes it is holding.

Attila

2014-12-03 16:21 GMT+01:00 Aljoscha Krettek <al...@apache.org>:
> RuntimeContext.getIndexOfThisSubtask()
>
> What do you want to use this partition number for? If I may ask.
>
> Cheers,
> Aljoscha
>
> On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <be...@gmail.com> wrote:
>> Thank you, Stephan.
>> How to access the partition number from the RuntimeContext?
>>
>> Attila
>>
>> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <se...@apache.org>:
>>> Hey!
>>>
>>> Here is a brief description how to use rich functions:
>>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>>>
>>> Greetings,
>>> Stephan
>>>
>>>
>>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <se...@apache.org> wrote:
>>>>
>>>> Hi!
>>>>
>>>> You can always use the "rich" version of the function, for example the
>>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>>>> which gives you access to many things, among them the partition number.
>>>>
>>>> Stephan
>>>>
>>>>
>>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <be...@gmail.com>
>>>> wrote:
>>>>>
>>>>> Dear Developers,
>>>>>
>>>>> Datasets are partitioned between machines. I wonder if there is a way
>>>>> to get some identifier of a partition. I see that the class
>>>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>>>> could use this.
>>>>> (For example, I would like to see the partition identifier in a
>>>>> MapFunction, or in a MapPartitionFunction).
>>>>>
>>>>> Attila
>>>>
>>>>
>>>

Re: partition identifier

Posted by Aljoscha Krettek <al...@apache.org>.
RuntimeContext.getIndexOfThisSubtask()

What do you want to use this partition number for? If I may ask.

Cheers,
Aljoscha

On Wed, Dec 3, 2014 at 4:12 PM, Attila Bernáth <be...@gmail.com> wrote:
> Thank you, Stephan.
> How to access the partition number from the RuntimeContext?
>
> Attila
>
> 2014-12-03 15:53 GMT+01:00 Stephan Ewen <se...@apache.org>:
>> Hey!
>>
>> Here is a brief description how to use rich functions:
>> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>>
>> Greetings,
>> Stephan
>>
>>
>> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <se...@apache.org> wrote:
>>>
>>> Hi!
>>>
>>> You can always use the "rich" version of the function, for example the
>>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>>> which gives you access to many things, among them the partition number.
>>>
>>> Stephan
>>>
>>>
>>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <be...@gmail.com>
>>> wrote:
>>>>
>>>> Dear Developers,
>>>>
>>>> Datasets are partitioned between machines. I wonder if there is a way
>>>> to get some identifier of a partition. I see that the class
>>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>>> could use this.
>>>> (For example, I would like to see the partition identifier in a
>>>> MapFunction, or in a MapPartitionFunction).
>>>>
>>>> Attila
>>>
>>>
>>

Re: partition identifier

Posted by Attila Bernáth <be...@gmail.com>.
Thank you, Stephan.
How to access the partition number from the RuntimeContext?

Attila

2014-12-03 15:53 GMT+01:00 Stephan Ewen <se...@apache.org>:
> Hey!
>
> Here is a brief description how to use rich functions:
> http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink
>
> Greetings,
> Stephan
>
>
> On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <se...@apache.org> wrote:
>>
>> Hi!
>>
>> You can always use the "rich" version of the function, for example the
>> "RichMapFunction". Inside that function, you can call "getRuntimeContext()",
>> which gives you access to many things, among them the partition number.
>>
>> Stephan
>>
>>
>> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <be...@gmail.com>
>> wrote:
>>>
>>> Dear Developers,
>>>
>>> Datasets are partitioned between machines. I wonder if there is a way
>>> to get some identifier of a partition. I see that the class
>>> HashPartition has a getPartitionNumber method, but I don't see how I
>>> could use this.
>>> (For example, I would like to see the partition identifier in a
>>> MapFunction, or in a MapPartitionFunction).
>>>
>>> Attila
>>
>>
>

Re: partition identifier

Posted by Stephan Ewen <se...@apache.org>.
Hey!

Here is a brief description how to use rich functions:
http://flink.incubator.apache.org/docs/0.7-incubating/programming_guide.html#passing-functions-to-flink

Greetings,
Stephan


On Wed, Dec 3, 2014 at 3:52 PM, Stephan Ewen <se...@apache.org> wrote:

> Hi!
>
> You can always use the "rich" version of the function, for example the
> "RichMapFunction". Inside that function, you can call
> "getRuntimeContext()", which gives you access to many things, among them
> the partition number.
>
> Stephan
>
>
> On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <be...@gmail.com>
> wrote:
>
>> Dear Developers,
>>
>> Datasets are partitioned between machines. I wonder if there is a way
>> to get some identifier of a partition. I see that the class
>> HashPartition has a getPartitionNumber method, but I don't see how I
>> could use this.
>> (For example, I would like to see the partition identifier in a
>> MapFunction, or in a MapPartitionFunction).
>>
>> Attila
>>
>
>

Re: partition identifier

Posted by Stephan Ewen <se...@apache.org>.
Hi!

You can always use the "rich" version of the function, for example the
"RichMapFunction". Inside that function, you can call
"getRuntimeContext()", which gives you access to many things, among them
the partition number.

Stephan


On Wed, Dec 3, 2014 at 3:49 PM, Attila Bernáth <be...@gmail.com>
wrote:

> Dear Developers,
>
> Datasets are partitioned between machines. I wonder if there is a way
> to get some identifier of a partition. I see that the class
> HashPartition has a getPartitionNumber method, but I don't see how I
> could use this.
> (For example, I would like to see the partition identifier in a
> MapFunction, or in a MapPartitionFunction).
>
> Attila
>