You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by "devinduan (段丁瑞)" <de...@tencent.com> on 2018/09/11 13:46:17 UTC

How to implement repartition.

Hi all:
    I recently start studying the Beam on spark runner.
    I want to implement a method repartition similar to Spark rdd.repartition() , but I can't find a solution.
    Could anyone help me?
    Thanks for your reply.
devin.

Re: How to implement repartition.(Internet mail)

Posted by Robert Bradshaw <ro...@google.com>.
Reshuffle is deprecated for requiring stable input (currently being added
as a separate transform), but is perfectly fine for just "reshuffling."
There is currently no way to set the partition number though, how important
is that?

On Wed, Sep 12, 2018 at 6:46 AM devinduan(段丁瑞) <de...@tencent.com>
wrote:

>   Thanks for your reply.
>     But *Reshuffle* Class has no param to set.
>
>     I see the code of Reshuffle ,  constructor for this class is private,
> and code comment  "For internal use only; no backwards compatibility
> guarantees"
>     I mean... I want to set rdd partition number like *rdd.repartition(3)*
>  or simliar to Flink *DataStream.setParallelism(3) * .
>     Could you help me...
> devin.
>
>
>
> *From:* Robert Bradshaw <ro...@google.com>
> *Date:* 2018-09-11 21:50
> *To:* dev@beam.apache.org
> *Subject:* Re: How to implement repartition.(Internet mail)
> Does Reshuffle do what you want?
>
> On Tue, Sep 11, 2018, 3:46 PM devinduan(段丁瑞) <de...@tencent.com>
> wrote:
>
>> Hi all:
>>     I recently start studying the Beam on spark runner.
>>     I want to implement a method *repartition* similar to Spark
>> *rdd.repartition()* , but I can't find a solution.
>>     Could anyone help me?
>>     Thanks for your reply.
>> devin.
>>
>

Re: How to implement repartition.(Internet mail)

Posted by "devinduan (段丁瑞)" <de...@tencent.com>.
  Thanks for your reply.
    But Reshuffle Class has no param to set.
    [cid:_Foxmail.1@f6ac2f58-8523-342c-5007-aebee3d95122]
    I see the code of Reshuffle ,  constructor for this class is private, and code comment  "For internal use only; no backwards compatibility guarantees"
    I mean... I want to set rdd partition number like rdd.repartition(3)  or simliar to Flink DataStream.setParallelism(3) .
    Could you help me...
devin.



From: Robert Bradshaw<ma...@google.com>
Date: 2018-09-11 21:50
To: dev@beam.apache.org<ma...@beam.apache.org>
Subject: Re: How to implement repartition.(Internet mail)
Does Reshuffle do what you want?

On Tue, Sep 11, 2018, 3:46 PM devinduan(段丁瑞) <de...@tencent.com>> wrote:
Hi all:
    I recently start studying the Beam on spark runner.
    I want to implement a method repartition similar to Spark rdd.repartition() , but I can't find a solution.
    Could anyone help me?
    Thanks for your reply.
devin.

Re: How to implement repartition.

Posted by Robert Bradshaw <ro...@google.com>.
Does Reshuffle do what you want?

On Tue, Sep 11, 2018, 3:46 PM devinduan(段丁瑞) <de...@tencent.com> wrote:

> Hi all:
>     I recently start studying the Beam on spark runner.
>     I want to implement a method *repartition* similar to Spark
> *rdd.repartition()* , but I can't find a solution.
>     Could anyone help me?
>     Thanks for your reply.
> devin.
>