You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by "Truebody, Kyle" <Tr...@DNB.com> on 2019/11/07 13:51:47 UTC

Sharding in apache beam

Hi all,

Is it possible to update a Write IO's shard count once the job has already started.  I am asking because I would like to be able to increase or decrease the shard count base on some Stream producers put or get metrics.
I read in the java docs that if method  withNumShards() has argument of zero then beam will decide the value, does anyone know if that value decided by beam will increase or decrease base on demand.

Thanks in advance

Kyle

Re: Sharding in apache beam

Posted by Eugene Kirpichov <jk...@google.com>.
I think this might help you:

(in FileIO.write())
    public Write<DestinationT, UserT> withSharding(
        PTransform<PCollection<UserT>, PCollectionView<Integer>> sharding) {

So as long as you can write such a transform, you can control the sharding
"dynamically".

On Thu, Nov 7, 2019 at 11:19 AM Luke Cwik <lc...@google.com> wrote:

> By not specifying an explicit number of shards the runner has the
> capability to choose the number of shards dynamically but I'm not aware of
> being able to tell the runner dynamically how many shards it should use.
>
> I know that Dataflow will scale up/down the number of shards based upon
> the amount of data in the pipeline and the current throughput.
>
> On Thu, Nov 7, 2019 at 5:52 AM Truebody, Kyle <Tr...@dnb.com> wrote:
>
>> Hi all,
>>
>>
>>
>> Is it possible to update a Write IO’s shard count once the job has
>> already started.  I am asking because I would like to be able to increase
>> or decrease the shard count base on some Stream producers put or get
>> metrics.
>>
>> I read in the java docs that if method  *withNumShards() * has argument
>> of zero then beam will decide the value*, *does anyone know if that
>> value decided by beam will increase or decrease base on demand.
>>
>>
>>
>> Thanks in advance
>>
>>
>>
>> Kyle
>>
>

Re: Sharding in apache beam

Posted by Luke Cwik <lc...@google.com>.
By not specifying an explicit number of shards the runner has the
capability to choose the number of shards dynamically but I'm not aware of
being able to tell the runner dynamically how many shards it should use.

I know that Dataflow will scale up/down the number of shards based upon the
amount of data in the pipeline and the current throughput.

On Thu, Nov 7, 2019 at 5:52 AM Truebody, Kyle <Tr...@dnb.com> wrote:

> Hi all,
>
>
>
> Is it possible to update a Write IO’s shard count once the job has
> already started.  I am asking because I would like to be able to increase
> or decrease the shard count base on some Stream producers put or get
> metrics.
>
> I read in the java docs that if method  *withNumShards() * has argument
> of zero then beam will decide the value*, *does anyone know if that value
> decided by beam will increase or decrease base on demand.
>
>
>
> Thanks in advance
>
>
>
> Kyle
>