You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Sachin Mittal <sj...@gmail.com> on 2022/08/31 08:04:08 UTC

Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

Hello folks,
We are consuming from a kinesis stream in our beam application.
So far we were using *sdk.io <http://sdk.io>*.kinesis.KinesisIO to read
from kinesis.
What I understand is that this library does not use Enhanced Fan-Out
Consumer provided by AWS.

Recently I saw this library *sdk.io.aws2*.kinesis.KinesisIO and I wanted to
understand what is the purpose of this new KinesisIO to read from and write
to kinesis ?

Does this new library use Enhanced Fan-Out Consumer ?

When and how should we decide if we want to migrate to new library or
continue to use the current one ?

Thanks
Sachin

Re: Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

Posted by Pavel Solomin <p....@gmail.com>.
Just to add: If you want to try out a pre-release version, you can use
SNAPSHOT artifacts. In Maven pom.xml it would look like this:

    <repositories>
        <repository>
            <id>apache.snapshots</id>
            <name>Apache Development Snapshot Repository</name>
            <url>https://repository.apache.org/content/repositories/snapshots/</url>
            <releases>
                <enabled>false</enabled>
            </releases>
            <snapshots>
                <enabled>true</enabled>
            </snapshots>
        </repository>
    </repositories>

...

        <dependency>
            <groupId>org.apache.beam</groupId>
            <artifactId>beam-sdks-java-io-amazon-web-services2</artifactId>
            <version>2.48.0-SNAPSHOT</version>
        </dependency>

        < other beam sdk dependencies >

Best Regards,
Pavel Solomin

Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin
<https://www.linkedin.com/in/pavelsolomin>





On Mon, 22 May 2023 at 13:07, Moritz Mack <mm...@talend.com> wrote:

> Hi Sachin,
>
>
>
> Enhanced Fanout for Kinesis will be released with Beam 2.48.0. See the I/O
> section:
>
> https://github.com/apache/beam/blob/master/CHANGES.md#2480---unreleased
>
>
>
> Best,
>
> Moritz
>
>
>
> On 22.05.23, 14:00, "Sachin Mittal" <sj...@gmail.com> wrote:
>
>
>
> Hi, Can anyone tell me if enhanced fanout consumer for kinesis is released
> or not. If I see this PR https: //github. com/apache/beam/pull/23540 looks
> like this feature is merged. I also see this issue now closed https:
> //github. com/apache/beam/issues/19967However
>
> Hi,
>
> Can anyone tell me if enhanced fanout consumer for kinesis is released or
> not.
>
> If I see this PR
>
> https://github.com/apache/beam/pull/23540
> <https://urldefense.com/v3/__https:/github.com/apache/beam/pull/23540__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5y_aYgF9Q$>
> looks like this feature is merged.
>
> I also see this issue now closed
>
> https://github.com/apache/beam/issues/19967
> <https://urldefense.com/v3/__https:/github.com/apache/beam/issues/19967__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yFb_csSA$>
>
>
>
> However I am not able to find the mention of the same in release notes:
>
> https://github.com/apache/beam/releases/tag/v2.47.0
> <https://urldefense.com/v3/__https:/github.com/apache/beam/releases/tag/v2.47.0__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yjitVjHQ$>
>
>
>
> Any knowledge in this would be appreciated.
>
>
>
> Thanks
>
> Sachin
>
>
>
>
>
> On Mon, Sep 5, 2022 at 12:35 PM Moritz Mack <mm...@talend.com> wrote:
>
> Hi Sachin,
>
>
>
> I’d recommend migrating to the new AWS 2 IOs in
> beam-sdks-java-io-amazon-web-services2 (using Amazon’s Java SDK v2) rather
> soon.
>
> The previous ones (beam-sdks-java-io-amazon-web-services and
> beam-sdks-java-io-kinesis) are both deprecated and not actively maintained
> anymore.
>
>
>
> Please have a look at these notes in the changelog:
>
> https://github.com/apache/beam/blob/master/CHANGES.md#2410---2022-08-23
> <https://urldefense.com/v3/__https:/github.com/apache/beam/blob/master/CHANGES.md*2410---2022-08-23__;Iw!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5zG-ZfSOw$>
>
> https://github.com/apache/beam/blob/master/CHANGES.md#2380---2022-04-20
> <https://urldefense.com/v3/__https:/github.com/apache/beam/blob/master/CHANGES.md*2380---2022-04-20__;Iw!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5xH6M7dfw$>
>
>
>
> Currently, neither one supports enhanced fan-out consumers yet.  It’s
> certainly something we’d like to support for the new v2 modules, but I
> personally didn’t have time to start looking into it so far.
>
> https://github.com/apache/beam/issues/19967
> <https://urldefense.com/v3/__https:/github.com/apache/beam/issues/19967__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yFb_csSA$>
>
>
>
> Don’t hesitate to reach out in case you are facing any issues when
> migrating!
>
>
>
> Kind regards,
>
> Moritz
>
>
>
>
>
>
>
> On 31.08.22, 10:05, "Sachin Mittal" <sj...@gmail.com> wrote:
>
>
>
> Hello folks, We are consuming from a kinesis stream in our beam
> application. So far we were using sdk. io. kinesis. KinesisIO to read from
> kinesis. What I understand is that this library does not use Enhanced
> Fan-Out Consumer provided by AWS.
>
> Hello folks,
>
> We are consuming from a kinesis stream in our beam application.
>
> So far we were using *sdk.io
> <https://urldefense.com/v3/__http:/sdk.io__;!!CiXD_PY!VUENr73r_r9hNui_jnBewR2JWX7t2IwDXe6_FLFUq43GIFb3-uSIOC9DWmtR3XKaJzp3O-sg6N0iBA$>*.kinesis.KinesisIO
> to read from kinesis.
>
> What I understand is that this library does not use Enhanced Fan-Out
> Consumer provided by AWS.
>
>
>
> Recently I saw this library *sdk.io.aws2*.kinesis.KinesisIO and I
> wanted to understand what is the purpose of this new KinesisIO to read from
> and write to kinesis ?
>
>
>
> Does this new library use Enhanced Fan-Out Consumer ?
>
>
>
> When and how should we decide if we want to migrate to new library or
> continue to use the current one ?
>
>
>
> Thanks
>
> Sachin
>
>
>
>
>
> *As a recipient of an email from the Talend Group, your personal data will
> be processed by our systems. Please see our Privacy Notice
> <https://www.talend.com/privacy-policy/>*for more information about our
> collection and use of your personal information, our security practices,
> and your data protection rights, including any rights you may have to
> object to automated-decision making or profiling we use to analyze support
> or marketing related communications. To manage or discontinue promotional
> communications, use the communication preferences portal
> <https://info.talend.com/emailpreferencesen.html>. To exercise your data
> protection rights, use the privacy request form
> <https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>.
> Contact us here <https://www.talend.com/contact/>or by mail to either of
> our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, San
> Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 Suresnes,
> France
>

Re: Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

Posted by Moritz Mack <mm...@talend.com>.
Hi Sachin,

Enhanced Fanout for Kinesis will be released with Beam 2.48.0. See the I/O section:
https://github.com/apache/beam/blob/master/CHANGES.md#2480---unreleased

Best,
Moritz

On 22.05.23, 14:00, "Sachin Mittal" <sj...@gmail.com> wrote:

Hi, Can anyone tell me if enhanced fanout consumer for kinesis is released or not. If I see this PR https: //github. com/apache/beam/pull/23540 looks like this feature is merged. I also see this issue now closed https: //github. com/apache/beam/issues/19967However

Hi,
Can anyone tell me if enhanced fanout consumer for kinesis is released or not.
If I see this PR
https://github.com/apache/beam/pull/23540<https://urldefense.com/v3/__https:/github.com/apache/beam/pull/23540__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5y_aYgF9Q$> looks like this feature is merged.
I also see this issue now closed
https://github.com/apache/beam/issues/19967<https://urldefense.com/v3/__https:/github.com/apache/beam/issues/19967__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yFb_csSA$>

However I am not able to find the mention of the same in release notes:
https://github.com/apache/beam/releases/tag/v2.47.0<https://urldefense.com/v3/__https:/github.com/apache/beam/releases/tag/v2.47.0__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yjitVjHQ$>

Any knowledge in this would be appreciated.

Thanks
Sachin


On Mon, Sep 5, 2022 at 12:35 PM Moritz Mack <mm...@talend.com>> wrote:
Hi Sachin,

I’d recommend migrating to the new AWS 2 IOs in beam-sdks-java-io-amazon-web-services2 (using Amazon’s Java SDK v2) rather soon.
The previous ones (beam-sdks-java-io-amazon-web-services and beam-sdks-java-io-kinesis) are both deprecated and not actively maintained anymore.

Please have a look at these notes in the changelog:
https://github.com/apache/beam/blob/master/CHANGES.md#2410---2022-08-23<https://urldefense.com/v3/__https:/github.com/apache/beam/blob/master/CHANGES.md*2410---2022-08-23__;Iw!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5zG-ZfSOw$>
https://github.com/apache/beam/blob/master/CHANGES.md#2380---2022-04-20<https://urldefense.com/v3/__https:/github.com/apache/beam/blob/master/CHANGES.md*2380---2022-04-20__;Iw!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5xH6M7dfw$>

Currently, neither one supports enhanced fan-out consumers yet.  It’s certainly something we’d like to support for the new v2 modules, but I personally didn’t have time to start looking into it so far.
https://github.com/apache/beam/issues/19967<https://urldefense.com/v3/__https:/github.com/apache/beam/issues/19967__;!!CiXD_PY!UoiR6oywupQ0vxlUpxBE-F-bRravkguTQOPk-a7djBb24sHZZW7QL3i7V1-o6KXmsbexU5yFb_csSA$>

Don’t hesitate to reach out in case you are facing any issues when migrating!

Kind regards,
Moritz



On 31.08.22, 10:05, "Sachin Mittal" <sj...@gmail.com>> wrote:

Hello folks, We are consuming from a kinesis stream in our beam application. So far we were using sdk. io. kinesis. KinesisIO to read from kinesis. What I understand is that this library does not use Enhanced Fan-Out Consumer provided by AWS. 
Hello folks,
We are consuming from a kinesis stream in our beam application.
So far we were using sdk.io<https://urldefense.com/v3/__http:/sdk.io__;!!CiXD_PY!VUENr73r_r9hNui_jnBewR2JWX7t2IwDXe6_FLFUq43GIFb3-uSIOC9DWmtR3XKaJzp3O-sg6N0iBA$>.kinesis.KinesisIO to read from kinesis.
What I understand is that this library does not use Enhanced Fan-Out Consumer provided by AWS.

Recently I saw this library sdk.io.aws2.kinesis.KinesisIO and I wanted to understand what is the purpose of this new KinesisIO to read from and write to kinesis ?

Does this new library use Enhanced Fan-Out Consumer ?

When and how should we decide if we want to migrate to new library or continue to use the current one ?

Thanks
Sachin



As a recipient of an email from the Talend Group, your personal data will be processed by our systems. Please see our Privacy Notice <https://www.talend.com/privacy-policy/> for more information about our collection and use of your personal information, our security practices, and your data protection rights, including any rights you may have to object to automated-decision making or profiling we use to analyze support or marketing related communications. To manage or discontinue promotional communications, use the communication preferences portal<https://info.talend.com/emailpreferencesen.html>. To exercise your data protection rights, use the privacy request form<https://talend.my.onetrust.com/webform/ef906c5a-de41-4ea0-ba73-96c079cdd15a/b191c71d-f3cb-4a42-9815-0c3ca021704cl>. Contact us here <https://www.talend.com/contact/> or by mail to either of our co-headquarters: Talend, Inc.: 400 South El Camino Real, Ste 1400, San Mateo, CA 94402; Talend SAS: 5/7 rue Salomon De Rothschild, 92150 Suresnes, France

Re: Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

Posted by Sachin Mittal <sj...@gmail.com>.
Hi,
Can anyone tell me if enhanced fanout consumer for kinesis is released or
not.
If I see this PR
https://github.com/apache/beam/pull/23540 looks like this feature is merged.
I also see this issue now closed
https://github.com/apache/beam/issues/19967

However I am not able to find the mention of the same in release notes:
https://github.com/apache/beam/releases/tag/v2.47.0

Any knowledge in this would be appreciated.

Thanks
Sachin


On Mon, Sep 5, 2022 at 12:35 PM Moritz Mack <mm...@talend.com> wrote:

> Hi Sachin,
>
>
>
> I’d recommend migrating to the new AWS 2 IOs in
> beam-sdks-java-io-amazon-web-services2 (using Amazon’s Java SDK v2) rather
> soon.
>
> The previous ones (beam-sdks-java-io-amazon-web-services and
> beam-sdks-java-io-kinesis) are both deprecated and not actively maintained
> anymore.
>
>
>
> Please have a look at these notes in the changelog:
>
> https://github.com/apache/beam/blob/master/CHANGES.md#2410---2022-08-23
>
> https://github.com/apache/beam/blob/master/CHANGES.md#2380---2022-04-20
>
>
>
> Currently, neither one supports enhanced fan-out consumers yet.  It’s
> certainly something we’d like to support for the new v2 modules, but I
> personally didn’t have time to start looking into it so far.
>
> https://github.com/apache/beam/issues/19967
>
>
>
> Don’t hesitate to reach out in case you are facing any issues when
> migrating!
>
>
>
> Kind regards,
>
> Moritz
>
>
>
>
>
>
>
> On 31.08.22, 10:05, "Sachin Mittal" <sj...@gmail.com> wrote:
>
>
>
> Hello folks, We are consuming from a kinesis stream in our beam
> application. So far we were using sdk. io. kinesis. KinesisIO to read from
> kinesis. What I understand is that this library does not use Enhanced
> Fan-Out Consumer provided by AWS.
>
> Hello folks,
>
> We are consuming from a kinesis stream in our beam application.
>
> So far we were using *sdk.io
> <https://urldefense.com/v3/__http:/sdk.io__;!!CiXD_PY!VUENr73r_r9hNui_jnBewR2JWX7t2IwDXe6_FLFUq43GIFb3-uSIOC9DWmtR3XKaJzp3O-sg6N0iBA$>*.kinesis.KinesisIO
> to read from kinesis.
>
> What I understand is that this library does not use Enhanced Fan-Out
> Consumer provided by AWS.
>
>
>
> Recently I saw this library *sdk.io.aws2*.kinesis.KinesisIO and I
> wanted to understand what is the purpose of this new KinesisIO to read from
> and write to kinesis ?
>
>
>
> Does this new library use Enhanced Fan-Out Consumer ?
>
>
>
> When and how should we decide if we want to migrate to new library or
> continue to use the current one ?
>
>
>
> Thanks
>
> Sachin
>
>
>
>
>

Re: Difference between sdk.io.aws2.kinesis.KinesisIO vs sdk.io.kinesis.KinesisIO

Posted by Moritz Mack <mm...@talend.com>.
Hi Sachin,

I’d recommend migrating to the new AWS 2 IOs in beam-sdks-java-io-amazon-web-services2 (using Amazon’s Java SDK v2) rather soon.
The previous ones (beam-sdks-java-io-amazon-web-services and beam-sdks-java-io-kinesis) are both deprecated and not actively maintained anymore.

Please have a look at these notes in the changelog:
https://github.com/apache/beam/blob/master/CHANGES.md#2410---2022-08-23
https://github.com/apache/beam/blob/master/CHANGES.md#2380---2022-04-20

Currently, neither one supports enhanced fan-out consumers yet.  It’s certainly something we’d like to support for the new v2 modules, but I personally didn’t have time to start looking into it so far.
https://github.com/apache/beam/issues/19967

Don’t hesitate to reach out in case you are facing any issues when migrating!

Kind regards,
Moritz



On 31.08.22, 10:05, "Sachin Mittal" <sj...@gmail.com> wrote:

Hello folks, We are consuming from a kinesis stream in our beam application. So far we were using sdk. io. kinesis. KinesisIO to read from kinesis. What I understand is that this library does not use Enhanced Fan-Out Consumer provided by AWS. 

Hello folks,
We are consuming from a kinesis stream in our beam application.
So far we were using sdk.io<https://urldefense.com/v3/__http:/sdk.io__;!!CiXD_PY!VUENr73r_r9hNui_jnBewR2JWX7t2IwDXe6_FLFUq43GIFb3-uSIOC9DWmtR3XKaJzp3O-sg6N0iBA$>.kinesis.KinesisIO to read from kinesis.
What I understand is that this library does not use Enhanced Fan-Out Consumer provided by AWS.

Recently I saw this library sdk.io.aws2.kinesis.KinesisIO and I wanted to understand what is the purpose of this new KinesisIO to read from and write to kinesis ?

Does this new library use Enhanced Fan-Out Consumer ?

When and how should we decide if we want to migrate to new library or continue to use the current one ?

Thanks
Sachin



Re: Can we shutdown a pipeline based on some condition

Posted by Sachin Mittal <sj...@gmail.com>.
Hi Pavel,
I have tried the setting you have suggested but this does not seem to work.
Looks like this setting bounds the unbounded stream till the specified time
and then closes the stream and it is only after that the records are
applied downstream.

In my case withMaxReadTime = 3 minutes

Logs are like this:
------------------------------------------------------------------------
13:06:38.267 [direct-runner-worker] INFO a.p.f.b.c.k.ShardReadersPool -
Starting to read test-batch stream from [shardId-000000000000] shards
13:09:39.002 [direct-runner-worker] INFO a.p.f.b.c.k.ShardReadersPool -
Closing shard iterators pool
13:09:42.632 [pool-16-thread-1] INFO a.p.f.b.c.k.ShardReadersPool - Kinesis
Shard read loop has finished


As you can see, it reads all the records and then closes the reader and
only when the read loop is finished, the downstream operators process these
records further.

So if downstream I again right to the same Kinesis stream, it is never read
as the loop is closed.

I guess I may still need the unbounded stream so it continues to listen to
any fresh records added to the stream, but I would like the beam
application to shutdown after a specified time.

Would some like this work:

pipeline.run().waitUntilFinish(Duration.standardMinutes(3));

This will run the pipeline for 3 minutes and then shuts it down and by
then it has computed all the derived data.


Thanks

Sachin



On Thu, May 4, 2023 at 11:26 PM Pavel Solomin <p....@gmail.com> wrote:

> Hello!
>
> In case of KinesisIO there is a param which you can set - withMaxReadTime
> I think many IOs implement it. It does not check if there is still some
> outstanding data, simply finishes the pipeline when this time is over.
>
> Best Regards,
> Pavel Solomin
>
> Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin
> <https://www.linkedin.com/in/pavelsolomin>
>
>
>
>
>
> On Thu, 4 May 2023 at 18:20, Sachin Mittal <sj...@gmail.com> wrote:
>
>> Hi,
>> I am kind of building a batch/streaming hybrid beam application.
>> Data is fed into a kinesis stream and the beam pipeline is run.
>>
>> I want to stop the pipeline if no new data is fed into the stream for a
>> certain period of time, say 5 minutes.
>>
>> Is there a way of achieving this ?
>>
>> Right now I only see something like this:
>>
>> pipeline.run().waitUntilFinish();
>>
>> Instead of waiting until finish can we have some conditional finish in
>> any way ?
>>
>> Thanks
>> Sachin
>>
>

Re: Can we shutdown a pipeline based on some condition

Posted by Pavel Solomin <p....@gmail.com>.
Hello!

In case of KinesisIO there is a param which you can set - withMaxReadTime
I think many IOs implement it. It does not check if there is still some
outstanding data, simply finishes the pipeline when this time is over.

Best Regards,
Pavel Solomin

Tel: +351 962 950 692 | Skype: pavel_solomin | Linkedin
<https://www.linkedin.com/in/pavelsolomin>





On Thu, 4 May 2023 at 18:20, Sachin Mittal <sj...@gmail.com> wrote:

> Hi,
> I am kind of building a batch/streaming hybrid beam application.
> Data is fed into a kinesis stream and the beam pipeline is run.
>
> I want to stop the pipeline if no new data is fed into the stream for a
> certain period of time, say 5 minutes.
>
> Is there a way of achieving this ?
>
> Right now I only see something like this:
>
> pipeline.run().waitUntilFinish();
>
> Instead of waiting until finish can we have some conditional finish in any
> way ?
>
> Thanks
> Sachin
>

Can we shutdown a pipeline based on some condition

Posted by Sachin Mittal <sj...@gmail.com>.
Hi,
I am kind of building a batch/streaming hybrid beam application.
Data is fed into a kinesis stream and the beam pipeline is run.

I want to stop the pipeline if no new data is fed into the stream for a
certain period of time, say 5 minutes.

Is there a way of achieving this ?

Right now I only see something like this:

pipeline.run().waitUntilFinish();

Instead of waiting until finish can we have some conditional finish in any
way ?

Thanks
Sachin