You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@beam.apache.org by Ramya Prasad via user <us...@beam.apache.org> on 2023/12/22 16:28:00 UTC

[Question] WaitOn for Reading Step

Hello,

I am a developer trying to use Apache Beam, and I am running into an issue
where my WaitOn step is not working as expected. I want my pipeline to read
all the data from an S3 bucket using ParquetIO before moving on to the rest
of the steps in my pipeline. However, I see in my DAG that even though
there is a collect step after all the data is being read in, my pipeline
still reads from S3 in the subsequent steps. It appears that the Wait.on is
not actually happening. Is it even possible to wait on a read step? This is
what my code looks like:

PCollection<GenericRecord> records = pipeline.apply("Read parquet file
in as Generic Records",
ParquetIO.read(finalSchema).from(beamReadPath).withConfiguration(configuration));
PCollection<GenericRecord> recordsWaited = records
        .apply("Waiting on Read Parquet File",
Wait.on(records)).setCoder(AvroCoder.of(GenericRecord.class,
finalSchema));
{Processing of rest of data subsequently}



Any help would be greatly appreciated, thanks!

Sincerely,
Ramya

______________________________________________________________________



The information contained in this e-mail may be confidential and/or proprietary to Capital One and/or its affiliates and may only be used solely in performance of work or services for Capital One. The information transmitted herewith is intended only for use by the individual or entity to which it is addressed. If the reader of this message is not the intended recipient, you are hereby notified that any review, retransmission, dissemination, distribution, copying or other use of, or taking of any action in reliance upon this information is strictly prohibited. If you have received this communication in error, please contact the sender and delete the material from your computer.




Re: [Question] WaitOn for Reading Step

Posted by XQ Hu via user <us...@beam.apache.org>.
When I search the Beam code base, there are plenty of places which
use Wait.on. You could check these code for some insights.
If this doesn't work, it would be better to create a small test case to
reproduce the problem and open a Github issue.
Sorry, I cannot help too much with this.

On Fri, Dec 22, 2023 at 11:28 AM Ramya Prasad via user <us...@beam.apache.org>
wrote:

> Hello,
>
> I am a developer trying to use Apache Beam, and I am running into an issue
> where my WaitOn step is not working as expected. I want my pipeline to read
> all the data from an S3 bucket using ParquetIO before moving on to the rest
> of the steps in my pipeline. However, I see in my DAG that even though
> there is a collect step after all the data is being read in, my pipeline
> still reads from S3 in the subsequent steps. It appears that the Wait.on is
> not actually happening. Is it even possible to wait on a read step? This is
> what my code looks like:
>
> PCollection<GenericRecord> records = pipeline.apply("Read parquet file in as Generic Records", ParquetIO.read(finalSchema).from(beamReadPath).withConfiguration(configuration));
> PCollection<GenericRecord> recordsWaited = records
>         .apply("Waiting on Read Parquet File", Wait.on(records)).setCoder(AvroCoder.of(GenericRecord.class, finalSchema));
> {Processing of rest of data subsequently}
>
>
>
> Any help would be greatly appreciated, thanks!
>
> Sincerely,
> Ramya
> ------------------------------
>
> The information contained in this e-mail may be confidential and/or
> proprietary to Capital One and/or its affiliates and may only be used
> solely in performance of work or services for Capital One. The information
> transmitted herewith is intended only for use by the individual or entity
> to which it is addressed. If the reader of this message is not the intended
> recipient, you are hereby notified that any review, retransmission,
> dissemination, distribution, copying or other use of, or taking of any
> action in reliance upon this information is strictly prohibited. If you
> have received this communication in error, please contact the sender and
> delete the material from your computer.
>
>
>
>
>