You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "vishwesh0409 (via GitHub)" <gi...@apache.org> on 2023/07/19 07:21:16 UTC

[GitHub] [beam] vishwesh0409 opened a new issue, #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

vishwesh0409 opened a new issue, #27165:
URL: https://github.com/apache/beam/issues/27165

   ### What happened? 
   
   "Error message from worker: generic::unknown: org.apache.beam.sdk.util.UserCodeException: java.lang.NullPointerException: Cannot invoke "org.apache.beam.sdk.io.kinesis.ShardReadersPool.getLatestRecordTimestamp()" because "this.shardReadersPool" is null
   	org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:39)
   	org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn$DoFnInvoker.invokeSplitRestriction(Unknown Source)
   	org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForSplitRestriction(FnApiDoFnRunner.java:887)
   	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:313)
   	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:245)
   	org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1788)
   	org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForPairWithRestriction(FnApiDoFnRunner.java:824)
   	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:313)
   	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:245)
   	org.apache.beam.fn.harness.FnApiDoFnRunner.outputTo(FnApiDoFnRunner.java:1788)
   	org.apache.beam.fn.harness.FnApiDoFnRunner.access$3000(FnApiDoFnRunner.java:142)
   	org.apache.beam.fn.harness.FnApiDoFnRunner$NonWindowObservingProcessBundleContext.output(FnApiDoFnRunner.java:2506)
   	org.apache.beam.sdk.io.Read$OutputSingleSource.processElement(Read.java:1034)
   	org.apache.beam.sdk.io.Read$OutputSingleSource$DoFnInvoker.invokeProcessElement(Unknown Source)
   	org.apache.beam.fn.harness.FnApiDoFnRunner.processElementForParDo(FnApiDoFnRunner.java:799)
   	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:313)
   	org.apache.beam.fn.harness.data.PCollectionConsumerRegistry$MetricTrackingFnDataReceiver.accept(PCollectionConsumerRegistry.java:245)
   	org.apache.beam.fn.harness.BeamFnDataReadRunner.forwardElementToConsumer(BeamFnDataReadRunner.java:213)
   	org.apache.beam.sdk.fn.data.BeamFnDataInboundObserver.multiplexElements(BeamFnDataInboundObserver.java:158)
   	org.apache.beam.fn.harness.control.ProcessBundleHandler.processBundle(ProcessBundleHandler.java:537)
   	org.apache.beam.fn.harness.control.BeamFnControlClient.delegateOnInstructionRequestType(BeamFnControlClient.java:151)
   	org.apache.beam.fn.harness.control.BeamFnControlClient$InboundObserver.lambda$onNext$0(BeamFnControlClient.java:116)
   	java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539)
   	java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
   	org.apache.beam.sdk.util.UnboundedScheduledExecutorService$ScheduledFutureTask.run(UnboundedScheduledExecutorService.java:163)
   	java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
   	java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
   	java.base/java.lang.Thread.run(Thread.java:833)
   Caused by: java.lang.NullPointerException: Cannot invoke "org.apache.beam.sdk.io.kinesis.ShardReadersPool.getLatestRecordTimestamp()" because "this.shardReadersPool" is null
   	org.apache.beam.sdk.io.kinesis.KinesisReader.getSplitBacklogBytes(KinesisReader.java:172)
   	org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn$UnboundedSourceAsSDFRestrictionTracker.getProgress(Read.java:999)
   	org.apache.beam.sdk.transforms.reflect.ByteBuddyDoFnInvokerFactory$DefaultGetSize.invokeGetSize(ByteBuddyDoFnInvokerFactory.java:432)
   	org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn$DoFnInvoker.invokeGetSize(Unknown Source)
   	org.apache.beam.fn.harness.FnApiDoFnRunner$SizedRestrictionNonWindowObservingProcessBundleContext.output(FnApiDoFnRunner.java:2415)
   	org.apache.beam.sdk.io.Read$UnboundedSourceAsSDFWrapperFn.splitRestriction(Read.java:540)
   
   ### Issue Priority
   
   Priority: 3 (minor)
   
   ### Issue Components
   
   - [X] Component: Python SDK
   - [ ] Component: Java SDK
   - [ ] Component: Go SDK
   - [ ] Component: Typescript SDK
   - [X] Component: IO connector
   - [ ] Component: Beam examples
   - [ ] Component: Beam playground
   - [ ] Component: Beam katas
   - [ ] Component: Website
   - [ ] Component: Spark Runner
   - [ ] Component: Flink Runner
   - [ ] Component: Samza Runner
   - [ ] Component: Twister2 Runner
   - [ ] Component: Hazelcast Jet Runner
   - [X] Component: Google Cloud Dataflow Runner


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1662850580

   2.50 release manager here.
   This issue is currently tagged for the 2.50.0 release, which cuts in a week on August 9th.
   
   Please complete work and get it into the main branch in that time, or move this issue to the 2.51 Milestone: https://github.com/apache/beam/milestone/15


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] benvit92 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1711330613

   @Abacn  @lostluck but then if an issue is open and neither who opened it nor who is participating in the discussion has the knowledge to fix it will the issue be forgotten or someone in the community will pick it up?
   Just trying to understand if there will be any resolution to this or if it will just stick around 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner [beam]

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1893534616

   @mxm it should affect all runners since the issue is at source code level, the core issue I tried to explain in more details here https://github.com/apache/beam/issues/27165#issuecomment-1644183789


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] benvit92 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1644183789

   ok I am gonna structure my thought a little bit better here :)
   
   So there are 2 `KinesisReader.java` around, one v1 and one v2, and as far as I can tell `KinesisTransformRegistrar.java` used to map the Python interface to the source Java package is relying on the v1 one. 
   
   https://github.com/apache/beam/pull/26953 fixed the issues in the `KinesisReader.java` v2 but not in the v1 and as a test I myself cloned master and applied the same fix on the v1 files and after that, the job runs without error.
   
   I do encounter another issue though which is then when running on my laptop using the `DirectRunner` nothing happens when it comes to reading data even if I have data in the source stream and I am using the option `InitialPositionInStream.TRIM_HORIZON`, so the job runs but no data are read.
   
   I also tried to make the KinesisTransformRegistrar.java` point to v2 as the job does get deprecation warnings as v1 seems to be deprecated but my understanding of the project is not enough to make it work right now (plus my Java is a bit rusty 😓 ).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] Abacn commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "Abacn (via GitHub)" <gi...@apache.org>.
Abacn commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1663177003

   this was due to issue get reopened and the milestone not removed


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner [beam]

Posted by "nahplay (via GitHub)" <gi...@apache.org>.
nahplay commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1902256154

   The same issue on my side.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] benvit92 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1644072646

   also @vishwesh0409, if I implement that, fix myself in my copy of mater, and then rerun the job then it does not fail anymore however when running on `DirectRunner` on my laptop no data are read when using `InitialPositionInStream.TRIM_HORIZON` and after verifying there are data in the Kinesis Stream, am I doing something wrong there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] vishwesh0409 closed issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "vishwesh0409 (via GitHub)" <gi...@apache.org>.
vishwesh0409 closed issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner
URL: https://github.com/apache/beam/issues/27165


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] benvit92 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1640122349

   Hello @vishwesh0409, what was the fix that was applied for this ? Tried to look for it but did not find it 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] vishwesh0409 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "vishwesh0409 (via GitHub)" <gi...@apache.org>.
vishwesh0409 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1641547883

   > Hello @vishwesh0409, what was the fix that was applied for this ? Tried to look for it but did not find it 
   
   I'll reopen this. I thought it was fixed by #26953 but after updating the beam version to 2.49.0, I'm getting the same problem. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] benvit92 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1641933062

   > > Hello @vishwesh0409, what was the fix that was applied for this ? Tried to look for it but did not find it
   > 
   > I'll reopen this. I thought it was fixed by #26953 but after updating the beam version to 2.49.0, I'm getting the same problem.
   
   Yep, also tested this in the current master branch status and getting the same error when running a test job locally so def an issue IMO


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] lostluck commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "lostluck (via GitHub)" <gi...@apache.org>.
lostluck commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1711953973

   As an open source project, the primary means of fixes is via community and user involvement. As a rule, no one can expect someone else to fix an issue just because it's filed.
   
   The story changes somewhat if judged to be  a P1 or P0, a regression from the previous release version, or if it's affecting a large Dataflow customer, at which point it's likely that someone from the Dataflow team will drive it yo resolution. That does require the customer to have a GCP support ticket in place as well.
   
   P3s are typically not release blockers, so it's more likely this remains in the backlog until someone motivated to fix it it comes along. Beam welcomes contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner [beam]

Posted by "mmxgn (via GitHub)" <gi...@apache.org>.
mmxgn commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1831859827

   This issue seems to also affect FlinkRunner.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] benvit92 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "benvit92 (via GitHub)" <gi...@apache.org>.
benvit92 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1647906029

   Basically, there is a need to develop a new `KinesisTransformRegistrar.java` based on the `aws2` kinesis package instead of the current one and expose a new python transform URN which would be `v2`, hopefully that should enable the python flow to work as expected with the latest modules 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] vishwesh0409 commented on issue #27165: [Bug]: When using beam.io.kinesis.ReadDataFromKinesis, a java error is raised in DataFlowRunner

Posted by "vishwesh0409 (via GitHub)" <gi...@apache.org>.
vishwesh0409 commented on issue #27165:
URL: https://github.com/apache/beam/issues/27165#issuecomment-1645065867

   > 
   
   Looks like the KinesisReader modules are not properly fixed yet. I observed the same deprication warnings when I tried to run my code on Google Cloud. It should be a simple enough fix tho.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org