You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 01:18:12 UTC
[GitHub] [beam] damccorm opened a new issue, #21690: Exception from ElasticSearch Write Module in DataFlow
damccorm opened a new issue, #21690:
URL: https://github.com/apache/beam/issues/21690
I am seeing the following exception being thrown from the ElasticSearch Write module when running my streaming pipeline from Google DataFlow. I am using Apache Beam 2.39.0. I hope someone will be able to help me or tell me if this is a bug. Thanks a lot.
```
2022-06-02T02:21:07.023945713ZError message from worker: java.lang.ClassCastException: class org.apache.beam.sdk.transforms.windowing.GlobalWindow
cannot be cast to class org.apache.beam.sdk.transforms.windowing.IntervalWindow (org.apache.beam.sdk.transforms.windowing.GlobalWindow
and org.apache.beam.sdk.transforms.windowing.IntervalWindow are in unnamed module of loader 'app') org.apache.beam.sdk.transforms.windowing.IntervalWindow$IntervalWindowCoder.registerByteSizeObserver(IntervalWindow.java:142)
org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:209) org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:59)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:640)
org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:558)
org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:403)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:128)
org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:67)
org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
org.apache.beam.runners.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:285) org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:251)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:85)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnFinishBundleArgumentProvider$Context.output(SimpleDoFnRunner.java:309)
org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnFinishBundleArgumentProvider$Context.output(SimpleDoFnRunner.java:304)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn$FinishBundleContextAdapter.output(ElasticsearchIO.java:2415)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.flushAndOutputResults(ElasticsearchIO.java:2422)
org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.finishBundle(ElasticsearchIO.java:2382)
```
Imported from Jira [BEAM-14551](https://issues.apache.org/jira/browse/BEAM-14551). Original Jira may contain additional context.
Reported by: Declan.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] aaltay commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155952571
+1 to reverting and not blocking the release.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] aaltay commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155831062
Could we move this to 2.41.0? Does this need to be a blocker for 2.40.0 ?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] kennknowles closed issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
kennknowles closed issue #21690: Exception from ElasticSearch Write Module in DataFlow
URL: https://github.com/apache/beam/issues/21690
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] kennknowles commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155276525
Nice! Thanks for figuring it out.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156690551
That’s a good point. I’ve honestly debated whether or not an IO ought to preserve windows at all given that responses from a Sink could be viewed as effectively a new Source. But I suppose preserving windows could be useful for joining data sent to a sink and data returned from that sink via CoGroupByKey.
Back to your original idea of just using the windowing function, I’ll give that a try. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] kennknowles commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155277141
Does #17112 completely solve it? Can it be closed?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] steveniemitz commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
steveniemitz commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156598013
Do we actually need to even bother preserving the "extra stuff" on the windowing configuration? Given that the output is already passed through a buffer, does a user care about triggering/lateness at all at that point? Maybe it's good enough to just restore the window function, which can be done w/o needing `setWindowingStrategyInternal`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157031723
I don’t know what all the decision criteria are for which is preferred. The proposed fix definitely corrects the error observed in Dataflow. I believe the fix is merely correcting a misunderstanding on my part that should have been included originally with #17112.
Reverting will reintroduce a potential data correctness bug. But I fully concede if the fix is deemed to simply not have had enough time for review and further testing.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156593434
Thanks @steveniemitz , that very well could be the issue! I definitely need some guidance. I had seen in other places in the codebase that `Flatten` was used to explicitly force the creation of a new PCollection. My original idea was to write a test so that I could play around with `setWindowingStrategyInternal` Vs. `Window.into` Vs. `Flatten`/`PCollection.createPrimitiveOutputInternal` but I wasn't able to reproduce this issue in testing.
Any thoughts on alternative methods of reapplying the same windowing without using `setWindowingStrategyInternal`?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156955964
ok for 2.40.0 what is the preferred course of action?
- Revert - https://github.com/apache/beam/pull/21884
or
- What seems to be a fix - https://github.com/apache/beam/pull/21895
Are we confident https://github.com/apache/beam/pull/21895 is a fix?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155894953
The ElasticsearchIO connector is completely broken in 2.39.0 as a result of this issue. I would say it should be a blocker to either fix this or revert https://github.com/apache/beam/pull/17112 and try again to fix the issue that it was meant to patch.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157000492
I think I will proceed with Revert if I can't get a confirmation that the fix will be the preferred route : /
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157042645
ok I'll cut the branch without either and we can resolve in the coming days. Thanks @egalpin
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157069718
Thanks @pabloem , apologies for the extra work that entails as release manager 🙏
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1152742725
I was able to repro on Dataflow today. I would guess this issue is widespread for any pipeline with windowing and Sinking data to ES. This is a regression introduced by https://github.com/apache/beam/pull/17112
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155289750
@kennknowles Sorry my comment may have been unclear. I don't know the approach to fix this issue yet, rather I believe this issue was introduced by https://github.com/apache/beam/pull/17112. I could actually use some guidance by anyone with in depth knowledge of windowing.
So far I'm working with the assumption that applying global windowing[1] to an input with non-global windowing results in a PCollection with mismatched WindowCoder[2]? I'm not able to reproduce this issue with a test yet, including a test that applies FixedWindows before running the `Write` transform.
[1] https://github.com/apache/beam/pull/17112/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR2275
[2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L571-L575
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] steveniemitz commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
steveniemitz commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156548416
is the issue that expand() uses [setWindowingStrategyInternal]([https://github.com/apache/beam/blob/8bf289cb9af4bc44d93ae43205ac372fa79268ea/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L2287) ? This doesn't create a new PCollection so in theory "propegates back" to the producer (your batching DoFn) and ends up with a mismatched coder. (I think at least?)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155305796
I'll attach some original context as well:
ES Write was found to be outputting data to incorrect windows(introduced by [#15381](
[1] https://github.com/apache/beam/pull/15381/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR2363)): https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
And pseudo-code proposal of a potential fix, which I very well may have bastardized: https://lists.apache.org/thread/fnrlw7rkcg4of5xyc7s43mwz00dgqf72 (attempted to introduce the idea [here](https://github.com/apache/beam/pull/17112/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR2272-R2287))
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156747294
Just confirmed that @steveniemitz suggestion of using the WindowFn + Window.into rather than `setWindowingStrategyInternal` corrects the error described here. I must admit I don't fully grasp _why_ yet (following up on @steveniemitz's comment RE `setWindowingStrategyInternal` not creating a new PCollection for my own understanding).
Opened a PR at https://github.com/apache/beam/pull/21895 as an alternative to reverting.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1149068930
I'm not sure how to reach the original ticket reporter via github username, but I'd like to get more details about how the pipeline was being run (i.e. which runner? x-lang? portable?) and what input windowing was being used. Thanks!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157070068
aw no worries! I appreciate your help with this fix.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow
Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157623959
Ok so I did a little more digging for the sake of my own understanding and I think I have a much better grasp of what `setWindowingStrategyInternal` is used for. It seems to be intended simply for cases where the programmer knows that the data contained in a new PCollection has a specific the same WindowingStrategy as a prior PCollection i.e. the data type and window type are known to be the same in a new PCollection as they were in a previously known PCollection. Ex in Window.into[1].
`setWindowingStrategyInternal` is not meant to be used to _actively restore_ i.e. re-apply a WindowFn, rather it's used to indicate that a previously applied WindowFn is already suitable.
I feel I have better clarity on why re-applying (i.e. `Window.into(<priorWindowFn>)`) resolves this issue. I would like to recommend that the fix in https://github.com/apache/beam/pull/21895 be the preferred path forward.
[1] https://github.com/apache/beam/blob/29f30b3605824a47cbc68229f63af44a34cf0885/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java#L404-L408
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org