You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/05 01:18:12 UTC

[GitHub] [beam] damccorm opened a new issue, #21690: Exception from ElasticSearch Write Module in DataFlow

damccorm opened a new issue, #21690:
URL: https://github.com/apache/beam/issues/21690

   I am seeing the following exception being thrown from the ElasticSearch Write module when running my streaming pipeline from Google DataFlow. I am using Apache Beam 2.39.0. I hope someone will be able to help me or tell me if this is a bug. Thanks a lot.
   
    
   ```
   
   2022-06-02T02:21:07.023945713ZError message from worker: java.lang.ClassCastException: class org.apache.beam.sdk.transforms.windowing.GlobalWindow
   cannot be cast to class org.apache.beam.sdk.transforms.windowing.IntervalWindow (org.apache.beam.sdk.transforms.windowing.GlobalWindow
   and org.apache.beam.sdk.transforms.windowing.IntervalWindow are in unnamed module of loader 'app') org.apache.beam.sdk.transforms.windowing.IntervalWindow$IntervalWindowCoder.registerByteSizeObserver(IntervalWindow.java:142)
   org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:209) org.apache.beam.sdk.coders.IterableLikeCoder.registerByteSizeObserver(IterableLikeCoder.java:59)
   org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:640)
   org.apache.beam.sdk.util.WindowedValue$FullWindowedValueCoder.registerByteSizeObserver(WindowedValue.java:558)
   org.apache.beam.runners.dataflow.worker.IntrinsicMapTaskExecutorFactory$ElementByteSizeObservableCoder.registerByteSizeObserver(IntrinsicMapTaskExecutorFactory.java:403)
   org.apache.beam.runners.dataflow.worker.util.common.worker.OutputObjectAndByteCounter.update(OutputObjectAndByteCounter.java:128)
   org.apache.beam.runners.dataflow.worker.DataflowOutputCounter.update(DataflowOutputCounter.java:67)
   org.apache.beam.runners.dataflow.worker.util.common.worker.OutputReceiver.process(OutputReceiver.java:43)
   org.apache.beam.runners.dataflow.worker.SimpleParDoFn$1.output(SimpleParDoFn.java:285) org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.outputWindowedValue(SimpleDoFnRunner.java:251)
   org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner.access$900(SimpleDoFnRunner.java:85)
   org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnFinishBundleArgumentProvider$Context.output(SimpleDoFnRunner.java:309)
   org.apache.beam.runners.dataflow.worker.repackaged.org.apache.beam.runners.core.SimpleDoFnRunner$DoFnFinishBundleArgumentProvider$Context.output(SimpleDoFnRunner.java:304)
   org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn$FinishBundleContextAdapter.output(ElasticsearchIO.java:2415)
   org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.flushAndOutputResults(ElasticsearchIO.java:2422)
   org.apache.beam.sdk.io.elasticsearch.ElasticsearchIO$BulkIO$BulkIOBaseFn.finishBundle(ElasticsearchIO.java:2382)
   ```
   
   
   Imported from Jira [BEAM-14551](https://issues.apache.org/jira/browse/BEAM-14551). Original Jira may contain additional context.
   Reported by: Declan.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aaltay commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155952571

   +1 to reverting and not blocking the release.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] aaltay commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
aaltay commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155831062

   Could we move this to 2.41.0? Does this need to be a blocker for 2.40.0 ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles closed issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
kennknowles closed issue #21690: Exception from ElasticSearch Write Module in DataFlow
URL: https://github.com/apache/beam/issues/21690


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155276525

   Nice! Thanks for figuring it out.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156690551

   That’s a good point. I’ve honestly debated whether or not an IO ought to preserve windows at all given that responses from a Sink could be viewed as effectively a new Source. But I suppose preserving windows could be useful for joining data sent to a sink and data returned from that sink via CoGroupByKey.
   
   Back to your original idea of just using the windowing function, I’ll give that a try. Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] kennknowles commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
kennknowles commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155277141

   Does #17112 completely solve it? Can it be closed?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] steveniemitz commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
steveniemitz commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156598013

   Do we actually need to even bother preserving the "extra stuff" on the windowing configuration?  Given that the output is already passed through a buffer, does a user care about triggering/lateness at all at that point?  Maybe it's good enough to just restore the window function, which can be done w/o needing `setWindowingStrategyInternal`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157031723

   I don’t know what all the decision criteria are for which is preferred. The proposed fix definitely corrects the error observed in Dataflow. I believe the fix is merely correcting a misunderstanding on my part that should have been included originally with #17112. 
   
   Reverting will reintroduce a potential data correctness bug. But I fully concede if the fix is deemed to simply not have had enough time for review and further testing. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156593434

   Thanks @steveniemitz , that very well could be the issue!  I definitely need some guidance.  I had seen in other places in the codebase that `Flatten` was used to explicitly force the creation of a new PCollection.  My original idea was to write a test so that I could play around with `setWindowingStrategyInternal` Vs. `Window.into` Vs. `Flatten`/`PCollection.createPrimitiveOutputInternal` but I wasn't able to reproduce this issue in testing.
   
   Any thoughts on alternative methods of reapplying the same windowing without using `setWindowingStrategyInternal`? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156955964

   ok for 2.40.0 what is the preferred course of action?
   - Revert - https://github.com/apache/beam/pull/21884
   or
   - What seems to be a fix - https://github.com/apache/beam/pull/21895
   
   Are we confident https://github.com/apache/beam/pull/21895 is a fix? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155894953

   The ElasticsearchIO connector is completely broken in 2.39.0 as a result of this issue. I would say it should be a blocker to either fix this or revert  https://github.com/apache/beam/pull/17112 and try again to fix the issue that it was meant to patch. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157000492

   I think I will proceed with Revert if I can't get a confirmation that the fix will be the preferred route : /


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157042645

   ok I'll cut the branch without either and we can resolve in the coming days. Thanks @egalpin 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157069718

   Thanks @pabloem , apologies for the extra work that entails as release manager 🙏 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1152742725

   I was able to repro on Dataflow today.  I would guess this issue is widespread for any pipeline with windowing and Sinking data to ES.  This is a regression introduced by https://github.com/apache/beam/pull/17112


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155289750

   @kennknowles Sorry my comment may have been unclear.  I don't know the approach to fix this issue yet, rather I believe this issue was introduced by https://github.com/apache/beam/pull/17112.  I could actually use some guidance by anyone with in depth knowledge of windowing.
   
   So far I'm working with the assumption that applying global windowing[1] to an input with non-global windowing results in a PCollection with mismatched WindowCoder[2]? I'm not able to reproduce this issue with a test yet, including a test that applies FixedWindows before running the `Write` transform.  
   
   [1] https://github.com/apache/beam/pull/17112/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR2275
   [2] https://github.com/apache/beam/blob/master/sdks/java/core/src/main/java/org/apache/beam/sdk/util/WindowedValue.java#L571-L575 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] steveniemitz commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
steveniemitz commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156548416

   is the issue that expand() uses [setWindowingStrategyInternal]([https://github.com/apache/beam/blob/8bf289cb9af4bc44d93ae43205ac372fa79268ea/sdks/java/io/elasticsearch/src/main/java/org/apache/beam/sdk/io/elasticsearch/ElasticsearchIO.java#L2287) ?  This doesn't create a new PCollection so in theory "propegates back" to the producer (your batching DoFn) and ends up with a mismatched coder. (I think at least?)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1155305796

   I'll attach some original context as well:
   
   ES Write was found to be outputting data to incorrect windows(introduced by [#15381](
   [1] https://github.com/apache/beam/pull/15381/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR2363)):  https://lists.apache.org/thread/mtwtno2o88lx3zl12jlz7o5w1lcgm2db
   
   And pseudo-code proposal of a potential fix, which I very well may have bastardized: https://lists.apache.org/thread/fnrlw7rkcg4of5xyc7s43mwz00dgqf72 (attempted to introduce the idea [here](https://github.com/apache/beam/pull/17112/files#diff-e119923661d41dfc1911da8f6e29cc0e2e9a0a50129718c1c20c7d1aa32119bfR2272-R2287))


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1156747294

   Just confirmed that @steveniemitz suggestion of using the WindowFn + Window.into rather than `setWindowingStrategyInternal` corrects the error described here.  I must admit I don't fully grasp _why_ yet (following up on @steveniemitz's comment RE `setWindowingStrategyInternal` not creating a new PCollection for my own understanding).
   
   Opened a PR at https://github.com/apache/beam/pull/21895 as an alternative to reverting.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1149068930

   I'm not sure how to reach the original ticket reporter via github username, but I'd like to get more details about how the pipeline was being run (i.e. which runner? x-lang? portable?) and what input windowing was being used.  Thanks!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] pabloem commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
pabloem commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157070068

   aw no worries! I appreciate your help with this fix.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] egalpin commented on issue #21690: Exception from ElasticSearch Write Module in DataFlow

Posted by GitBox <gi...@apache.org>.
egalpin commented on issue #21690:
URL: https://github.com/apache/beam/issues/21690#issuecomment-1157623959

   Ok so I did a little more digging for the sake of my own understanding and I think I have a much better grasp of what `setWindowingStrategyInternal` is used for.  It seems to be intended simply for cases where the programmer knows that the data contained in a new PCollection has a specific the same WindowingStrategy as a prior PCollection i.e. the data type and window type are known to be the same in a new PCollection as they were in a previously known PCollection.  Ex in Window.into[1].
   
   `setWindowingStrategyInternal` is not meant to be used to _actively restore_ i.e. re-apply a WindowFn, rather it's used to indicate that a previously applied WindowFn is already suitable.
   
   I feel I have better clarity on why re-applying (i.e. `Window.into(<priorWindowFn>)`) resolves this issue.  I would like to recommend that the fix in https://github.com/apache/beam/pull/21895 be the preferred path forward.
   
   [1] https://github.com/apache/beam/blob/29f30b3605824a47cbc68229f63af44a34cf0885/sdks/java/core/src/main/java/org/apache/beam/sdk/transforms/windowing/Window.java#L404-L408


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org