You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:35:17 UTC
[GitHub] [beam] damccorm opened a new issue, #21103: Python Direct Runner doesn't support both streaming & non streaming sources
damccorm opened a new issue, #21103:
URL: https://github.com/apache/beam/issues/21103
Please see Stack Overflow discussion:
[https://stackoverflow.com/questions/68125864/transform-node-appliedptransform-was-not-replaced-as-expected-error-with-the-dir](https://stackoverflow.com/questions/68125864/transform-node-appliedptransform-was-not-replaced-as-expected-error-with-the-dir)
When I create a GCS source & a Pub Source and try to flatten both, there is an error because of some incompatible transformation done by the direct runner.
Code example:
```
gcsEventsColl = p | "Read from GCS" >> beam.io.ReadFromText("gs://sample_events_for_beam/*.log") \
| 'convert to dict' >> beam.Map(lambda x: json.loads(x))
liveEventsColl = p | "Read
from Pubsub" >> beam.io.ReadFromPubSub(topic="projects/axxxx/topics/input_topic") \
| 'convert to dict2' >> beam.Map(lambda x: json.loads(x))
input_rec = (gcsEventsColl, liveEventsColl)
| 'flatten' >> beam.Flatten()
```
Error:
```
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 564, in run
return self.runner.run_pipeline(self, self._options)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
line 131, in run_pipeline
return runner.run_pipeline(pipeline, options)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
line 529, in run_pipeline
pipeline.replace_all(_get_transform_overrides(options))
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 504, in replace_all
self._check_replacement(override)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 478, in _check_replacement
self.visit(ReplacementValidator())
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 611, in visit
self._root_transform().visit(visitor, self, visited)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 1195, in visit
part.visit(visitor, pipeline, visited)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 1195, in visit
part.visit(visitor, pipeline, visited)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 1195, in visit
part.visit(visitor, pipeline, visited) [Previous line repeated 4 more times]
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 1198, in visit
visitor.visit_transform(self)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
line 476, in visit_transform
transform_node) RuntimeError: Transform node AppliedPTransform(Read
from GCS/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/ProcessKeyedElements/GroupByKey/GroupByKey,
_GroupByKeyOnly) was not replaced as expected.
```
The direct runner corrupts the pipeline when it rewrites the transforms.
Imported from Jira [BEAM-12586](https://issues.apache.org/jira/browse/BEAM-12586). Original Jira may contain additional context.
Reported by: rodriguezc.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] jamesandreou commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
jamesandreou commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1147703767
Hello. We are currently experiencing this issue as well trying to use beam.Flatten() on a historical Pcol from bigquery and a streaming Pcol from pub/sub.
Has anyone found a temporary workaround?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] krishnap commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
krishnap commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1202049582
any update on this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] precabal commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by "precabal (via GitHub)" <gi...@apache.org>.
precabal commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1591450400
any update on this?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1261670897
@damccorm is working on a fix for PeriodicImpulse transform that may help with that pattern. Not sure if it will work with DirectRunner though as it has other limitations.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] ptrmcrthr commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
ptrmcrthr commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1242760530
We are running into this issue trying to implement a slowly changing side input as seen here: https://beam.apache.org/documentation/patterns/side-inputs/
Maybe a note on that page saying it's not working with DirectRunner? Unfortunately my pipeline is not working with Flink runner
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1261671264
@BjornPrime - when you will document direct runner streaming limitations, incorporate https://github.com/apache/beam/issues/21103#issuecomment-1242760530
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1163096066
@jamesandreou would an in-process Flink runner work for you?
```
# (in a separate terminal)
docker run --net=host apache/beam_flink1.11_job_server:latest
python -m your_pipeline --runner PortableRunner --job_endpoint="localhost:8099" --environment_type="LOOPBACK" --streaming
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
Re: [I] Python Direct Runner doesn't support both streaming & non streaming sources [beam]
Posted by "franklin113 (via GitHub)" <gi...@apache.org>.
franklin113 commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1869640172
Are there any workarounds for this? Using PeriodicImpulse for updating side inputs in the DirectRunner throws this error in my streaming pipeline.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1206855986
I don't think there has been significant work on Python streaming direct runner recently.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [beam] paulleroyza commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources
Posted by "paulleroyza (via GitHub)" <gi...@apache.org>.
paulleroyza commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1685030779
This is also affecting my pipeline, snippet below:
with beam.Pipeline(argv=pipeline_args) as pipeline:
send_data = (pipeline | "Read Parquet" >> beam.io.ReadFromParquet(known_args.source)
#| "Print" >> beam.FlatMap(beam_print,subset=1)
| "Write to PubSub" >> beam.io.WriteToPubSub(topic=known_args.topic)
)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@beam.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org