You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/06/04 21:35:17 UTC

[GitHub] [beam] damccorm opened a new issue, #21103: Python Direct Runner doesn't support both streaming & non streaming sources

damccorm opened a new issue, #21103:
URL: https://github.com/apache/beam/issues/21103

   Please see Stack Overflow discussion:
   
   [https://stackoverflow.com/questions/68125864/transform-node-appliedptransform-was-not-replaced-as-expected-error-with-the-dir](https://stackoverflow.com/questions/68125864/transform-node-appliedptransform-was-not-replaced-as-expected-error-with-the-dir)
   
   When I create a GCS source & a Pub Source and try to flatten both, there is an error because of some incompatible transformation done by the direct runner.
   
   Code example:
   ```
   
   gcsEventsColl = p | "Read from GCS" >> beam.io.ReadFromText("gs://sample_events_for_beam/*.log") \
   
                    | 'convert to dict' >> beam.Map(lambda x: json.loads(x))
   liveEventsColl = p | "Read
   from Pubsub" >> beam.io.ReadFromPubSub(topic="projects/axxxx/topics/input_topic") \
                 
       | 'convert to dict2' >> beam.Map(lambda x: json.loads(x))
   
   
   input_rec = (gcsEventsColl, liveEventsColl)
   | 'flatten' >> beam.Flatten()
   
   ```
   
   Error:
   ```
   
   File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 564, in run
       return self.runner.run_pipeline(self, self._options)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
   line 131, in run_pipeline
       return runner.run_pipeline(pipeline, options)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/runners/direct/direct_runner.py",
   line 529, in run_pipeline
       pipeline.replace_all(_get_transform_overrides(options))
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 504, in replace_all
       self._check_replacement(override)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 478, in _check_replacement
       self.visit(ReplacementValidator())
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 611, in visit
       self._root_transform().visit(visitor, self, visited)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 1195, in visit
       part.visit(visitor, pipeline, visited)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 1195, in visit
       part.visit(visitor, pipeline, visited)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 1195, in visit
       part.visit(visitor, pipeline, visited)   [Previous line repeated 4 more times]
   
     File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 1198, in visit
       visitor.visit_transform(self)
      File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/apache_beam/pipeline.py",
   line 476, in visit_transform
       transform_node) RuntimeError: Transform node AppliedPTransform(Read
   from GCS/Read/SDFBoundedSourceReader/ParDo(SDFBoundedSourceDoFn)/ProcessKeyedElements/GroupByKey/GroupByKey,
   
      _GroupByKeyOnly) was not replaced as expected.
   
   ```
   
   The direct runner corrupts the pipeline when it rewrites the transforms.
   
    
   
   Imported from Jira [BEAM-12586](https://issues.apache.org/jira/browse/BEAM-12586). Original Jira may contain additional context.
   Reported by: rodriguezc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] jamesandreou commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
jamesandreou commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1147703767

   Hello. We are currently experiencing this issue as well trying to use beam.Flatten() on a historical Pcol from bigquery and a streaming Pcol from pub/sub.
   
   Has anyone found a temporary workaround?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] krishnap commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
krishnap commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1202049582

   any update on this? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] precabal commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by "precabal (via GitHub)" <gi...@apache.org>.
precabal commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1591450400

   any update on this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1261670897

   @damccorm is working on a fix for PeriodicImpulse transform that may help with that pattern. Not sure if it will work with DirectRunner though as it has other limitations.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] ptrmcrthr commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
ptrmcrthr commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1242760530

   We are running into this issue trying to implement a slowly changing side input as seen here:  https://beam.apache.org/documentation/patterns/side-inputs/
   
   Maybe a note on that page saying it's not working with DirectRunner? Unfortunately my pipeline is not working with Flink runner
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1261671264

   @BjornPrime - when you will document direct runner streaming limitations, incorporate https://github.com/apache/beam/issues/21103#issuecomment-1242760530


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1163096066

   @jamesandreou would an in-process Flink runner work for you?
   
   ```
   # (in a separate terminal)
   docker run --net=host apache/beam_flink1.11_job_server:latest
   
   python -m your_pipeline --runner PortableRunner  --job_endpoint="localhost:8099" --environment_type="LOOPBACK"  --streaming
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [I] Python Direct Runner doesn't support both streaming & non streaming sources [beam]

Posted by "franklin113 (via GitHub)" <gi...@apache.org>.
franklin113 commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1869640172

   Are there any workarounds for this?  Using PeriodicImpulse for updating side inputs in the DirectRunner throws this error in my streaming pipeline.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] tvalentyn commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1206855986

   I don't think there has been significant work on Python streaming direct runner recently.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [beam] paulleroyza commented on issue #21103: Python Direct Runner doesn't support both streaming & non streaming sources

Posted by "paulleroyza (via GitHub)" <gi...@apache.org>.
paulleroyza commented on issue #21103:
URL: https://github.com/apache/beam/issues/21103#issuecomment-1685030779

   This is also affecting my pipeline, snippet below:
     with beam.Pipeline(argv=pipeline_args) as pipeline:
       send_data = (pipeline | "Read Parquet" >> beam.io.ReadFromParquet(known_args.source)
                             #| "Print" >> beam.FlatMap(beam_print,subset=1)
                             | "Write to PubSub" >> beam.io.WriteToPubSub(topic=known_args.topic)
                   )


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org