You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "ahmedabu98 (via GitHub)" <gi...@apache.org> on 2024/03/18 18:57:13 UTC

[PR] add ExternalTransformProvider example [beam]

ahmedabu98 opened a new pull request, #30666:
URL: https://github.com/apache/beam/pull/30666

   Adding an example for creating SchemaTransforms and using them with the ExternalTransformProvider API


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on PR #30666:
URL: https://github.com/apache/beam/pull/30666#issuecomment-2004717123

   R: @chamikaramj 
   R: @robertwb 
   R: @liferoad 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1530961429


##########
examples/multi-language/python/wordcount_external.py:
##########
@@ -0,0 +1,102 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+
+import apache_beam as beam
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.transforms.external_transform_provider import ExternalTransformProvider
+from apache_beam.typehints.row_type import RowTypeConstraint
+"""A Python multi-language pipeline that counts words.
+
+This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in
+`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically
+provided in Python via the `ExternalTransformProvider` API.
+
+Example commands for executing this program:
+
+DirectRunner:
+$ python wordcount_external.py --runner DirectRunner --input <INPUT FILE> --output <OUTPUT FILE> --expansion_service_port <PORT>
+
+DataflowRunner:
+$ python wordcount_external.py \
+      --runner DataflowRunner \
+      --temp_location $TEMP_LOCATION \
+      --project $GCP_PROJECT \
+      --region $GCP_REGION \
+      --job_name $JOB_NAME \
+      --num_workers $NUM_WORKERS \
+      --input "gs://dataflow-samples/shakespeare/kinglear.txt" \
+      --output "gs://$GCS_BUCKET/wordcount_external/output" \
+      --expansion_service_port <PORT>

Review Comment:
   There's a common section in the [README.md](https://github.com/apache/beam/blob/master/examples/multi-language/README.md#instructions-for-running-the-pipelines) file



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1529488929


##########
examples/multi-language/python/wordcount_external.py:
##########
@@ -0,0 +1,102 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+
+import apache_beam as beam
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.transforms.external_transform_provider import ExternalTransformProvider
+from apache_beam.typehints.row_type import RowTypeConstraint
+"""A Python multi-language pipeline that counts words.
+
+This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in
+`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically
+provided in Python via the `ExternalTransformProvider` API.
+
+Example commands for executing this program:
+
+DirectRunner:
+$ python wordcount_external.py --runner DirectRunner --input <INPUT FILE> --output <OUTPUT FILE> --expansion_service_port <PORT>
+
+DataflowRunner:
+$ python wordcount_external.py \
+      --runner DataflowRunner \
+      --temp_location $TEMP_LOCATION \
+      --project $GCP_PROJECT \
+      --region $GCP_REGION \
+      --job_name $JOB_NAME \
+      --num_workers $NUM_WORKERS \
+      --input "gs://dataflow-samples/shakespeare/kinglear.txt" \
+      --output "gs://$GCS_BUCKET/wordcount_external/output" \
+      --expansion_service_port <PORT>

Review Comment:
   do we have the good doc to run this expansion service? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1530967292


##########
examples/multi-language/README.md:
##########
@@ -28,6 +28,8 @@ This project provides examples of Apache Beam
 * **python/javacount** - A Python pipeline that counts words using the Java `Count.perElement()` transform.
 * **python/javadatagenerator** - A Python pipeline that produces a set of strings generated from Java.
                                   This example demonstrates the `JavaExternalTransform` API.
+* **python/wordcount_external** - A Python pipeline that runs the Word Count workflow using three external Java
+                transforms. This example demonstrates the simpler `ExternalTransformProvider` API.

Review Comment:
   Do we have any step-by-step guide somewhere about how to create a new JavaExternalTransform? If so, can we link it here? When I first looked at the java codes, I am a bit lost about what parts I need to create in order to use `ExternalTransformProvider`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "liferoad (via GitHub)" <gi...@apache.org>.
liferoad commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1529112663


##########
examples/multi-language/python/wordcount_external.py:
##########
@@ -0,0 +1,106 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+
+import apache_beam as beam
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.transforms.external_transform_provider import ExternalTransformProvider
+from apache_beam.typehints.row_type import RowTypeConstraint
+"""A Python multi-language pipeline that counts words.
+
+This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in
+`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically
+provided in Python via the `ExternalTransformProvider` API.
+
+Before running this program, make sure the expansion service is up and running. You can do so with the command:
+$ ./gradlew examples:multi-language:runExpansionService -PexpansionPort=<PORT>

Review Comment:
   we should provide the command without using gradelw since the users who just install the beam do not use this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30666:
URL: https://github.com/apache/beam/pull/30666#issuecomment-2004719546

   Stopping reviewer notifications for this pull request: review requested by someone other than the bot, ceding control


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1529190740


##########
examples/multi-language/python/wordcount_external.py:
##########
@@ -0,0 +1,106 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#    http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+import logging
+
+import apache_beam as beam
+from apache_beam.io import ReadFromText
+from apache_beam.options.pipeline_options import PipelineOptions
+from apache_beam.transforms.external_transform_provider import ExternalTransformProvider
+from apache_beam.typehints.row_type import RowTypeConstraint
+"""A Python multi-language pipeline that counts words.
+
+This pipeline reads an input text file then extracts and counts the words using Java SDK SchemaTransforms provided in
+`ExtractWordsProvider`, `JavaCountProvider`, and `WriteWordsProvider`. Wrappers for these transforms are dynamically
+provided in Python via the `ExternalTransformProvider` API.
+
+Before running this program, make sure the expansion service is up and running. You can do so with the command:
+$ ./gradlew examples:multi-language:runExpansionService -PexpansionPort=<PORT>

Review Comment:
   Thanks for the catch, this was an old comment. Removed it



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] add ExternalTransformProvider example [beam]

Posted by "ahmedabu98 (via GitHub)" <gi...@apache.org>.
ahmedabu98 commented on code in PR #30666:
URL: https://github.com/apache/beam/pull/30666#discussion_r1538250014


##########
examples/multi-language/README.md:
##########
@@ -28,6 +28,8 @@ This project provides examples of Apache Beam
 * **python/javacount** - A Python pipeline that counts words using the Java `Count.perElement()` transform.
 * **python/javadatagenerator** - A Python pipeline that produces a set of strings generated from Java.
                                   This example demonstrates the `JavaExternalTransform` API.
+* **python/wordcount_external** - A Python pipeline that runs the Word Count workflow using three external Java
+                transforms. This example demonstrates the simpler `ExternalTransformProvider` API.

Review Comment:
   Clarifying that we are using SchemaTransforms here, not JavaExternalTransform.
   Also modifying comments in wordcount_external.py to clarify which Java transforms are used for this example



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org