You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/10/02 21:05:48 UTC

[GitHub] [beam] udim commented on a change in pull request #12913: Minor fixes to the get-started/wordcount-example webpage.

udim commented on a change in pull request #12913:
URL: https://github.com/apache/beam/pull/12913#discussion_r499050065



##########
File path: website/www/site/content/en/get-started/wordcount-example.md
##########
@@ -1424,15 +1424,15 @@ outputs.
 
 This example uses an unbounded `PCollection` and streams the results to
 Google Pub/Sub. The code formats the results and writes them to a Pub/Sub topic
-using [`beam.io.WriteStringsToPubSub`](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.io.gcp.pubsub.html#apache_beam.io.gcp.pubsub.WriteStringsToPubSub).
+using [`beam.io.WriteToPubSub`](https://beam.apache.org/releases/pydoc/{{< param release_latest >}}/apache_beam.io.gcp.pubsub.html#apache_beam.io.gcp.pubsub.WriteToPubSub).
 
 {{< highlight java >}}
   // This example is not currently available for the Beam SDK for Java.
 {{< /highlight >}}
 
 {{< highlight py >}}
   # Write to Pub/Sub
-  output | beam.io.WriteStringsToPubSub(known_args.output_topic)
+  output | beam.io.WriteToPubSub(known_args.output_topic)

Review comment:
       Similarly here, assuming that strings are being written they would have to be converted to `bytes` first.
   ```
   _ = (output 
        | 'EncodeString' >> Map(lambda s: s.encode('utf-8'))
        | beam.io.WriteToPubSub(known_args.output_topic))
   ```

##########
File path: website/www/site/content/en/get-started/wordcount-example.md
##########
@@ -1405,10 +1405,10 @@ messages from a Pub/Sub subscription or topic using
 {{< highlight py >}}
   # Read from Pub/Sub into a PCollection.
   if known_args.input_subscription:
-    lines = p | beam.io.ReadStringsFromPubSub(
+    lines = p | beam.io.ReadFromPubSub(

Review comment:
       An important difference is that ReadFromPubSub returns a `bytes` object (the raw message data).
   I would:
   - rename `lines` to `data`
   - add an additional step below: `lines = data | 'DecodeString' >> beam.Map(lambda d: d.decode('utf-8')))`




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org