You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by da...@apache.org on 2024/03/04 19:02:16 UTC
(beam) branch master updated: Duet AI data encoding prompts (no links) (#30420)

This is an automated email from the ASF dual-hosted git repository.

damccorm pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/beam.git


The following commit(s) were added to refs/heads/master by this push:
     new fb58cfc0cad Duet AI data encoding prompts (no links) (#30420)
fb58cfc0cad is described below

commit fb58cfc0cad657036fd09e6d33008b4c064a04ca
Author: Daria Bezkorovaina <99...@users.noreply.github.com>
AuthorDate: Mon Mar 4 19:02:08 2024 +0000

    Duet AI data encoding prompts (no links) (#30420)
    
    * Create 32_data_encoding.md
    
    * Update 32_data_encoding.md
    
    * Create 33_coders_data_encoding.md
    
    * Update 33_coders_data_encoding.md
    
    * Create 34_change_coders_data_encoding.md
    
    * Update 34_change_coders_data_encoding.md
    
    * Update 34_change_coders_data_encoding.md
    
    * Update README.md
    
    * Update 01_io_kafka.md
    
    Nits to avoid passive voice
    
    * Update 02_io_pubsub.md
    
    * Update 02_io_pubsub.md
    
    Nits and minor typos
    
    * Update 02_io_pubsub.md
    
    * Update 03_io_bigquery.md
    
    * Update 04_io_bigtable.md
    
    * Update 04_io_bigtable.md
    
    * Update 05_io_spanner.md
    
    * Update 06_io_tfrecord.md
    
    * Update 07_io_json.md
    
    * Update 08_io_csv.md
    
    * Update 09_io_avro.md
    
    * Update 10_io_parquet.md
    
    * Update 11_io_jdbc.md
    
    * Update 06_io_tfrecord.md
    
    * Update 07_io_json.md
    
    * Update 08_io_csv.md
    
    * Update 09_io_avro.md
    
    * Update 09_io_avro.md
    
    * Update 10_io_parquet.md
    
    * Update 11_io_jdbc.md
    
    * Rename 33_coders_data_encoding.md to 48_coders_data_encoding.md
    
    * Rename 34_change_coders_data_encoding.md to 49_change_coders_data_encoding.md
    
    * Rename 48_coders_data_encoding.md to 34_coders_data_encoding.md
    
    * Rename 34_coders_data_encoding.md to 33_coders_data_encoding.md
    
    * Rename 49_change_coders_data_encoding.md to 34_change_coders_data_encoding.md
    
    * Update learning/prompts/documentation-lookup-nolinks/34_change_coders_data_encoding.md
    
    Implement PR review comments
    
    Co-authored-by: Danny McCormick <da...@google.com>
    
    ---------
    
    Co-authored-by: Danny McCormick <da...@google.com>
---
 learning/prompts/README.md                         |   6 +-
 learning/prompts/code-generation/01_io_kafka.md    |   4 +-
 learning/prompts/code-generation/02_io_pubsub.md   |  17 ++--
 learning/prompts/code-generation/03_io_bigquery.md |   7 +-
 learning/prompts/code-generation/04_io_bigtable.md |   8 +-
 learning/prompts/code-generation/05_io_spanner.md  |   5 +-
 learning/prompts/code-generation/06_io_tfrecord.md |   4 +-
 learning/prompts/code-generation/07_io_json.md     |   7 +-
 learning/prompts/code-generation/08_io_csv.md      |   6 +-
 learning/prompts/code-generation/09_io_avro.md     |   5 +-
 learning/prompts/code-generation/10_io_parquet.md  |   6 +-
 learning/prompts/code-generation/11_io_jdbc.md     |   7 +-
 .../32_data_encoding.md                            |  15 +++
 .../33_coders_data_encoding.md                     |  35 +++++++
 .../34_change_coders_data_encoding.md              | 103 +++++++++++++++++++++
 15 files changed, 198 insertions(+), 37 deletions(-)

diff --git a/learning/prompts/README.md b/learning/prompts/README.md
index ea5d7d8bd79..b8cce71b794 100644
--- a/learning/prompts/README.md
+++ b/learning/prompts/README.md
@@ -48,7 +48,7 @@ Features of a good response:
 - Starts with a brief introduction that explains the code sample.
 - Includes information about how to find the reference documentation.
 - Includes a link to the list of code samples.
-- Provides well documented code. Consider including an example of what the return result looks like.
+- Provides well-documented code. Consider including an example of what the execution result looks like.
 - Follows up with the user to ensure they don’t continue needlessly with false responses.
 
 
@@ -56,7 +56,7 @@ Features of a good response:
 Features of a good response:
 - Starts with a short overall description that tries to answer the question in the prompt.
 - Grounds the algorithm in any well-known context, if appropriate. For example, this is an implementation of X, a well-known algorithm to do Y.
-- Discusses the variables in the snippet, and what their purpose is relative to the runtime.
+- Discusses the variables in the snippet and their purpose relative to the runtime.
 - Discusses runtime and memory storage complexity.
 - Notes any interesting features of the code, or opportunities for improvement (optimizations, refactoring, syntax best practices, etc.)
 
@@ -73,4 +73,4 @@ learning/prompts/
 │   ├── 01_basic_learning_apache_beam.md
 │   └── ...
 └── README.md
-```
\ No newline at end of file
+```
diff --git a/learning/prompts/code-generation/01_io_kafka.md b/learning/prompts/code-generation/01_io_kafka.md
index e169e012ef4..9b225c85638 100644
--- a/learning/prompts/code-generation/01_io_kafka.md
+++ b/learning/prompts/code-generation/01_io_kafka.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from Apache Kafka using Apache Beam.
 
 Response:
-You can read data from [Apache Kafka](https://kafka.apache.org/) using the `ReadFromKafka` transform. The following Python code reads a topic `topic` from a Kafka server `bootstrap_servers`. The values for `topic` and `bootstrap_servers` are provided as command-line arguments. The data is logged to the console.
+Your Apache Beam pipeline can read data from [Apache Kafka](https://kafka.apache.org/) using the `ReadFromKafka` transform. The following Python code reads a topic `topic` from a Kafka server specified in `bootstrap_servers`. The values for `topic` and `bootstrap_servers` are provided as command-line arguments. The Apache Beam pipeline reads the data from a Kafka topic and logs it to the console.
 
 ```python
 import logging
@@ -43,4 +43,4 @@ with beam.Pipeline(options=options) as p:
 ```
 For more information about how to use the KafkaIO connector with the Python SDK for Apache Beam, see the [KafkaIO connector documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.kafka.html).
 
-For samples that show common pipeline configurations to parse pipeline options from command-line arguments, see [Pipeline option patterns](https://beam.apache.org/documentation/patterns/pipeline-options/).
\ No newline at end of file
+For samples that show common pipeline configurations to parse pipeline options from command-line arguments, see [Pipeline option patterns](https://beam.apache.org/documentation/patterns/pipeline-options/).
diff --git a/learning/prompts/code-generation/02_io_pubsub.md b/learning/prompts/code-generation/02_io_pubsub.md
index 90856e8956f..d5e2e2d833f 100644
--- a/learning/prompts/code-generation/02_io_pubsub.md
+++ b/learning/prompts/code-generation/02_io_pubsub.md
@@ -1,9 +1,10 @@
 Prompt:
-Write the python code to read messages from a Pub/Sub subscription.
+Write the Python code to read messages from a Pub/Sub subscription.
+
 Response:
-You can read messages from a Pub/Sub subscription or topic using the `ReadFromPubSub` transform. Pub/Sub is only supported in streaming pipelines.
+Your Apache Beam pipeline can read messages from a Pub/Sub subscription or topic using the `ReadFromPubSub` transform. Pub/Sub is only supported in streaming pipelines.
 
-The following Python code reads messages from a Pub/Sub subscription. The subscription is provided as a command line argument. The messages are logged to the console:
+The following Python code reads messages from a Pub/Sub subscription and logs the data to the console. The subscription is provided as a command line argument:
 
 ```python
 import logging
@@ -14,9 +15,8 @@ from apache_beam.io import ReadFromPubSub
 from apache_beam.options.pipeline_options import PipelineOptions
 
 class PubSubReadOptions(PipelineOptions):
-"""
-Configure pipeline options for PubSub read transform.
-"""
+"""Configure pipeline options for Pub/Sub read transform."""
+
     @classmethod
     def _add_argparse_args(cls, parser):
       parser.add_argument(
@@ -42,10 +42,11 @@ if __name__ == '__main__':
   read_subscription()
 
 ```
-Reading messages directly from a topic is also supported. A temporary subscription will be created automatically.
+
+Apache Beam also supports reading messages directly from a topic. A temporary subscription will be created automatically.
 
 The messages could be returned as a byte string or as `PubsubMessage` objects. This behavior is controlled by the `with_attributes` parameter.
 
 For more information, see the [Pub/Sub I/O transform documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsub.html).
 
-For a common pattern for configuring pipeline options see the [Pipeline option pattern](https://beam.apache.org/documentation/patterns/pipeline-options/).
\ No newline at end of file
+For a common pattern for configuring pipeline options, see [Pipeline option patterns](https://beam.apache.org/documentation/patterns/pipeline-options/).
diff --git a/learning/prompts/code-generation/03_io_bigquery.md b/learning/prompts/code-generation/03_io_bigquery.md
index b13be4fd355..b31b434ccdf 100644
--- a/learning/prompts/code-generation/03_io_bigquery.md
+++ b/learning/prompts/code-generation/03_io_bigquery.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from BigQuery using Apache Beam.
 
 Response:
-You can read data from BigQuery using the `ReadFromBigQuery` transform. The following Python code reads data from a BigQuery table. The table name is provided as the command-line argument `table`. The data is logged to the console.
+Your Apache Beam pipeline can read data from BigQuery using the `ReadFromBigQuery` transform. The following Python code reads data from a BigQuery table and logs it to the console. The table name is provided as the command-line argument `table`.
 
 ```python
 import logging
@@ -39,7 +39,8 @@ if __name__ == '__main__':
     logging.getLogger().setLevel(logging.INFO)
     read_table()
 ```
-The `ReadFromBigQuery` transform also supports reading from a BigQuery query. The following Python code reads data from a BigQuery table using a SQL query. The query is provided as the command-line argument `query`. The data is logged to the console.
+
+The `ReadFromBigQuery` transform also supports reading from a BigQuery query. The following Python code reads data from a BigQuery table using a SQL query and logs it to the console. The query is provided as the command-line argument `query`.
 
 ```python
 with beam.Pipeline(options=options) as p:
@@ -48,7 +49,7 @@ with beam.Pipeline(options=options) as p:
       | Map(logging.info)
  ```
 
-For more information, see the [BigQueryIO connector documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html).
+For more information, see the [BigQuery I/O connector documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html).
 
 For samples that show common pipeline configurations, see [Pipeline option patterns](https://beam.apache.org/documentation/patterns/pipeline-options/).
 
diff --git a/learning/prompts/code-generation/04_io_bigtable.md b/learning/prompts/code-generation/04_io_bigtable.md
index 22ea5ebcef9..e4b09153a69 100644
--- a/learning/prompts/code-generation/04_io_bigtable.md
+++ b/learning/prompts/code-generation/04_io_bigtable.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from Bigtable using Apache Beam.
 
 Response:
-You can read data from [Cloud Bigtable](https://cloud.google.com/bigtable) using the `ReadFromBigtable` transform. The following Python code reads a table `table_id` from a Bigtable instance `instance_id`. The `table_id` and `instance_id` are provided as command-line arguments. The data is logged to the console.
+Your Apache Beam pipeline can read data from [Cloud Bigtable](https://cloud.google.com/bigtable) using the `ReadFromBigtable` transform. The following Python code reads a table `table_id` from a Bigtable instance `instance_id` and logs the data to the console. The `table_id` and `instance_id` are provided as command-line arguments.
 
 ```python
 import logging
@@ -44,7 +44,7 @@ def run():
     options = BigtableOptions()
 
     with beam.Pipeline(options=options) as p:
-        #this pipeline reads from a BigQuery table and logs the data to the console
+        #this pipeline reads from a Bigtable table and logs the data to the console
         output = (p | "Read from Bigtable" >> ReadFromBigtable(
                     project_id=options.project_id,
                     instance_id=options.instance_id,
@@ -58,9 +58,9 @@ if __name__ == "__main__":
     logging.getLogger().setLevel(logging.INFO)
     run()
 ```
-The `ReadFromBigtable` transform returns a `PCollection` of `PartialRowData` objects, each representing a Bigtable row. For more information about this row object, see [PartialRowData (row_key)](https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey)
+The `ReadFromBigtable` transform returns a `PCollection` of `PartialRowData` objects, each representing a Bigtable row. For more information about this row object, see [PartialRowData (row_key)](https://cloud.google.com/python/docs/reference/bigtable/latest/row#class-googlecloudbigtablerowpartialrowdatarowkey).
 
-For more information, see the [BigTableIO connector documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigtableio.html).
+For more information, see the [BigTable I/O connector documentation](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigtableio.html).
 
 For samples that show common pipeline configurations, see [Pipeline option patterns](https://beam.apache.org/documentation/patterns/pipeline-options/).
 
diff --git a/learning/prompts/code-generation/05_io_spanner.md b/learning/prompts/code-generation/05_io_spanner.md
index 24453ff01be..991347a7ef3 100644
--- a/learning/prompts/code-generation/05_io_spanner.md
+++ b/learning/prompts/code-generation/05_io_spanner.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from Cloud Spanner using Apache Beam.
 
 Response:
-You can read data from [Cloud Spanner](https://cloud.google.com/spanner) using the Apache Beam [SpannerIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.spanner.html). The SpannerIO connector lets you read data from and write data to Cloud Spanner. Cloud Spanner is supported in both batch pipelines and streaming pipelines.
+Your Apache Beam pipeline can read data from [Cloud Spanner](https://cloud.google.com/spanner) using the Apache Beam [SpannerIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.spanner.html). The SpannerIO connector lets you read data from and write data to Cloud Spanner. Cloud Spanner is supported in both batch pipelines and streaming pipelines.
 
 ```python
 import logging
@@ -55,9 +55,10 @@ with beam.Pipeline(options=options) as p:
 The `ReadFromSpanner` transform is a built-in Apache Beam transform that reads data from a Cloud Spanner table. The `ReadFromSpanner` transform returns a `PCollection` of `NamedTuple` objects. The `NamedTuple` object is a Python class that represents a row in a Cloud Spanner table.
 
 Registering a coder for `NamedTuple` is required to use `NamedTuple` as a row type:
+
 ```python
  coders.registry.register_coder(ExampleRow, coders.RowCoder)
 ```
 For more information about how to register a coder for a custom type, see [Data encoding and type safety](https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety).
 
-The `SpannerOptions` class defines the command-line arguments `project_id`, `instance_id` and `database_id`, which are used to configure the `ReadFromSpanner` transform. These arguments are parsed from the command line using [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/).
\ No newline at end of file
+The `SpannerOptions` class defines the command-line arguments `project_id`, `instance_id`, and `database_id`, which are used to configure the `ReadFromSpanner` transform. These arguments are parsed from the command line using [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/).
diff --git a/learning/prompts/code-generation/06_io_tfrecord.md b/learning/prompts/code-generation/06_io_tfrecord.md
index 7f36540b97b..873d75d759c 100644
--- a/learning/prompts/code-generation/06_io_tfrecord.md
+++ b/learning/prompts/code-generation/06_io_tfrecord.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from a TensorFlow TFRecord file using Apache Beam.
 
 Response:
-You can read data from a TensorFlow [TFRecord file](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset) using the Apache Beam [TFRecordIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html). The TFRecordIO connector lets you read and write data in `TFRecord` format.
+Your Apache Beam pipeline can read data from a TensorFlow [TFRecord file](https://www.tensorflow.org/api_docs/python/tf/data/TFRecordDataset) using the Apache Beam [TFRecordIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html). The TFRecordIO connector lets you read and write data in `TFRecord` format.
 
 ```python
 import logging
@@ -55,6 +55,6 @@ if __name__ == "__main__":
 
 The `TFRecordOptions` class defines the command-line argument `file_pattern`, which is used to configure the `ReadFromTFRecord` transform. This argument is parsed from the command line using [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/).
 
-The `ReadFromTFRecord` transform is a built-in Apache Beam transform that reads data from a `TFRecord` file. By default, `TFRecordIO` transforms use `coders.BytesCoder()`. For more information, see [ReadFromTFRecord transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html#apache_beam.io.tfrecordio.ReadFromTFRecord).
+The `ReadFromTFRecord` transform is a built-in Apache Beam transform that reads data from a `TFRecord` file. By default, `TFRecordIO` transforms use `coders.BytesCoder()`. For more information, see [`ReadFromTFRecord` transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html#apache_beam.io.tfrecordio.ReadFromTFRecord).
 
 
diff --git a/learning/prompts/code-generation/07_io_json.md b/learning/prompts/code-generation/07_io_json.md
index 6c142b49c45..6abe29e94ab 100644
--- a/learning/prompts/code-generation/07_io_json.md
+++ b/learning/prompts/code-generation/07_io_json.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from JSON files using Apache Beam.
 
 Response:
-You can read data from [JSON files](https://www.json.org/) using the Apache Beam [TextIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html).
+Your Apache Beam pipeline can read data from [JSON files](https://www.json.org/) using the Apache Beam [TextIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html).
 
 ```python
 import logging
@@ -42,7 +42,8 @@ if __name__ == "__main__":
     logging.getLogger().setLevel(logging.INFO)
     run()
 ```
-The `JsonOptions` class is a custom class that extends the `PipelineOptions` object. It is used to parse pipeline options from command-line arguments. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) to parse command-line arguments.
 
-The [ReadFromJson](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromJson) transform is a built-in Apache Beam transform that reads data from a JSON file.
+The `JsonOptions` class is a custom class that extends the `PipelineOptions` object. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) to parse command-line arguments.
+
+The [`ReadFromJson`](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromJson) transform is a built-in Apache Beam transform that reads data from a JSON file.
 
diff --git a/learning/prompts/code-generation/08_io_csv.md b/learning/prompts/code-generation/08_io_csv.md
index 3888e15bb70..7db9f0bf2af 100644
--- a/learning/prompts/code-generation/08_io_csv.md
+++ b/learning/prompts/code-generation/08_io_csv.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from CSV files using Apache Beam.
 
 Response:
-You can read data from [CSV files](https://en.wikipedia.org/wiki/Comma-separated_values) using the Apache Beam [TextIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html).
+Your Apache Beam pipeline can read data from [CSV files](https://en.wikipedia.org/wiki/Comma-separated_values) using the [TextIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html).
 
 ```python
 import logging
@@ -42,6 +42,6 @@ if __name__ == "__main__":
     run()
 ```
 
-The `CsvOptions` class is a custom class that extends the `PipelineOptions` object. It is used to parse pipeline options from command-line arguments. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) to parse command-line arguments.
+The `CsvOptions` class is a custom class that extends the `PipelineOptions` object. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) to parse command-line arguments.
 
-The [ReadFromCsv transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromCsv) is a built-in Apache Beam transform that reads data from a CSV file.
\ No newline at end of file
+The [`ReadFromCsv` transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html#apache_beam.io.textio.ReadFromCsv) is a built-in Apache Beam transform that reads data from a CSV file.
diff --git a/learning/prompts/code-generation/09_io_avro.md b/learning/prompts/code-generation/09_io_avro.md
index 2ccf5427822..5721bf3c301 100644
--- a/learning/prompts/code-generation/09_io_avro.md
+++ b/learning/prompts/code-generation/09_io_avro.md
@@ -2,7 +2,7 @@ Prompt:
 Write the Python code to read data from Apache Avro files using Apache Beam.
 
 Response:
-You can read data from [Apache Avro](https://avro.apache.org/) files using the Apache Beam [AvroIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html).
+Your Apache Beam pipeline can read data from [Apache Avro](https://avro.apache.org/) files using the [AvroIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html).
 
 ```python
 import logging
@@ -35,8 +35,9 @@ if __name__ == "__main__":
   logging.getLogger().setLevel(logging.INFO)
   run()
 ```
+
 The `ReadAvroOptions` class is used to define the command-line argument `--path`, which specifies the path to the Avro file. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) for the required `path` argument.
 
-A Beam pipeline is created using the `ReadAvroOptions` class and the [ReadFromAvro transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html#apache_beam.io.avroio.ReadFromAvro) is used to read data from the Avro file.
+The code creates an Apache Beam pipeline using the `ReadAvroOptions` class to set the Avro file path and the [`ReadFromAvro` transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html#apache_beam.io.avroio.ReadFromAvro) to read data from the file.
 
 
diff --git a/learning/prompts/code-generation/10_io_parquet.md b/learning/prompts/code-generation/10_io_parquet.md
index 5493c5e14d5..d84f9f3e74f 100644
--- a/learning/prompts/code-generation/10_io_parquet.md
+++ b/learning/prompts/code-generation/10_io_parquet.md
@@ -2,7 +2,8 @@ Prompt:
 Write the Python code to read data from Apache Parquet files using Apache Beam.
 
 Response:
-You can read data from [Apache Parquet](https://parquet.apache.org/) files using the Apache Beam [ParquetIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html).
+Your Apache Beam pipeline can read data from [Apache Parquet](https://parquet.apache.org/) files using the [ParquetIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html).
+
 ```python
 import logging
 
@@ -34,6 +35,7 @@ if __name__ == "__main__":
   logging.getLogger().setLevel(logging.INFO)
   run()
 ```
+
 The `ReadParquetOptions` class is used to define the command-line argument `--path`, which specifies the path to the Parquet file. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) for the required `path` argument.
 
-A Beam pipeline is created using the `ReadParquetOptions` class and the [ReadFromParquet transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html#apache_beam.io.parquetio.ReadFromParquet) is used to read data from the Parquet file.
+The code creates an Apache Beam pipeline using the `ReadParquetOptions` class to set the Parquet file path and the [`ReadFromParquet` transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html#apache_beam.io.parquetio.ReadFromParquet) to read data from the file.
diff --git a/learning/prompts/code-generation/11_io_jdbc.md b/learning/prompts/code-generation/11_io_jdbc.md
index 9d8a98d6e64..b833e90eeef 100644
--- a/learning/prompts/code-generation/11_io_jdbc.md
+++ b/learning/prompts/code-generation/11_io_jdbc.md
@@ -2,7 +2,8 @@ Prompt:
 Write the Python code to read data from Oracle JDBC using Apache Beam.
 
 Response:
-You can read data from an Oracle Java Database Connectivity (JDBC) enabled database using the Apache Beam [JdbcIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.jdbc.html).
+Your Apache Beam pipeline can read data from an Oracle Java Database Connectivity (JDBC) enabled database using the [JdbcIO connector](https://beam.apache.org/releases/pydoc/current/apache_beam.io.jdbc.html).
+
 ```python
 import logging
 import apache_beam as beam
@@ -70,6 +71,6 @@ if __name__ == "__main__":
     run()
 ```
 
-The `JdbcOptions` class is a custom class that extends the `PipelineOptions` object. It is used to parse pipeline options from command-line arguments. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/).
+The `JdbcOptions` class is a custom class that extends the `PipelineOptions` object. This code uses [pipeline options](https://beam.apache.org/documentation/patterns/pipeline-options/) to parse command line arguments.
 
-A Beam pipeline is created using the `JdbcOptions` class and the [ReadFromJdbc transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.jdbc.html#apache_beam.io.jdbc.ReadFromJdbc) is used to read data from the database via JDBC.
+The code creates an Apache Beam pipeline using the `JdbcOptions` class to set the JDBC connection information and the [`ReadFromJdbc` transform](https://beam.apache.org/releases/pydoc/current/apache_beam.io.jdbc.html#apache_beam.io.jdbc.ReadFromJdbc) to read data from the database via JDBC.
diff --git a/learning/prompts/documentation-lookup-nolinks/32_data_encoding.md b/learning/prompts/documentation-lookup-nolinks/32_data_encoding.md
new file mode 100644
index 00000000000..34617ac965e
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/32_data_encoding.md
@@ -0,0 +1,15 @@
+Prompt:
+How does data encoding work in the Apache Beam model and why is it needed?
+
+Response:
+In Apache Beam, data encoding is a fundamental aspect of processing data in distributed environments. The model operates on the principle of encoding each element of a `PCollection` as a byte string.
+
+Data encoding is necessary to pass elements efficiently between distributed workers during the execution of an Apache Beam pipeline. Apache Beam is designed for distributed processing, where a pipeline may be executed on multiple VMs (workers) concurrently. During pipeline execution, workers collaborate to process data and often need to materialize the intermediate data in your `PCollection`. Communication between workers involves the exchange of encoded data over a network. The process  [...]
+
+Apache Beam offers a data encoding mechanism with built-in support for commonly used types, as well as the flexibility to specify custom encodings.
+
+The Beam SDKs use the `Coder` objects to define how elements in a `PCollection` are encoded and decoded. These objects ensure type consistency and safety, guaranteeing that the encoding and decoding processes correctly interpret data based on its original data types.
+
+Every `PCollection` in an Apache Beam pipeline requires a corresponding `Coder`. In most cases, the Apache Beam SDKs can automatically infer a `Coder` for a `PCollection` based on its element type or the transform that produces it. However, in some cases, the pipeline author will need to specify a `Coder` explicitly or develop a `Coder` for their custom type.
+
+Apache Beam comes with many inbuilt `Coder` objects that work with a variety of standard data types. Users can also build custom coders or register a new default coder for a given type using the `CoderRegistry` object in their pipeline.
diff --git a/learning/prompts/documentation-lookup-nolinks/33_coders_data_encoding.md b/learning/prompts/documentation-lookup-nolinks/33_coders_data_encoding.md
new file mode 100644
index 00000000000..72d6cbba987
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/33_coders_data_encoding.md
@@ -0,0 +1,35 @@
+Prompt:
+What is a coder in Apache Beam?
+
+Response:
+In Apache Beam, a `Coder` is an object responsible for specifying how elements in a given `PCollection` are encoded and decoded into byte strings. The `Coder` objects provide type consistency and safety during distributed processing between workers, ensuring the correct interpretation of data based on its original types.
+
+Every `PCollection` in an Apache Beam pipeline requires a corresponding `Coder`. While the Apache Beam SDKs can automatically infer a `Coder` for a `PCollection` based on its element type or the producing transform, there are cases where explicit specification or custom `Coder` development is necessary. It is important to note that multiple `Coder` objects can exist for a single data type.
+
+The Apache Beam SDKs use various mechanisms to automatically infer the `Coder` for a `PCollection`. Each pipeline object contains a `CoderRegistry` object representing a mapping of language types to the default coder for those types.
+
+In the Apache Beam SDKs for Python and Java, the `Coder` type provides the necessary methods for encoding and decoding data. The SDKs offer various `Coder` subclasses working with standard Python and Java types, available in the `apache_beam.coders` package for Python and the `Coder` package for Java.
+
+By default, the Beam SDKs use the typehints (Python) or the type parameters (Java) from the transform's function object (such as `DoFn`) to infer the `Coder` for elements in a `PCollection`. For example, in Apache Beam Python SDK, a `ParDo` annotated with the typehints `@beam.typehints.with_input_types(int)` and `@beam.typehints.with_output_types(str)` indicates that it accepts `int` inputs and produces `str` outputs. The Python SDK automatically infers the default `Coder` for the output [...]
+
+You can use the `CoderRegistry` to look up the default coder for a given type or register a new default coder. The following tables show the default mappings of coders to standard types for any pipeline created using the Beam SDK for Java and Python:
+
+| **Java Type** | **Default Coder** |
+|---------------|-------------------|
+| Double        | DoubleCoder       |
+| Instant       | InstantCoder      |
+| Integer       | VarIntCoder       |
+| Iterable      | IterableCoder     |
+| KV            | KvCoder           |
+| List          | ListCoder         |
+| Map           | MapCoder          |
+| Long          | VarLongCoder      |
+| String        | StringUtf8Coder   |
+
+| **Python Type** | **Default Coder** |
+|-----------------|-------------------|
+| int             | VarIntCoder       |
+| float           | FloatCoder        |
+| str             | BytesCoder        |
+| bytes           | StrUtf8Coder      |
+| Tuple           | TupleCoder        |
diff --git a/learning/prompts/documentation-lookup-nolinks/34_change_coders_data_encoding.md b/learning/prompts/documentation-lookup-nolinks/34_change_coders_data_encoding.md
new file mode 100644
index 00000000000..3c3c4915630
--- /dev/null
+++ b/learning/prompts/documentation-lookup-nolinks/34_change_coders_data_encoding.md
@@ -0,0 +1,103 @@
+Prompt:
+How to get, set, and create new coders for my Apache Beam pipeline?
+
+Response:
+Every `PCollection` in an Apache Beam pipeline requires a corresponding `Coder`. In most cases, the Apache Beam SDKs can automatically infer a `Coder` for a `PCollection` based on its element type or the producing transform. However, in some instances, you may need to explicitly set a `Coder` or create a custom `Coder`.
+
+In the Apache Beam SDKs for Python and Java, the `Coder` type provides the necessary methods for encoding and decoding data. To get, set, or register a coder for a particular pipeline, you can access and modify the pipeline’s `CoderRegistry` object.
+
+The following examples demonstrate how to get, set, and create a new `Coder` in an Apache Beam pipeline using the Python and Java SDKs.
+
+**Python SDK:**
+
+In the Python SDK, you can use the following methods:
+* `coders.registry`: retrieves the pipeline’s `CoderRegistry` object.
+* `CoderRegistry.get_coder`: retrieves the default `Coder` for a type.
+* `CoderRegistry.register_coder`: sets a new `Coder` for the target type.
+
+Here is an example illustrating how to set the default `Coder` in the Python SDK:
+
+```python
+apache_beam.coders.registry.register_coder(int, BigEndianIntegerCoder)
+```
+
+The provided example sets a default `Coder`, specifically `BigEndianIntegerCoder`, for `int` values in the pipeline.
+
+For custom or complex nested data types, you can implement a custom coder for your pipeline. To create a new `Coder`, you need to define a class that inherits from `Coder` and implement the required methods:
+* `encode`: takes input values and encodes them into byte strings.
+* `decode`: decodes the encoded byte string into its corresponding object.
+* `is_deterministic`: specifies whether this coder encodes values deterministically or not. A deterministic coder produces the same encoded representation of a given object every time, even if it is called on different workers at different moments. The method returns `True` or `False` based on your implementation.
+
+Here is an example of a custom `Coder` implementation in the Python SDK:
+
+```python
+from apache_beam.coders import Coder
+
+class CustomCoder(Coder):
+    def encode(self, value):
+        # Implementation for encoding 'value' into byte strings
+        pass
+
+    def decode(self, encoded):
+        # Implementation for decoding byte strings into the original object
+        pass
+
+    def is_deterministic(self):
+        # Specify whether this coder produces deterministic encodings
+        return True  # or False based on your implementation
+```
+
+**Java SDK:**
+
+In the Java SDK, you can use the following methods:
+* `Pipeline.getCoderRegistry`: retrieves the pipeline’s `CoderRegistry` object.
+* `getCoder`: retrieves the coder for an existing `PCollection`.
+* `CoderRegistry.getCoder`: retrieves the default `Coder` for a type.
+* `CoderRegistry.registerCoder`: sets a new default `Coder` for the target type.
+
+Here is an example of how you can set the default ‘Coder’ in the Java SDK:
+
+```java
+PipelineOptions options = PipelineOptionsFactory.create();
+Pipeline p = Pipeline.create(options);
+
+CoderRegistry cr = p.getCoderRegistry();
+cr.registerCoder(Integer.class, BigEndianIntegerCoder.class);
+```
+
+In this example, you use the method `CoderRegistry.registerCoder` to register `BigEndianIntegerCoder` for the target `integer` type.
+
+For custom or complex nested data types, you can implement a custom coder for your pipeline. For this, the `Coder` class exposes the following key methods:
+* `encode`: takes input values and encodes them into byte strings.
+* `decode`: decodes the encoded byte string into its corresponding object.
+* `verifyDeterministic`: specifies whether this coder produces deterministic encodings. A deterministic coder produces the same encoded representation of a given object every time, even if it is called on different workers at different moments. The method will return `NonDeterministicException` if a coder is not deterministic.
+
+Here’s an example of a custom `Coder` implementation in the Java SDK:
+
+```java
+import org.apache.beam.sdk.coders.CoderException;
+import org.apache.beam.sdk.coders.StructuredCoder;
+import java.io.IOException;
+import java.io.InputStream;
+import java.io.OutputStream;
+
+public class CustomCoder extends StructuredCoder<YourType> {
+    @Override
+    public void encode(YourType value, OutputStream outStream) throws CoderException, IOException {
+        // Implementation for encoding 'value' into byte strings
+    }
+
+    @Override
+    public YourType decode(InputStream inStream) throws CoderException, IOException {
+        // Implementation for decoding byte strings into the original object
+    }
+
+    @Override
+    public void verifyDeterministic() throws NonDeterministicException {
+        // Specify whether this coder produces deterministic encodings
+        // Throw NonDeterministicException if not deterministic
+    }
+}
+```
+
+Replace `YourType` with the actual type for which you want to create a new `Coder`, and implement the necessary methods based on your encoding/decoding logic.