You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by "dariabezkorovaina (via GitHub)" <gi...@apache.org> on 2024/02/06 11:15:36 UTC

[PR] Data encoding prompts for Duet AI [beam]

dariabezkorovaina opened a new pull request, #30231:
URL: https://github.com/apache/beam/pull/30231

   Adding 3 'golden' prompts about data encoding and type safety for Duet AI. 
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [ ] Mention the appropriate issue in your description (for example: `addresses #123`), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment `fixes #<ISSUE NUMBER>` instead.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://github.com/apache/beam/blob/master/CONTRIBUTING.md#make-the-reviewers-job-easier).
   
   To check the build health, please visit [https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md](https://github.com/apache/beam/blob/master/.test-infra/BUILD_STATUS.md)
   
   GitHub Actions Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   [![Build python source distribution and wheels](https://github.com/apache/beam/workflows/Build%20python%20source%20distribution%20and%20wheels/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Build+python+source+distribution+and+wheels%22+branch%3Amaster+event%3Aschedule)
   [![Python tests](https://github.com/apache/beam/workflows/Python%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Python+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Java tests](https://github.com/apache/beam/workflows/Java%20Tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Java+Tests%22+branch%3Amaster+event%3Aschedule)
   [![Go tests](https://github.com/apache/beam/workflows/Go%20tests/badge.svg?branch=master&event=schedule)](https://github.com/apache/beam/actions?query=workflow%3A%22Go+tests%22+branch%3Amaster+event%3Aschedule)
   
   See [CI.md](https://github.com/apache/beam/blob/master/CI.md) for more information about GitHub Actions CI or the [workflows README](https://github.com/apache/beam/blob/master/.github/workflows/README.md) to see a list of phrases to trigger workflows.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [OLD - with links] Data encoding prompts for Duet AI [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] closed pull request #30231: [OLD - with links] Data encoding prompts for Duet AI
URL: https://github.com/apache/beam/pull/30231


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [OLD - with links] Data encoding prompts for Duet AI [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30231:
URL: https://github.com/apache/beam/pull/30231#issuecomment-2094149505

   This pull request has been closed due to lack of activity. If you think that is incorrect, or the pull request requires review, you can revive the PR at any time.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Data encoding prompts for Duet AI [beam]

Posted by "dariabezkorovaina (via GitHub)" <gi...@apache.org>.
dariabezkorovaina commented on code in PR #30231:
URL: https://github.com/apache/beam/pull/30231#discussion_r1479609626


##########
learning/prompts/documentation-lookup/34_change_coders_data_encoding.md:
##########
@@ -0,0 +1,62 @@
+Prompt: 
+How to get, set, and create new coders for my Apache Beam pipeline?
+
+Response: 
+Every ['PCollection'](https://beam.apache.org/documentation/basics/#pcollection) in an Apache Beam pipeline requires a corresponding ['Coder’](https://beam.apache.org/documentation/programming-guide/#specifying-coders). In most cases, Beam SDKs can automatically infer a 'Coder' for a 'PCollection' based on its element type or the producing transform. However, in some instances, you may need to explicitly set a ‘Coder’ or create a custom 'Coder'.
+
+In the Apache Beam SDKs for Python and Java, the 'Coder' type provides the necessary methods for encoding and decoding data. To get, set, or register a coder for a particular pipeline, you can access and modify the pipeline’s ‘CoderRegistry’ object.
+
+The examples below demonstrate how to get, set, and create new ‘Coders’ in Apache Beam pipelines using the Python and Java SDKs.
+
+**Python SDK:**
+
+In the Python SDK, you can use the following methods:
+* To retrieve the pipeline’s ‘CoderRegistry’ object - ‘coders.registry’.
+* To get the default ‘Coder’ for a type - ‘CoderRegistry.get_coder’.
+* To set a new ‘Coder’ for the target type - ‘CoderRegistry.register_coder’.
+
+Here is an example illustrating how to set the default ‘Coder’ in the Python SDK:
+
+```python
+apache_beam.coders.registry.register_coder(int, BigEndianIntegerCoder)
+```
+
+This example sets a default ‘Coder’, specifically ‘BigEndianIntegerCoder’, for 'int' values in the pipeline.
+
+For custom or complex nested data types, you can implement a custom coder for your pipeline. To create a new ‘Coder’, you need to define a class that inherits from ‘Coder’ and implement the required methods: 
+* The ‘encode’ method takes input values and encodes them into byte strings.
+* The ‘decode’ method decodes the encoded byte string into its corresponding object.
+* The ‘is_deterministic’ method (optional) specifies whether this coder encodes values deterministically or not. A deterministic coder produces the same encoded representation of a given object every time, even if it is called on different workers at different moments. The method returns ‘True’ or ‘False’ based on your implementation.
+
+Here’s an example of a custom ‘Coder’ implementation in the Python SDK:
+https://towardsdatascience.com/data-pipelines-with-apache-beam-86cd8eb55fd8 
+
+**Java SDK:**
+
+In the Java SDK, you can use the following methods:
+* To retrieve the pipeline’s ‘CoderRegistry’ object - ‘Pipeline.getCoderRegistry’. 
+* To get the coder for an existing ‘PCollection’ - ‘getCoder’.
+* To get the default ‘Coder’ for a type - ‘CoderRegistry.getCoder’.
+* To set a new default ‘Coder’ for the target type - ‘CoderRegistry.registerCoder’.
+
+Here is an example of how you can set the default ‘Coder’ in the Java SDK:
+
+```java
+PipelineOptions options = PipelineOptionsFactory.create();
+Pipeline p = Pipeline.create(options);
+
+CoderRegistry cr = p.getCoderRegistry();
+cr.registerCoder(Integer.class, BigEndianIntegerCoder.class);
+```
+In this example, you use the method ‘CoderRegistry.registerCoder’ to register ‘BigEndianIntegerCoder’ for the target 'integer' type.
+
+For custom or complex nested data types, you can implement a custom coder for your pipeline. For this, the ‘Coder’ class exposes the following key methods: 
+* The ‘encode’ method takes input values and encodes them into byte strings.
+* The ‘decode’ method decodes the encoded byte string into its corresponding object.
+* The ‘verifyDeterministic’ method (optional) specifies whether this coder produces deterministic encodings. A deterministic coder produces the same encoded representation of a given object every time, even if it is called on different workers at different moments. The method will return 'NonDeterministicException' if the coder is not deterministic.
+
+Here’s an example of a custom ‘Coder’ implementation in the Java SDK: 
+https://www.waitingforcode.com/apache-beam/coders-apache-beam/read 
+
+For more details about working with coders, you can refer to the [Apache Beam documentation on data encoding and type safety](https://beam.apache.org/documentation/programming-guide/#data-encoding-and-type-safety). 
+

Review Comment:
   ```suggestion
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [OLD - with links] Data encoding prompts for Duet AI [beam]

Posted by "dariabezkorovaina (via GitHub)" <gi...@apache.org>.
dariabezkorovaina commented on PR #30231:
URL: https://github.com/apache/beam/pull/30231#issuecomment-1964719479

   This PR is an old version 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] [OLD - with links] Data encoding prompts for Duet AI [beam]

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.
github-actions[bot] commented on PR #30231:
URL: https://github.com/apache/beam/pull/30231#issuecomment-2080559953

   This pull request has been marked as stale due to 60 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@beam.apache.org list. Thank you for your contributions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org