You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/06/26 07:56:36 UTC

[GitHub] [beam] AbhiY98 opened a new pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

AbhiY98 opened a new pull request #12097:
URL: https://github.com/apache/beam/pull/12097


   [[BEAM-10327](https://issues.apache.org/jira/browse/BEAM-10327)] Create a pattern showing use of Schema
   
   ------------------------
   
   Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
   
    - [x] [**Choose reviewer(s)**](https://beam.apache.org/contribute/#make-your-change) and mention them in a comment (`R: @username`).
    - [x] Format the pull request title like `[BEAM-XXX] Fixes bug in ApproximateQuantiles`, where you replace `BEAM-XXX` with the appropriate JIRA issue, if applicable. This will automatically link the pull request to the issue.
    - [ ] Update `CHANGES.md` with noteworthy changes.
    - [ ] If this contribution is large, please file an Apache [Individual Contributor License Agreement](https://www.apache.org/licenses/icla.pdf).
   
   See the [Contributor Guide](https://beam.apache.org/contribute) for more tips on [how to make review process smoother](https://beam.apache.org/contribute/#make-reviewers-job-easier).
   
   Post-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   Lang | SDK | Dataflow | Flink | Samza | Spark
   --- | --- | --- | --- | --- | ---
   Go | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Go_VR_Spark/lastCompletedBuild/)
   Java | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Dataflow_Java11/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Flink_Java11/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Flink_Streaming/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Samza/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_Spark/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_PVR_Spark_Batch/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Java_ValidatesRunner_SparkStructuredStreaming/lastCompletedBuild/)
   Python | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python36/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python37/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_VR_Dataflow_V2/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Py_ValCont/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python2_PVR_Flink_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python35_VR_Flink/lastCompletedBuild/) | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_Python_VR_Spark/lastCompletedBuild/)
   XLang | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Flink/lastCompletedBuild/) | --- | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PostCommit_XVR_Spark/lastCompletedBuild/)
   
   Pre-Commit Tests Status (on master branch)
   ------------------------------------------------------------------------------------------------
   
   --- |Java | Python | Go | Website
   --- | --- | --- | --- | ---
   Non-portable | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Java_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Python_Cron/lastCompletedBuild/)<br>[![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_PythonLint_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Go_Cron/lastCompletedBuild/) | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Website_Cron/lastCompletedBuild/)
   Portable | --- | [![Build Status](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/badge/icon)](https://ci-beam.apache.org/job/beam_PreCommit_Portable_Python_Cron/lastCompletedBuild/) | --- | ---
   
   See [.test-infra/jenkins/README](https://github.com/apache/beam/blob/master/.test-infra/jenkins/README.md) for trigger phrase, status and link of all Jenkins jobs.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rezarokni commented on pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
rezarokni commented on pull request #12097:
URL: https://github.com/apache/beam/pull/12097#issuecomment-650895214


   @reuvenlax FYI work to have a nice pattern in the patterns section of the Beam docs for schema joins. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] abhiy13 commented on a change in pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
abhiy13 commented on a change in pull request #12097:
URL: https://github.com/apache/beam/pull/12097#discussion_r446963284



##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,56 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page describe common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure. Schemas are useful because Beam sources commonly produce JSON, Avro or database row objects all of which have a well-defined structure. 

Review comment:
       Done !




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] AbhiY98 commented on pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
AbhiY98 commented on pull request #12097:
URL: https://github.com/apache/beam/pull/12097#issuecomment-650045436


   R: @tvalentyn @rezarokni 
   Staged: http://apache-beam-website-pull-requests.storage.googleapis.com/12097/index.html


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] tvalentyn commented on a change in pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
tvalentyn commented on a change in pull request #12097:
URL: https://github.com/apache/beam/pull/12097#discussion_r446327362



##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 

Review comment:
       How about "The samples on this page describe common patterns using Schemas."
   
   

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 

Review comment:
       A Schema is a way to represent records with a fixed structure. Schemas are useful because Beam sources commonly produce JSON, Avro or database row objects all of which have a well-defined structure.

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 
+For more information, see the [programming guide section on Schemas](/documentation/programming-guide/#what-is-a-schema).
+
+{{< language-switcher java >}}
+
+## Using Joins
+
+Beam supports equijoins on schema `PCollections` of Schemas where the join condition depends on the equality of a subset of fields. 
+
+Consider using Join if you have multiple data sets that provide information about related things and their structure is known.
+
+For example let's say we have two different files with user data: one file has names and email addresses; the other file has names and phone numbers.

Review comment:
       Should we use consistent terminology here: 'collection' or 'dataset' instead of several terms: file, collection, dataset, data set?

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 

Review comment:
       actually, let me add @rosetn @davidwrede who can guide better on the style of Website narrative.

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 

Review comment:
       Might be easier to read if we break up this sentence.

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 
+For more information, see the [programming guide section on Schemas](/documentation/programming-guide/#what-is-a-schema).
+
+{{< language-switcher java >}}
+
+## Using Joins
+
+Beam supports equijoins on schema `PCollections` of Schemas where the join condition depends on the equality of a subset of fields. 
+
+Consider using Join if you have multiple data sets that provide information about related things and their structure is known.

Review comment:
       s/Join/[`Join`](https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/schemas/transforms/Join.html)
   (Adding link + ticks)

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 
+For more information, see the [programming guide section on Schemas](/documentation/programming-guide/#what-is-a-schema).
+
+{{< language-switcher java >}}
+
+## Using Joins
+
+Beam supports equijoins on schema `PCollections` of Schemas where the join condition depends on the equality of a subset of fields. 
+
+Consider using Join if you have multiple data sets that provide information about related things and their structure is known.
+
+For example let's say we have two different files with user data: one file has names and email addresses; the other file has names and phone numbers.
+You can join the two data sets using the name as a common key and the other data as the associated values.
+After the join, you have one dataset that contains all the information (email address and phone numbers) associated with each name.
+
+The following conceptual examples uses two input collections to show the mechanism of Join.

Review comment:
       s/examples/example

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 
+For more information, see the [programming guide section on Schemas](/documentation/programming-guide/#what-is-a-schema).
+
+{{< language-switcher java >}}
+
+## Using Joins
+
+Beam supports equijoins on schema `PCollections` of Schemas where the join condition depends on the equality of a subset of fields. 
+
+Consider using Join if you have multiple data sets that provide information about related things and their structure is known.
+
+For example let's say we have two different files with user data: one file has names and email addresses; the other file has names and phone numbers.
+You can join the two data sets using the name as a common key and the other data as the associated values.
+After the join, you have one dataset that contains all the information (email address and phone numbers) associated with each name.
+
+The following conceptual examples uses two input collections to show the mechanism of Join.
+
+You can define the Schema and the schema `PCollection` and then perform join on the two `PCollections` using a [Join](https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/schemas/transforms/Join.html). 

Review comment:
       How about: 'We define PCollections, their  schemas  and then perform..."

##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,49 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page show you common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure, they are useful as common beam sources produce JSON, Avro or database row objects all of which have a well defined structure. 
+For more information, see the [programming guide section on Schemas](/documentation/programming-guide/#what-is-a-schema).
+
+{{< language-switcher java >}}
+
+## Using Joins
+
+Beam supports equijoins on schema `PCollections` of Schemas where the join condition depends on the equality of a subset of fields. 
+
+Consider using Join if you have multiple data sets that provide information about related things and their structure is known.
+
+For example let's say we have two different files with user data: one file has names and email addresses; the other file has names and phone numbers.
+You can join the two data sets using the name as a common key and the other data as the associated values.
+After the join, you have one dataset that contains all the information (email address and phone numbers) associated with each name.
+
+The following conceptual examples uses two input collections to show the mechanism of Join.
+
+You can define the Schema and the schema `PCollection` and then perform join on the two `PCollections` using a [Join](https://beam.apache.org/releases/javadoc/2.21.0/org/apache/beam/sdk/schemas/transforms/Join.html). 

Review comment:
       Nit: let's add ticks around Join since it refers to a code statement `Join`. 




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rezarokni commented on a change in pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
rezarokni commented on a change in pull request #12097:
URL: https://github.com/apache/beam/pull/12097#discussion_r446763798



##########
File path: website/www/site/content/en/documentation/patterns/schema.md
##########
@@ -0,0 +1,56 @@
+---
+title: "Schema Patterns"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Schema Patterns
+
+The samples on this page describe common patterns using Schemas. 
+A Schema is a way to represent records with a fixed structure. Schemas are useful because Beam sources commonly produce JSON, Avro or database row objects all of which have a well-defined structure. 

Review comment:
       Maybe just borrow the text from:
   https://beam.apache.org/documentation/programming-guide/#what-is-a-schema
   
   Schemas provide us a type-system for Beam records that is independent of any specific programming-language type. There might be multiple Java classes that all have the same schema (for example a Protocol-Buffer class or a POJO class), and Beam will allow us to seamlessly convert between these types. Schemas also provide a simple way to reason about types across different programming-language APIs.

##########
File path: examples/java/src/test/java/org/apache/beam/examples/snippets/SnippetsTest.java
##########
@@ -154,6 +156,73 @@ public void testCoGroupByKeyTuple() throws IOException {
     p.run();
   }
 
+  /* Tests SchemaJoinPattern */
+  @Test
+  public void testSchemaJoinPattern() {
+    // [START SchemaJoinPatternCreate]
+    // Define Schemas
+    Schema emailSchema =
+        Schema.of(
+            Schema.Field.of("name", Schema.FieldType.STRING),
+            Schema.Field.of("email", Schema.FieldType.STRING));
+
+    Schema phoneSchema =
+        Schema.of(
+            Schema.Field.of("name", Schema.FieldType.STRING),
+            Schema.Field.of("phone", Schema.FieldType.STRING));
+
+    // Create User Data Collections
+    final List<Row> emailUsers =
+        Arrays.asList(
+            Row.withSchema(emailSchema).addValue("person1").addValue("person1@example.com").build(),
+            Row.withSchema(emailSchema).addValue("person2").addValue("person2@example.com").build(),
+            Row.withSchema(emailSchema).addValue("person3").addValue("person3@example.com").build(),
+            Row.withSchema(emailSchema)
+                .addValue("person4")
+                .addValue("person4@example.com")
+                .build());
+
+    final List<Row> phoneUsers =
+        Arrays.asList(
+            Row.withSchema(phoneSchema).addValue("person1").addValue("111-222-3333").build(),
+            Row.withSchema(phoneSchema).addValue("person2").addValue("222-333-4444").build(),
+            Row.withSchema(phoneSchema).addValue("person3").addValue("444-333-4444").build(),
+            Row.withSchema(phoneSchema).addValue("person4").addValue("555-333-4444").build());
+
+    // [END SchemaJoinPatternCreate]
+
+    PCollection<String> actualFormattedResult =
+        Snippets.SchemaJoinPattern.main(p, emailUsers, phoneUsers, emailSchema, phoneSchema);

Review comment:
       As this is example snippet code, consider if it would be easier for the reader to have the code be inlined here rather than be abstracted in a class. 

##########
File path: examples/java/src/test/java/org/apache/beam/examples/snippets/SnippetsTest.java
##########
@@ -154,6 +156,73 @@ public void testCoGroupByKeyTuple() throws IOException {
     p.run();
   }
 
+  /* Tests SchemaJoinPattern */
+  @Test
+  public void testSchemaJoinPattern() {
+    // [START SchemaJoinPatternCreate]
+    // Define Schemas
+    Schema emailSchema =
+        Schema.of(
+            Schema.Field.of("name", Schema.FieldType.STRING),
+            Schema.Field.of("email", Schema.FieldType.STRING));
+
+    Schema phoneSchema =
+        Schema.of(
+            Schema.Field.of("name", Schema.FieldType.STRING),
+            Schema.Field.of("phone", Schema.FieldType.STRING));
+
+    // Create User Data Collections
+    final List<Row> emailUsers =

Review comment:
       Consider if adding a non-match would help the sample. For example a missing id from emailUsers or the other. So that the reader is clear on the output when no match occurs.

##########
File path: examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java
##########
@@ -914,4 +917,71 @@ public void process(ProcessContext c) {
       return result;
     }
   }
+
+  public static class SchemaJoinPattern {
+    public static PCollection<String> main(
+        Pipeline p,
+        final List<Row> emailUsers,
+        final List<Row> phoneUsers,
+        Schema emailSchema,
+        Schema phoneSchema) {
+      // [START SchemaJoinPatternJoin]
+      // Create/Read Schema PCollections
+      PCollection<Row> emailList =
+          p.apply("CreateEmails", Create.of(emailUsers).withRowSchema(emailSchema));
+
+      PCollection<Row> phoneList =
+          p.apply("CreatePhones", Create.of(phoneUsers).withRowSchema(phoneSchema));
+
+      // Perform Join
+      PCollection<Row> resultRow =
+          emailList.apply("Apply Join", Join.<Row, Row>innerJoin(phoneList).using("name"));
+
+      // Preview Result
+      resultRow.apply(
+          "Preview Result",
+          MapElements.into(TypeDescriptors.strings())
+              .via(
+                  x -> {
+                    System.out.println(x);
+                    return "";
+                  }));
+
+      /* Sample Output From the pipeline:

Review comment:
       As this class is abstracted, the sample output can be difficult to tie back as the data is not in this snippet but later on. Consider if inline will be easier to follow for the example.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] abhiy13 edited a comment on pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
abhiy13 edited a comment on pull request #12097:
URL: https://github.com/apache/beam/pull/12097#issuecomment-650045436


   R: @tvalentyn @rezarokni 
   Staged: http://apache-beam-website-pull-requests.storage.googleapis.com/12097/index.html
   Please have a look


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rezarokni commented on a change in pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
rezarokni commented on a change in pull request #12097:
URL: https://github.com/apache/beam/pull/12097#discussion_r446765181



##########
File path: examples/java/src/test/java/org/apache/beam/examples/snippets/SnippetsTest.java
##########
@@ -154,6 +156,73 @@ public void testCoGroupByKeyTuple() throws IOException {
     p.run();
   }
 
+  /* Tests SchemaJoinPattern */
+  @Test
+  public void testSchemaJoinPattern() {
+    // [START SchemaJoinPatternCreate]
+    // Define Schemas
+    Schema emailSchema =
+        Schema.of(
+            Schema.Field.of("name", Schema.FieldType.STRING),
+            Schema.Field.of("email", Schema.FieldType.STRING));
+
+    Schema phoneSchema =
+        Schema.of(
+            Schema.Field.of("name", Schema.FieldType.STRING),
+            Schema.Field.of("phone", Schema.FieldType.STRING));
+
+    // Create User Data Collections
+    final List<Row> emailUsers =

Review comment:
       Consider if adding a non-match would help the sample. For example a missing id from emailUsers or the other. So that the reader is clear on the output when no match occurs.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] abhiy13 commented on pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
abhiy13 commented on pull request #12097:
URL: https://github.com/apache/beam/pull/12097#issuecomment-651547875


   PTAL @rezarokni @tvalentyn 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [beam] rezarokni commented on a change in pull request #12097: [BEAM-10327] Create a pattern that shows use of Schema using Joins

Posted by GitBox <gi...@apache.org>.
rezarokni commented on a change in pull request #12097:
URL: https://github.com/apache/beam/pull/12097#discussion_r446764890



##########
File path: examples/java/src/main/java/org/apache/beam/examples/snippets/Snippets.java
##########
@@ -914,4 +917,71 @@ public void process(ProcessContext c) {
       return result;
     }
   }
+
+  public static class SchemaJoinPattern {
+    public static PCollection<String> main(
+        Pipeline p,
+        final List<Row> emailUsers,
+        final List<Row> phoneUsers,
+        Schema emailSchema,
+        Schema phoneSchema) {
+      // [START SchemaJoinPatternJoin]
+      // Create/Read Schema PCollections
+      PCollection<Row> emailList =
+          p.apply("CreateEmails", Create.of(emailUsers).withRowSchema(emailSchema));
+
+      PCollection<Row> phoneList =
+          p.apply("CreatePhones", Create.of(phoneUsers).withRowSchema(phoneSchema));
+
+      // Perform Join
+      PCollection<Row> resultRow =
+          emailList.apply("Apply Join", Join.<Row, Row>innerJoin(phoneList).using("name"));
+
+      // Preview Result
+      resultRow.apply(
+          "Preview Result",
+          MapElements.into(TypeDescriptors.strings())
+              .via(
+                  x -> {
+                    System.out.println(x);
+                    return "";
+                  }));
+
+      /* Sample Output From the pipeline:

Review comment:
       As this class is abstracted, the sample output can be difficult to tie back as the data is not in this snippet but later on. Consider if inline will be easier to follow for the example.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org