You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/18 22:28:06 UTC

[GitHub] [beam] alxp1982 commented on a diff in pull request #24488: add schema-based trnasforms

alxp1982 commented on code in PR #24488:
URL: https://github.com/apache/beam/pull/24488#discussion_r1054009338


##########
learning/tour-of-beam/learning-content/java/schema-based-transforms/schema-concept/creating-schema/description.md:
##########
@@ -0,0 +1,153 @@
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+
+# Overview
+
+Most structured records share some common characteristics:
+
+→  They can be subdivided into separate named fields. Fields usually have string names, but sometimes - as in the case of indexed tuples - have numerical indices instead.
+
+→  There is a confined list of primitive types that a field can have. These often match primitive types in most programming languages: int, long, string, etc.
+
+→  Often a field type can be marked as optional (sometimes referred to as nullable) or required.
+
+Often records have a nested structure. A nested structure occurs when a field itself has subfields so the type of the field itself has a schema. Fields that are array or map types is also a common feature of these structured records.
+
+For example, consider the following schema, representing actions in a fictitious e-commerce company:
+
+**Purchase**
+
+```
+Field Name              Field Type
+userId                  STRING
+itemId                  INT64
+shippingAddress         ROW(ShippingAddress)
+cost                    INT64
+transactions            ARRAY[ROW(Transaction)]
+```
+
+**ShippingAddress**
+
+```
+Field Name              Field Type
+streetAddress           STRING
+city                    STRING
+state                   nullable STRING
+country                 STRING
+postCode                STRING
+```
+
+**Transaction**
+
+```
+Field Name              Field Type
+bank                    STRING
+purchaseAmount          DOUBLE
+```
+
+Schemas provide us a type-system for Beam records that is independent of any specific programming-language type. There might be multiple Java classes that all have the same schema (for example a Protocol-Buffer class or a POJO class), and Beam will allow us to seamlessly convert between these types. Schemas also provide a simple way to reason about types across different programming-language APIs.
+
+A `PCollection` with a schema does not need to have a `Coder` specified, as Beam knows how to encode and decode Schema rows; Beam uses a special coder to encode schema types.
+
+### Creating Schemas
+
+While schemas themselves are language independent, they are designed to embed naturally into the programming languages of the Beam SDK being used. This allows Beam users to continue using native types while reaping the advantage of having Beam understand their element schemas.
+
+In Java you could use the following set of classes to represent the purchase schema. Beam will automatically infer the correct schema based on the members of the class.
+
+#### Java POJOs
+
+A `POJO` (Plain Old Java Object) is a Java object that is not bound by any restriction other than the Java Language Specification. A `POJO` can contain member variables that are primitives, that are other POJOs, or are collections maps or arrays thereof. `POJO`s do not have to extend prespecified classes or extend any specific interfaces.
+
+If a `POJO` class is annotated with `@DefaultSchema(JavaFieldSchema.class)`, Beam will automatically infer a schema for this class. Nested classes are supported as are classes with List, array, and Map fields.

Review Comment:
   If a `POJO` class is annotated with `@DefaultSchema(JavaFieldSchema.class)`, Beam will automatically infer a schema for this class. Nested classes are supported, as are List, array, and Map fields.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org