You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/17 23:44:59 UTC

[GitHub] [beam] apilloud commented on a diff in pull request #24962: Add I/O Standards Page

apilloud commented on code in PR #24962:
URL: https://github.com/apache/beam/pull/24962#discussion_r1072943926


##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 1P/3P developers developing an Apache Beam I/O connector. These guidelines aim to create best practices encompassing documentation, development and testing in a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as a **Built-in I/O connector**. Built-in I/O’s have their [integration tests](#integration-tests) and performance tests routinely run by the Google Cloud Dataflow Team using the Dataflow Runner and metrics published publicly for [reference](#dashboard). Otherwise, the following guidelines will apply to both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be made available with an I/O. The Apache Beam documentation referenced throughout this section can be found [here](https://beam.apache.org/documentation/). And generally a good example to follow would be the built-in I/O, [Snowflake I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This should also have links to any external sources of information within the Apache Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a href="https://beam.apache.org/releases/javadoc/current/overview-summary.html">Java doc</a>
+            <li><a href="https://beam.apache.org/releases/pydoc/current/">Python doc</a>
+            <li><a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that covers specific tips and configurations. The following shows those for <a href="https://beam.apache.org/documentation/io/built-in/parquet/">Parquet</a>, <a href="https://beam.apache.org/documentation/io/built-in/hadoop/">Hadoop</a> and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" width="" alt="I/O connector guides screenshot" title="I/O connector guides screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should be consistent throughout such that programmatic information extraction for other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>

Review Comment:
   This link is odd. Why does `Resource scalability` link to `#unit-tests`? Should the rest be links too?



##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 1P/3P developers developing an Apache Beam I/O connector. These guidelines aim to create best practices encompassing documentation, development and testing in a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as a **Built-in I/O connector**. Built-in I/O’s have their [integration tests](#integration-tests) and performance tests routinely run by the Google Cloud Dataflow Team using the Dataflow Runner and metrics published publicly for [reference](#dashboard). Otherwise, the following guidelines will apply to both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be made available with an I/O. The Apache Beam documentation referenced throughout this section can be found [here](https://beam.apache.org/documentation/). And generally a good example to follow would be the built-in I/O, [Snowflake I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).

Review Comment:
   nit: Internal links should be relative. Drop https://beam.apache.org



##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 1P/3P developers developing an Apache Beam I/O connector. These guidelines aim to create best practices encompassing documentation, development and testing in a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as a **Built-in I/O connector**. Built-in I/O’s have their [integration tests](#integration-tests) and performance tests routinely run by the Google Cloud Dataflow Team using the Dataflow Runner and metrics published publicly for [reference](#dashboard). Otherwise, the following guidelines will apply to both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be made available with an I/O. The Apache Beam documentation referenced throughout this section can be found [here](https://beam.apache.org/documentation/). And generally a good example to follow would be the built-in I/O, [Snowflake I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This should also have links to any external sources of information within the Apache Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a href="https://beam.apache.org/releases/javadoc/current/overview-summary.html">Java doc</a>
+            <li><a href="https://beam.apache.org/releases/pydoc/current/">Python doc</a>
+            <li><a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that covers specific tips and configurations. The following shows those for <a href="https://beam.apache.org/documentation/io/built-in/parquet/">Parquet</a>, <a href="https://beam.apache.org/documentation/io/built-in/hadoop/">Hadoop</a> and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" width="" alt="I/O connector guides screenshot" title="I/O connector guides screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should be consistent throughout such that programmatic information extraction for other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Reporting an Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html">JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating <a href="https://2022.beamsummit.org/sessions/relational-beam/">Relational Features</a> supported in their page under <strong>I/O connector guides</strong>.
+         <p>Relational Features are concepts that can help improve efficiency and can optionally be implemented by an I/O Connector. Using end user supplied pipeline configuration (<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/SchemaIO.html">SchemaIO</a>) and user query (<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/FieldAccessDescriptor.html">FieldAccessDescriptor</a>) data, relational theory is applied to derive improvements such as faster pipeline execution, lower operation costs and less data read/written.
+         <p>Example table:
+         <p><img src="/images/io-standards/io-supported-relational-features-table.png" width="" alt="Supported Relational Features" title="Supported Relational Features"></img>

Review Comment:
   Please make this text and not a table. Markdown tables might be a good choice, if not HTML tables work too.



##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1452 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 1P/3P developers developing an Apache Beam I/O connector. These guidelines aim to create best practices encompassing documentation, development and testing in a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as a **Built-in I/O connector**. Built-in I/O’s have their [integration tests](#integration-tests) and performance tests routinely run by the Google Cloud Dataflow Team using the Dataflow Runner and metrics published publicly for [reference](#dashboard). Otherwise, the following guidelines will apply to both unless explicitly stated.
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be made available with an I/O. The Apache Beam documentation referenced throughout this section can be found [here](https://beam.apache.org/documentation/). And generally a good example to follow would be the built-in I/O, [Snowflake I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This should also have links to any external sources of information within the Apache Beam site or external documentation location.
+         <p>Examples:
+         <ul>
+            <li><a href="https://beam.apache.org/releases/javadoc/current/overview-summary.html">Java doc</a>
+            <li><a href="https://beam.apache.org/releases/pydoc/current/">Python doc</a>
+            <li><a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that covers specific tips and configurations. The following shows those for <a href="https://beam.apache.org/documentation/io/built-in/parquet/">Parquet</a>, <a href="https://beam.apache.org/documentation/io/built-in/hadoop/">Hadoop</a> and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" width="" alt="I/O connector guides screenshot" title="I/O connector guides screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc should be consistent throughout such that programmatic information extraction for other pages can be enabled in the future.
+         <p>Example <strong>subset</strong> of sections to include in your page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Reporting an Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html">JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating <a href="https://2022.beamsummit.org/sessions/relational-beam/">Relational Features</a> supported in their page under <strong>I/O connector guides</strong>.
+         <p>Relational Features are concepts that can help improve efficiency and can optionally be implemented by an I/O Connector. Using end user supplied pipeline configuration (<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/io/SchemaIO.html">SchemaIO</a>) and user query (<a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/FieldAccessDescriptor.html">FieldAccessDescriptor</a>) data, relational theory is applied to derive improvements such as faster pipeline execution, lower operation costs and less data read/written.
+         <p>Example table:
+         <p><img src="/images/io-standards/io-supported-relational-features-table.png" width="" alt="Supported Relational Features" title="Supported Relational Features"></img>
+         <p>Example implementations:
+         <p>BigQueryIO <a href="https://github.com/apache/beam/blob/5bb13fa35b9bc36764895c57f23d3890f0f1b567/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L1813">Column Pruning</a> via ProjectionPushdown to return only necessary columns indicated by an end user's query. This is achieved using BigQuery DirectRead API.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a page under <strong>Common pipeline patterns</strong>, if necessary, outlining common usage patterns involving your I/O.
+         <p><a href="https://beam.apache.org/documentation/patterns/bigqueryio/">https://beam.apache.org/documentation/patterns/bigqueryio/</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Update <strong>I/O Connectors</strong> with your I/O’s information
+         <p>Example:
+         <p><a href="https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors">https://beam.apache.org/documentation/io/connectors/#built-in-io-connectors</a>
+         <p><img src="/images/io-standards/io-supported-via-screenshot.png" width="" alt="alt_text" title="image_tooltip">
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Provide setup steps to use the I/O, under a <strong>Before you start Header</strong>
+         <p>Example:
+         <p><a href="https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start">https://beam.apache.org/documentation/io/built-in/parquet/#before-you-start</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Include a canonical read/write code snippet after the initial description for each supported language. The below example shows Hadoop with examples for Java.
+         <p>Example:
+         <p><a href="https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformatio">https://beam.apache.org/documentation/io/built-in/hadoop/#reading-using-hadoopformation</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps for elements are assigned. This includes batch sources to allow for future I/Os which may provide more useful information than current_time().
+         <p>Example:
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Indicate how timestamps are advanced; for Batch sources this will be marked as n/a in most cases.
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Outline any temporary resources (for example, files) that the connector will create.
+         <p>Example:
+         <p>BigQuery batch loads first create a temp GCS location
+         <p><a href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455">https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java#L455</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Provide, under an <strong>Authentication</strong> subheader, how to acquire partner authorization material to securely access the source/sink.
+         <p>Example:
+         <p><a href="https://beam.apache.org/documentation/io/built-in/snowflake/#authentication">https://beam.apache.org/documentation/io/built-in/snowflake/#authentication</a>
+         <p>Here BigQuery names it permissions but the topic covers similarities
+         <p><a href="https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html">https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.html</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>I/Os should provide links to the Source/Sink documentation within <strong>Before you start Header</strong>
+         <p>Example:
+         <p><a href="https://beam.apache.org/documentation/io/built-in/snowflake/">https://beam.apache.org/documentation/io/built-in/snowflake/</a>
+      </td>
+  </tr>
+  <tr>
+      <td>
+         <p>Indicate if there is native or X-language support in each language with a link to the docs.
+         <p>Example:
+         <p>Kinesis I/O has a native implementation of java and X-language support for python but no support for Golang.
+      </td>
+  </tr>
+  <tr>
+   <td>
+      <p>Indicate known limitations under a <strong>Limitations</strong> header. If the limitation has a tracking issue, please link it inline.
+      <p>Example:
+      <p><a href="https://beam.apache.org/documentation/io/built-in/snowflake/#limitations">https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+   </td>
+  </tr>
+</table>
+</div>
+
+
+
+### I/O (not built-in)
+
+Custom I/Os are not included in the Apache Beam Github repository. Some examples would be [Solace](https://github.com/SolaceProducts/solace-apache-beam)IO.
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-connectors">
+   <tr>
+      <td>
+         <p>Update I/O connectors with your I/O information
+         <p>Example:
+         <p><a href="https://beam.apache.org/documentation/io/connectors/#other-io-connectors-for-apache-beam">https://beam.apache.org/documentation/io/built-in/snowflake/#limitations</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+## Development
+
+This section outlines API Syntax, Semantics and Feature Adoption recommendations for new and existing Apache Beam I/O Connectors.
+
+Development guidelines are written with the following principles in mind:
+
+
+
+* Consistency makes an API easier to learn
+    * If there are multiple ways of doing something, we should strive to be consistent first
+* With a couple minutes of studying documentation, users should be able to pick up most I/O connectors
+* The design of a new I/O should consider the possibility of evolution
+* Transforms should integrate well with other Beam utilities
+
+
+### All SDKs
+
+
+#### Pipeline Configuration / Execution / Streaming / Windowing semantics guidelines
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <th>
+         <p>Topic
+      </th>
+      <th>
+         <p>Semantics
+      </th>
+   </tr>
+   <tr>
+      <td>
+         <p>Pipeline Options
+      </td>
+      <td>
+         <p>An I/O should rarely rely on a PipelineOptions subclass to tune internal parameters.
+         <p>A connector-related pipeline options class should:
+         <ul>
+            <li>Document clearly, for each option, the effect it has and why one may modify it.
+            <li>Option names must be namespaced to avoid collisions
+            <li>Class Name: {Connector}Options
+            <li>Method names: .set{Connector}{Option}, get{Connector}{Option}
+            </li>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Source Windowing
+      </td>
+      <td>
+         <p>A source must return elements in the GlobalWindow unless explicitly parameterized in the API by the user.
+         <p>Allowable Non-global-window patterns:
+         <ul>
+            <li>ReadFromIO(window_by=...)
+            <li>ReadFromIO.IntoFixedWindows(...)
+            <li>ReadFromIO(apply_windowing=True/False) (e.g. <a href="https://beam.apache.org/releases/pydoc/current/apache_beam.transforms.periodicsequence.html#apache_beam.transforms.periodicsequence.PeriodicImpulse">PeriodicImpulse</a>)
+            <li>IO.read().withWindowing(...)
+            <li>IO.read().windowBy(...)
+            <li>IO.read().withFixedWindows(...)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Sink Windowing
+      </td>
+      <td>
+         <p>A sink should be Window agnostic and handle elements sent with any Windowing methodexpect elements to be sent to it in the Global Window, unless explicitly parameterized or expressed in its API.
+         <p>A sink may change the windowing of a PCollection internally however it needs, however, the metadata that it returns as part of its Result object must be:
+         <ul>
+            <li>In the same window, unless explicitly declared in the API
+            <li>With accurate timestamps
+            <li><strong>It may</strong> also return metadata with information about windowing (e.g. a BigQuery job may have a timestamp, but also a window associated with it).
+         </ul>
+         <p>Allowable non-global-window patterns:
+         <ul>
+            <li>WriteToIO(triggering_frequency=...) - e.g. <a href="https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.bigquery.html#apache_beam.io.gcp.bigquery.WriteToBigQuery">WriteToBigQuery</a> (This only sets the windowing within the transform - input data is still in the Global Window).
+            <li>WriteBatchesToIO(...)
+            <li>WriteWindowsToIO(...)
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Throttling
+      </td>
+      <td>
+         <p>A streaming sink (or any transform accessing an external service) may implement throttling of its requests to prevent from overloading the external service.
+         <p>TODO: Beam should expose throttling utilities (<a href="https://github.com/apache/beam/issues/24743">Tracking Issue</a>):
+         <ul>
+            <li>Per-key fixed throttling
+            <li>Adaptive throttling with sink-reported backpressure
+            <li>Ramp-up throttling from a start point
+            </li>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Error handling
+      </td>
+      <td>
+         <p>TODO: <a href="https://github.com/apache/beam/issues/24742">Tracking Issue</a>
+      </td>
+   </tr>
+</table>
+</div>
+
+
+
+### Java
+
+
+#### General
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>The primary class used in working with the connector should be named <strong>{connector}IO</strong>
+         <p>Example:
+         <p>The BigQuery I/O is <strong>org.apache.beam.sdk.io.bigquery.BigQueryIO</strong>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>The class should be placed in the package <strong>org.apache.beam.sdk.io.{connector}</strong>
+         <p>Example:
+         <p>The BigQueryIO belongs in the java package <a href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java">org.apache.beam.sdk.io.bigquery</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>The unit/integration/performance tests should live under the package <strong>org.apache.beam.sdk.io.{connector}.testing</strong>. This will cause the various tests to work with the standard user-facing interfaces of the connector.
+         <p>Unit tests should reside in the same package (i.e. <strong>org.apache.beam.sdk.io.{connector}</strong>), as they may often test internals of the connector.
+         <p>The BigQueryIO belongs in the java package <a href="https://github.com/apache/beam/blob/master/sdks/java/io/google-cloud-platform/src/main/java/org/apache/beam/sdk/io/gcp/bigquery/BigQueryIO.java">org.apache.beam.sdk.io.bigquery</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>An I/O transform should avoid receiving user lambdas to map elements from a user type to a connector-specific type. Instead, they should interface with a connector-specific data type (with schema information when possible).
+         <p>When necessary, then an I/O transform should receive a type parameter that specifies the input type (for sinks) or output type (for sources) of the transform.
+         <p>An I/O transform may not have a type parameter <strong>only if it is certain that its output type will not change</strong> (e.g. <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.MatchAll.html">FileIO.MatchAll</a> and other <a href="https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html">FileIO transforms</a>).
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>As part of the API of an I/O, it is highly discouraged to directly expose third-party libraries in the public API of a Beam API or connector.
+         <ul>
+            <li>It reduces Apache Beam’s compatibility guarantees - Changes to third-party libraries can/will directly break existing user’s pipelines.
+            <li>It makes code maintainability hard - If libraries are directly exposed at API level, a dependency change will require multiple changes throughout the I/O implementation code
+            <li>It forces third-party dependencies onto end users
+            </li>
+         </ul>
+         <p>Instead, we highly recommend exposing Beam-native interfaces and an adaptor be implemented to translate.
+         <p>If you believe that the library in question is extremely static in nature. Please note it in the I/O itself.
+         <p>As part of the API of an I/O, it is <strong>highly discouraged</strong> to expose third-party libraries in the public API of a Beam API or connector,. Instead, a Beam-native interface should be used and adapted into the third-library object.

Review Comment:
   This repeats the same thing again.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org