You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2023/01/17 10:30:08 UTC

[GitHub] [beam] hermanmak commented on a diff in pull request #24962: Add I/O Standards Page

hermanmak commented on code in PR #24962:
URL: https://github.com/apache/beam/pull/24962#discussion_r1072035340


##########
website/www/site/content/en/documentation/io/io-standards.md:
##########
@@ -0,0 +1,1465 @@
+---
+title: "IO Standards"
+---
+<!--
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.
+-->
+# I/O Standards
+
+## Overview
+
+This Apache Beam I/O Standards document lays out the prescriptive guidance for 1P/3P developers developing an Apache Beam I/O connector. These guidelines aim to create best practices encompassing documentation, development and testing in a simple and concise manner.
+
+
+### What are built-in I/O Connectors?
+
+An I/O connector (I/O) living in the Apache Beam Github repository is known as a **Built-in I/O connector**. Built-in I/O’s have their [integration tests](#integration-tests) and performance tests routinely run by the Google Cloud Dataflow Team using the Dataflow Runner and metrics published publicly for [reference](#dashboard). Otherwise, the following guidelines will apply to both unless explicitly stated. 
+
+
+# Guidance
+
+
+## Documentation
+
+This section lays out the superset of all documentation that is expected to be made available with an I/O. The Apache Beam documentation referenced throughout this section can be found [here](https://beam.apache.org/documentation/). And generally a good example to follow would be the built-in I/O, [Snowflake I/O](https://beam.apache.org/documentation/io/built-in/snowflake/).
+
+
+### Built-in I/O
+
+<div class="table-container-wrapper">
+<table class="table table-bordered table-io-standards">
+   <tr>
+      <td>
+         <p>Provided code docs for the relevant language of the I/O. This should also have links to any external sources of information within the Apache Beam site or external documentation location. 
+         <p>Examples:
+         <ul>
+            <li><a href="https://beam.apache.org/releases/javadoc/current/overview-summary.html">Java doc</a>
+            <li><a href="https://beam.apache.org/releases/pydoc/current/">Python doc</a>
+            <li><a href="https://pkg.go.dev/github.com/apache/beam/sdks/v2/go/pkg/beam">Go doc</a>
+         </ul>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Add a new page under <strong>I/O connector guides</strong> that covers specific tips and configurations. The following shows those for <a href="https://beam.apache.org/documentation/io/built-in/parquet/">Parquet</a>, <a href="https://beam.apache.org/documentation/io/built-in/hadoop/">Hadoop</a> and others.
+         <p>Examples:
+         <p><img src="/images/io-standards/io-connector-guides-screenshot.png" width="" alt="I/O connector guides screenshot" title="I/O connector guides screenshot"></img>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>Formatting of the section headers in your Javadoc/Pythondoc<strong> </strong>should be consistent throughout such that programmatic information extraction for other pages can be enabled in the future.
+         <p>In the future we may want to use the documents as a source of information for programmatic information extraction for other pages. The position and naming of headers should be consistent to allow this to be enabled in the future. (Take Resource Scalability contents and transfer it to Java/Python docs).
+         <p>Example <strong>subset</strong> of sections to include in your page in order:
+         <ol>
+            <li>Before you start
+            <li>{Connector}IO basics
+            <li>Supported Features
+               <ol>
+                  <li>Relational
+                  </li>
+               </ol>
+            <li>Authentication
+            <li>Reading from {Connector}
+            <li>Writing to {Connector}
+            <li><a href="#unit-tests">Resource scalability</a>
+            <li>Limitations
+            <li>Report and Issue
+            </li>
+         </ol>
+         <p>Example:
+         <p>The KafkaIO <a href="https://beam.apache.org/releases/javadoc/2.1.0/org/apache/beam/sdk/io/kafka/KafkaIO.html">output JavaDoc</a>
+      </td>
+   </tr>
+   <tr>
+      <td>
+         <p>I/O Connectors should include a note indicating the Relational Features supported in their page under <strong>I/O connector guides</strong>.
+         <p>Relational Features are efficiency concepts that can be implemented by an I/O Connector. Using end user supplied pipeline configuration (SchemaIO) and user query (FieldAccessDescriptor) data, apply relational theory to derive improvements. Results in faster pipeline execution, lower operation cost, less data read/written.

Review Comment:
   Apologies Mortiz, it was not my intention to overlook your comments. Let me re-evaluate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org