You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/05/27 19:51:02 UTC

[GitHub] [beam] garrettjonesgoogle commented on a change in pull request #11802: [BEAM-9916] Update I/O documentation links and create more complete I/O matrix

garrettjonesgoogle commented on a change in pull request #11802:
URL: https://github.com/apache/beam/pull/11802#discussion_r431403908



##########
File path: website/www/site/data/io_matrix.yaml
##########
@@ -0,0 +1,377 @@
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+categories:
+  - name: File-based
+    description: These I/O connectors involve working with files.
+    rows:
+      - transform: FileIO
+        description: "General-purpose transforms for working with files: listing files (matching), reading and writing."
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.FileIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/FileIO.html
+          - language: py
+            name: apache_beam.io.FileIO
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.fileio.html
+      - transform: AvroIO
+        description: PTransforms for reading from and writing to [Avro](https://avro.apache.org/) files.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.AvroIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/AvroIO.html
+          - language: py
+            name: apache_beam.io.avroio
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.avroio.html
+          - language: go
+            name: github.com/apache/beam/sdks/go/pkg/beam/io/avroio
+            url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/avroio
+      - transform: TextIO
+        description: PTransforms for reading and writing text files.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.TextIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TextIO.html
+          - language: py
+            name: apache_beam.io.textio
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.textio.html
+          - language: go
+            name: github.com/apache/beam/sdks/go/pkg/beam/io/textio
+            url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/textio
+      - transform: TFRecordIO
+        description: PTransforms for reading and writing [TensorFlow TFRecord](https://www.tensorflow.org/tutorials/load_data/tfrecord) files.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.TFRecordIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/TFRecordIO.html
+          - language: py
+            name: apache_beam.io.tfrecordio
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.tfrecordio.html
+      - transform: XmlIO
+        description: Transforms for reading and writing XML files using [JAXB](https://www.oracle.com/technical-resources/articles/javase/jaxb.html) mappers.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.xml.XmlIO
+            url: https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/xml/XmlIO.html
+      - transform: TikaIO
+        description: Transforms for parsing arbitrary files using [Apache Tika](https://tika.apache.org/).
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.tika.TikaIO
+            url: https://beam.apache.org/releases/javadoc/2.3.0/org/apache/beam/sdk/io/tika/TikaIO.html
+      - transform: ParquetIO
+        description: IO for reading from and writing to [Parquet](https://parquet.apache.org/) files.
+        docs: /documentation/io/built-in/parquet/
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.parquet.ParquetIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/parquet/ParquetIO.html
+          - language: py
+            name: apache_beam.io.parquetio
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.parquetio.html
+      - transform: ThriftIO
+        description: PTransforms for reading and writing files containing [Thrift](https://thrift.apache.org/)-encoded data.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.thrift.ThriftIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/thrift/ThriftIO.html
+      - transform: VcfIO
+        description: A source for reading from [VCF files](https://samtools.github.io/hts-specs/VCFv4.2.pdf) (version 4.x).
+        implementations:
+          - language: py
+            name: apache_beam.io.vcfio
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.vcfio.html
+      - transform: S3IO
+        description: A source for reading from and writing to [Amazon S3](https://aws.amazon.com/s3/).
+        implementations:
+          - language: py
+            name: apache_beam.io.aws.s3io
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.aws.s3io.html
+      - transform: GcsIO
+        description: A source for reading from and writing to [Google Cloud Storage](https://cloud.google.com/storage).
+        implementations:
+          - language: py
+            name: apache_beam.io.gcp.gcsio
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.gcsio.html
+  - name: FileSystem
+    description: Beam provides a File system interface that defines APIs for writing file systems agnostic code. Several I/O connectors are implemented as a FileSystem implementation.
+    rows:
+      - transform: HadoopFileSystem
+        description: "`FileSystem` implementation for accessing [Hadoop](https://hadoop.apache.org/) Distributed File System files."
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.hdfs.HadoopFileSystemRegistrar
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/hdfs/HadoopFileSystemRegistrar.html
+          - language: py
+            name: apache_beam.io.hadoopfilesystem
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.hadoopfilesystem.html
+      - transform: GcsFileSystem
+        description: "`FileSystem` implementation for [Google Cloud Storage](https://cloud.google.com/storage)."
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.extensions.gcp.storage.GcsFileSystemRegistrar
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/extensions/gcp/storage/GcsFileSystemRegistrar.html
+          - language: py
+            name: apache_beam.io.gcp.gcsfilesystem
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.gcsfilesystem.html
+          - language: go
+            name: github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs
+            url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/gcs
+      - transform: LocalFileSystem
+        description: "`FileSystem` implementation for accessing files on disk."
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.LocalFileSystemRegistrar
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/LocalFileSystemRegistrar.html
+          - language: py
+            name: apache_beam.io.localfilesystem
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.localfilesystem.html
+          - language: go
+            name: github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local
+            url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/local
+      - transform: S3FileSystem
+        description: "`FileSystem` implementation for [Amazon S3](https://aws.amazon.com/s3/)."
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.aws.s3.S3FileSystemRegistrar
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/hdfs/package-summary.html
+      - transform: In-memory
+        description: "`FileSystem` implementation in memory; useful for testing."
+        implementations:
+          - language: go
+            name: github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/memfs
+            url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/filesystem/memfs
+  - name: Messaging
+    description: These I/O connectors typically involve working with unbounded sources that come from messaging sources.
+    rows:
+      - transform: KinesisIO
+        description: PTransforms for reading from and writing to [Kinesis](https://aws.amazon.com/kinesis/) streams.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.kinesis.KinesisIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kinesis/KinesisIO.html
+      - transform: AmqpIO
+        description: AMQP 1.0 protocol using the Apache QPid Proton-J library
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.amqp.AmqpIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/amqp/AmqpIO.html
+      - transform: KafkaIO
+        description: Read and Write PTransforms for [Apache Kafka](https://kafka.apache.org/).
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.kafka.KafkaIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/kafka/KafkaIO.html
+          - language: py
+            name: apache_beam.io.external.kafka
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.external.kafka.html
+      - transform: PubSubIO
+        description: Read and Write PTransforms for [Google Cloud Pub/Sub](https://cloud.google.com/pubsub) streams.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.gcp.pubsub.PubsubIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubIO.html
+          - language: py
+            name: apache_beam.io.gcp.pubsub
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.gcp.pubsub.html
+          - language: py
+            name: apache_beam.io.external.gcp.pubsub
+            url: https://beam.apache.org/releases/pydoc/current/apache_beam.io.external.gcp.pubsub.html
+          - language: go
+            name: github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio
+            url: https://godoc.org/github.com/apache/beam/sdks/go/pkg/beam/io/pubsubio
+      - transform: JmsIO
+        description: An unbounded source for [JMS](https://www.oracle.com/java/technologies/java-message-service.html) destinations (queues or topics).
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.jms.JmsIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/jms/JmsIO.html
+      - transform: MqttIO
+        description: An unbounded source for [MQTT](https://mqtt.org/) broker.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.mqtt.MqttIO
+            url: https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/mqtt/MqttIO.html
+      - transform: RabbitMqIO
+        description: A IO to publish or consume messages with a RabbitMQ broker.
+        implementations:
+          - language: java
+            name: org.apache.beam.sdk.io.rabbitmq.RabbitMqIO
+            url: https://github.com/apache/beam/blob/master/sdks/java/io/rabbitmq/src/main/java/org/apache/beam/sdk/io/rabbitmq/RabbitMqIO.java

Review comment:
       The PR #7197 essentially just switched the exporting of javadocs from an include list to an exclude list, and preserved the bit for all components that existed at that point in time. RabbitMqIO and KuduIO didn't have javadocs exported before my change either. Thus, the explanation for their lack of javadocs would be due to something before me. 
   
   My suspicion is that since the exporting of javadocs was an include list before my change, it was easy to miss components, and those two were probably missed accidentally. 
   




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org