You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@beam.apache.org by Damon Douglas via dev <de...@beam.apache.org> on 2022/11/16 19:00:04 UTC

Pub/Sub Client | Add Java getSchema and Go GetSchema methods

Hello Everyone,

*For those new to Beam, even if this is your first day, consider yourselves
a welcome contributor to this conversation.  Below are
definitions/references and a suggested learning path to understand this
email.*

Proposal | Java

For Java, I would like to add the following to PubsubClient [1].

import org.apache.beam.sdk.schemas.Schema;


public Schema getSchema(SchemaPath schemaPath) throws IOException;

public static class SchemaPath { /* Supports the
projects/<project>/schemas/<schema> resource path[2]. */ }


Additionally, I would like to propose two static helper methods to support
the PubsubGrpcClient [3], and PubsubJsonClient [4].

static Schema fromPubsubSchema(com.google.api.services.pubsub.model.Schema
pubsubSchema) { /* Converts Pub/Sub model Schema to Beam Schema; for use by
PubsubJsonClient. */ }


static Schema fromPubsubSchema(com.google.pubsub.v1.Schema pubsubSchema) {
/* Converts Pub/Sub model Schema to Beam Schema; for use by
PubsubGrpcClient. */ }


Finally, to support tests, I would like to add this new Schema feature
in PubsubTestClient's [5] private State class.

import org.apache.beam.sdk.schemas.Schema;

private static class State {

/** Expected Pub/Sub mapped Beam Schema. */
@Nullable Schema schema;

}

Proposal | Go

For Go, I would like to add the following to pubsubx [6].

func GetSchema(ctx context.Context, client *pubsub.SchemaClient, schemaId
string) (*pubsub.SchemaConfig, error) { ... }

func EncodeSchema(dst reflect.Type, src *pubsub.SchemaConfig) ([]byte,
error) { ... }


Rationale

Querying from and converting the Pub/Sub Schema [7] to a Beam Schema[8]
would allow us to validate that both schemas match to prevent potential
errors.  This supports the design goals of Pub/Sub schemas to facilitate a
contract between publisher and subscriber and facilitate a single source of
truth for inter-team production and consumption.  This feature surfaced
while implementing work related to Pub/Sub and Beam Schemas whose detail is
excluded from this email.

Definitions/References

[1] PubsubClient: An (abstract) helper class for talking to Pubsub via an
underlying transport.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.html

[2] *Resource Path*: Google Cloud resource naming adheres to the Google API
design guide using a '/' delimited pattern.
https://cloud.google.com/apis/design/resource_names

[3] PubsubGrpcClient: An implementation of PubsubClient [1] using gRPC.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.html

[4] *PubsubJsonClient*: An implementation of PubsubClient [1] using JSON
transport.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.html

[5] PubsubTestClient: A partial implementation of PubsubClient [1] for use
by unit tests.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.html

[6] pubsubx: The Beam Go SDK's package contains utilities for working with
Pub/Sub.
https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0/go/pkg/beam/util/pubsubx

[7] *Pub/Sub Schema*: A format to which Pub/Sub data must adhere.  It
facilitates a contract between publisher and subscriber that Pub/Sub will
enforce.
https://cloud.google.com/pubsub/docs/schemas

[8] *Beam Schema*:  An object that describes Beam data elements such as
field names and their data types.
https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html

Suggested Learning Path To Understand This Email

1. *What is Pub/Sub?* -
https://www.youtube.com/playlist?list=PLIivdWyY5sqKwVLe4BLJ-vlh9r9zCdOse
2. *What is Apache Beam?* -
https://www.youtube.com/watch?v=65lmwL7rSy4&t=223s
3. *Apache Beam Overview* -
https://beam.apache.org/documentation/programming-guide/#overview
4. *Transforms (Up to section 4.1)* -
https://beam.apache.org/documentation/programming-guide/#transforms
5. *Pipeline I/O* -
https://beam.apache.org/documentation/programming-guide/#pipeline-io
6. Schemas -
https://beam.apache.org/documentation/programming-guide/#schemas

Re: Pub/Sub Client | Add Java getSchema and Go GetSchema methods

Posted by Robert Burke <ro...@frantil.com>.
+1 to the Go side proposal. That helper package is the right place for that.

Note that Beam Schemas in the Go SDK are vanilla Go structs, and the intent
is that users don't need to interact with the Beam protos at all.

On Wed, Nov 16, 2022, 11:00 AM Damon Douglas via dev <de...@beam.apache.org>
wrote:

> Hello Everyone,
>
> *For those new to Beam, even if this is your first day, consider
> yourselves a welcome contributor to this conversation.  Below are
> definitions/references and a suggested learning path to understand this
> email.*
>
> Proposal | Java
>
> For Java, I would like to add the following to PubsubClient [1].
>
> import org.apache.beam.sdk.schemas.Schema;
>
>
> public Schema getSchema(SchemaPath schemaPath) throws IOException;
>
> public static class SchemaPath { /* Supports the
> projects/<project>/schemas/<schema> resource path[2]. */ }
>
>
> Additionally, I would like to propose two static helper methods to support
> the PubsubGrpcClient [3], and PubsubJsonClient [4].
>
> static Schema fromPubsubSchema(com.google.api.services.pubsub.model.Schema
> pubsubSchema) { /* Converts Pub/Sub model Schema to Beam Schema; for use by
> PubsubJsonClient. */ }
>
>
> static Schema fromPubsubSchema(com.google.pubsub.v1.Schema pubsubSchema) {
> /* Converts Pub/Sub model Schema to Beam Schema; for use by
> PubsubGrpcClient. */ }
>
>
> Finally, to support tests, I would like to add this new Schema feature
> in PubsubTestClient's [5] private State class.
>
> import org.apache.beam.sdk.schemas.Schema;
>
> private static class State {
>
> /** Expected Pub/Sub mapped Beam Schema. */
> @Nullable Schema schema;
>
> }
>
> Proposal | Go
>
> For Go, I would like to add the following to pubsubx [6].
>
> func GetSchema(ctx context.Context, client *pubsub.SchemaClient, schemaId
> string) (*pubsub.SchemaConfig, error) { ... }
>
> func EncodeSchema(dst reflect.Type, src *pubsub.SchemaConfig) ([]byte,
> error) { ... }
>
>
> Rationale
>
> Querying from and converting the Pub/Sub Schema [7] to a Beam Schema[8]
> would allow us to validate that both schemas match to prevent potential
> errors.  This supports the design goals of Pub/Sub schemas to facilitate a
> contract between publisher and subscriber and facilitate a single source of
> truth for inter-team production and consumption.  This feature surfaced
> while implementing work related to Pub/Sub and Beam Schemas whose detail is
> excluded from this email.
>
> Definitions/References
>
> [1] PubsubClient: An (abstract) helper class for talking to Pubsub via an
> underlying transport.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubClient.html
>
> [2] *Resource Path*: Google Cloud resource naming adheres to the Google
> API design guide using a '/' delimited pattern.
> https://cloud.google.com/apis/design/resource_names
>
> [3] PubsubGrpcClient: An implementation of PubsubClient [1] using gRPC.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubGrpcClient.html
>
> [4] *PubsubJsonClient*: An implementation of PubsubClient [1] using JSON
> transport.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubJsonClient.html
>
> [5] PubsubTestClient: A partial implementation of PubsubClient [1] for use
> by unit tests.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/io/gcp/pubsub/PubsubTestClient.html
>
> [6] pubsubx: The Beam Go SDK's package contains utilities for working with
> Pub/Sub.
>
> https://pkg.go.dev/github.com/apache/beam/sdks/v2@v2.42.0/go/pkg/beam/util/pubsubx
>
> [7] *Pub/Sub Schema*: A format to which Pub/Sub data must adhere.  It
> facilitates a contract between publisher and subscriber that Pub/Sub will
> enforce.
> https://cloud.google.com/pubsub/docs/schemas
>
> [8] *Beam Schema*:  An object that describes Beam data elements such as
> field names and their data types.
>
> https://beam.apache.org/releases/javadoc/current/org/apache/beam/sdk/schemas/Schema.html
>
> Suggested Learning Path To Understand This Email
>
> 1. *What is Pub/Sub?* -
> https://www.youtube.com/playlist?list=PLIivdWyY5sqKwVLe4BLJ-vlh9r9zCdOse
> 2. *What is Apache Beam?* -
> https://www.youtube.com/watch?v=65lmwL7rSy4&t=223s
> 3. *Apache Beam Overview* -
> https://beam.apache.org/documentation/programming-guide/#overview
> 4. *Transforms (Up to section 4.1)* -
> https://beam.apache.org/documentation/programming-guide/#transforms
> 5. *Pipeline I/O* -
> https://beam.apache.org/documentation/programming-guide/#pipeline-io
> 6. Schemas -
> https://beam.apache.org/documentation/programming-guide/#schemas
>