You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2020/12/28 19:11:25 UTC

[GitHub] [beam] lostluck commented on a change in pull request #13611: [BEAM-9615] Custom Schema Coder Support

lostluck commented on a change in pull request #13611:
URL: https://github.com/apache/beam/pull/13611#discussion_r549456890



##########
File path: sdks/go/pkg/beam/core/graph/coder/row_decoder.go
##########
@@ -0,0 +1,275 @@
+// Licensed to the Apache Software Foundation (ASF) under one or more
+// contributor license agreements.  See the NOTICE file distributed with
+// this work for additional information regarding copyright ownership.
+// The ASF licenses this file to You under the Apache License, Version 2.0
+// (the "License"); you may not use this file except in compliance with
+// the License.  You may obtain a copy of the License at
+//
+//    http://www.apache.org/licenses/LICENSE-2.0
+//
+// Unless required by applicable law or agreed to in writing, software
+// distributed under the License is distributed on an "AS IS" BASIS,
+// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+// See the License for the specific language governing permissions and
+// limitations under the License.
+
+package coder
+
+import (
+	"fmt"
+	"io"
+	"reflect"
+
+	"github.com/apache/beam/sdks/go/pkg/beam/internal/errors"
+)
+
+// RowDecoderBuilder allows one to build Beam Schema row encoders for provided types.
+type RowDecoderBuilder struct {
+	allFuncs   map[reflect.Type]decoderProvider
+	ifaceFuncs []reflect.Type
+}
+
+type decoderProvider = func(reflect.Type) (func(io.Reader) (interface{}, error), error)

Review comment:
       Yes! Great questions.
   Two reasons, interface types, and to generate independent coders.
   
   We want to have the providers since we want the bundles to be independent and be able to create new instances for each one. We need to keep the factory function around, since we don't actually know what types are being used until construction time and these would need to be registered earlier than that.
   
   Independant coders are valuable to avoid locking overhead, and re-doing work at per-element calls, which account for most costs.
   
   The factory allows for interface types since we can register with an interface type, and then pass the type to the factory. We want this approach rather than registering 1 size fits all, to avoid forcing interface coders to have to make per-element decisions based on static type attributes.
   
   Eg. With a fixed type, we know it's structure ahead of time, and could use simple coders only. 
   With a coder factory bound to an interface type, the factory is still given the concrete type used in some pipeline (like a specific protocol buffer), but the factory can generate the coder once, and use it for multiple elements. This avoids looking up coders per element from some global registry which could require locking, which would add overhead.
   
   Further, we can bolster support for interfaces. Eg. A generic interface can be used as an element type, and then the factory can then do per element work if it so chooses, but at the cost of whatever overhead. But in this case we can avoid the per-element locking since we know the coder will be used only in single threaded contexts, so it can cache it's own coders. More expensive than statically structured coders, but it doesn't block the path either way.
   
   Finally, we're accepting interface{} in the register, for these since we may wish to expand the factory set up for optimizations that could avoid additional allocations.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org