You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@beam.apache.org by GitBox <gi...@apache.org> on 2022/11/09 01:44:35 UTC

[GitHub] [beam] lostluck commented on pull request #23285: Golang SpannerIO Implementation

lostluck commented on PR #23285:
URL: https://github.com/apache/beam/pull/23285#issuecomment-1308082338

   My recommendation here is to not worry about a full on integration test for now. If you've run this on a portable runner (like Flink or Cloud Dataflow), and it works, that's sufficient E2E verification for now for example.
   
   I'd much prefer a robust in-memory unit test of the ProcessLogic though, and that we can do by refactoring the logic to be more unit testable, in separate stand alone functions. The pipeline construction logic is a bit hairy to test outside of running pipelines, or writing elaborate pre-set up (creating a spanner instance, and tearing it down, etc). Trying to connect to a test database is also tricky because typically it requires everything to be in memory, or to have something arbitrary distributed runners can connect to (which won't be true on Dataflow, for example).
   
   The "simple" way to test this is to migrate the DoFn code that calls the client into testable functions, and pass those functions the client. This allows unit testing the important logic of calling the spanner APIs, if not the beam specific logic of setting up the client.
   
   This is what we do to test the pubsubx "helper" logic for example: https://github.com/apache/beam/blob/3c5ea0dfd500c2f6b97eaaf4e39612c406afc9f5/sdks/go/pkg/beam/util/pubsubx/pubsub_test.go which is used to set up some of the streaming examples against pubsub.
   
   So, we can set some things up, like `query(ctx context.Context, client spanner.Client, query string, rt reflect.Type, emit func(beam.X))`
   
   Then you can write a test that sets up the the spannertest client with the data, and passes in a closure like `func(v beam.X) { values = append(values, v) }`, allowing us to check all the values, and then that will validate most of that logic. 
   
   And you can validate you're getting the expected types out too. And similarly for the writing variant.
   
   The spannertest has code you can borrow for setting up the client data properly: https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L127
   and https://github.com/googleapis/google-cloud-go/blob/main/spanner/spannertest/integration_test.go#L212
   
   I haven't got a great advice for this in general, as most effort for IOs is hard focused on Java, and outside of simple style guidelines, there's not much in the way of "This is the comprehensive way to write a testable IO" that I could translate from Java to Go for you. And it's not something I feel I should "wing".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@beam.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org