You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Anna Smith (JIRA)" <ji...@apache.org> on 2017/12/06 20:26:00 UTC

[jira] [Created] (BEAM-3311) Extend BigTableIO to write Iterable of KV

Anna Smith created BEAM-3311:
--------------------------------

             Summary: Extend BigTableIO to write Iterable of KV 
                 Key: BEAM-3311
                 URL: https://issues.apache.org/jira/browse/BEAM-3311
             Project: Beam
          Issue Type: Improvement
          Components: sdk-java-gcp
    Affects Versions: 2.2.0
            Reporter: Anna Smith
            Assignee: Chamikara Jayalath


The motivation is to achieve qps as advertised in BigTable in Dataflow streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't seeing this as the bundle size is small in streaming mode and the requests are overwhelmed by AuthentiationHeader.  For example, in order to achieve qps advertised each payload is recommended to be ~1KB but without batching each payload is 7KB, the majority of which is the authentication header.

Currently BigTableIO supports DoFn<KV<ByteString, Iterable<Mutation>>,...> where batching is done per Bundle on flush in finishBundle. We would like to be able to manually batch using a DoFn<Iterable<KV<ByteString, Iterable<Mutation>>>,...> so we can get around the small Bundle size in streaming.  We have seen some improvements in qps to BigTable when running with Dataflow using this approach.

Initial thoughts on implementation would be to extend Write in order to have a BulkWrite of Iterable<KV<ByteString, Iterable<Mutation>>>.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)