You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Anna Smith (JIRA)" <ji...@apache.org> on 2017/12/06 20:26:00 UTC
[jira] [Created] (BEAM-3311) Extend BigTableIO to write Iterable of
KV
Anna Smith created BEAM-3311:
--------------------------------
Summary: Extend BigTableIO to write Iterable of KV
Key: BEAM-3311
URL: https://issues.apache.org/jira/browse/BEAM-3311
Project: Beam
Issue Type: Improvement
Components: sdk-java-gcp
Affects Versions: 2.2.0
Reporter: Anna Smith
Assignee: Chamikara Jayalath
The motivation is to achieve qps as advertised in BigTable in Dataflow streaming mode (ex: 300k qps for 30 node cluster). Currently we aren't seeing this as the bundle size is small in streaming mode and the requests are overwhelmed by AuthentiationHeader. For example, in order to achieve qps advertised each payload is recommended to be ~1KB but without batching each payload is 7KB, the majority of which is the authentication header.
Currently BigTableIO supports DoFn<KV<ByteString, Iterable<Mutation>>,...> where batching is done per Bundle on flush in finishBundle. We would like to be able to manually batch using a DoFn<Iterable<KV<ByteString, Iterable<Mutation>>>,...> so we can get around the small Bundle size in streaming. We have seen some improvements in qps to BigTable when running with Dataflow using this approach.
Initial thoughts on implementation would be to extend Write in order to have a BulkWrite of Iterable<KV<ByteString, Iterable<Mutation>>>.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)