You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "Solomon Duskis (JIRA)" <ji...@apache.org> on 2018/01/25 21:41:00 UTC

[jira] [Closed] (BEAM-3311) Extend BigTableIO to write Iterable of KV

     [ https://issues.apache.org/jira/browse/BEAM-3311?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Solomon Duskis closed BEAM-3311.
--------------------------------
       Resolution: Won't Fix
    Fix Version/s: Not applicable

Use Flatten.iterable() instead of duplicating that functionality in BigtableIO.

> Extend BigTableIO to write Iterable of KV 
> ------------------------------------------
>
>                 Key: BEAM-3311
>                 URL: https://issues.apache.org/jira/browse/BEAM-3311
>             Project: Beam
>          Issue Type: Improvement
>          Components: sdk-java-gcp
>    Affects Versions: 2.2.0
>            Reporter: Anna Smith
>            Assignee: Solomon Duskis
>            Priority: Major
>             Fix For: Not applicable
>
>
> The motivation is to achieve qps as advertised in BigTable in Dataflow streaming mode (ex: 300k qps for 30 node cluster).  Currently we aren't seeing this as the bundle size is small in streaming mode and the requests are overwhelmed by AuthentiationHeader.  For example, in order to achieve qps advertised each payload is recommended to be ~1KB but without batching each payload is 7KB, the majority of which is the authentication header.
> Currently BigTableIO supports DoFn<KV<ByteString, Iterable<Mutation>>,...> where batching is done per Bundle on flush in finishBundle. We would like to be able to manually batch using a DoFn<Iterable<KV<ByteString, Iterable<Mutation>>>,...> so we can get around the small Bundle size in streaming.  We have seen some improvements in qps to BigTable when running with Dataflow using this approach.
> Initial thoughts on implementation would be to extend Write in order to have a BulkWrite of Iterable<KV<ByteString, Iterable<Mutation>>>.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)