You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/04/12 07:40:00 UTC

[jira] [Work logged] (BEAM-4049) Improve write throughput of CassandraIO

     [ https://issues.apache.org/jira/browse/BEAM-4049?focusedWorklogId=90315&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-90315 ]

ASF GitHub Bot logged work on BEAM-4049:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 12/Apr/18 07:39
            Start Date: 12/Apr/18 07:39
    Worklog Time Spent: 10m 
      Work Description: adejanovski opened a new pull request #5112: [BEAM-4049] Improve CassandraIO write throughput by performing async queries
URL: https://github.com/apache/beam/pull/5112
 
 
   The current design uses synchronous queries which serializes all queries and is not the recommended way to perform writes in Cassandra.
   This commit makes use of asynchronous queries instead and protects the cluster from being overwhelmed by sending no more than 100 queries at the same time.
   It was necessary to exclude the ListenableFuture interface from the guava relocation that the Datastax Java driver uses, but since it's not included in the shaded jar invocation of `saveAsync()` would otherwise fail at runtime. 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

            Worklog Id:     (was: 90315)
            Time Spent: 10m
    Remaining Estimate: 0h

> Improve write throughput of CassandraIO
> ---------------------------------------
>
>                 Key: BEAM-4049
>                 URL: https://issues.apache.org/jira/browse/BEAM-4049
>             Project: Beam
>          Issue Type: Improvement
>          Components: io-java-cassandra
>    Affects Versions: 2.4.0
>            Reporter: Alexander Dejanovski
>            Assignee: Jean-Baptiste Onofré
>            Priority: Major
>              Labels: performance
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> The CassandraIO currently uses the mapper to perform writes in a synchronous fashion. 
> This implies that writes are serialized and is a very suboptimal way of writing to Cassandra.
> The IO should use the saveAsync() method instead of save() and should wait for completion each time 100 queries are in flight, in order to avoid overwhelming clusters.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)