You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2019/01/02 16:26:00 UTC

[jira] [Work logged] (BEAM-5964) Add ClickHouseIO.Write

     [ https://issues.apache.org/jira/browse/BEAM-5964?focusedWorklogId=180228&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-180228 ]

ASF GitHub Bot logged work on BEAM-5964:
----------------------------------------

                Author: ASF GitHub Bot
            Created on: 02/Jan/19 16:25
            Start Date: 02/Jan/19 16:25
    Worklog Time Spent: 10m 
      Work Description: kanterov commented on issue #7006: [BEAM-5964] Add ClickHouseIO.Write
URL: https://github.com/apache/beam/pull/7006#issuecomment-450909993
 
 
   @chamikaramj did you have a chance to take a look?
 
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Issue Time Tracking
-------------------

    Worklog Id:     (was: 180228)
    Time Spent: 6h 40m  (was: 6.5h)

> Add ClickHouseIO.Write
> ----------------------
>
>                 Key: BEAM-5964
>                 URL: https://issues.apache.org/jira/browse/BEAM-5964
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-ideas
>            Reporter: Gleb Kanterov
>            Assignee: Gleb Kanterov
>            Priority: Major
>          Time Spent: 6h 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data that is updated in real time. The project was released as open-source software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and don't put it in ClickHouseIO.Write
> h3. References
> [1] http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)