You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Gleb Kanterov (JIRA)" <ji...@apache.org> on 2019/01/11 10:54:00 UTC

[jira] [Work started] (BEAM-5964) Add ClickHouseIO.Write

     [ https://issues.apache.org/jira/browse/BEAM-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Work on BEAM-5964 started by Gleb Kanterov.
-------------------------------------------
> Add ClickHouseIO.Write
> ----------------------
>
>                 Key: BEAM-5964
>                 URL: https://issues.apache.org/jira/browse/BEAM-5964
>             Project: Beam
>          Issue Type: New Feature
>          Components: io-ideas
>            Reporter: Gleb Kanterov
>            Assignee: Gleb Kanterov
>            Priority: Major
>          Time Spent: 8h 40m
>  Remaining Estimate: 0h
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data that is updated in real time. The project was released as open-source software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and don't put it in ClickHouseIO.Write
> h3. References
> [1] http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)