You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@beam.apache.org by "Gleb Kanterov (JIRA)" <ji...@apache.org> on 2019/01/11 10:54:00 UTC
[jira] [Work started] (BEAM-5964) Add ClickHouseIO.Write
[ https://issues.apache.org/jira/browse/BEAM-5964?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Work on BEAM-5964 started by Gleb Kanterov.
-------------------------------------------
> Add ClickHouseIO.Write
> ----------------------
>
> Key: BEAM-5964
> URL: https://issues.apache.org/jira/browse/BEAM-5964
> Project: Beam
> Issue Type: New Feature
> Components: io-ideas
> Reporter: Gleb Kanterov
> Assignee: Gleb Kanterov
> Priority: Major
> Time Spent: 8h 40m
> Remaining Estimate: 0h
>
> h3. Motivation
> ClickHouse is open-source columnar DBMS for OLAP. It allows analysis of data that is updated in real time. The project was released as open-source software under the Apache 2 license in June 2016.
> h3. Design and implementation
> 1. Do only writes, reads aren't useful because ClickHouse is designed for OLAP queries
> 2. For writes, do write in batches and rely on idempotent and atomic inserts supported by replicated tables in ClickHouse
> 3. Implement ClickHouseIO.Write as PTransform<PCollection<Row>, PDone>
> 4. Rely on having logic for casting rows between schemas in BEAM-5918, and don't put it in ClickHouseIO.Write
> h3. References
> [1] http://highscalability.com/blog/2017/9/18/evolution-of-data-structures-in-yandexmetrica.html
> [2] https://blog.cloudflare.com/how-cloudflare-analyzes-1m-dns-queries-per-second/
> [3] https://blog.cloudflare.com/http-analytics-for-6m-requests-per-second-using-clickhouse/
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)