You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@ignite.apache.org by "Kirill Shirokov (JIRA)" <ji...@apache.org> on 2018/01/17 11:44:00 UTC
[jira] [Updated] (IGNITE-6917) SQL: implement COPY command for efficient data loading

     [ https://issues.apache.org/jira/browse/IGNITE-6917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Kirill Shirokov updated IGNITE-6917:
------------------------------------
    Description: 
Inspired by Postgres [1]

Common use case - bulk data load through JDBC/ODBC interface. Currently it is only possible to execute single commands one by one. We already can batch them to improve performance, but there is still big room for improvement.

We should think of a completely new command - {{COPY}}. It will accept a file (or input stream in general case) on the client side, then transfer data to the cluster, and then execute update inside the cluster, e.g. through streamer.

First of all we need to create quick and dirty prototype to assess potential performance improvement. It speedup is confirmed, we should build base implementation which will accept only files. But at the same time we should understand how it will evolve in future: multiple file formats (probably including Hadoop formarts, e.g. Parquet), escape characters, input streams, etc..

[1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
h1. Proposed syntax

Curent implementation:
{noformat}
COPY 
    FROM "file.name"
    INTO <schema>.<table>
    [COLUMNS (col-name, ...)]
    FORMAT <format-name>
{noformat}

  was:
Inspired by Postgres [1]

Common use case - bulk data load through JDBC/ODBC interface. Currently it is only possible to execute single commands one by one. We already can batch them to improve performance, but there is still big room for improvement.

We should think of a completely new command - {{COPY}}. It will accept a file (or input stream in general case) on the client side, then transfer data to the cluster, and then execute update inside the cluster, e.g. through streamer. 

First of all we need to create quick and dirty prototype to assess potential performance improvement. It speedup is confirmed, we should build base implementation which will accept only files. But at the same time we should understand how it will evolve in future: multiple file formats (probably including Hadoop formarts, e.g. Parquet), escape characters, input streams, etc..

[1] https://www.postgresql.org/docs/9.6/static/sql-copy.html


> SQL: implement COPY command for efficient data loading
> ------------------------------------------------------
>
>                 Key: IGNITE-6917
>                 URL: https://issues.apache.org/jira/browse/IGNITE-6917
>             Project: Ignite
>          Issue Type: New Feature
>          Components: sql
>            Reporter: Vladimir Ozerov
>            Assignee: Kirill Shirokov
>            Priority: Major
>              Labels: iep-1
>
> Inspired by Postgres [1]
> Common use case - bulk data load through JDBC/ODBC interface. Currently it is only possible to execute single commands one by one. We already can batch them to improve performance, but there is still big room for improvement.
> We should think of a completely new command - {{COPY}}. It will accept a file (or input stream in general case) on the client side, then transfer data to the cluster, and then execute update inside the cluster, e.g. through streamer.
> First of all we need to create quick and dirty prototype to assess potential performance improvement. It speedup is confirmed, we should build base implementation which will accept only files. But at the same time we should understand how it will evolve in future: multiple file formats (probably including Hadoop formarts, e.g. Parquet), escape characters, input streams, etc..
> [1] [https://www.postgresql.org/docs/9.6/static/sql-copy.html]
> h1. Proposed syntax
> Curent implementation:
> {noformat}
> COPY 
>     FROM "file.name"
>     INTO <schema>.<table>
>     [COLUMNS (col-name, ...)]
>     FORMAT <format-name>
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)