You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tyler Hobbs (JIRA)" <ji...@apache.org> on 2016/07/05 19:17:11 UTC

[jira] [Updated] (CASSANDRA-12079) CQLSH to retrieve column names from data file header

     [ https://issues.apache.org/jira/browse/CASSANDRA-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Tyler Hobbs updated CASSANDRA-12079:
------------------------------------
    Assignee:     (was: Tyler Hobbs)

> CQLSH to retrieve column names from data file header
> ----------------------------------------------------
>
>                 Key: CASSANDRA-12079
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-12079
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Core
>         Environment: Cassandra 2.1.14.1346
>            Reporter: Jose Martinez Poblete
>            Priority: Minor
>             Fix For: 2.2.x, 3.0.x, 3.x
>
>
> Suppose a have a table with 3 columns
> Then the data is copied to a delimited file with *HEADER*
> {noformat}
> cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 } AND durable_writes = 'true';
> cqlsh> use my_keyspace ;
> cqlsh:my_keyspace> CREATE TABLE my_table ( col1 int PRIMARY KEY, col2 text, col3 text );
> cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 1, '1st row') ;
> cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 2, '2nd row') ;
> cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 3, '3rd row') ;
> cqlsh:my_keyspace> COPY my_keyspace.my_table ( col1, col2 ) TO 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
> Reading options from the command line: {'header': 'true', 'delimiter': '|'}
> Using 3 child processes
> Starting copy of my_keyspace.my_table with columns ['col1', 'col2'].
> Processed: 3 rows; Rate:      10 rows/s; Avg. rate:       4 rows/s
> 3 rows exported to 1 files in 0.861 seconds.
> {noformat}
> This will create a file with these contents
> {noformat}
> col1|col2
> 3|3rd row
> 2|2nd row
> 1|1st row
> {noformat}
> Then we create another table with same DDL 
> {noformat}
> cqlsh:my_keyspace> CREATE TABLE my_table2 ( col1 int PRIMARY KEY, col2 text, col3 text );
> {noformat}
> A restore from the recently created delimited file *with header* data file WILL FAIL because no columns were specified so it is expecting all columns to be in the delimited file - but we have a header row and the header option was specified
> {noformat}
> cqlsh:my_keyspace> COPY my_table2 FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
> Reading options from the command line: {'header': 'true', 'delimiter': '|'}
> Using 3 child processes
> Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2', 'col3'].
> Failed to import 3 rows: ParseError - Invalid row length 2 should be 3,  given up without retries
> Failed to process 3 rows; failed rows written to import_my_keyspace_my_table2.err
> Processed: 3 rows; Rate:       5 rows/s; Avg. rate:       7 rows/s
> 3 rows imported from 1 files in 0.442 seconds (0 skipped).
> {noformat}
> Provided that *HEADER = true*,  It would be very handy if CQLSH looks into the *header row* and retrieves the column names so they do not have to be entered manually on the copy command - especially where there is a significant number of columns
> {noformat}
> cqlsh:my_keyspace> COPY my_table2 (col1, col2) FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
> Reading options from the command line: {'header': 'true', 'delimiter': '|'}
> Using 3 child processes
> Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2'].
> Processed: 3 rows; Rate:       3 rows/s; Avg. rate:       4 rows/s
> 3 rows imported from 1 files in 0.708 seconds (0 skipped).
> cqlsh:my_keyspace> select * from my_table2;
>  col1 | col2    | col3
> ------+---------+------
>     1 | 1st row | null
>     2 | 2nd row | null
>     3 | 3rd row | null
> (3 rows)
> cqlsh:my_keyspace> 
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)