You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Tyler Hobbs (JIRA)" <ji...@apache.org> on 2016/07/05 19:17:11 UTC
[jira] [Updated] (CASSANDRA-12079) CQLSH to retrieve column names
from data file header
[ https://issues.apache.org/jira/browse/CASSANDRA-12079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Tyler Hobbs updated CASSANDRA-12079:
------------------------------------
Assignee: (was: Tyler Hobbs)
> CQLSH to retrieve column names from data file header
> ----------------------------------------------------
>
> Key: CASSANDRA-12079
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12079
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Environment: Cassandra 2.1.14.1346
> Reporter: Jose Martinez Poblete
> Priority: Minor
> Fix For: 2.2.x, 3.0.x, 3.x
>
>
> Suppose a have a table with 3 columns
> Then the data is copied to a delimited file with *HEADER*
> {noformat}
> cqlsh> create KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1 } AND durable_writes = 'true';
> cqlsh> use my_keyspace ;
> cqlsh:my_keyspace> CREATE TABLE my_table ( col1 int PRIMARY KEY, col2 text, col3 text );
> cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 1, '1st row') ;
> cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 2, '2nd row') ;
> cqlsh:my_keyspace> insert INTO my_table (col1, col2) VALUES ( 3, '3rd row') ;
> cqlsh:my_keyspace> COPY my_keyspace.my_table ( col1, col2 ) TO 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
> Reading options from the command line: {'header': 'true', 'delimiter': '|'}
> Using 3 child processes
> Starting copy of my_keyspace.my_table with columns ['col1', 'col2'].
> Processed: 3 rows; Rate: 10 rows/s; Avg. rate: 4 rows/s
> 3 rows exported to 1 files in 0.861 seconds.
> {noformat}
> This will create a file with these contents
> {noformat}
> col1|col2
> 3|3rd row
> 2|2nd row
> 1|1st row
> {noformat}
> Then we create another table with same DDL
> {noformat}
> cqlsh:my_keyspace> CREATE TABLE my_table2 ( col1 int PRIMARY KEY, col2 text, col3 text );
> {noformat}
> A restore from the recently created delimited file *with header* data file WILL FAIL because no columns were specified so it is expecting all columns to be in the delimited file - but we have a header row and the header option was specified
> {noformat}
> cqlsh:my_keyspace> COPY my_table2 FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
> Reading options from the command line: {'header': 'true', 'delimiter': '|'}
> Using 3 child processes
> Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2', 'col3'].
> Failed to import 3 rows: ParseError - Invalid row length 2 should be 3, given up without retries
> Failed to process 3 rows; failed rows written to import_my_keyspace_my_table2.err
> Processed: 3 rows; Rate: 5 rows/s; Avg. rate: 7 rows/s
> 3 rows imported from 1 files in 0.442 seconds (0 skipped).
> {noformat}
> Provided that *HEADER = true*, It would be very handy if CQLSH looks into the *header row* and retrieves the column names so they do not have to be entered manually on the copy command - especially where there is a significant number of columns
> {noformat}
> cqlsh:my_keyspace> COPY my_table2 (col1, col2) FROM 'my_table.dat' WITH DELIMITER = '|' AND HEADER = true ;
> Reading options from the command line: {'header': 'true', 'delimiter': '|'}
> Using 3 child processes
> Starting copy of my_keyspace.my_table2 with columns ['col1', 'col2'].
> Processed: 3 rows; Rate: 3 rows/s; Avg. rate: 4 rows/s
> 3 rows imported from 1 files in 0.708 seconds (0 skipped).
> cqlsh:my_keyspace> select * from my_table2;
> col1 | col2 | col3
> ------+---------+------
> 1 | 1st row | null
> 2 | 2nd row | null
> 3 | 3rd row | null
> (3 rows)
> cqlsh:my_keyspace>
> {noformat}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)