You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "fuggy_yama (JIRA)" <ji...@apache.org> on 2015/07/13 19:48:05 UTC
[jira] [Comment Edited] (CASSANDRA-9773) Hadoop Cassandra
integration - cannot output to table with only primary key columns
[ https://issues.apache.org/jira/browse/CASSANDRA-9773?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14625020#comment-14625020 ]
fuggy_yama edited comment on CASSANDRA-9773 at 7/13/15 5:48 PM:
----------------------------------------------------------------
Adding a dummy value column may be some kind of *workaround*:
{code}CREATE TABLE IF NOT EXISTS summary
(
it int,
id int,
x float,
y float,
dummy boolean,
PRIMARY KEY (it, id, x, y)
) WITH compact storage{code}
The update statement defined in hadoop job would look like:
{code}String outputQuery = "UPDATE " + params.get("output_keyspace") + "." + params.get("output_column_family") + " SET dummy=?";
CqlConfigHelper.setOutputCql(job.getConfiguration(), outputQuery);{code}
And finally insert some dummy value in reducers to this column.
was (Author: fuggy_yama):
Adding a dummy value column may be some kind of *workaround*:
{code}CREATE TABLE IF NOT EXISTS summary
(
it int,
id int,
x float,
y float,
dummy boolean,
PRIMARY KEY (it, id, x, y)
) WITH compact storage{code}
The update statement defined in hadoop job would look like:
{code}String outputQuery = "UPDATE " + params.get("output_keyspace") + "." + params.get("output_column_family") + " SET dummy=?";{code}
CqlConfigHelper.setOutputCql(job.getConfiguration(), outputQuery);
And finally insert some dummy value in reducers to this column.
> Hadoop Cassandra integration - cannot output to table with only primary key columns
> -----------------------------------------------------------------------------------
>
> Key: CASSANDRA-9773
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9773
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Environment: Cassandra 2.0.13, Hadoop 1.0.4
> Reporter: fuggy_yama
>
> I have following table in cassandra:
> {code:sql}CREATE TABLE IF NOT EXISTS summary
> (
> it int,
> id int,
> x float,
> y float,
> PRIMARY KEY (it, id, x, y)
> ) WITH compact storage{code}
> In hadoop job definition i set output/update query:
> {code:java}String outputQuery = "UPDATE " + params.get("output_keyspace") + "." + params.get("output_column_family") + " SET x=?, y=?";
> CqlConfigHelper.setOutputCql(job.getConfiguration(), outputQuery);{code}
> When hadoop job wants to write results from reducers to cassandra then I get this exception:
> {code:java}java.io.IOException: java.lang.RuntimeException: failed to prepare cql query UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" = ? AND "y" = ?
> at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:256)
> Caused by: java.lang.RuntimeException: failed to prepare cql query UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" = ? AND "y" = ?
> at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:300)
> at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.run(CqlRecordWriter.java:237)
> Caused by: InvalidRequestException(why:PRIMARY KEY part x found in SET part)
> at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:51017)
> at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result$prepare_cql3_query_resultStandardScheme.read(Cassandra.java:50994)
> at org.apache.cassandra.thrift.Cassandra$prepare_cql3_query_result.read(Cassandra.java:50933)
> at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:78)
> at org.apache.cassandra.thrift.Cassandra$Client.recv_prepare_cql3_query(Cassandra.java:1756)
> at org.apache.cassandra.thrift.Cassandra$Client.prepare_cql3_query(Cassandra.java:1742)
> at org.apache.cassandra.hadoop.cql3.CqlRecordWriter$RangeClient.preparedStatement(CqlRecordWriter.java:296)
> ... 1 more{code}
> When we want to insert/update columns from PK definition then there is a conflict in generated CQL query (x and y columns appear in SET and WHERE coulses...):
> *UPDATE kmeans_out_cs.summary SET x=?, y=? WHERE "it" = ? AND "id" = ? AND "x" = ? AND "y" = ?*
> *Can hadoop job write data to a cassandra table that has only PRIMARY KEY columns?*
> *UPDATE1*
> I checked the source code and noticed that the above update cql query actually has to be an update statement (not insert).
> Update statement syntax requires non empty "SET a=b" clause so there is no way to avoid column names duplication in final update query
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)