You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Updated) (JIRA)" <ji...@apache.org> on 2012/02/08 15:02:01 UTC
[jira] [Updated] (CASSANDRA-3371) Cassandra inferred schema and
actual data don't match
[ https://issues.apache.org/jira/browse/CASSANDRA-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Brandon Williams updated CASSANDRA-3371:
----------------------------------------
Attachment: 0002-Output-support-to-match-input.txt
0001-Rework-pig-schema.txt
New patches build upon v5 and adds output storage to match. The catch, however, is that either the tuples (indexed/validated columns) or the bag (unknown columns) are optional, so if you are dealing with narrow output, you can just make a tuple like (key, (name,value), (name,value)) and that will work.
> Cassandra inferred schema and actual data don't match
> -----------------------------------------------------
>
> Key: CASSANDRA-3371
> URL: https://issues.apache.org/jira/browse/CASSANDRA-3371
> Project: Cassandra
> Issue Type: Bug
> Components: Hadoop
> Affects Versions: 0.8.7
> Reporter: Pete Warden
> Assignee: Brandon Williams
> Attachments: 0001-Rework-pig-schema.txt, 0002-Output-support-to-match-input.txt, 3371-v2.txt, 3371-v3.txt, 3371-v4.txt, 3371-v5-rebased.txt, 3371-v5.txt, pig.diff
>
>
> It's looking like there may be a mismatch between the schema that's being reported by the latest CassandraStorage.java, and the data that's actually returned. Here's an example:
> rows = LOAD 'cassandra://Frap/PhotoVotes' USING CassandraStorage();
> DESCRIBE rows;
> rows: {key: chararray,columns: {(name: chararray,value: bytearray,photo_owner: chararray,value_photo_owner: bytearray,pid: chararray,value_pid: bytearray,matched_string: chararray,value_matched_string: bytearray,src_big: chararray,value_src_big: bytearray,time: chararray,value_time: bytearray,vote_type: chararray,value_vote_type: bytearray,voter: chararray,value_voter: bytearray)}}
> DUMP rows;
> (691831038_1317937188.48955,{(photo_owner,1596090180),(pid,6855155124568798560),(matched_string,),(src_big,),(time,Thu Oct 06 14:39:48 -0700 2011),(vote_type,album_dislike),(voter,691831038)})
> getSchema() is reporting the columns as an inner bag of tuples, each of which contains 16 values. In fact, getNext() seems to return an inner bag containing 7 tuples, each of which contains two values.
> It appears that things got out of sync with this change:
> http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?r1=1177083&r2=1177082&pathrev=1177083
> See more discussion at:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/pig-cassandra-problem-quot-Incompatible-field-schema-quot-error-tc6882703.html
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira