You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Brandon Williams (Commented) (JIRA)" <ji...@apache.org> on 2012/01/19 20:44:40 UTC

[jira] [Commented] (CASSANDRA-3371) Cassandra inferred schema and actual data don't match

    [ https://issues.apache.org/jira/browse/CASSANDRA-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13189334#comment-13189334 ] 

Brandon Williams commented on CASSANDRA-3371:
---------------------------------------------

bq. 1. Fix schema so that this ticket's problem is resolved

v4 does this, however it's not quite all of what we want.

bq. 2. have the default return value from CassandraStorage be (key, column, value) as is thought of for transposing wide rows

After thinking about this more, that's the wrong way to approach that, because if you DO want to work within the row, now you have to do an expensive group to get back what we had before -- a nest structure -- where breaking that structure up into (k, c, v) is extremely cheap if that's what you want.  So ultimately, we need to stick with a bag for spillage, and thus keep the existing schema.  v4 does this.

v4 also names the *values* of indexed/validated columns after their name, which is more pygmalion-style, since you'll always want to filter the value, not the name.

The problem, however, is strange parsing problems again:

{noformat}
ERROR 1200: Pig script failed to parse: 
<file foo.pig, line 3, column 7> pig script failed to validate: org.apache.pig.impl.logicalLayer.FrontendException: ERROR 1128: Cannot find field owner_id in :tuple(name:chararray,owner_id:chararray)
{noformat}

The seems related to the fact that schema-wise, a bag can only contain a single tuple - but that tuple can then contain any number of items.  Apparently this is only a hard requirement in 0.9 or later, but I tested it up to trunk so it doesn't look like it's going anywhere.

In practice, however, getNext doesn't actually return this 'container' tuple.  If you do you get casting errors.

I'm not really sure how we can fix this, and finding other examples of LoadMetadata implemented with bags are hard to come by.


                
> Cassandra inferred schema and actual data don't match
> -----------------------------------------------------
>
>                 Key: CASSANDRA-3371
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3371
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.8.7
>            Reporter: Pete Warden
>            Assignee: Brandon Williams
>         Attachments: 3371-v2.txt, 3371-v3.txt, 3371-v4.txt, pig.diff
>
>
> It's looking like there may be a mismatch between the schema that's being reported by the latest CassandraStorage.java, and the data that's actually returned. Here's an example:
> rows = LOAD 'cassandra://Frap/PhotoVotes' USING CassandraStorage();
> DESCRIBE rows;
> rows: {key: chararray,columns: {(name: chararray,value: bytearray,photo_owner: chararray,value_photo_owner: bytearray,pid: chararray,value_pid: bytearray,matched_string: chararray,value_matched_string: bytearray,src_big: chararray,value_src_big: bytearray,time: chararray,value_time: bytearray,vote_type: chararray,value_vote_type: bytearray,voter: chararray,value_voter: bytearray)}}
> DUMP rows;
> (691831038_1317937188.48955,{(photo_owner,1596090180),(pid,6855155124568798560),(matched_string,),(src_big,),(time,Thu Oct 06 14:39:48 -0700 2011),(vote_type,album_dislike),(voter,691831038)})
> getSchema() is reporting the columns as an inner bag of tuples, each of which contains 16 values. In fact, getNext() seems to return an inner bag containing 7 tuples, each of which contains two values. 
> It appears that things got out of sync with this change:
> http://svn.apache.org/viewvc/cassandra/branches/cassandra-0.8/contrib/pig/src/java/org/apache/cassandra/hadoop/pig/CassandraStorage.java?r1=1177083&r2=1177082&pathrev=1177083
> See more discussion at:
> http://cassandra-user-incubator-apache-org.3065146.n2.nabble.com/pig-cassandra-problem-quot-Incompatible-field-schema-quot-error-tc6882703.html

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira