You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jonathan Ellis (JIRA)" <ji...@apache.org> on 2011/07/05 18:02:16 UTC

[jira] [Commented] (CASSANDRA-2855) Add hadoop support option to skip rows with empty columns

    [ https://issues.apache.org/jira/browse/CASSANDRA-2855?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13059972#comment-13059972 ] 

Jonathan Ellis commented on CASSANDRA-2855:
-------------------------------------------

I don't like the idea of adding flags to change behavior.

What I think we *could* do is not bother including empty rows in the resultset, IF we are doing a slice query for the entire row.  (Since, as soon as the tombstones expire, they will be gone anyway.)

> Add hadoop support option to skip rows with empty columns
> ---------------------------------------------------------
>
>                 Key: CASSANDRA-2855
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2855
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Hadoop
>            Reporter: Jeremy Hanna
>            Assignee: Jeremy Hanna
>              Labels: hadoop
>
> We have been finding that range ghosts appear in results from Hadoop via Pig.  This could also happen if rows don't have data for the slice predicate that is given.  This leads to having to do a painful amount of defensive checking on the Pig side, especially in the case of range ghosts.
> We would like to add an option to skip rows that have no column values in it.  That functionality existed before in core Cassandra but was removed because of the performance penalty of that checking.  However with Hadoop support in the RecordReader, that is batch oriented anyway, so individual row reading performance isn't as much of an issue.  Also we would make it an optional config parameter for each job anyway, so people wouldn't have to incur that penalty if they are confident that there won't be those empty rows or they don't care.
> It could be parameter cassandra.skip.empty.rows and be true/false.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira