You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Mck SembWever (JIRA)" <ji...@apache.org> on 2011/09/07 21:19:09 UTC

[jira] [Issue Comment Edited] (CASSANDRA-3150) ColumnFormatRecordReader loops forever

    [ https://issues.apache.org/jira/browse/CASSANDRA-3150?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13099234#comment-13099234 ] 

Mck SembWever edited comment on CASSANDRA-3150 at 9/7/11 7:17 PM:
------------------------------------------------------------------

Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is batchRowCount rows).

What happens if split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and get_range_slices(..) will start returning wrapping ranges. This will still return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between.

      was (Author: michaelsembwever):
    Here keyRange is startToken to split.getEndToken()
startToken is updated each iterate to the last row read (each iterate is batchRowCount rows).

What happens is split.getEndToken() doesn't correspond to any of the rowKeys?
To me it reads that startToken will hop over split.getEndToken() and get_rage_slices(..) will start returning wrapping ranges. This will still return rows and so the iteration will continue, now forever.

The only way out for this code today is a) startToken equals split.getEndToken(), or b) get_range_slices(..) is called with startToken equals split.getEndToken() OR a gap so small there exists no rows in between.
  
> ColumnFormatRecordReader loops forever
> --------------------------------------
>
>                 Key: CASSANDRA-3150
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-3150
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Hadoop
>    Affects Versions: 0.8.4
>            Reporter: Mck SembWever
>            Assignee: Mck SembWever
>            Priority: Critical
>         Attachments: CASSANDRA-3150.patch
>
>
> From http://thread.gmane.org/gmane.comp.db.cassandra.user/20039
> {quote}
> bq. Cassandra-0.8.4 w/ ByteOrderedPartitioner
> bq. CFIF's inputSplitSize=196608
> bq. 3 map tasks (from 4013) is still running after read 25 million rows.
> bq. Can this be a bug in StorageService.getSplits(..) ?
> getSplits looks pretty foolproof to me but I guess we'd need to add
> more debug logging to rule out a bug there for sure.
> I guess the main alternative would be a bug in the recordreader paging.
> {quote}

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira