You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Joost Ouwerkerk <jo...@openplaces.org> on 2010/04/30 23:55:55 UTC

Single Split ColumnFamilyRecordReader returns duplicate rows

There's a bug in ColumnFamilyRecordReader that appears when processing
a single split.  When the start and end tokens of the split are equal,
duplicate rows can be returned.

Example with 5 rows:
token (start and end) = 53193025635115934196771903670925341736

Tokens returned by first get_range_slices iteration:
 16955237001963240173058271559858726497
 40670782773005619916245995581909898190
 99079589977253916124855502156832923443
 144992942750327304334463589818972416113
 166860289390734216023086131251507064403

Tokens returned by next iteration (first token is last token from
previous, end token is unchanged)
 16955237001963240173058271559858726497
 40670782773005619916245995581909898190

Tokens returned by final iteration  (first token is last token from
previous, end token is unchanged)
 [] (empty)

In this example, the mapper has processed 7 rows in total, 2 of which
were duplicates.

Joost.

Re: Single Split ColumnFamilyRecordReader returns duplicate rows

Posted by Joost Ouwerkerk <jo...@openplaces.org>.
Created CASSANDRA-1042.

On Sat, May 1, 2010 at 12:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Can you create a ticket?
>
> On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
>> There's a bug in ColumnFamilyRecordReader that appears when processing
>> a single split.  When the start and end tokens of the split are equal,
>> duplicate rows can be returned.
>>
>> Example with 5 rows:
>> token (start and end) = 53193025635115934196771903670925341736
>>
>> Tokens returned by first get_range_slices iteration:
>>  16955237001963240173058271559858726497
>>  40670782773005619916245995581909898190
>>  99079589977253916124855502156832923443
>>  144992942750327304334463589818972416113
>>  166860289390734216023086131251507064403
>>
>> Tokens returned by next iteration (first token is last token from
>> previous, end token is unchanged)
>>  16955237001963240173058271559858726497
>>  40670782773005619916245995581909898190
>>
>> Tokens returned by final iteration  (first token is last token from
>> previous, end token is unchanged)
>>  [] (empty)
>>
>> In this example, the mapper has processed 7 rows in total, 2 of which
>> were duplicates.
>>
>> Joost.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>

Re: Single Split ColumnFamilyRecordReader returns duplicate rows

Posted by Jonathan Ellis <jb...@gmail.com>.
Can you create a ticket?

On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
> There's a bug in ColumnFamilyRecordReader that appears when processing
> a single split.  When the start and end tokens of the split are equal,
> duplicate rows can be returned.
>
> Example with 5 rows:
> token (start and end) = 53193025635115934196771903670925341736
>
> Tokens returned by first get_range_slices iteration:
>  16955237001963240173058271559858726497
>  40670782773005619916245995581909898190
>  99079589977253916124855502156832923443
>  144992942750327304334463589818972416113
>  166860289390734216023086131251507064403
>
> Tokens returned by next iteration (first token is last token from
> previous, end token is unchanged)
>  16955237001963240173058271559858726497
>  40670782773005619916245995581909898190
>
> Tokens returned by final iteration  (first token is last token from
> previous, end token is unchanged)
>  [] (empty)
>
> In this example, the mapper has processed 7 rows in total, 2 of which
> were duplicates.
>
> Joost.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com