You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@cassandra.apache.org by Jonathan Ellis <jb...@gmail.com> on 2010/05/01 06:01:49 UTC

Re: Single Split ColumnFamilyRecordReader returns duplicate rows

Can you create a ticket?

On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
> There's a bug in ColumnFamilyRecordReader that appears when processing
> a single split.  When the start and end tokens of the split are equal,
> duplicate rows can be returned.
>
> Example with 5 rows:
> token (start and end) = 53193025635115934196771903670925341736
>
> Tokens returned by first get_range_slices iteration:
>  16955237001963240173058271559858726497
>  40670782773005619916245995581909898190
>  99079589977253916124855502156832923443
>  144992942750327304334463589818972416113
>  166860289390734216023086131251507064403
>
> Tokens returned by next iteration (first token is last token from
> previous, end token is unchanged)
>  16955237001963240173058271559858726497
>  40670782773005619916245995581909898190
>
> Tokens returned by final iteration  (first token is last token from
> previous, end token is unchanged)
>  [] (empty)
>
> In this example, the mapper has processed 7 rows in total, 2 of which
> were duplicates.
>
> Joost.
>



-- 
Jonathan Ellis
Project Chair, Apache Cassandra
co-founder of Riptano, the source for professional Cassandra support
http://riptano.com

Re: Single Split ColumnFamilyRecordReader returns duplicate rows

Posted by Joost Ouwerkerk <jo...@openplaces.org>.
Created CASSANDRA-1042.

On Sat, May 1, 2010 at 12:01 AM, Jonathan Ellis <jb...@gmail.com> wrote:
> Can you create a ticket?
>
> On Fri, Apr 30, 2010 at 4:55 PM, Joost Ouwerkerk <jo...@openplaces.org> wrote:
>> There's a bug in ColumnFamilyRecordReader that appears when processing
>> a single split.  When the start and end tokens of the split are equal,
>> duplicate rows can be returned.
>>
>> Example with 5 rows:
>> token (start and end) = 53193025635115934196771903670925341736
>>
>> Tokens returned by first get_range_slices iteration:
>>  16955237001963240173058271559858726497
>>  40670782773005619916245995581909898190
>>  99079589977253916124855502156832923443
>>  144992942750327304334463589818972416113
>>  166860289390734216023086131251507064403
>>
>> Tokens returned by next iteration (first token is last token from
>> previous, end token is unchanged)
>>  16955237001963240173058271559858726497
>>  40670782773005619916245995581909898190
>>
>> Tokens returned by final iteration  (first token is last token from
>> previous, end token is unchanged)
>>  [] (empty)
>>
>> In this example, the mapper has processed 7 rows in total, 2 of which
>> were duplicates.
>>
>> Joost.
>>
>
>
>
> --
> Jonathan Ellis
> Project Chair, Apache Cassandra
> co-founder of Riptano, the source for professional Cassandra support
> http://riptano.com
>