You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Terje Marthinussen (JIRA)" <ji...@apache.org> on 2011/06/24 17:39:48 UTC

[jira] [Created] (CASSANDRA-2823) NPE during range slices with rowrepairs

NPE during range slices with rowrepairs
---------------------------------------

                 Key: CASSANDRA-2823
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
             Project: Cassandra
          Issue Type: Bug
    Affects Versions: 0.8.2
         Environment: This is a trunk build with 2521 and 2433
I somewhat doubt that is related however.
            Reporter: Terje Marthinussen


Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
Then occasionally killing a node here and there and restarting it.

Triggers the following NPE
 ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
java.lang.NullPointerException
	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)

Looking at the code in getReduced:

{noformat}
                ColumnFamily resolved = versions.size() > 1
                                      ? RowRepairResolver.resolveSuperset(versions)
                                      : versions.get(0);
{noformat}
seems like resolved becomes null when this happens and versions.size is larger than 1.

RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.

It may also be an interesting question if it is guaranteed that                
versions.add(current.left.cf);
can never return null?

Jonathan suggested on IRC that maybe 
{noformat}
                ColumnFamily resolved = versions.size() > 1
                                      ? RowRepairResolver.resolveSuperset(versions)
                                      : versions.get(0);
                if (resolved == null)
                      return new Row(key, resolved);
{noformat}

could be a fix.


--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055597#comment-13055597 ] 

Jonathan Ellis commented on CASSANDRA-2823:
-------------------------------------------

ah, right -- skipping the clear would be buggy.  +1 again. :)

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055596#comment-13055596 ] 

Sylvain Lebresne commented on CASSANDRA-2823:
---------------------------------------------

Yeah, I didn't do that mostly because there is still a few lines of code (besides maybe scheduling repair) that we need to do even if resolved is null (debugging message in RowRepairResolver and more importantly, the clear of versions and versionSources in RangeSliceResolver). 

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055591#comment-13055591 ] 

Jonathan Ellis commented on CASSANDRA-2823:
-------------------------------------------

+1

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Issue Comment Edited] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055593#comment-13055593 ] 

Jonathan Ellis edited comment on CASSANDRA-2823 at 6/27/11 3:06 PM:
--------------------------------------------------------------------

although I slightly prefer the "if == null return" immediately after initializing resolved, to keep those two pieces of logic together.

      was (Author: jbellis):
    although I slightly prefer the "if == null return version" immediately after initializing resolved to keep those two pieces of logic together.
  
> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056458#comment-13056458 ] 

Jonathan Ellis commented on CASSANDRA-2823:
-------------------------------------------

does 0.7 need this?

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
>
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne resolved CASSANDRA-2823.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 0.8.2
         Reviewer: jbellis

Committed, thanks

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
>
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056472#comment-13056472 ] 

Hudson commented on CASSANDRA-2823:
-----------------------------------

Integrated in Cassandra-0.7 #514 (See [https://builds.apache.org/job/Cassandra-0.7/514/])
    Fix potential NPE during read repair
patch by slebresne; reviewed by jbellis for CASSANDRA-2823

slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140550
Files : 
* /cassandra/branches/cassandra-0.7/CHANGES.txt
* /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java
* /cassandra/branches/cassandra-0.7/src/java/org/apache/cassandra/service/RowRepairResolver.java


> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
>
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jonathan Ellis reassigned CASSANDRA-2823:
-----------------------------------------

    Assignee: Sylvain Lebresne

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056464#comment-13056464 ] 

Sylvain Lebresne commented on CASSANDRA-2823:
---------------------------------------------

You're right, 0.7 needs that too. I've committed it there too.

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
>
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Sylvain Lebresne (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Sylvain Lebresne updated CASSANDRA-2823:
----------------------------------------

    Attachment: 2823.patch

I think the problem is with the call to removeDeleted in resolveSuperset() (which is fairly new). Basically, the code is fine with resolved being null as long as this means that all the versions are null. But the removeDeleted call make it possible to have a null removeDeleted even if the versions are not null, if a row tombstone expires between the time it was returned by the node and the time it is resolved by the coordinator for instance.

Attaching patch that skips the maybeScheduleRepair() call if resolved == null since even in that case there is nothing to repair since the tombstone are now expired.

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Jonathan Ellis (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13055593#comment-13055593 ] 

Jonathan Ellis commented on CASSANDRA-2823:
-------------------------------------------

although I slightly prefer the "if == null return version" immediately after initializing resolved to keep those two pieces of logic together.

> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (CASSANDRA-2823) NPE during range slices with rowrepairs

Posted by "Hudson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/CASSANDRA-2823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13056383#comment-13056383 ] 

Hudson commented on CASSANDRA-2823:
-----------------------------------

Integrated in Cassandra-0.8 #195 (See [https://builds.apache.org/job/Cassandra-0.8/195/])
    Fix potential NPE in range slice read repair
patch by slebresne; reviewed by jbellis for CASSANDRA-2823

slebresne : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1140470
Files : 
* /cassandra/branches/cassandra-0.8/CHANGES.txt
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/RowRepairResolver.java
* /cassandra/branches/cassandra-0.8/src/java/org/apache/cassandra/service/RangeSliceResponseResolver.java


> NPE during range slices with rowrepairs
> ---------------------------------------
>
>                 Key: CASSANDRA-2823
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-2823
>             Project: Cassandra
>          Issue Type: Bug
>    Affects Versions: 0.8.2
>         Environment: This is a trunk build with 2521 and 2433
> I somewhat doubt that is related however.
>            Reporter: Terje Marthinussen
>            Assignee: Sylvain Lebresne
>             Fix For: 0.8.2
>
>         Attachments: 2823.patch
>
>
> Doing some heavy testing of relatively fast feeding (5000+ mutations/sec) + repair on all node + range slices.
> Then occasionally killing a node here and there and restarting it.
> Triggers the following NPE
>  ERROR [pool-2-thread-3] 2011-06-24 20:56:27,289 Cassandra.java (line 3210) Internal error processing get_range_slices
> java.lang.NullPointerException
> 	at org.apache.cassandra.service.RowRepairResolver.maybeScheduleRepairs(RowRepairResolver.java:109)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:112)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver$2.getReduced(RangeSliceResponseResolver.java:83)
> 	at org.apache.cassandra.utils.MergeIterator$ManyToOne.consume(MergeIterator.java:161)
> 	at org.apache.cassandra.utils.MergeIterator.computeNext(MergeIterator.java:88)
> 	at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:140)
> 	at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:135)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:120)
> 	at org.apache.cassandra.service.RangeSliceResponseResolver.resolve(RangeSliceResponseResolver.java:43)
> Looking at the code in getReduced:
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
> {noformat}
> seems like resolved becomes null when this happens and versions.size is larger than 1.
> RowRepairResolver.resolveSuperset() does actually return null if it cannot resolve anything, so there is definately a case here which can occur and is not handled.
> It may also be an interesting question if it is guaranteed that                
> versions.add(current.left.cf);
> can never return null?
> Jonathan suggested on IRC that maybe 
> {noformat}
>                 ColumnFamily resolved = versions.size() > 1
>                                       ? RowRepairResolver.resolveSuperset(versions)
>                                       : versions.get(0);
>                 if (resolved == null)
>                       return new Row(key, resolved);
> {noformat}
> could be a fix.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira