You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by shalom sagges <sh...@gmail.com> on 2019/06/05 11:32:46 UTC

AbstractLocalAwareExecutorService Exception During Upgrade

Hi All,

I'm having a bad situation where after upgrading 2 nodes (binaries only)
from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:

AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread
Thread[ReadStage-5,5,main]: {}
java.lang.ArrayIndexOutOfBoundsException: null


I also see errors on repairs but no repair is running at all. I verified
this with ps -ef command and nodetool compactionstats. The error I see is:
Failed creating a merkle tree for [repair
#a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /1.2.3.4
(see log for details)

I saw repair errors on data tables as well.
nodetool status shows all are UN and nodetool describecluster shows two
schema versions as expected.


After the warnings appeared, clients started to get timed out read/write
queries.
Restarting the 2 nodes solved the clients' connection issues, but the
warnings are still being generated in the logs.

Did anyone encounter such an issue and knows what this means?

Thanks!

Re: AbstractLocalAwareExecutorService Exception During Upgrade

Posted by shalom sagges <sh...@gmail.com>.

Hi Again,

Trying to push this up as I wasn't able to find the root cause of this
issue.
Perhaps I need to upgrade to 3.0 first?
Will be happy to get some ideas.

Opened https://issues.apache.org/jira/browse/CASSANDRA-15172 with more
details.

Thanks!

On Thu, Jun 6, 2019 at 5:31 AM Jonathan Koppenhofer <jo...@koppedomain.com>
wrote:

> Not sure about why repair is running, but we are also seeing the same
> merkle tree issue in a mixed version cluster in which we have intentionally
> started a repair against 2 upgraded DCs. We are currently researching, and
> can post back if we find the issue, but also would appreciate if someone
> has a suggestion. We have also run a local repair in an upgraded DC in this
> same mixed version cluster without issue.
>
> We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to
> run repairs in mixed version clusters, so don't do it :) this is kind of a
> special circumstances where other things have gone wrong.
>
> Thanks
>
> On Wed, Jun 5, 2019, 5:23 PM shalom sagges <sh...@gmail.com> wrote:
>
>> If anyone has any idea on what might cause this issue, it'd be great.
>>
>> I don't understand what could trigger this exception.
>> But what I really can't understand is why repairs started to run suddenly
>> :-\
>> There's no cron job running, no active repair process, no Validation
>> compactions, Reaper is turned off....  I see repair running only in the
>> logs.
>>
>> Thanks!
>>
>>
>> On Wed, Jun 5, 2019 at 2:32 PM shalom sagges <sh...@gmail.com>
>> wrote:
>>
>>> Hi All,
>>>
>>> I'm having a bad situation where after upgrading 2 nodes (binaries only)
>>> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:
>>>
>>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on
>>> thread Thread[ReadStage-5,5,main]: {}
>>> java.lang.ArrayIndexOutOfBoundsException: null
>>>
>>>
>>> I also see errors on repairs but no repair is running at all. I verified
>>> this with ps -ef command and nodetool compactionstats. The error I see is:
>>> Failed creating a merkle tree for [repair
>>> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /
>>> 1.2.3.4 (see log for details)
>>>
>>> I saw repair errors on data tables as well.
>>> nodetool status shows all are UN and nodetool describecluster shows two
>>> schema versions as expected.
>>>
>>>
>>> After the warnings appeared, clients started to get timed out read/write
>>> queries.
>>> Restarting the 2 nodes solved the clients' connection issues, but the
>>> warnings are still being generated in the logs.
>>>
>>> Did anyone encounter such an issue and knows what this means?
>>>
>>> Thanks!
>>>
>>>

Re: AbstractLocalAwareExecutorService Exception During Upgrade

Posted by Jonathan Koppenhofer <jo...@koppedomain.com>.

Not sure about why repair is running, but we are also seeing the same
merkle tree issue in a mixed version cluster in which we have intentionally
started a repair against 2 upgraded DCs. We are currently researching, and
can post back if we find the issue, but also would appreciate if someone
has a suggestion. We have also run a local repair in an upgraded DC in this
same mixed version cluster without issue.

We are going 2.1.x to 3.0.x... and yes, we know you are not supposed to run
repairs in mixed version clusters, so don't do it :) this is kind of a
special circumstances where other things have gone wrong.

Thanks

On Wed, Jun 5, 2019, 5:23 PM shalom sagges <sh...@gmail.com> wrote:

> If anyone has any idea on what might cause this issue, it'd be great.
>
> I don't understand what could trigger this exception.
> But what I really can't understand is why repairs started to run suddenly
> :-\
> There's no cron job running, no active repair process, no Validation
> compactions, Reaper is turned off....  I see repair running only in the
> logs.
>
> Thanks!
>
>
> On Wed, Jun 5, 2019 at 2:32 PM shalom sagges <sh...@gmail.com>
> wrote:
>
>> Hi All,
>>
>> I'm having a bad situation where after upgrading 2 nodes (binaries only)
>> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:
>>
>> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread
>> Thread[ReadStage-5,5,main]: {}
>> java.lang.ArrayIndexOutOfBoundsException: null
>>
>>
>> I also see errors on repairs but no repair is running at all. I verified
>> this with ps -ef command and nodetool compactionstats. The error I see is:
>> Failed creating a merkle tree for [repair
>> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /1.2.3.4
>> (see log for details)
>>
>> I saw repair errors on data tables as well.
>> nodetool status shows all are UN and nodetool describecluster shows two
>> schema versions as expected.
>>
>>
>> After the warnings appeared, clients started to get timed out read/write
>> queries.
>> Restarting the 2 nodes solved the clients' connection issues, but the
>> warnings are still being generated in the logs.
>>
>> Did anyone encounter such an issue and knows what this means?
>>
>> Thanks!
>>
>>

Re: AbstractLocalAwareExecutorService Exception During Upgrade

Posted by shalom sagges <sh...@gmail.com>.

If anyone has any idea on what might cause this issue, it'd be great.

I don't understand what could trigger this exception.
But what I really can't understand is why repairs started to run suddenly
:-\
There's no cron job running, no active repair process, no Validation
compactions, Reaper is turned off....  I see repair running only in the
logs.

Thanks!


On Wed, Jun 5, 2019 at 2:32 PM shalom sagges <sh...@gmail.com> wrote:

> Hi All,
>
> I'm having a bad situation where after upgrading 2 nodes (binaries only)
> from 2.1.21 to 3.11.4 I'm getting a lot of warnings as follows:
>
> AbstractLocalAwareExecutorService.java:167 - Uncaught exception on thread
> Thread[ReadStage-5,5,main]: {}
> java.lang.ArrayIndexOutOfBoundsException: null
>
>
> I also see errors on repairs but no repair is running at all. I verified
> this with ps -ef command and nodetool compactionstats. The error I see is:
> Failed creating a merkle tree for [repair
> #a95498f0-8783-11e9-b065-81cdbc6bee08 on system_auth/users, []], /1.2.3.4
> (see log for details)
>
> I saw repair errors on data tables as well.
> nodetool status shows all are UN and nodetool describecluster shows two
> schema versions as expected.
>
>
> After the warnings appeared, clients started to get timed out read/write
> queries.
> Restarting the 2 nodes solved the clients' connection issues, but the
> warnings are still being generated in the logs.
>
> Did anyone encounter such an issue and knows what this means?
>
> Thanks!
>
>