You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@cassandra.apache.org by "Jeremy Justus (JIRA)" <ji...@apache.org> on 2019/07/03 21:51:00 UTC

[jira] [Updated] (CASSANDRA-15199) Cassandra throwing occasional NPE 3.11.x

     [ https://issues.apache.org/jira/browse/CASSANDRA-15199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jeremy Justus updated CASSANDRA-15199:
--------------------------------------
    Description: 
Hey folks, decided to raise an official Jira(never done one of these before) about an issue we have found between Kong API Gateway leveraging Cassandra as our db.

We run a C* cluster in 2 close DCs, 6 C* nodes total, 3 in each DC. Data replicated across all nodes. We have found C* sometimes throws NPEs based on the calls made by the lua-cassandra driver Kong leverages([https://github.com/thibaultcha/lua-cassandra]). Very specifically it seems to occur when attempting to do paging across multiple C* nodes. When persistently paging to a single C* node we can't reproduce NPEs in C*.

The exact Error C* throws with its stack-trace can be seen here:

[https://github.com/Kong/kong/issues/4194#issuecomment-497572751]

And again when we tried to upgrade from 3.11.2 to 3.11.4 in hopes it was already resolved:

[https://github.com/Kong/kong/issues/4194#issuecomment-497590235]

Same error same line numbers, so code must be same in this portion.

 

Sample of our C* Config: [https://github.com/Kong/kong/issues/4194#issuecomment-497595766]

We discussed this in the ASF Slack flow as well:

ASF Slack discussion ([https://the-asf.slack.com/archives/CJZLTM05A/p1559321422028200] )

Some of the more important technical comments I saw people posting here:

[https://github.com/Kong/kong/issues/4194#issuecomment-497858824]

 

I am not a C* DBA so I don't have an exact repro for you other than stating what the client application was attempting to do(Paging across multiple C* nodes within a DC) when we could see the failures. Any Apache C* folk think they see the issue or could drop me C* JAR with extra debugging print statements I could run in dev to help feed you more info? Or if you see the problem and can one shot a fix so no more NPE and Cassandra responds appropriately to the client with some sort of error message around what was wrong that would be insightful.

 

Thanks!

  was:
Hey folks, decided to raise an official Jira(never done one of these before) about an issue we have found between Kong API Gateway leveraging Cassandra as our db.


We run a C* cluster in 2 close DCs, 6 C* nodes total, 3 in each DC. Data replicated across all nodes. We have found C* sometimes throws NPEs based on the calls made by the lua-cassandra driver Kong leverages([https://github.com/thibaultcha/lua-cassandra]). Very specifically it seems to occur when attempting to do paging across multiple C* nodes. When persistently paging to a single C* node we can't reproduce NPEs in C*.

The exact Error C* throws with its stack-trace can be seen here:

[https://github.com/Kong/kong/issues/4194#issuecomment-497572751]

And again when we tried to upgrade from 3.11.2 to 3.11.4 in hopes it was already resolved:

[https://github.com/Kong/kong/issues/4194#issuecomment-497590235]

Same error same line numbers, so code must be same in this portion.

 

Sample of our C* Config: [https://github.com/Kong/kong/issues/4194#issuecomment-497595766]

We discussed this in the ASF Slack flow as well:

ASF Slack discussion ([https://the-asf.slack.com/archives/CJZLTM05A/p1559321422028200] )

Some of the more important technical comments I saw people posting here:

[https://github.com/Kong/kong/issues/4194#issuecomment-497858824]

 

I am not a C* DBA so I don't have an exact repro for you other than stating what the client application was attempting to do(Paging across multiple C* nodes within a DC) when we could see the failures. Any Apache C* folk think they see the issue or could drop me C* JAR with extra debugging print statements I could run in dev to help feed you more info? Or if you see the problem and can one shot a fix so no more NPE and Cassandra responds appropriately to the client with some sort of error message around what was wrong that would be helpful.


> Cassandra throwing occasional NPE 3.11.x
> ----------------------------------------
>
>                 Key: CASSANDRA-15199
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15199
>             Project: Cassandra
>          Issue Type: Bug
>            Reporter: Jeremy Justus
>            Priority: Normal
>
> Hey folks, decided to raise an official Jira(never done one of these before) about an issue we have found between Kong API Gateway leveraging Cassandra as our db.
> We run a C* cluster in 2 close DCs, 6 C* nodes total, 3 in each DC. Data replicated across all nodes. We have found C* sometimes throws NPEs based on the calls made by the lua-cassandra driver Kong leverages([https://github.com/thibaultcha/lua-cassandra]). Very specifically it seems to occur when attempting to do paging across multiple C* nodes. When persistently paging to a single C* node we can't reproduce NPEs in C*.
> The exact Error C* throws with its stack-trace can be seen here:
> [https://github.com/Kong/kong/issues/4194#issuecomment-497572751]
> And again when we tried to upgrade from 3.11.2 to 3.11.4 in hopes it was already resolved:
> [https://github.com/Kong/kong/issues/4194#issuecomment-497590235]
> Same error same line numbers, so code must be same in this portion.
>  
> Sample of our C* Config: [https://github.com/Kong/kong/issues/4194#issuecomment-497595766]
> We discussed this in the ASF Slack flow as well:
> ASF Slack discussion ([https://the-asf.slack.com/archives/CJZLTM05A/p1559321422028200] )
> Some of the more important technical comments I saw people posting here:
> [https://github.com/Kong/kong/issues/4194#issuecomment-497858824]
>  
> I am not a C* DBA so I don't have an exact repro for you other than stating what the client application was attempting to do(Paging across multiple C* nodes within a DC) when we could see the failures. Any Apache C* folk think they see the issue or could drop me C* JAR with extra debugging print statements I could run in dev to help feed you more info? Or if you see the problem and can one shot a fix so no more NPE and Cassandra responds appropriately to the client with some sort of error message around what was wrong that would be insightful.
>  
> Thanks!



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscribe@cassandra.apache.org
For additional commands, e-mail: commits-help@cassandra.apache.org