You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@cassandra.apache.org by Mohammed Guller <mo...@glassbeam.com> on 2015/01/09 08:33:11 UTC

C* throws OOM error despite use of automatic paging

Hi -

We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax's Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement("SELECT x,y,z FROM cf").setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
      row = rs.one()
      process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed

Re: C* throws OOM error despite use of automatic paging

Posted by Mohammed Guller <mo...@glassbeam.com>.

There are no tombstones.

Mohammed


On Jan 12, 2015, at 9:11 PM, Dominic Letz <do...@exosite.com>> wrote:

Does your use case include many tombstones? If yes then that might explain the OOM situation.

If you want to know for sure you can enable the heap dump generation on crash in cassandra-env.sh just uncomment JVM_OPTS="$JVM_OPTS -XX:+HeapDumpOnOutOfMemoryError" and then run your query again. The heapdump will have the answer.




On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller <mo...@glassbeam.com>> wrote:
The heap usage is pretty low ( less than 700MB) when the application starts. I can see the heap usage gradually climbing once the application starts. C* does not log any errors before OOM happens.

Data is on EBS. Write throughput is quite high with two applications simultaneously pumping data into C*.


Mohammed

From: Ryan Svihla [mailto:rs@foundev.pro<ma...@foundev.pro>]
Sent: Monday, January 12, 2015 3:39 PM
To: user

Subject: Re: C* throws OOM error despite use of automatic paging

I think it's more accurate that to say that auto paging prevents one type of OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes?


On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller <mo...@glassbeam.com>> wrote:
nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections.

Mohammed

From: DuyHai Doan [mailto:doanduyhai@gmail.com<ma...@gmail.com>]
Sent: Friday, January 9, 2015 12:51 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: C* throws OOM error despite use of automatic paging

What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Hi –

We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement("SELECT x,y,z FROM cf").setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
      row = rs.one()
      process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed






--

Thanks,
Ryan Svihla



--
Dominic Letz
Director of R&D
Exosite<http://exosite.com>

Re: C* throws OOM error despite use of automatic paging

Posted by Dominic Letz <do...@exosite.com>.

Does your use case include many tombstones? If yes then that might explain
the OOM situation.

If you want to know for sure you can enable the heap dump generation on
crash in cassandra-env.sh just uncomment JVM_OPTS="$JVM_OPTS
-XX:+HeapDumpOnOutOfMemoryError" and then run your query again. The
heapdump will have the answer.




On Tue, Jan 13, 2015 at 10:54 AM, Mohammed Guller <mo...@glassbeam.com>
wrote:

>  The heap usage is pretty low ( less than 700MB) when the application
> starts. I can see the heap usage gradually climbing once the application
> starts. C* does not log any errors before OOM happens.
>
>
>
> Data is on EBS. Write throughput is quite high with two applications
> simultaneously pumping data into C*.
>
>
>
>
>
> Mohammed
>
>
>
> *From:* Ryan Svihla [mailto:rs@foundev.pro]
> *Sent:* Monday, January 12, 2015 3:39 PM
> *To:* user
>
> *Subject:* Re: C* throws OOM error despite use of automatic paging
>
>
>
> I think it's more accurate that to say that auto paging prevents one type
> of OOM. It's premature to diagnose it as 'not happening'.
>
>
>
> What is heap usage when you start? Are you storing your data on EBS? What
> kind of write throughput do you have going on at the same time? What errors
> do you have in the cassandra logs before this crashes?
>
>
>
>
>
> On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller <mo...@glassbeam.com>
> wrote:
>
> nodetool cfstats shows 9GB. We are storing simple primitive value. No
> blobs or collections.
>
>
>
> Mohammed
>
>
>
> *From:* DuyHai Doan [mailto:doanduyhai@gmail.com]
> *Sent:* Friday, January 9, 2015 12:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: C* throws OOM error despite use of automatic paging
>
>
>
> What is the data size of the column family you're trying to fetch with
> paging ? Are you storing big blob or just primitive values ?
>
>
>
> On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mo...@glassbeam.com>
> wrote:
>
> Hi –
>
>
>
> We have an ETL application that reads all rows from Cassandra (2.1.2),
> filters them and stores a small subset in an RDBMS. Our application is
> using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since
> the Java driver supports automatic paging, I was under the impression that
> SELECT queries should not cause an OOM error on the C* nodes. However, even
> with just 16GB data on each nodes, the C* nodes start throwing OOM error as
> soon as the application starts iterating through the rows of a table.
>
>
>
> The application code looks something like this:
>
>
>
> Statement stmt = new SimpleStatement("SELECT x,y,z FROM
> cf").setFetchSize(5000);
>
> ResultSet rs = session.execute(stmt);
>
> while (!rs.isExhausted()){
>
>       row = rs.one()
>
>       process(row)
>
> }
>
>
>
> Even after we reduced the page size to 1000, the C* nodes still crash. C*
> is running on M3.xlarge machines (4-cores, 15GB). We manually increased the
> heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes,
> the heap usage climbs up to 7.6GB. That does not make sense. Either
> automatic paging is not working or we are missing something.
>
>
>
> Does anybody have insights as to what could be happening? Thanks.
>
>
>
> Mohammed
>
>
>
>
>
>
>
>
>
>
>
> --
>
> Thanks,
>
> Ryan Svihla
>



-- 
Dominic Letz
Director of R&D
Exosite <http://exosite.com>

RE: C* throws OOM error despite use of automatic paging

Posted by Mohammed Guller <mo...@glassbeam.com>.

The heap usage is pretty low ( less than 700MB) when the application starts. I can see the heap usage gradually climbing once the application starts. C* does not log any errors before OOM happens.

Data is on EBS. Write throughput is quite high with two applications simultaneously pumping data into C*.

Mohammed

From: Ryan Svihla [mailto:rs@foundev.pro]
Sent: Monday, January 12, 2015 3:39 PM
To: user
Subject: Re: C* throws OOM error despite use of automatic paging

I think it's more accurate that to say that auto paging prevents one type of OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What kind of write throughput do you have going on at the same time? What errors do you have in the cassandra logs before this crashes?

On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller <mo...@glassbeam.com>> wrote:
nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections.

Mohammed

From: DuyHai Doan [mailto:doanduyhai@gmail.com<ma...@gmail.com>]
Sent: Friday, January 9, 2015 12:51 AM
To: user@cassandra.apache.org<ma...@cassandra.apache.org>
Subject: Re: C* throws OOM error despite use of automatic paging

What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Hi –

We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement("SELECT x,y,z FROM cf").setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
      row = rs.one()
      process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed

--

Thanks,
Ryan Svihla

Re: C* throws OOM error despite use of automatic paging

Posted by Ryan Svihla <rs...@foundev.pro>.

I think it's more accurate that to say that auto paging prevents one type
of OOM. It's premature to diagnose it as 'not happening'.

What is heap usage when you start? Are you storing your data on EBS? What
kind of write throughput do you have going on at the same time? What errors
do you have in the cassandra logs before this crashes?


On Sat, Jan 10, 2015 at 1:48 PM, Mohammed Guller <mo...@glassbeam.com>
wrote:

>  nodetool cfstats shows 9GB. We are storing simple primitive value. No
> blobs or collections.
>
>
>
> Mohammed
>
>
>
> *From:* DuyHai Doan [mailto:doanduyhai@gmail.com]
> *Sent:* Friday, January 9, 2015 12:51 AM
> *To:* user@cassandra.apache.org
> *Subject:* Re: C* throws OOM error despite use of automatic paging
>
>
>
> What is the data size of the column family you're trying to fetch with
> paging ? Are you storing big blob or just primitive values ?
>
>
>
> On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mo...@glassbeam.com>
> wrote:
>
> Hi –
>
>
>
> We have an ETL application that reads all rows from Cassandra (2.1.2),
> filters them and stores a small subset in an RDBMS. Our application is
> using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since
> the Java driver supports automatic paging, I was under the impression that
> SELECT queries should not cause an OOM error on the C* nodes. However, even
> with just 16GB data on each nodes, the C* nodes start throwing OOM error as
> soon as the application starts iterating through the rows of a table.
>
>
>
> The application code looks something like this:
>
>
>
> Statement stmt = new SimpleStatement("SELECT x,y,z FROM
> cf").setFetchSize(5000);
>
> ResultSet rs = session.execute(stmt);
>
> while (!rs.isExhausted()){
>
>       row = rs.one()
>
>       process(row)
>
> }
>
>
>
> Even after we reduced the page size to 1000, the C* nodes still crash. C*
> is running on M3.xlarge machines (4-cores, 15GB). We manually increased the
> heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes,
> the heap usage climbs up to 7.6GB. That does not make sense. Either
> automatic paging is not working or we are missing something.
>
>
>
> Does anybody have insights as to what could be happening? Thanks.
>
>
>
> Mohammed
>
>
>
>
>
>
>



-- 

Thanks,
Ryan Svihla

RE: C* throws OOM error despite use of automatic paging

Posted by Mohammed Guller <mo...@glassbeam.com>.

nodetool cfstats shows 9GB. We are storing simple primitive value. No blobs or collections.

Mohammed

From: DuyHai Doan [mailto:doanduyhai@gmail.com]
Sent: Friday, January 9, 2015 12:51 AM
To: user@cassandra.apache.org
Subject: Re: C* throws OOM error despite use of automatic paging

What is the data size of the column family you're trying to fetch with paging ? Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mo...@glassbeam.com>> wrote:
Hi –

We have an ETL application that reads all rows from Cassandra (2.1.2), filters them and stores a small subset in an RDBMS. Our application is using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since the Java driver supports automatic paging, I was under the impression that SELECT queries should not cause an OOM error on the C* nodes. However, even with just 16GB data on each nodes, the C* nodes start throwing OOM error as soon as the application starts iterating through the rows of a table.

The application code looks something like this:

Statement stmt = new SimpleStatement("SELECT x,y,z FROM cf").setFetchSize(5000);
ResultSet rs = session.execute(stmt);
while (!rs.isExhausted()){
      row = rs.one()
      process(row)
}

Even after we reduced the page size to 1000, the C* nodes still crash. C* is running on M3.xlarge machines (4-cores, 15GB). We manually increased the heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes, the heap usage climbs up to 7.6GB. That does not make sense. Either automatic paging is not working or we are missing something.

Does anybody have insights as to what could be happening? Thanks.

Mohammed

Re: C* throws OOM error despite use of automatic paging

Posted by DuyHai Doan <do...@gmail.com>.

What is the data size of the column family you're trying to fetch with
paging ? Are you storing big blob or just primitive values ?

On Fri, Jan 9, 2015 at 8:33 AM, Mohammed Guller <mo...@glassbeam.com>
wrote:

>  Hi –
>
>
>
> We have an ETL application that reads all rows from Cassandra (2.1.2),
> filters them and stores a small subset in an RDBMS. Our application is
> using Datastax’s Java driver (2.1.4) to fetch data from the C* nodes. Since
> the Java driver supports automatic paging, I was under the impression that
> SELECT queries should not cause an OOM error on the C* nodes. However, even
> with just 16GB data on each nodes, the C* nodes start throwing OOM error as
> soon as the application starts iterating through the rows of a table.
>
>
>
> The application code looks something like this:
>
>
>
> Statement stmt = new SimpleStatement("SELECT x,y,z FROM
> cf").setFetchSize(5000);
>
> ResultSet rs = session.execute(stmt);
>
> while (!rs.isExhausted()){
>
>       row = rs.one()
>
>       process(row)
>
> }
>
>
>
> Even after we reduced the page size to 1000, the C* nodes still crash. C*
> is running on M3.xlarge machines (4-cores, 15GB). We manually increased the
> heap size to 8GB just to see how much heap C* consumes. With 10-15 minutes,
> the heap usage climbs up to 7.6GB. That does not make sense. Either
> automatic paging is not working or we are missing something.
>
>
>
> Does anybody have insights as to what could be happening? Thanks.
>
>
>
> Mohammed
>
>
>
>
>

RE: C* throws OOM error despite use of automatic paging

Posted by Mohammed Guller <mo...@glassbeam.com>.

Hi Jens,
Thank you for sharing the results of your tests.

I even tried setFetchSize with 100 and it didn't help much. I am coming to the conclusion that the correct number for setFetchSize depends on the data. In some cases, default is fine, whereas in others it needs to be significantly lower than 5000. As you mentioned, that leaves a lot of operational risk for production use. 

It would be great if there was some documented guidelines on how to select the correct number for setFetchSize.

Mohammed


-----Original Message-----
From: Jens-U. Mozdzen [mailto:jmozdzen@nde.ag] 
Sent: Friday, January 9, 2015 4:02 AM
To: user@cassandra.apache.org
Subject: Re: C* throws OOM error despite use of automatic paging

Hi Mohammed,

Zitat von Mohammed Guller <mo...@glassbeam.com>:
> Hi -
>
> We have an ETL application that reads all rows from Cassandra (2.1.2), 
> filters them and stores a small subset in an RDBMS. Our application is 
> using Datastax's Java driver (2.1.4) to fetch data from the C* nodes. 
> Since the Java driver supports automatic paging, I was under the 
> impression that SELECT queries should not cause an OOM error on the C* 
> nodes. However, even with just 16GB data on each nodes, the C* nodes 
> start throwing OOM error as soon as the application starts iterating 
> through the rows of a table.
>
> The application code looks something like this:
>
> Statement stmt = new SimpleStatement("SELECT x,y,z FROM 
> cf").setFetchSize(5000); ResultSet rs = session.execute(stmt); while 
> (!rs.isExhausted()){
>       row = rs.one()
>       process(row)
> }
>
> Even after we reduced the page size to 1000, the C* nodes still crash. 
> C* is running on M3.xlarge machines (4-cores, 15GB).

I've been running a few tests to determine the effect of
setFetchSize() on heap pressure on the Cassandra nodes and came to the conclusion that a limit of "500" is much more helpful than values above "1000"... with too high values, we managed to put that much pressure on the nodes that we had to restart them.

This, btw, leaves a lot of operational risk for production use. I've i.e. found no way to influence time-outs or fetch size with the Datastax JDBC driver, with according consequences on the queries
(time-outs) and C* node behavior (esp. heap pressure). Hence, operating a C* cluster needs a lot of trust in the skills of the "users" (developers/maintainers of the client-side solutions) and their tools :( .

Regards,
Jens

Re: C* throws OOM error despite use of automatic paging

Posted by "Jens-U. Mozdzen" <jm...@nde.ag>.

Hi Mohammed,

Zitat von Mohammed Guller <mo...@glassbeam.com>:
> Hi -
>
> We have an ETL application that reads all rows from Cassandra  
> (2.1.2), filters them and stores a small subset in an RDBMS. Our  
> application is using Datastax's Java driver (2.1.4) to fetch data  
> from the C* nodes. Since the Java driver supports automatic paging,  
> I was under the impression that SELECT queries should not cause an  
> OOM error on the C* nodes. However, even with just 16GB data on each  
> nodes, the C* nodes start throwing OOM error as soon as the  
> application starts iterating through the rows of a table.
>
> The application code looks something like this:
>
> Statement stmt = new SimpleStatement("SELECT x,y,z FROM  
> cf").setFetchSize(5000);
> ResultSet rs = session.execute(stmt);
> while (!rs.isExhausted()){
>       row = rs.one()
>       process(row)
> }
>
> Even after we reduced the page size to 1000, the C* nodes still  
> crash. C* is running on M3.xlarge machines (4-cores, 15GB).

I've been running a few tests to determine the effect of  
setFetchSize() on heap pressure on the Cassandra nodes and came to the  
conclusion that a limit of "500" is much more helpful than values  
above "1000"... with too high values, we managed to put that much  
pressure on the nodes that we had to restart them.

This, btw, leaves a lot of operational risk for production use. I've  
i.e. found no way to influence time-outs or fetch size with the  
Datastax JDBC driver, with according consequences on the queries  
(time-outs) and C* node behavior (esp. heap pressure). Hence,  
operating a C* cluster needs a lot of trust in the skills of the  
"users" (developers/maintainers of the client-side solutions) and  
their tools :( .

Regards,
Jens