You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@accumulo.apache.org by Geoffry Roberts <th...@gmail.com> on 2014/09/30 18:03:40 UTC

Out of memory when putting many rows in an Acc table

I am try to pump some data into Accumulo but I keep encountering

Exception in thread "Thrift Connection Pool Checker"
java.lang.OutOfMemoryError: Java heap space

at java.util.HashMap.newValueIterator(HashMap.java:971)

at java.util.HashMap$Values.iterator(HashMap.java:1038)

at
org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.closeConnections(ThriftTransportPool.java:103)

at
org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.run(ThriftTransportPool.java:147)

at java.lang.Thread.run(Thread.java:745)

I tried, as a work around, creating a new BatchWriter and closing the old
one every ten thousand rows, but to no avail.  Data gets written up to the
200kth row, then the error.

I have a table of 8M rows in a RDB that I am pumping into Acc via a groovy
script.  The rows are narrow, a short text field and four floats.

I googled of course but nothing was helpful.  What can be done?

Thanks so much.

-- 
There are ways and there are ways,

Geoffry Roberts

Re: Out of memory when putting many rows in an Acc table

Posted by Geoffry Roberts <th...@gmail.com>.
Apparently, my JDBC driver, no matter what settings one sets, always tries
to load the entire table into memory.  I tried using Groovy's page facility
thinking it would fetch rows in an increment of say, 10k.  It creates the
impression that it's doing this, but behind the scenes, it still tries to
read in the entire table.

Whatever! The whole thing was cleaning my clock.  So I rolled my own page
mechanism, which works my resubmitting the query with an offset and limit,
thus I only get as many rows at at time as I want.  And it is working.
BatchWriter is doing fine.  I'm getting my 8M row table and will move on to
the bigger ones.

For the record, the RDB is MySQL using the latest driver.

Thanks for the help.

On Wed, Oct 1, 2014 at 2:56 PM, Josh Elser <jo...@gmail.com> wrote:

> Or nofile (too). ulimit is your friend :)
>
> Eric Newton wrote:
>
>>     I realized this could be due to an inability by the JVM to create
>>     additional native threads
>>
>>
>> You may need to increase the nproc limit on your systems.
>>
>> -Eric
>>
>>
>> On Wed, Oct 1, 2014 at 11:12 AM, Geoffry Roberts <threadedblue@gmail.com
>> <ma...@gmail.com>> wrote:
>>
>>     Thanks for the response.
>>
>>     The only reason I was creating q new BatchWriter periodically was to
>>     determine if BatchWriter was holding on to memory even after a
>>     flush.   I had memory on my BatchWriterConfig set to 1M already.  I
>>     am reading my RDB tables in pages of 10K rows.
>>
>>     Bumping up the JVM size didn't help,
>>
>>     I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not
>>     produce any output (hprof file). I realized this could be due to an
>>     inability by the JVM to create additional native threads.
>>
>>     What I now think is the problem is not with Acc directly but hiding
>>     out on the JDBC side.
>>
>>     Perhaps this is not an Acc issue at all but merely masquerading as
>>     one.  We'll see.
>>
>>
>>     On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser <josh.elser@gmail.com
>>     <ma...@gmail.com>> wrote:
>>
>>         You shouldn't have to create a new BatchWriter -- have you tried
>>         reducing the amount of memory the BatchWriter will use? It keeps
>>         a cache internally to try to do an amortization of Mutations to
>>         send to a given tabletserver.
>>
>>         To limit this memory, use the
>>         BatchWriterConfig#__setMaxMemory(long) method. By default, the
>>         maxMemory value is set to 50MB. Reducing this may be enough to
>>         hold less data in your client and give you some more head room.
>>
>>         Alternatively, you could give your client JVM some more heap :)
>>
>>
>>         Geoffry Roberts wrote:
>>
>>             I am try to pump some data into Accumulo but I keep
>> encountering
>>
>>             Exception in thread "Thrift Connection Pool Checker"
>>             java.lang.OutOfMemoryError: Java heap space
>>
>>             at java.util.HashMap.__newValueIterator(HashMap.java:__971)
>>
>>             at java.util.HashMap$Values.__iterator(HashMap.java:1038)
>>
>>             at
>>             org.apache.accumulo.core.__client.impl.__
>> ThriftTransportPool$Closer.__closeConnections(__
>> ThriftTransportPool.java:103)
>>
>>             at
>>             org.apache.accumulo.core.__client.impl.__
>> ThriftTransportPool$Closer.__run(ThriftTransportPool.java:__147)
>>
>>             at java.lang.Thread.run(Thread.__java:745)
>>
>>
>>             I tried, as a work around, creating a new BatchWriter and
>>             closing the
>>             old one every ten thousand rows, but to no avail.  Data gets
>>             written up
>>             to the 200kth row, then the error.
>>
>>             I have a table of 8M rows in a RDB that I am pumping into
>>             Acc via a
>>             groovy script.  The rows are narrow, a short text field and
>>             four floats.
>>
>>             I googled of course but nothing was helpful.  What can be
>> done?
>>
>>             Thanks so much.
>>
>>             --
>>             There are ways and there are ways,
>>
>>             Geoffry Roberts
>>
>>
>>
>>
>>     --
>>     There are ways and there are ways,
>>
>>     Geoffry Roberts
>>
>>
>>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Out of memory when putting many rows in an Acc table

Posted by Josh Elser <jo...@gmail.com>.
Or nofile (too). ulimit is your friend :)

Eric Newton wrote:
>     I realized this could be due to an inability by the JVM to create
>     additional native threads
>
>
> You may need to increase the nproc limit on your systems.
>
> -Eric
>
>
> On Wed, Oct 1, 2014 at 11:12 AM, Geoffry Roberts <threadedblue@gmail.com
> <ma...@gmail.com>> wrote:
>
>     Thanks for the response.
>
>     The only reason I was creating q new BatchWriter periodically was to
>     determine if BatchWriter was holding on to memory even after a
>     flush.   I had memory on my BatchWriterConfig set to 1M already.  I
>     am reading my RDB tables in pages of 10K rows.
>
>     Bumping up the JVM size didn't help,
>
>     I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not
>     produce any output (hprof file). I realized this could be due to an
>     inability by the JVM to create additional native threads.
>
>     What I now think is the problem is not with Acc directly but hiding
>     out on the JDBC side.
>
>     Perhaps this is not an Acc issue at all but merely masquerading as
>     one.  We'll see.
>
>
>     On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser <josh.elser@gmail.com
>     <ma...@gmail.com>> wrote:
>
>         You shouldn't have to create a new BatchWriter -- have you tried
>         reducing the amount of memory the BatchWriter will use? It keeps
>         a cache internally to try to do an amortization of Mutations to
>         send to a given tabletserver.
>
>         To limit this memory, use the
>         BatchWriterConfig#__setMaxMemory(long) method. By default, the
>         maxMemory value is set to 50MB. Reducing this may be enough to
>         hold less data in your client and give you some more head room.
>
>         Alternatively, you could give your client JVM some more heap :)
>
>
>         Geoffry Roberts wrote:
>
>             I am try to pump some data into Accumulo but I keep encountering
>
>             Exception in thread "Thrift Connection Pool Checker"
>             java.lang.OutOfMemoryError: Java heap space
>
>             at java.util.HashMap.__newValueIterator(HashMap.java:__971)
>
>             at java.util.HashMap$Values.__iterator(HashMap.java:1038)
>
>             at
>             org.apache.accumulo.core.__client.impl.__ThriftTransportPool$Closer.__closeConnections(__ThriftTransportPool.java:103)
>
>             at
>             org.apache.accumulo.core.__client.impl.__ThriftTransportPool$Closer.__run(ThriftTransportPool.java:__147)
>
>             at java.lang.Thread.run(Thread.__java:745)
>
>
>             I tried, as a work around, creating a new BatchWriter and
>             closing the
>             old one every ten thousand rows, but to no avail.  Data gets
>             written up
>             to the 200kth row, then the error.
>
>             I have a table of 8M rows in a RDB that I am pumping into
>             Acc via a
>             groovy script.  The rows are narrow, a short text field and
>             four floats.
>
>             I googled of course but nothing was helpful.  What can be done?
>
>             Thanks so much.
>
>             --
>             There are ways and there are ways,
>
>             Geoffry Roberts
>
>
>
>
>     --
>     There are ways and there are ways,
>
>     Geoffry Roberts
>
>

Re: Out of memory when putting many rows in an Acc table

Posted by Eric Newton <er...@gmail.com>.
>
> I realized this could be due to an inability by the JVM to create
> additional native threads


You may need to increase the nproc limit on your systems.

-Eric


On Wed, Oct 1, 2014 at 11:12 AM, Geoffry Roberts <th...@gmail.com>
wrote:

> Thanks for the response.
>
> The only reason I was creating q new BatchWriter periodically was to
> determine if BatchWriter was holding on to memory even after a flush.   I
> had memory on my BatchWriterConfig set to 1M already.  I am reading my RDB
> tables in pages of 10K rows.
>
> Bumping up the JVM size didn't help,
>
> I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not
> produce any output (hprof file). I realized this could be due to an
> inability by the JVM to create additional native threads.
>
> What I now think is the problem is not with Acc directly but hiding out on
> the JDBC side.
>
> Perhaps this is not an Acc issue at all but merely masquerading as one.
> We'll see.
>
>
> On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser <jo...@gmail.com> wrote:
>
>> You shouldn't have to create a new BatchWriter -- have you tried reducing
>> the amount of memory the BatchWriter will use? It keeps a cache internally
>> to try to do an amortization of Mutations to send to a given tabletserver.
>>
>> To limit this memory, use the BatchWriterConfig#setMaxMemory(long)
>> method. By default, the maxMemory value is set to 50MB. Reducing this may
>> be enough to hold less data in your client and give you some more head room.
>>
>> Alternatively, you could give your client JVM some more heap :)
>>
>>
>> Geoffry Roberts wrote:
>>
>>> I am try to pump some data into Accumulo but I keep encountering
>>>
>>> Exception in thread "Thrift Connection Pool Checker"
>>> java.lang.OutOfMemoryError: Java heap space
>>>
>>> at java.util.HashMap.newValueIterator(HashMap.java:971)
>>>
>>> at java.util.HashMap$Values.iterator(HashMap.java:1038)
>>>
>>> at
>>> org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.
>>> closeConnections(ThriftTransportPool.java:103)
>>>
>>> at
>>> org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.
>>> run(ThriftTransportPool.java:147)
>>>
>>> at java.lang.Thread.run(Thread.java:745)
>>>
>>>
>>> I tried, as a work around, creating a new BatchWriter and closing the
>>> old one every ten thousand rows, but to no avail.  Data gets written up
>>> to the 200kth row, then the error.
>>>
>>> I have a table of 8M rows in a RDB that I am pumping into Acc via a
>>> groovy script.  The rows are narrow, a short text field and four floats.
>>>
>>> I googled of course but nothing was helpful.  What can be done?
>>>
>>> Thanks so much.
>>>
>>> --
>>> There are ways and there are ways,
>>>
>>> Geoffry Roberts
>>>
>>
>
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts
>

Re: Out of memory when putting many rows in an Acc table

Posted by Geoffry Roberts <th...@gmail.com>.
Thanks for the response.

The only reason I was creating q new BatchWriter periodically was to
determine if BatchWriter was holding on to memory even after a flush.   I
had memory on my BatchWriterConfig set to 1M already.  I am reading my RDB
tables in pages of 10K rows.

Bumping up the JVM size didn't help,

I tried setting -XX:+HeapDumpOnOutOfMemoryError and when it did not produce
any output (hprof file). I realized this could be due to an inability by
the JVM to create additional native threads.

What I now think is the problem is not with Acc directly but hiding out on
the JDBC side.

Perhaps this is not an Acc issue at all but merely masquerading as one.
We'll see.

On Tue, Sep 30, 2014 at 12:17 PM, Josh Elser <jo...@gmail.com> wrote:

> You shouldn't have to create a new BatchWriter -- have you tried reducing
> the amount of memory the BatchWriter will use? It keeps a cache internally
> to try to do an amortization of Mutations to send to a given tabletserver.
>
> To limit this memory, use the BatchWriterConfig#setMaxMemory(long)
> method. By default, the maxMemory value is set to 50MB. Reducing this may
> be enough to hold less data in your client and give you some more head room.
>
> Alternatively, you could give your client JVM some more heap :)
>
>
> Geoffry Roberts wrote:
>
>> I am try to pump some data into Accumulo but I keep encountering
>>
>> Exception in thread "Thrift Connection Pool Checker"
>> java.lang.OutOfMemoryError: Java heap space
>>
>> at java.util.HashMap.newValueIterator(HashMap.java:971)
>>
>> at java.util.HashMap$Values.iterator(HashMap.java:1038)
>>
>> at
>> org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.
>> closeConnections(ThriftTransportPool.java:103)
>>
>> at
>> org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.
>> run(ThriftTransportPool.java:147)
>>
>> at java.lang.Thread.run(Thread.java:745)
>>
>>
>> I tried, as a work around, creating a new BatchWriter and closing the
>> old one every ten thousand rows, but to no avail.  Data gets written up
>> to the 200kth row, then the error.
>>
>> I have a table of 8M rows in a RDB that I am pumping into Acc via a
>> groovy script.  The rows are narrow, a short text field and four floats.
>>
>> I googled of course but nothing was helpful.  What can be done?
>>
>> Thanks so much.
>>
>> --
>> There are ways and there are ways,
>>
>> Geoffry Roberts
>>
>


-- 
There are ways and there are ways,

Geoffry Roberts

Re: Out of memory when putting many rows in an Acc table

Posted by Josh Elser <jo...@gmail.com>.
You shouldn't have to create a new BatchWriter -- have you tried 
reducing the amount of memory the BatchWriter will use? It keeps a cache 
internally to try to do an amortization of Mutations to send to a given 
tabletserver.

To limit this memory, use the BatchWriterConfig#setMaxMemory(long) 
method. By default, the maxMemory value is set to 50MB. Reducing this 
may be enough to hold less data in your client and give you some more 
head room.

Alternatively, you could give your client JVM some more heap :)

Geoffry Roberts wrote:
> I am try to pump some data into Accumulo but I keep encountering
>
> Exception in thread "Thrift Connection Pool Checker"
> java.lang.OutOfMemoryError: Java heap space
>
> at java.util.HashMap.newValueIterator(HashMap.java:971)
>
> at java.util.HashMap$Values.iterator(HashMap.java:1038)
>
> at
> org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.closeConnections(ThriftTransportPool.java:103)
>
> at
> org.apache.accumulo.core.client.impl.ThriftTransportPool$Closer.run(ThriftTransportPool.java:147)
>
> at java.lang.Thread.run(Thread.java:745)
>
>
> I tried, as a work around, creating a new BatchWriter and closing the
> old one every ten thousand rows, but to no avail.  Data gets written up
> to the 200kth row, then the error.
>
> I have a table of 8M rows in a RDB that I am pumping into Acc via a
> groovy script.  The rows are narrow, a short text field and four floats.
>
> I googled of course but nothing was helpful.  What can be done?
>
> Thanks so much.
>
> --
> There are ways and there are ways,
>
> Geoffry Roberts