You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@storm.apache.org by Navin Ipe <na...@searchlighthealth.com> on 2017/02/05 11:53:39 UTC

Identifying the source of the memory error in Storm

*Hi,*
*I have a bolt which emits around 15000 tuples sometimes. Sometimes it
emits more than 20000 tuples. I think when this happens, there's a memory
issue and the workers get restarted. This is what worker.log.err contains:*





*Java HotSpot(TM) 64-Bit Server VM warning: INFO:
os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
allocate memory' (errno=12)# There is insufficient memory for the Java
Runtime Environment to continue.# Native memory allocation (mmap) failed to
map 62914560 bytes for committing reserved memory.# An error report file
with more information is saved as:#
/home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*

*The odd part is, that in all my bolts I have *



*    @Override    public void execute(Tuple tuple) {        try {*

*..some code; including the code that emits tuples*

*} catch(Exception ex) {logger.info <http://logger.info>("The exception {},
{}", ex.getCause(), ex.getMessage());}    }*

*But in the logs I never see the string "The exception". But worker.log
shows:*






*2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server VM
warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0) failed;
error='Cannot allocate memory' (errno=12)2017-02-05 09:14:01.320 STDERR
[INFO] #2017-02-05 09:14:01.330 STDERR [INFO] # There is insufficient
memory for the Java Runtime Environment to continue.2017-02-05 09:14:01.330
STDERR [INFO] # Native memory allocation (mmap) failed to map 37748736
bytes for committing reserved memory.2017-02-05 09:14:01.331 STDERR [INFO]
# An error report file with more information is saved as:2017-02-05
09:14:01.331 STDERR [INFO] #
/home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log2017-02-05
09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
{"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
"-Xmx1024m",... etc*

*These are the settings I'm using for the topology:*










*        Config stormConfig = new Config();
stormConfig.setNumWorkers(20);        stormConfig.setNumAckers(20);
stormConfig.put(Config.TOPOLOGY_DEBUG, false);
stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,
1024);        stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
65536);
stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,
65536);        stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING,
2);        stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS,
2200);                stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS,
Arrays.asList(new String[]{"localhost"}));
stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*



*So am I right in assuming the exception is not thrown in my code but is
thrown in the worker thread? Do such exceptions happen when the worker
isn't able to receive too many tuples in its queue?*
*What can I do to avoid this problem?*

-- 
Regards,
Navin

Re: Identifying the source of the memory error in Storm

Posted by Navin Ipe <na...@searchlighthealth.com>.

Thank you Mostafa.

On Tue, Feb 7, 2017 at 2:25 PM, Mostafa Gomaa <mg...@trendak.com> wrote:

> I had a similar issue and I solved it by setting this option worker.heap.
> memory.mb
>
>
> On Feb 7, 2017 10:45 AM, "Navin Ipe" <na...@searchlighthealth.com>
> wrote:
>
>> Hi,
>>
>> Even though I ran the topology on a server with 30GB RAM, it still
>> crashed.
>> I had set *stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" +
>> "15g");*
>>
>> But still, when I see the workers using htop, their virtual memory is
>> shown as 15G, but toward the right side of the screen, under the command
>> column it shows "java -Xmx2048m and a few other options". I assume this was
>> the command that storm used to start the worker.
>>
>> So howcome my memory setting isn't getting used by the worker? Why is it
>> still using 2GB instead of 15GB?
>> Also, out of the 30GB, 25GB was getting used. How could that happen when
>> I have only 4 slots and 4 workers running? The exact same thing was taking
>> up just 5GB on a system with 10GB RAM, where I configured -Xmx to "2g".
>>
>> Could you help me understand this?
>>
>>
>> On Mon, Feb 6, 2017 at 2:29 PM, Navin Ipe <navin.ipe@searchlighthealth.c
>> om> wrote:
>>
>>> Thank you. Been monitoring it via JConsole, and these are what I see:
>>> Supervisor used memory: 61MB
>>>
>>> *Supervisor committed memory: 171MB*
>>> *Supervisor Max memory: 239.1MB*
>>>
>>> Nimbus used memory: 44.3MB
>>> Nimbus committed memory: 169.3MB
>>> Nimbus max memory: 954.7MB
>>>
>>> Zookeeper used memory: 224MB
>>> Zookeeper committed memory: 529MB
>>> Zookeeper Max memory: 1.9GB
>>>
>>> Worker used memory: 941MB
>>>
>>> *Worker committed memory: 1.4GB*
>>> *Worker Max memory: 1.9GB*
>>>
>>> So from what it looks like, even if the worker memory is managed and
>>> kept low, the supervisor can crash because of low memory. So the solution
>>> appears to be to increase supervisor memory in storm.yaml, use bigger RAM
>>> and use swap space.
>>>
>>> If you have any other opinions, please let me know.
>>>
>>>
>>> On Sun, Feb 5, 2017 at 7:10 PM, Andrea Gazzarini <gx...@gmail.com>
>>> wrote:
>>>
>>>> Hi Navin,
>>>> I think this line is a good starting point for your analysis:
>>>>
>>>>
>>>>
>>>> *"There is insufficient memory for the Java Runtime Environment to
>>>> continue." *I don't believe this scenario is caught by the JVM as a
>>>> checked exception: in my opinion it belongs to the "Error" class, and that
>>>> would explain why the catch block is never reached.
>>>> In addition, your assumption could be also right: the part of code that
>>>> raises the exception could be everywhere in the worker code, not
>>>> necessarily within your class; this because memory errors, differently from
>>>> what in general happens for exceptions, don't have a deterministic point of
>>>> failure, they depends on the system state at a given moment.
>>>>
>>>> Please expand a bit (or investigate on yourself) your architecture,
>>>> nodes, hardware resources and any information that can helps understanding
>>>> your context. Tools like JVisualVM, JConsole, Storm GUI are precious
>>>> friends in this contexts.
>>>>
>>>> Best,
>>>> Andrea
>>>>
>>>>
>>>> On 05/02/17 12:53, Navin Ipe wrote:
>>>>
>>>>
>>>>
>>>> *Hi, *
>>>> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it
>>>> emits more than 20000 tuples. I think when this happens, there's a memory
>>>> issue and the workers get restarted. This is what worker.log.err contains:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> * Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>>> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
>>>> allocate memory' (errno=12) # There is insufficient memory for the Java
>>>> Runtime Environment to continue. # Native memory allocation (mmap) failed
>>>> to map 62914560 bytes for committing reserved memory. # An error report
>>>> file with more information is saved as: #
>>>> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*
>>>>
>>>> *The odd part is, that in all my bolts I have *
>>>>
>>>>
>>>>
>>>> *     @Override     public void execute(Tuple tuple) {         try { *
>>>>
>>>> *..some code; including the code that emits tuples *
>>>>
>>>> *} catch(Exception ex) {logger.info <http://logger.info>("The exception
>>>> {}, {}", ex.getCause(), ex.getMessage());}     }*
>>>>
>>>> *But in the logs I never see the string "The exception". But worker.log
>>>> shows:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server
>>>> VM warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0)
>>>> failed; error='Cannot allocate memory' (errno=12) 2017-02-05 09:14:01.320
>>>> STDERR [INFO] # 2017-02-05 09:14:01.330 STDERR [INFO] # There is
>>>> insufficient memory for the Java Runtime Environment to continue.
>>>> 2017-02-05 09:14:01.330 STDERR [INFO] # Native memory allocation (mmap)
>>>> failed to map 37748736 bytes for committing reserved memory. 2017-02-05
>>>> 09:14:01.331 STDERR [INFO] # An error report file with more information is
>>>> saved as: 2017-02-05 09:14:01.331 STDERR [INFO] #
>>>> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
>>>> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
>>>> HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
>>>> with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
>>>> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>>>> "-Xmx1024m",... etc*
>>>>
>>>> *These are the settings I'm using for the topology:*
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *         Config stormConfig = new Config();
>>>> stormConfig.setNumWorkers(20);         stormConfig.setNumAckers(20);
>>>>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
>>>> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,          1024);
>>>>         stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
>>>> 65536);
>>>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    65536);
>>>>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
>>>> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
>>>> stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(new
>>>> String[]{"localhost"}));
>>>> stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*
>>>>
>>>>
>>>>
>>>> *So am I right in assuming the exception is not thrown in my code but
>>>> is thrown in the worker thread? Do such exceptions happen when the worker
>>>> isn't able to receive too many tuples in its queue? *
>>>> *What can I do to avoid this problem?*
>>>>
>>>> --
>>>> Regards,
>>>> Navin
>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>


-- 
Regards,
Navin

Re: Identifying the source of the memory error in Storm

Posted by Mostafa Gomaa <mg...@trendak.com>.

I had a similar issue and I solved it by setting this option worker.heap.
memory.mb


On Feb 7, 2017 10:45 AM, "Navin Ipe" <na...@searchlighthealth.com>
wrote:

> Hi,
>
> Even though I ran the topology on a server with 30GB RAM, it still crashed.
> I had set *stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" +
> "15g");*
>
> But still, when I see the workers using htop, their virtual memory is
> shown as 15G, but toward the right side of the screen, under the command
> column it shows "java -Xmx2048m and a few other options". I assume this was
> the command that storm used to start the worker.
>
> So howcome my memory setting isn't getting used by the worker? Why is it
> still using 2GB instead of 15GB?
> Also, out of the 30GB, 25GB was getting used. How could that happen when I
> have only 4 slots and 4 workers running? The exact same thing was taking up
> just 5GB on a system with 10GB RAM, where I configured -Xmx to "2g".
>
> Could you help me understand this?
>
>
> On Mon, Feb 6, 2017 at 2:29 PM, Navin Ipe <navin.ipe@searchlighthealth.com
> > wrote:
>
>> Thank you. Been monitoring it via JConsole, and these are what I see:
>> Supervisor used memory: 61MB
>>
>> *Supervisor committed memory: 171MB*
>> *Supervisor Max memory: 239.1MB*
>>
>> Nimbus used memory: 44.3MB
>> Nimbus committed memory: 169.3MB
>> Nimbus max memory: 954.7MB
>>
>> Zookeeper used memory: 224MB
>> Zookeeper committed memory: 529MB
>> Zookeeper Max memory: 1.9GB
>>
>> Worker used memory: 941MB
>>
>> *Worker committed memory: 1.4GB*
>> *Worker Max memory: 1.9GB*
>>
>> So from what it looks like, even if the worker memory is managed and kept
>> low, the supervisor can crash because of low memory. So the solution
>> appears to be to increase supervisor memory in storm.yaml, use bigger RAM
>> and use swap space.
>>
>> If you have any other opinions, please let me know.
>>
>>
>> On Sun, Feb 5, 2017 at 7:10 PM, Andrea Gazzarini <gx...@gmail.com>
>> wrote:
>>
>>> Hi Navin,
>>> I think this line is a good starting point for your analysis:
>>>
>>>
>>>
>>> *"There is insufficient memory for the Java Runtime Environment to
>>> continue." *I don't believe this scenario is caught by the JVM as a
>>> checked exception: in my opinion it belongs to the "Error" class, and that
>>> would explain why the catch block is never reached.
>>> In addition, your assumption could be also right: the part of code that
>>> raises the exception could be everywhere in the worker code, not
>>> necessarily within your class; this because memory errors, differently from
>>> what in general happens for exceptions, don't have a deterministic point of
>>> failure, they depends on the system state at a given moment.
>>>
>>> Please expand a bit (or investigate on yourself) your architecture,
>>> nodes, hardware resources and any information that can helps understanding
>>> your context. Tools like JVisualVM, JConsole, Storm GUI are precious
>>> friends in this contexts.
>>>
>>> Best,
>>> Andrea
>>>
>>>
>>> On 05/02/17 12:53, Navin Ipe wrote:
>>>
>>>
>>>
>>> *Hi, *
>>> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it
>>> emits more than 20000 tuples. I think when this happens, there's a memory
>>> issue and the workers get restarted. This is what worker.log.err contains:*
>>>
>>>
>>>
>>>
>>>
>>> * Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>>> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
>>> allocate memory' (errno=12) # There is insufficient memory for the Java
>>> Runtime Environment to continue. # Native memory allocation (mmap) failed
>>> to map 62914560 bytes for committing reserved memory. # An error report
>>> file with more information is saved as: #
>>> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*
>>>
>>> *The odd part is, that in all my bolts I have *
>>>
>>>
>>>
>>> *     @Override     public void execute(Tuple tuple) {         try { *
>>>
>>> *..some code; including the code that emits tuples *
>>>
>>> *} catch(Exception ex) {logger.info <http://logger.info>("The exception
>>> {}, {}", ex.getCause(), ex.getMessage());}     }*
>>>
>>> *But in the logs I never see the string "The exception". But worker.log
>>> shows:*
>>>
>>>
>>>
>>>
>>>
>>>
>>> *2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server VM
>>> warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0) failed;
>>> error='Cannot allocate memory' (errno=12) 2017-02-05 09:14:01.320 STDERR
>>> [INFO] # 2017-02-05 09:14:01.330 STDERR [INFO] # There is insufficient
>>> memory for the Java Runtime Environment to continue. 2017-02-05
>>> 09:14:01.330 STDERR [INFO] # Native memory allocation (mmap) failed to map
>>> 37748736 bytes for committing reserved memory. 2017-02-05 09:14:01.331
>>> STDERR [INFO] # An error report file with more information is saved as:
>>> 2017-02-05 09:14:01.331 STDERR [INFO] #
>>> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
>>> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
>>> HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
>>> with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
>>> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>>> "-Xmx1024m",... etc*
>>>
>>> *These are the settings I'm using for the topology:*
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> *         Config stormConfig = new Config();
>>> stormConfig.setNumWorkers(20);         stormConfig.setNumAckers(20);
>>>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
>>> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,          1024);
>>>         stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
>>> 65536);
>>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    65536);
>>>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
>>> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
>>> stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(new
>>> String[]{"localhost"}));
>>> stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*
>>>
>>>
>>>
>>> *So am I right in assuming the exception is not thrown in my code but is
>>> thrown in the worker thread? Do such exceptions happen when the worker
>>> isn't able to receive too many tuples in its queue? *
>>> *What can I do to avoid this problem?*
>>>
>>> --
>>> Regards,
>>> Navin
>>>
>>>
>>>
>>
>>
>> --
>> Regards,
>> Navin
>>
>
>
>
> --
> Regards,
> Navin
>

Re: Identifying the source of the memory error in Storm

Posted by Navin Ipe <na...@searchlighthealth.com>.

Hi,

Even though I ran the topology on a server with 30GB RAM, it still crashed.
I had set *stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" +
"15g");*

But still, when I see the workers using htop, their virtual memory is shown
as 15G, but toward the right side of the screen, under the command column
it shows "java -Xmx2048m and a few other options". I assume this was the
command that storm used to start the worker.

So howcome my memory setting isn't getting used by the worker? Why is it
still using 2GB instead of 15GB?
Also, out of the 30GB, 25GB was getting used. How could that happen when I
have only 4 slots and 4 workers running? The exact same thing was taking up
just 5GB on a system with 10GB RAM, where I configured -Xmx to "2g".

Could you help me understand this?


On Mon, Feb 6, 2017 at 2:29 PM, Navin Ipe <na...@searchlighthealth.com>
wrote:

> Thank you. Been monitoring it via JConsole, and these are what I see:
> Supervisor used memory: 61MB
>
> *Supervisor committed memory: 171MB*
> *Supervisor Max memory: 239.1MB*
>
> Nimbus used memory: 44.3MB
> Nimbus committed memory: 169.3MB
> Nimbus max memory: 954.7MB
>
> Zookeeper used memory: 224MB
> Zookeeper committed memory: 529MB
> Zookeeper Max memory: 1.9GB
>
> Worker used memory: 941MB
>
> *Worker committed memory: 1.4GB*
> *Worker Max memory: 1.9GB*
>
> So from what it looks like, even if the worker memory is managed and kept
> low, the supervisor can crash because of low memory. So the solution
> appears to be to increase supervisor memory in storm.yaml, use bigger RAM
> and use swap space.
>
> If you have any other opinions, please let me know.
>
>
> On Sun, Feb 5, 2017 at 7:10 PM, Andrea Gazzarini <gx...@gmail.com> wrote:
>
>> Hi Navin,
>> I think this line is a good starting point for your analysis:
>>
>>
>>
>> *"There is insufficient memory for the Java Runtime Environment to
>> continue." *I don't believe this scenario is caught by the JVM as a
>> checked exception: in my opinion it belongs to the "Error" class, and that
>> would explain why the catch block is never reached.
>> In addition, your assumption could be also right: the part of code that
>> raises the exception could be everywhere in the worker code, not
>> necessarily within your class; this because memory errors, differently from
>> what in general happens for exceptions, don't have a deterministic point of
>> failure, they depends on the system state at a given moment.
>>
>> Please expand a bit (or investigate on yourself) your architecture,
>> nodes, hardware resources and any information that can helps understanding
>> your context. Tools like JVisualVM, JConsole, Storm GUI are precious
>> friends in this contexts.
>>
>> Best,
>> Andrea
>>
>>
>> On 05/02/17 12:53, Navin Ipe wrote:
>>
>>
>>
>> *Hi, *
>> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it
>> emits more than 20000 tuples. I think when this happens, there's a memory
>> issue and the workers get restarted. This is what worker.log.err contains:*
>>
>>
>>
>>
>>
>> * Java HotSpot(TM) 64-Bit Server VM warning: INFO:
>> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
>> allocate memory' (errno=12) # There is insufficient memory for the Java
>> Runtime Environment to continue. # Native memory allocation (mmap) failed
>> to map 62914560 bytes for committing reserved memory. # An error report
>> file with more information is saved as: #
>> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*
>>
>> *The odd part is, that in all my bolts I have *
>>
>>
>>
>> *     @Override     public void execute(Tuple tuple) {         try { *
>>
>> *..some code; including the code that emits tuples *
>>
>> *} catch(Exception ex) {logger.info <http://logger.info>("The exception
>> {}, {}", ex.getCause(), ex.getMessage());}     }*
>>
>> *But in the logs I never see the string "The exception". But worker.log
>> shows:*
>>
>>
>>
>>
>>
>>
>> *2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server VM
>> warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0) failed;
>> error='Cannot allocate memory' (errno=12) 2017-02-05 09:14:01.320 STDERR
>> [INFO] # 2017-02-05 09:14:01.330 STDERR [INFO] # There is insufficient
>> memory for the Java Runtime Environment to continue. 2017-02-05
>> 09:14:01.330 STDERR [INFO] # Native memory allocation (mmap) failed to map
>> 37748736 bytes for committing reserved memory. 2017-02-05 09:14:01.331
>> STDERR [INFO] # An error report file with more information is saved as:
>> 2017-02-05 09:14:01.331 STDERR [INFO] #
>> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
>> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
>> HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
>> with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
>> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
>> "-Xmx1024m",... etc*
>>
>> *These are the settings I'm using for the topology:*
>>
>>
>>
>>
>>
>>
>>
>>
>>
>>
>> *         Config stormConfig = new Config();
>> stormConfig.setNumWorkers(20);         stormConfig.setNumAckers(20);
>>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
>> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,          1024);
>>         stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
>> 65536);
>> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    65536);
>>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
>> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
>> stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(new
>> String[]{"localhost"}));
>> stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*
>>
>>
>>
>> *So am I right in assuming the exception is not thrown in my code but is
>> thrown in the worker thread? Do such exceptions happen when the worker
>> isn't able to receive too many tuples in its queue? *
>> *What can I do to avoid this problem?*
>>
>> --
>> Regards,
>> Navin
>>
>>
>>
>
>
> --
> Regards,
> Navin
>



-- 
Regards,
Navin

Re: Identifying the source of the memory error in Storm

Posted by Navin Ipe <na...@searchlighthealth.com>.

Thank you. Been monitoring it via JConsole, and these are what I see:
Supervisor used memory: 61MB

*Supervisor committed memory: 171MB*
*Supervisor Max memory: 239.1MB*

Nimbus used memory: 44.3MB
Nimbus committed memory: 169.3MB
Nimbus max memory: 954.7MB

Zookeeper used memory: 224MB
Zookeeper committed memory: 529MB
Zookeeper Max memory: 1.9GB

Worker used memory: 941MB

*Worker committed memory: 1.4GB*
*Worker Max memory: 1.9GB*

So from what it looks like, even if the worker memory is managed and kept
low, the supervisor can crash because of low memory. So the solution
appears to be to increase supervisor memory in storm.yaml, use bigger RAM
and use swap space.

If you have any other opinions, please let me know.


On Sun, Feb 5, 2017 at 7:10 PM, Andrea Gazzarini <gx...@gmail.com> wrote:

> Hi Navin,
> I think this line is a good starting point for your analysis:
>
>
>
> *"There is insufficient memory for the Java Runtime Environment to
> continue." *I don't believe this scenario is caught by the JVM as a
> checked exception: in my opinion it belongs to the "Error" class, and that
> would explain why the catch block is never reached.
> In addition, your assumption could be also right: the part of code that
> raises the exception could be everywhere in the worker code, not
> necessarily within your class; this because memory errors, differently from
> what in general happens for exceptions, don't have a deterministic point of
> failure, they depends on the system state at a given moment.
>
> Please expand a bit (or investigate on yourself) your architecture, nodes,
> hardware resources and any information that can helps understanding your
> context. Tools like JVisualVM, JConsole, Storm GUI are precious friends in
> this contexts.
>
> Best,
> Andrea
>
>
> On 05/02/17 12:53, Navin Ipe wrote:
>
>
>
> *Hi, *
> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it
> emits more than 20000 tuples. I think when this happens, there's a memory
> issue and the workers get restarted. This is what worker.log.err contains:*
>
>
>
>
>
> * Java HotSpot(TM) 64-Bit Server VM warning: INFO:
> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; error='Cannot
> allocate memory' (errno=12) # There is insufficient memory for the Java
> Runtime Environment to continue. # Native memory allocation (mmap) failed
> to map 62914560 bytes for committing reserved memory. # An error report
> file with more information is saved as: #
> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log*
>
> *The odd part is, that in all my bolts I have *
>
>
>
> *     @Override     public void execute(Tuple tuple) {         try { *
>
> *..some code; including the code that emits tuples *
>
> *} catch(Exception ex) {logger.info <http://logger.info>("The exception
> {}, {}", ex.getCause(), ex.getMessage());}     }*
>
> *But in the logs I never see the string "The exception". But worker.log
> shows:*
>
>
>
>
>
>
> *2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server VM
> warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0) failed;
> error='Cannot allocate memory' (errno=12) 2017-02-05 09:14:01.320 STDERR
> [INFO] # 2017-02-05 09:14:01.330 STDERR [INFO] # There is insufficient
> memory for the Java Runtime Environment to continue. 2017-02-05
> 09:14:01.330 STDERR [INFO] # Native memory allocation (mmap) failed to map
> 37748736 bytes for committing reserved memory. 2017-02-05 09:14:01.331
> STDERR [INFO] # An error report file with more information is saved as:
> 2017-02-05 09:14:01.331 STDERR [INFO] #
> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for
> HydraCellGen-138-1486283223 on 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701
> with id 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf
> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts"
> "-Xmx1024m",... etc*
>
> *These are the settings I'm using for the topology:*
>
>
>
>
>
>
>
>
>
>
> *         Config stormConfig = new Config();
> stormConfig.setNumWorkers(20);         stormConfig.setNumAckers(20);
>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE,          1024);
>         stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE,
> 65536);
> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE,    65536);
>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
> stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, Arrays.asList(new
> String[]{"localhost"}));
> stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");*
>
>
>
> *So am I right in assuming the exception is not thrown in my code but is
> thrown in the worker thread? Do such exceptions happen when the worker
> isn't able to receive too many tuples in its queue? *
> *What can I do to avoid this problem?*
>
> --
> Regards,
> Navin
>
>
>


-- 
Regards,
Navin

Re: Identifying the source of the memory error in Storm

Posted by Andrea Gazzarini <gx...@gmail.com>.

Hi Navin,
I think this line is a good starting point for your analysis:

/"There is insufficient memory for the Java Runtime Environment to 
continue."

/I don't believe this scenario is caught by the JVM as a checked 
exception: in my opinion it belongs to the "Error" class, and that would 
explain why the catch block is never reached.
In addition, your assumption could be also right: the part of code that 
raises the exception could be everywhere in the worker code, not 
necessarily within your class; this because memory errors, differently 
from what in general happens for exceptions, don't have a deterministic 
point of failure, they depends on the system state at a given moment.

Please expand a bit (or investigate on yourself) your architecture, 
nodes, hardware resources and any information that can helps 
understanding your context. Tools like JVisualVM, JConsole, Storm GUI 
are precious friends in this contexts.

Best,
Andrea

On 05/02/17 12:53, Navin Ipe wrote:
> *Hi,
>
> *
> *I have a bolt which emits around 15000 tuples sometimes. Sometimes it 
> emits more than 20000 tuples. I think when this happens, there's a 
> memory issue and the workers get restarted. This is what 
> worker.log.err contains:*
> /
> Java HotSpot(TM) 64-Bit Server VM warning: INFO: 
> os::commit_memory(0x00000000f1000000, 62914560, 0) failed; 
> error='Cannot allocate memory' (errno=12)
> # There is insufficient memory for the Java Runtime Environment to 
> continue.
> # Native memory allocation (mmap) failed to map 62914560 bytes for 
> committing reserved memory.
> # An error report file with more information is saved as:
> # 
> /home/storm/apache-storm-1.0.0/storm-local/workers/6a1a70ad-d094-437a-a9c5-e837fc1b3535/hs_err_pid2766.log/
>
> *The odd part is, that in all my bolts I have */
>     @Override
>     public void execute(Tuple tuple) {
>         try {
> /
> /..some code; including the code that emits tuples
> /
> /} catch(Exception ex) {logger.info <http://logger.info>("The 
> exception {}, {}", ex.getCause(), ex.getMessage());}
>     }/
>
> *But in the logs I never see the string "The exception". But 
> worker.log shows:*
> /2017-02-05 09:14:01.320 STDERR [INFO] Java HotSpot(TM) 64-Bit Server 
> VM warning: INFO: os::commit_memory(0x00000000e6f80000, 37748736, 0) 
> failed; error='Cannot allocate memory' (errno=12)
> 2017-02-05 09:14:01.320 STDERR [INFO] #
> 2017-02-05 09:14:01.330 STDERR [INFO] # There is insufficient memory 
> for the Java Runtime Environment to continue.
> 2017-02-05 09:14:01.330 STDERR [INFO] # Native memory allocation 
> (mmap) failed to map 37748736 bytes for committing reserved memory.
> 2017-02-05 09:14:01.331 STDERR [INFO] # An error report file with more 
> information is saved as:
> 2017-02-05 09:14:01.331 STDERR [INFO] # 
> /home/storm/apache-storm-1.0.0/storm-local/workers/2685b445-c4a9-4f7e-94e1-1ce3fe13de47/hs_err_pid3022.log
> 2017-02-05 09:14:06.904 o.a.s.d.worker [INFO] Launching worker for 
> HydraCellGen-138-1486283223 on 
> 3fc3c05e-9769-4033-bf7d-df609d6c4963:6701 with id 
> 575bd7ed-a3fc-4f7f-a7d0-cdd4054c9fc5 and conf 
> {"topology.builtin.metrics.bucket.size.secs" 60, "nimbus.childopts" 
> "-Xmx1024m",... etc/
>
> *These are the settings I'm using for the topology:*/
>         Config stormConfig = new Config();
>         stormConfig.setNumWorkers(20);
>         stormConfig.setNumAckers(20);
>         stormConfig.put(Config.TOPOLOGY_DEBUG, false);
> stormConfig.put(Config.TOPOLOGY_TRANSFER_BUFFER_SIZE, 1024);
> stormConfig.put(Config.TOPOLOGY_EXECUTOR_RECEIVE_BUFFER_SIZE, 65536);
> stormConfig.put(Config.TOPOLOGY_EXECUTOR_SEND_BUFFER_SIZE, 65536);
>         stormConfig.put(Config.TOPOLOGY_MAX_SPOUT_PENDING, 2);
> stormConfig.put(Config.TOPOLOGY_MESSAGE_TIMEOUT_SECS, 2200);
>         stormConfig.put(Config.STORM_ZOOKEEPER_SERVERS, 
> Arrays.asList(new String[]{"localhost"}));
>         stormConfig.put(Config.TOPOLOGY_WORKER_CHILDOPTS, "-Xmx" + "2g");/
>
>
> *So am I right in assuming the exception is not thrown in my code but 
> is thrown in the worker thread? Do such exceptions happen when the 
> worker isn't able to receive too many tuples in its queue?
> *
> *What can I do to avoid this problem?*
>
> -- 
> Regards,
> Navin