You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@usergrid.apache.org by "Michael Russo (JIRA)" <ji...@apache.org> on 2016/10/06 17:17:20 UTC
[jira] [Comment Edited] (USERGRID-1259) Re-indexing ElasticSearch entity data from Cassandra - Possible Memory Leaks in Usergrid

    [ https://issues.apache.org/jira/browse/USERGRID-1259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15552554#comment-15552554 ] 

Michael Russo edited comment on USERGRID-1259 at 10/6/16 5:17 PM:
------------------------------------------------------------------

I also just pushed some commits to master specific to re-index which does actually fix the memory problem in re-index.  I had tested pre fix on a cluster with 100 million entities and it easily ran out of memory.  Now, it does fine, just queues up the index events super fast and then dependent on the # of queue consumers running ( elasticsearch.worker_count ).  Also the fix in master adds a new 'utility' queue with dedicated consumers ( elasticsearch.worker_count_utility ) so that the re-index queue events to do not cause disruption to runtime entity creation/update index events.


was (Author: mrusso):
I also just pushed some commits to master specific to re-index which does actually fix the memory problem in re-index.  I had tested pre fix on a cluster with 100 million entities and it easily ran out of memory.  Now, it does fine, just queues up the index events super fast and then dependent on the # of queue consumers running ( elasticsearch.worker_count ).  Also the fix in master adds a new 'utility' queue with dedicated consumers ( elasticsearch.worker_count_utility ) so that the re-index queue events to do cause disruption to runtime entity creation/update index events.

> Re-indexing ElasticSearch entity data from Cassandra - Possible Memory Leaks in Usergrid
> ----------------------------------------------------------------------------------------
>
>                 Key: USERGRID-1259
>                 URL: https://issues.apache.org/jira/browse/USERGRID-1259
>             Project: Usergrid
>          Issue Type: Story
>    Affects Versions: 2.1.0
>            Reporter: Jaskaran
>            Assignee: Michael Russo
>             Fix For: 2.2.0
>
>
> Full system re-index job (http://localhost:8080/system/index/rebuild), seems to stop / hang after 20-30 hours of indexing. Usergrid seems to exhaust the 4.5 GB of RAM. Please see logs below:
> 1. UserGrid logs (out of java heap space)
> Feb 04 17:06:32 Usergrid-2 catalina.out:  06:36:31,961  WARN OioServerSocketPipelineSink:83 - Failed to accept a connection.
> Feb 04 17:06:32 Usergrid-2 catalina.out:  java.lang.OutOfMemoryError: Java heap space
> Feb 04 15:05:03 Usergrid-2 catalina.out:  04:34:25,166  WARN jvm:203 - [default] [gc][old][29454][2683] duration [54.2s], collections [3]/[54.3s], total [54.2s]/[13.2h], memory [4.4gb]->[4.4gb]/[4.4gb], all_pools {[young] [532.5mb]->[532.5mb]/[532.5mb]}{[survivor] [62.5mb]->[65mb]/[66.5mb]}{[old] [3.8gb]->[3.8gb]/[3.8gb]}
> Feb 03 20:38:34 Usergrid-2 catalina.out: 10:08:34,616 ERROR AmazonAsyncEventService:361 - Failed to index message: 886ea9bd-708d-4bea-ab1c-844ff97c947c 
> 2. ES logs (time out, removed non-data node from cluster)
> Feb 04 16:00:33 Elasticsearch elasticsearch.log:  [2016-02-04 05:31:17,243][INFO ][cluster.service          ] [Arlette Truffaut] removed {[default][3GuynlamR6GvdiCAhhTBmw][ip-10-0-0-237][inet[/10.0.0.237:9301]]{client=true, data=false},}, reason: zen-disco-node_failed([default][3GuynlamR6GvdiCAhhTBmw][ip-10-0-0-237][inet[/10.0.0.237:9301]]{client=true, data=false}), reason failed to ping, tried [3] times, each with maximum [30s] timeout
> Feb 04 16:00:33 Elasticsearch elasticsearch.log:  [2016-02-04 05:31:17,256][DEBUG][action.admin.cluster.node.stats] [Arlette Truffaut] failed to execute on node [3GuynlamR6GvdiCAhhTBmw]
> Feb 04 16:00:33 Elasticsearch elasticsearch.log:  org.elasticsearch.transport.NodeDisconnectedException: [default][inet[/10.0.0.237:9301]][cluster:monitor/nodes/stats[n]] disconnected
> Is it possible that the re-indexing code in Usergrid could have memory leaks and thus uses up all the java heap memory.
> Please help.
> Thanks
> Jaskaran



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)