You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-dev@lucene.apache.org by "Ryuuichi Kumai (JIRA)" <ji...@apache.org> on 2009/02/28 06:13:12 UTC

[jira] Created: (SOLR-1042) Memory leaks in DIH

Memory leaks in DIH
-------------------

                 Key: SOLR-1042
                 URL: https://issues.apache.org/jira/browse/SOLR-1042
             Project: Solr
          Issue Type: Bug
          Components: contrib - DataImportHandler
    Affects Versions: 1.3
            Reporter: Ryuuichi Kumai
         Attachments: SOLR-1042.patch

If delta-import is executed many times, the heap utilization grows up and finally OutOfMemoryError occurs.

When delta-import is executed with SqlEntityProcessor, the instances of TemplateString cached in VariableResolverImpl#TEMPLATE_STRING#cache.
If the deltaQuery contains variable like `last_index_time', the cached values never used increases.
Similarly, I guess that the cache increases when fetching each modified row with primary key.
I think these queries should not be cached. 

I came up with two solutions:

 1) Not to cache queries to get modified rows.
 2) Make VariableResolverImpl#TEMPLATE_STRING non-static. Or clear cache on finishing delta-import.

I think that #1 is better for performance than #2, but #2 is easier to solve the problem.

I made a patch in #2 way, and then tested two solr applications with `-XX:+PrintClassHistgram' option.
The result after importing several million rows from a MySQL database is as follows:

 * original solr-1.3:
 num     #instances         #bytes  class name
----------------------------------------------
...
  6:       2983024      119320960  org.apache.solr.handler.dataimport.TemplateString
...

 * patched solr-1.3:
 num     #instances         #bytes  class name
----------------------------------------------
...
 748:             3            120  org.apache.solr.handler.dataimport.TemplateString
...

Though it is version 1.3 that I tested, perhaps current nightly version has same problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (SOLR-1042) Memory leaks in DIH

Posted by "Ryuuichi Kumai (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Ryuuichi Kumai updated SOLR-1042:
---------------------------------

    Attachment: SOLR-1042.patch

> Memory leaks in DIH
> -------------------
>
>                 Key: SOLR-1042
>                 URL: https://issues.apache.org/jira/browse/SOLR-1042
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Ryuuichi Kumai
>         Attachments: SOLR-1042.patch
>
>
> If delta-import is executed many times, the heap utilization grows up and finally OutOfMemoryError occurs.
> When delta-import is executed with SqlEntityProcessor, the instances of TemplateString cached in VariableResolverImpl#TEMPLATE_STRING#cache.
> If the deltaQuery contains variable like `last_index_time', the cached values never used increases.
> Similarly, I guess that the cache increases when fetching each modified row with primary key.
> I think these queries should not be cached. 
> I came up with two solutions:
>  1) Not to cache queries to get modified rows.
>  2) Make VariableResolverImpl#TEMPLATE_STRING non-static. Or clear cache on finishing delta-import.
> I think that #1 is better for performance than #2, but #2 is easier to solve the problem.
> I made a patch in #2 way, and then tested two solr applications with `-XX:+PrintClassHistgram' option.
> The result after importing several million rows from a MySQL database is as follows:
>  * original solr-1.3:
>  num     #instances         #bytes  class name
> ----------------------------------------------
> ...
>   6:       2983024      119320960  org.apache.solr.handler.dataimport.TemplateString
> ...
>  * patched solr-1.3:
>  num     #instances         #bytes  class name
> ----------------------------------------------
> ...
>  748:             3            120  org.apache.solr.handler.dataimport.TemplateString
> ...
> Though it is version 1.3 that I tested, perhaps current nightly version has same problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (SOLR-1042) Memory leaks in DIH

Posted by "Shalin Shekhar Mangar (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/SOLR-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Shalin Shekhar Mangar resolved SOLR-1042.
-----------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.4
         Assignee: Shalin Shekhar Mangar

Committed revision 748969.

Thanks Ryuuichi!

> Memory leaks in DIH
> -------------------
>
>                 Key: SOLR-1042
>                 URL: https://issues.apache.org/jira/browse/SOLR-1042
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Ryuuichi Kumai
>            Assignee: Shalin Shekhar Mangar
>             Fix For: 1.4
>
>         Attachments: SOLR-1042.patch
>
>
> If delta-import is executed many times, the heap utilization grows up and finally OutOfMemoryError occurs.
> When delta-import is executed with SqlEntityProcessor, the instances of TemplateString cached in VariableResolverImpl#TEMPLATE_STRING#cache.
> If the deltaQuery contains variable like `last_index_time', the cached values never used increases.
> Similarly, I guess that the cache increases when fetching each modified row with primary key.
> I think these queries should not be cached. 
> I came up with two solutions:
>  1) Not to cache queries to get modified rows.
>  2) Make VariableResolverImpl#TEMPLATE_STRING non-static. Or clear cache on finishing delta-import.
> I think that #1 is better for performance than #2, but #2 is easier to solve the problem.
> I made a patch in #2 way, and then tested two solr applications with `-XX:+PrintClassHistgram' option.
> The result after importing several million rows from a MySQL database is as follows:
>  * original solr-1.3:
>  num     #instances         #bytes  class name
> ----------------------------------------------
> ...
>   6:       2983024      119320960  org.apache.solr.handler.dataimport.TemplateString
> ...
>  * patched solr-1.3:
>  num     #instances         #bytes  class name
> ----------------------------------------------
> ...
>  748:             3            120  org.apache.solr.handler.dataimport.TemplateString
> ...
> Though it is version 1.3 that I tested, perhaps current nightly version has same problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (SOLR-1042) Memory leaks in DIH

Posted by "Noble Paul (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/SOLR-1042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12677658#action_12677658 ] 

Noble Paul commented on SOLR-1042:
----------------------------------

I guess #2 is a better solution. 

going forward we recommend making 'deltaImportQuery' a must for doing delta. Then this problem must go away

> Memory leaks in DIH
> -------------------
>
>                 Key: SOLR-1042
>                 URL: https://issues.apache.org/jira/browse/SOLR-1042
>             Project: Solr
>          Issue Type: Bug
>          Components: contrib - DataImportHandler
>    Affects Versions: 1.3
>            Reporter: Ryuuichi Kumai
>         Attachments: SOLR-1042.patch
>
>
> If delta-import is executed many times, the heap utilization grows up and finally OutOfMemoryError occurs.
> When delta-import is executed with SqlEntityProcessor, the instances of TemplateString cached in VariableResolverImpl#TEMPLATE_STRING#cache.
> If the deltaQuery contains variable like `last_index_time', the cached values never used increases.
> Similarly, I guess that the cache increases when fetching each modified row with primary key.
> I think these queries should not be cached. 
> I came up with two solutions:
>  1) Not to cache queries to get modified rows.
>  2) Make VariableResolverImpl#TEMPLATE_STRING non-static. Or clear cache on finishing delta-import.
> I think that #1 is better for performance than #2, but #2 is easier to solve the problem.
> I made a patch in #2 way, and then tested two solr applications with `-XX:+PrintClassHistgram' option.
> The result after importing several million rows from a MySQL database is as follows:
>  * original solr-1.3:
>  num     #instances         #bytes  class name
> ----------------------------------------------
> ...
>   6:       2983024      119320960  org.apache.solr.handler.dataimport.TemplateString
> ...
>  * patched solr-1.3:
>  num     #instances         #bytes  class name
> ----------------------------------------------
> ...
>  748:             3            120  org.apache.solr.handler.dataimport.TemplateString
> ...
> Though it is version 1.3 that I tested, perhaps current nightly version has same problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.