You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Ryuuichi KUMAI <ry...@gmail.com> on 2009/01/26 14:37:11 UTC

Concurrency problem with delta-import

Hello,

I'm using Solr 1.3 and I have a problem with DataImportHandler.

My environment:
 - Solr 1.3
 - MySQL 5.1.30, Connector/J 5.1.6
 - Linux 2.6.9 x86_64 (RHEL4)
 - Sun JDK 1.6.0_11
 - Apache Tomcat 6.0.18

Our Solr server has multi core, and the schema in each core is the same.
When delta-import is executed concurrently in two (or more) cores, the
CPU resources are exhausted and making the index doesn't progress. Then
no exceptions are generated.
This problem doesn't occur when delta-import is executed in one core (not
concurrent). Does anyone know how this is caused?

To avoid this problem, I changed temporarily the code of TemplateString
in DIH as follows, and it works. Though I don't understand details of
the cause of this problem, I guess that the cache in static object of
TemplateString is related with this.

Index: TemplateString.java
===================================================================
--- TemplateString.java	(revision 729410)
+++ TemplateString.java	(working copy)
@@ -17,9 +17,9 @@
 package org.apache.solr.handler.dataimport;

 import java.util.ArrayList;
-import java.util.HashMap;
 import java.util.List;
-import java.util.Map;
+import java.util.concurrent.ConcurrentHashMap;
+import java.util.concurrent.ConcurrentMap;
 import java.util.regex.Matcher;
 import java.util.regex.Pattern;

@@ -40,10 +40,10 @@

   private List<String> pcs = new ArrayList<String>();

-  private Map<String, TemplateString> cache;
+  private ConcurrentMap<String, TemplateString> cache;

   public TemplateString() {
-    cache = new HashMap<String, TemplateString>();
+    cache = new ConcurrentHashMap<String, TemplateString>();
   }

   private TemplateString(String s) {
@@ -70,7 +70,9 @@
     TemplateString ts = cache.get(string);
     if (ts == null) {
       ts = new TemplateString(string);
-      cache.put(string, ts);
+      TemplateString cachedTs = cache.putIfAbsent(string, ts);
+      if (cachedTs != null)
+        ts = cachedTs;
     }
     return ts.fillTokens(resolver);
   }

Does anyone has a better idea?
I appreciate any help anyone can give me.

Regards,
Ryuuichi Kumai.

Re: Concurrency problem with delta-import

Posted by Ryuuichi KUMAI <ry...@gmail.com>.
Hello Shalin,

Thank you for your reply.

I opened the issue and attached the patch.

https://issues.apache.org/jira/browse/SOLR-985

> The lists are OK since they are modified only in the constructor. The map
> needs to be changed to a ConcurrentHashMap as you did in the patch.

I understood. Many thanks!!

Regards,
Ryuuichi Kumai.

2009/1/26 Shalin Shekhar Mangar <sh...@gmail.com>:
> The lists are OK since they are modified only in the constructor. The map
> needs to be changed to a ConcurrentHashMap as you did in the patch.
>
> On Mon, Jan 26, 2009 at 7:23 PM, Shalin Shekhar Mangar <
> shalinmangar@gmail.com> wrote:
>
>> Wow, well spotted. TemplateString is not thread-safe but it is being used
>> concurrently by many cores due to the static instance. Apart from the cache
>> map, the lists will also need to be taken care of.
>>
>> Can you please open an issue and attach this patch?
>>
>> https://issues.apache.org/jira/browse/SOLR
>>
>>
>> On Mon, Jan 26, 2009 at 7:07 PM, Ryuuichi KUMAI <ry...@gmail.com>wrote:
>>
>>> Hello,
>>>
>>> I'm using Solr 1.3 and I have a problem with DataImportHandler.
>>>
>>> My environment:
>>>  - Solr 1.3
>>>  - MySQL 5.1.30, Connector/J 5.1.6
>>>  - Linux 2.6.9 x86_64 (RHEL4)
>>>  - Sun JDK 1.6.0_11
>>>  - Apache Tomcat 6.0.18
>>>
>>> Our Solr server has multi core, and the schema in each core is the same.
>>> When delta-import is executed concurrently in two (or more) cores, the
>>> CPU resources are exhausted and making the index doesn't progress. Then
>>> no exceptions are generated.
>>> This problem doesn't occur when delta-import is executed in one core (not
>>> concurrent). Does anyone know how this is caused?
>>>
>>> To avoid this problem, I changed temporarily the code of TemplateString
>>> in DIH as follows, and it works. Though I don't understand details of
>>> the cause of this problem, I guess that the cache in static object of
>>> TemplateString is related with this.
>>>
>>> Index: TemplateString.java
>>> ===================================================================
>>> --- TemplateString.java (revision 729410)
>>> +++ TemplateString.java (working copy)
>>> @@ -17,9 +17,9 @@
>>>  package org.apache.solr.handler.dataimport;
>>>
>>>  import java.util.ArrayList;
>>> -import java.util.HashMap;
>>>  import java.util.List;
>>> -import java.util.Map;
>>> +import java.util.concurrent.ConcurrentHashMap;
>>> +import java.util.concurrent.ConcurrentMap;
>>>  import java.util.regex.Matcher;
>>>  import java.util.regex.Pattern;
>>>
>>> @@ -40,10 +40,10 @@
>>>
>>>   private List<String> pcs = new ArrayList<String>();
>>>
>>> -  private Map<String, TemplateString> cache;
>>> +  private ConcurrentMap<String, TemplateString> cache;
>>>
>>>   public TemplateString() {
>>> -    cache = new HashMap<String, TemplateString>();
>>> +    cache = new ConcurrentHashMap<String, TemplateString>();
>>>   }
>>>
>>>   private TemplateString(String s) {
>>> @@ -70,7 +70,9 @@
>>>     TemplateString ts = cache.get(string);
>>>     if (ts == null) {
>>>       ts = new TemplateString(string);
>>> -      cache.put(string, ts);
>>> +      TemplateString cachedTs = cache.putIfAbsent(string, ts);
>>> +      if (cachedTs != null)
>>> +        ts = cachedTs;
>>>     }
>>>     return ts.fillTokens(resolver);
>>>   }
>>>
>>> Does anyone has a better idea?
>>> I appreciate any help anyone can give me.
>>>
>>> Regards,
>>> Ryuuichi Kumai.
>>>
>>
>>
>>
>> --
>> Regards,
>> Shalin Shekhar Mangar.
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Concurrency problem with delta-import

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
The lists are OK since they are modified only in the constructor. The map
needs to be changed to a ConcurrentHashMap as you did in the patch.

On Mon, Jan 26, 2009 at 7:23 PM, Shalin Shekhar Mangar <
shalinmangar@gmail.com> wrote:

> Wow, well spotted. TemplateString is not thread-safe but it is being used
> concurrently by many cores due to the static instance. Apart from the cache
> map, the lists will also need to be taken care of.
>
> Can you please open an issue and attach this patch?
>
> https://issues.apache.org/jira/browse/SOLR
>
>
> On Mon, Jan 26, 2009 at 7:07 PM, Ryuuichi KUMAI <ry...@gmail.com>wrote:
>
>> Hello,
>>
>> I'm using Solr 1.3 and I have a problem with DataImportHandler.
>>
>> My environment:
>>  - Solr 1.3
>>  - MySQL 5.1.30, Connector/J 5.1.6
>>  - Linux 2.6.9 x86_64 (RHEL4)
>>  - Sun JDK 1.6.0_11
>>  - Apache Tomcat 6.0.18
>>
>> Our Solr server has multi core, and the schema in each core is the same.
>> When delta-import is executed concurrently in two (or more) cores, the
>> CPU resources are exhausted and making the index doesn't progress. Then
>> no exceptions are generated.
>> This problem doesn't occur when delta-import is executed in one core (not
>> concurrent). Does anyone know how this is caused?
>>
>> To avoid this problem, I changed temporarily the code of TemplateString
>> in DIH as follows, and it works. Though I don't understand details of
>> the cause of this problem, I guess that the cache in static object of
>> TemplateString is related with this.
>>
>> Index: TemplateString.java
>> ===================================================================
>> --- TemplateString.java (revision 729410)
>> +++ TemplateString.java (working copy)
>> @@ -17,9 +17,9 @@
>>  package org.apache.solr.handler.dataimport;
>>
>>  import java.util.ArrayList;
>> -import java.util.HashMap;
>>  import java.util.List;
>> -import java.util.Map;
>> +import java.util.concurrent.ConcurrentHashMap;
>> +import java.util.concurrent.ConcurrentMap;
>>  import java.util.regex.Matcher;
>>  import java.util.regex.Pattern;
>>
>> @@ -40,10 +40,10 @@
>>
>>   private List<String> pcs = new ArrayList<String>();
>>
>> -  private Map<String, TemplateString> cache;
>> +  private ConcurrentMap<String, TemplateString> cache;
>>
>>   public TemplateString() {
>> -    cache = new HashMap<String, TemplateString>();
>> +    cache = new ConcurrentHashMap<String, TemplateString>();
>>   }
>>
>>   private TemplateString(String s) {
>> @@ -70,7 +70,9 @@
>>     TemplateString ts = cache.get(string);
>>     if (ts == null) {
>>       ts = new TemplateString(string);
>> -      cache.put(string, ts);
>> +      TemplateString cachedTs = cache.putIfAbsent(string, ts);
>> +      if (cachedTs != null)
>> +        ts = cachedTs;
>>     }
>>     return ts.fillTokens(resolver);
>>   }
>>
>> Does anyone has a better idea?
>> I appreciate any help anyone can give me.
>>
>> Regards,
>> Ryuuichi Kumai.
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,
Shalin Shekhar Mangar.

Re: Concurrency problem with delta-import

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
Wow, well spotted. TemplateString is not thread-safe but it is being used
concurrently by many cores due to the static instance. Apart from the cache
map, the lists will also need to be taken care of.

Can you please open an issue and attach this patch?

https://issues.apache.org/jira/browse/SOLR

On Mon, Jan 26, 2009 at 7:07 PM, Ryuuichi KUMAI <ry...@gmail.com> wrote:

> Hello,
>
> I'm using Solr 1.3 and I have a problem with DataImportHandler.
>
> My environment:
>  - Solr 1.3
>  - MySQL 5.1.30, Connector/J 5.1.6
>  - Linux 2.6.9 x86_64 (RHEL4)
>  - Sun JDK 1.6.0_11
>  - Apache Tomcat 6.0.18
>
> Our Solr server has multi core, and the schema in each core is the same.
> When delta-import is executed concurrently in two (or more) cores, the
> CPU resources are exhausted and making the index doesn't progress. Then
> no exceptions are generated.
> This problem doesn't occur when delta-import is executed in one core (not
> concurrent). Does anyone know how this is caused?
>
> To avoid this problem, I changed temporarily the code of TemplateString
> in DIH as follows, and it works. Though I don't understand details of
> the cause of this problem, I guess that the cache in static object of
> TemplateString is related with this.
>
> Index: TemplateString.java
> ===================================================================
> --- TemplateString.java (revision 729410)
> +++ TemplateString.java (working copy)
> @@ -17,9 +17,9 @@
>  package org.apache.solr.handler.dataimport;
>
>  import java.util.ArrayList;
> -import java.util.HashMap;
>  import java.util.List;
> -import java.util.Map;
> +import java.util.concurrent.ConcurrentHashMap;
> +import java.util.concurrent.ConcurrentMap;
>  import java.util.regex.Matcher;
>  import java.util.regex.Pattern;
>
> @@ -40,10 +40,10 @@
>
>   private List<String> pcs = new ArrayList<String>();
>
> -  private Map<String, TemplateString> cache;
> +  private ConcurrentMap<String, TemplateString> cache;
>
>   public TemplateString() {
> -    cache = new HashMap<String, TemplateString>();
> +    cache = new ConcurrentHashMap<String, TemplateString>();
>   }
>
>   private TemplateString(String s) {
> @@ -70,7 +70,9 @@
>     TemplateString ts = cache.get(string);
>     if (ts == null) {
>       ts = new TemplateString(string);
> -      cache.put(string, ts);
> +      TemplateString cachedTs = cache.putIfAbsent(string, ts);
> +      if (cachedTs != null)
> +        ts = cachedTs;
>     }
>     return ts.fillTokens(resolver);
>   }
>
> Does anyone has a better idea?
> I appreciate any help anyone can give me.
>
> Regards,
> Ryuuichi Kumai.
>



-- 
Regards,
Shalin Shekhar Mangar.