You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@deltacloud.apache.org by Michal Fojtik <mf...@redhat.com> on 2013/04/18 13:15:49 UTC

Thread safety problems with Deltacloud

Hi,

I recently discovered an interesting bug, that occurs when you do a lot 
of parallel requests to Deltacloud API.

Let say you start Deltacloud API with the 'mock' driver as default 
driver. Then you do 3 parallel requests to retrieve RHEV-M images, 
realms and hardware_profiles. In that case I get this error:

<snip>
E, [2013-04-18T12:44:13.629135 #11892] ERROR -- 500: [LoadError] 
uninitialized constant Deltacloud::Drivers::Rhevm

/home/mfojtik/code/core/server/lib/deltacloud/helpers/driver_helper.rb:57:in 
`rescue in driver'
/home/mfojtik/code/core/server/lib/deltacloud/helpers/driver_helper.rb:54:in 
`driver'

127.0.0.1 - - [18/Apr/2013 12:44:13] "GET /api HTTP/1.1" rhevm 
http://provider.url 500 127410 0.1480
</snip>

Note, that when I do the curl request manually with same params 
(provider, accept, I get 200 back).

I think this might be a threading issue, but I'm not sure how to fix it.
I started looking at 'driver' method and what we do here is that we 
require the 'driver' source file and then return the initialized driver.

We do this for every request (except we don't 'require' the driver 
source if the driver class exists in current namespace).

My impression is that the 'require' method is not thread-safe. It can be 
demonstrated on this code:

begin
   driver_class
rescue NameError => e
   require_relative(driver_source_name) ? retry :
      raise(LoadError.new(e.message))
end

In this case, we try to return the 'driver_class' and if the constant 
does not exists, we require the driver source file and call 'retry' that 
will then try to return it again. I think, if you have multiple parallel 
requests, the 'require_relative' (which is just alias for 'require') 
behave incorrectly, because multiple threads are requiring the same file 
in parallel.

My fix for this, that so far works for me is to change the line after 
'rescue NameError => e' to:

Thread.exclusive { require_relative(driver_source_name) } ? retry : 
raise(LoadError.new(e.message))

With this I don't run into any problems with parallel requests (so far).
However, I'm not 'ruby threads' expert, so any advise from somebody more 
experienced is appreciated.

Also I think one task, we need to do in Deltacloud/CIMI in close future 
would be to identify spots in our code base that could be potentially 
not thread safe.

Having somebody to write some reasonable benchmarking tool (like 'ab' 
with different urls/drivers/etc) would help to identifying this spots.

   -- Michal

-- 

Michal Fojtik <mf...@redhat.com>
Deltacloud API, CloudForms

Re: Thread safety problems with Deltacloud

Posted by Joseph VLcek <jv...@redhat.com>.
On Apr 22, 2013, at 9:40 AM, Francesco Vollero wrote:

> Il 4/22/13 3:32 PM, Joseph VLcek ha scritto:
>> On Apr 18, 2013, at 7:15 AM, Michal Fojtik wrote:
>> 
>>> Hi,
>>> 
>>> I recently discovered an interesting bug, that occurs when you do a lot of parallel requests to Deltacloud API.
>>> 
>>> Let say you start Deltacloud API with the 'mock' driver as default driver. Then you do 3 parallel requests to retrieve RHEV-M images, realms and hardware_profiles. In that case I get this error:
>>> 
>>> <snip>
> [snip]
>> 
>> Hey Michal,
>> 
>> As I mentioned in IRC last week the research I did seems to indicate the solution you present here
>> is the right way to go.
>> 
>> This describes the problem you found:
>> http://betterlogic.com/roger/2008/10/rubys-require-is-not-thread-safe/
>> 
>> This describes the Ruby threading and seems to support your proposed solution of using Thread.exclusive.
>> http://cs.calvin.edu/curriculum/cs/214/adams/labs/11/ruby/
>> 
>> 
>> So it seems your solution is the correct way to go.
> 
> thanks Joe for give us those links.
> At the moment I am trying to test when is happening giving more load to the various instances.
> In this way we will be able to actually pin point what/when/why is happening in a clear way.
> 
> -FV
> 
> 
>> Joe
>> 
>> 
> 

Great!

Not sure if it is worth it but one suggestion I have would be to write a small snippet of
code that kicks off multiple threads of deltacloud as close to simultaneous as possible.

I realize this may not be what happens in the real world but may be the best way to
iron out this issue.

... Just a suggestion.

Thanks!
   Joe

Re: Thread safety problems with Deltacloud

Posted by Francesco Vollero <ra...@gmail.com>.
Il 4/22/13 3:32 PM, Joseph VLcek ha scritto:
> On Apr 18, 2013, at 7:15 AM, Michal Fojtik wrote:
>
>> Hi,
>>
>> I recently discovered an interesting bug, that occurs when you do a lot of parallel requests to Deltacloud API.
>>
>> Let say you start Deltacloud API with the 'mock' driver as default driver. Then you do 3 parallel requests to retrieve RHEV-M images, realms and hardware_profiles. In that case I get this error:
>>
>> <snip>
[snip]
>
> Hey Michal,
>
> As I mentioned in IRC last week the research I did seems to indicate the solution you present here
> is the right way to go.
>
> This describes the problem you found:
> http://betterlogic.com/roger/2008/10/rubys-require-is-not-thread-safe/
>
> This describes the Ruby threading and seems to support your proposed solution of using Thread.exclusive.
> http://cs.calvin.edu/curriculum/cs/214/adams/labs/11/ruby/
>
>
> So it seems your solution is the correct way to go.

thanks Joe for give us those links.
At the moment I am trying to test when is happening giving more load to 
the various instances.
In this way we will be able to actually pin point what/when/why is 
happening in a clear way.

-FV


> Joe
>
>


Re: Thread safety problems with Deltacloud

Posted by Joseph VLcek <jv...@redhat.com>.
On Apr 18, 2013, at 7:15 AM, Michal Fojtik wrote:

> Hi,
> 
> I recently discovered an interesting bug, that occurs when you do a lot of parallel requests to Deltacloud API.
> 
> Let say you start Deltacloud API with the 'mock' driver as default driver. Then you do 3 parallel requests to retrieve RHEV-M images, realms and hardware_profiles. In that case I get this error:
> 
> <snip>
> E, [2013-04-18T12:44:13.629135 #11892] ERROR -- 500: [LoadError] uninitialized constant Deltacloud::Drivers::Rhevm
> 
> /home/mfojtik/code/core/server/lib/deltacloud/helpers/driver_helper.rb:57:in `rescue in driver'
> /home/mfojtik/code/core/server/lib/deltacloud/helpers/driver_helper.rb:54:in `driver'
> 
> 127.0.0.1 - - [18/Apr/2013 12:44:13] "GET /api HTTP/1.1" rhevm http://provider.url 500 127410 0.1480
> </snip>
> 
> Note, that when I do the curl request manually with same params (provider, accept, I get 200 back).
> 
> I think this might be a threading issue, but I'm not sure how to fix it.
> I started looking at 'driver' method and what we do here is that we require the 'driver' source file and then return the initialized driver.
> 
> We do this for every request (except we don't 'require' the driver source if the driver class exists in current namespace).
> 
> My impression is that the 'require' method is not thread-safe. It can be demonstrated on this code:
> 
> begin
>  driver_class
> rescue NameError => e
>  require_relative(driver_source_name) ? retry :
>     raise(LoadError.new(e.message))
> end
> 
> In this case, we try to return the 'driver_class' and if the constant does not exists, we require the driver source file and call 'retry' that will then try to return it again. I think, if you have multiple parallel requests, the 'require_relative' (which is just alias for 'require') behave incorrectly, because multiple threads are requiring the same file in parallel.
> 
> My fix for this, that so far works for me is to change the line after 'rescue NameError => e' to:
> 
> Thread.exclusive { require_relative(driver_source_name) } ? retry : raise(LoadError.new(e.message))
> 
> With this I don't run into any problems with parallel requests (so far).
> However, I'm not 'ruby threads' expert, so any advise from somebody more experienced is appreciated.
> 
> Also I think one task, we need to do in Deltacloud/CIMI in close future would be to identify spots in our code base that could be potentially not thread safe.
> 
> Having somebody to write some reasonable benchmarking tool (like 'ab' with different urls/drivers/etc) would help to identifying this spots.
> 
>  -- Michal
> 
> -- 
> 
> Michal Fojtik <mf...@redhat.com>
> Deltacloud API, CloudForms


Hey Michal,

As I mentioned in IRC last week the research I did seems to indicate the solution you present here
is the right way to go.

This describes the problem you found:
http://betterlogic.com/roger/2008/10/rubys-require-is-not-thread-safe/

This describes the Ruby threading and seems to support your proposed solution of using Thread.exclusive.
http://cs.calvin.edu/curriculum/cs/214/adams/labs/11/ruby/


So it seems your solution is the correct way to go.

Joe