You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@zookeeper.apache.org by Mark <st...@gmail.com> on 2011/11/08 18:18:56 UTC

Design question

I have a general design question regarding ZooKeeper.

Our use case: We currently have 3 restful recommendation servers that 
simply wrap a Mahout GenericBooleanPrefItemBasedRecommender. We started 
off using a JDBCDataModel but for performance reasons we had to switch 
to a FileDataModel so everything would be kept in memory. Although now 
that our recommendations service is blazing fast the start up/reloading 
time for each of these services are in the minutes. If we try to update 
all services at once then all recommendation requests come to a halt. As 
a result of this whenever we push a new model we have to do it in 
stages... ie disable server1, update, wait, renable, disable server2.... 
We've "automated" this using cron by simply updating one server waiting 
10 mins then updating the next and so on. We are trying to figure out if 
this coordination would be better managed via ZooKeeper.

I've read a bit into ZooKeeper and it seems like it would be easy to set 
a watch on a node to trigger when a model has changed thus triggering a 
refresh of our recommender. Where I get lost is how would I coordinate 
this so only one server at a time goes down? When it comes back up then 
the next server should be updated. Can someone please explain how this 
could be accomplished? Thanks

Re: Design question

Posted by Ted Dunning <te...@gmail.com>.

Been there!

Having both in memory is the ideal world scenario.  Some days we have to
live in the real world.  The chapter 16 example may still help.

That example is available, btw, at https://github.com/tdunning/Chapter-16

On Wed, Nov 9, 2011 at 10:29 AM, Mark <st...@gmail.com> wrote:

> Memory constraints of those machines prevent us from being able to load
> two models at the same time.
>
> On 11/8/11 10:10 PM, Ted Dunning wrote:
>
>> Yes.  This definitely could be done with ZK.
>>
>> See chapter 16 of Mahout in Action for an example of how to manage this
>> for
>> a farm of classifiers which have very similar issues (although loading a
>> new model is much faster).
>>
>> One trick that might work is to load the new model before dropping the old
>> one.  You might be able to do a very fast handover that way.
>>
>> On Tue, Nov 8, 2011 at 12:18 PM, Mark<static.void.dev@gmail.com**>
>>  wrote:
>>
>>  I have a general design question regarding ZooKeeper.
>>>
>>> Our use case: We currently have 3 restful recommendation servers that
>>> simply wrap a Mahout GenericBooleanPrefItemBasedRec****ommender. We
>>> started
>>> off using a JDBCDataModel but for performance reasons we had to switch
>>> to a
>>> FileDataModel so everything would be kept in memory. Although now that
>>> our
>>> recommendations service is blazing fast the start up/reloading time for
>>> each of these services are in the minutes. If we try to update all
>>> services
>>> at once then all recommendation requests come to a halt. As a result of
>>> this whenever we push a new model we have to do it in stages... ie
>>> disable
>>> server1, update, wait, renable, disable server2.... We've "automated"
>>> this
>>> using cron by simply updating one server waiting 10 mins then updating
>>> the
>>> next and so on. We are trying to figure out if this coordination would be
>>> better managed via ZooKeeper.
>>>
>>> I've read a bit into ZooKeeper and it seems like it would be easy to set
>>> a
>>> watch on a node to trigger when a model has changed thus triggering a
>>> refresh of our recommender. Where I get lost is how would I coordinate
>>> this
>>> so only one server at a time goes down? When it comes back up then the
>>> next
>>> server should be updated. Can someone please explain how this could be
>>> accomplished? Thanks
>>>
>>>

Re: Design question

Posted by Mark <st...@gmail.com>.

Memory constraints of those machines prevent us from being able to load 
two models at the same time.

On 11/8/11 10:10 PM, Ted Dunning wrote:
> Yes.  This definitely could be done with ZK.
>
> See chapter 16 of Mahout in Action for an example of how to manage this for
> a farm of classifiers which have very similar issues (although loading a
> new model is much faster).
>
> One trick that might work is to load the new model before dropping the old
> one.  You might be able to do a very fast handover that way.
>
> On Tue, Nov 8, 2011 at 12:18 PM, Mark<st...@gmail.com>  wrote:
>
>> I have a general design question regarding ZooKeeper.
>>
>> Our use case: We currently have 3 restful recommendation servers that
>> simply wrap a Mahout GenericBooleanPrefItemBasedRec**ommender. We started
>> off using a JDBCDataModel but for performance reasons we had to switch to a
>> FileDataModel so everything would be kept in memory. Although now that our
>> recommendations service is blazing fast the start up/reloading time for
>> each of these services are in the minutes. If we try to update all services
>> at once then all recommendation requests come to a halt. As a result of
>> this whenever we push a new model we have to do it in stages... ie disable
>> server1, update, wait, renable, disable server2.... We've "automated" this
>> using cron by simply updating one server waiting 10 mins then updating the
>> next and so on. We are trying to figure out if this coordination would be
>> better managed via ZooKeeper.
>>
>> I've read a bit into ZooKeeper and it seems like it would be easy to set a
>> watch on a node to trigger when a model has changed thus triggering a
>> refresh of our recommender. Where I get lost is how would I coordinate this
>> so only one server at a time goes down? When it comes back up then the next
>> server should be updated. Can someone please explain how this could be
>> accomplished? Thanks
>>

Re: Design question

Posted by Ted Dunning <te...@gmail.com>.

Yes.  This definitely could be done with ZK.

See chapter 16 of Mahout in Action for an example of how to manage this for
a farm of classifiers which have very similar issues (although loading a
new model is much faster).

One trick that might work is to load the new model before dropping the old
one.  You might be able to do a very fast handover that way.

On Tue, Nov 8, 2011 at 12:18 PM, Mark <st...@gmail.com> wrote:

> I have a general design question regarding ZooKeeper.
>
> Our use case: We currently have 3 restful recommendation servers that
> simply wrap a Mahout GenericBooleanPrefItemBasedRec**ommender. We started
> off using a JDBCDataModel but for performance reasons we had to switch to a
> FileDataModel so everything would be kept in memory. Although now that our
> recommendations service is blazing fast the start up/reloading time for
> each of these services are in the minutes. If we try to update all services
> at once then all recommendation requests come to a halt. As a result of
> this whenever we push a new model we have to do it in stages... ie disable
> server1, update, wait, renable, disable server2.... We've "automated" this
> using cron by simply updating one server waiting 10 mins then updating the
> next and so on. We are trying to figure out if this coordination would be
> better managed via ZooKeeper.
>
> I've read a bit into ZooKeeper and it seems like it would be easy to set a
> watch on a node to trigger when a model has changed thus triggering a
> refresh of our recommender. Where I get lost is how would I coordinate this
> so only one server at a time goes down? When it comes back up then the next
> server should be updated. Can someone please explain how this could be
> accomplished? Thanks
>