You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@zookeeper.apache.org by Mark <st...@gmail.com> on 2011/11/08 18:18:56 UTC
Design question
I have a general design question regarding ZooKeeper.
Our use case: We currently have 3 restful recommendation servers that
simply wrap a Mahout GenericBooleanPrefItemBasedRecommender. We started
off using a JDBCDataModel but for performance reasons we had to switch
to a FileDataModel so everything would be kept in memory. Although now
that our recommendations service is blazing fast the start up/reloading
time for each of these services are in the minutes. If we try to update
all services at once then all recommendation requests come to a halt. As
a result of this whenever we push a new model we have to do it in
stages... ie disable server1, update, wait, renable, disable server2....
We've "automated" this using cron by simply updating one server waiting
10 mins then updating the next and so on. We are trying to figure out if
this coordination would be better managed via ZooKeeper.
I've read a bit into ZooKeeper and it seems like it would be easy to set
a watch on a node to trigger when a model has changed thus triggering a
refresh of our recommender. Where I get lost is how would I coordinate
this so only one server at a time goes down? When it comes back up then
the next server should be updated. Can someone please explain how this
could be accomplished? Thanks
Re: Design question
Posted by Ted Dunning <te...@gmail.com>.
Been there!
Having both in memory is the ideal world scenario. Some days we have to
live in the real world. The chapter 16 example may still help.
That example is available, btw, at https://github.com/tdunning/Chapter-16
On Wed, Nov 9, 2011 at 10:29 AM, Mark <st...@gmail.com> wrote:
> Memory constraints of those machines prevent us from being able to load
> two models at the same time.
>
> On 11/8/11 10:10 PM, Ted Dunning wrote:
>
>> Yes. This definitely could be done with ZK.
>>
>> See chapter 16 of Mahout in Action for an example of how to manage this
>> for
>> a farm of classifiers which have very similar issues (although loading a
>> new model is much faster).
>>
>> One trick that might work is to load the new model before dropping the old
>> one. You might be able to do a very fast handover that way.
>>
>> On Tue, Nov 8, 2011 at 12:18 PM, Mark<static.void.dev@gmail.com**>
>> wrote:
>>
>> I have a general design question regarding ZooKeeper.
>>>
>>> Our use case: We currently have 3 restful recommendation servers that
>>> simply wrap a Mahout GenericBooleanPrefItemBasedRec****ommender. We
>>> started
>>> off using a JDBCDataModel but for performance reasons we had to switch
>>> to a
>>> FileDataModel so everything would be kept in memory. Although now that
>>> our
>>> recommendations service is blazing fast the start up/reloading time for
>>> each of these services are in the minutes. If we try to update all
>>> services
>>> at once then all recommendation requests come to a halt. As a result of
>>> this whenever we push a new model we have to do it in stages... ie
>>> disable
>>> server1, update, wait, renable, disable server2.... We've "automated"
>>> this
>>> using cron by simply updating one server waiting 10 mins then updating
>>> the
>>> next and so on. We are trying to figure out if this coordination would be
>>> better managed via ZooKeeper.
>>>
>>> I've read a bit into ZooKeeper and it seems like it would be easy to set
>>> a
>>> watch on a node to trigger when a model has changed thus triggering a
>>> refresh of our recommender. Where I get lost is how would I coordinate
>>> this
>>> so only one server at a time goes down? When it comes back up then the
>>> next
>>> server should be updated. Can someone please explain how this could be
>>> accomplished? Thanks
>>>
>>>
Re: Design question
Posted by Mark <st...@gmail.com>.
Memory constraints of those machines prevent us from being able to load
two models at the same time.
On 11/8/11 10:10 PM, Ted Dunning wrote:
> Yes. This definitely could be done with ZK.
>
> See chapter 16 of Mahout in Action for an example of how to manage this for
> a farm of classifiers which have very similar issues (although loading a
> new model is much faster).
>
> One trick that might work is to load the new model before dropping the old
> one. You might be able to do a very fast handover that way.
>
> On Tue, Nov 8, 2011 at 12:18 PM, Mark<st...@gmail.com> wrote:
>
>> I have a general design question regarding ZooKeeper.
>>
>> Our use case: We currently have 3 restful recommendation servers that
>> simply wrap a Mahout GenericBooleanPrefItemBasedRec**ommender. We started
>> off using a JDBCDataModel but for performance reasons we had to switch to a
>> FileDataModel so everything would be kept in memory. Although now that our
>> recommendations service is blazing fast the start up/reloading time for
>> each of these services are in the minutes. If we try to update all services
>> at once then all recommendation requests come to a halt. As a result of
>> this whenever we push a new model we have to do it in stages... ie disable
>> server1, update, wait, renable, disable server2.... We've "automated" this
>> using cron by simply updating one server waiting 10 mins then updating the
>> next and so on. We are trying to figure out if this coordination would be
>> better managed via ZooKeeper.
>>
>> I've read a bit into ZooKeeper and it seems like it would be easy to set a
>> watch on a node to trigger when a model has changed thus triggering a
>> refresh of our recommender. Where I get lost is how would I coordinate this
>> so only one server at a time goes down? When it comes back up then the next
>> server should be updated. Can someone please explain how this could be
>> accomplished? Thanks
>>
Re: Design question
Posted by Ted Dunning <te...@gmail.com>.
Yes. This definitely could be done with ZK.
See chapter 16 of Mahout in Action for an example of how to manage this for
a farm of classifiers which have very similar issues (although loading a
new model is much faster).
One trick that might work is to load the new model before dropping the old
one. You might be able to do a very fast handover that way.
On Tue, Nov 8, 2011 at 12:18 PM, Mark <st...@gmail.com> wrote:
> I have a general design question regarding ZooKeeper.
>
> Our use case: We currently have 3 restful recommendation servers that
> simply wrap a Mahout GenericBooleanPrefItemBasedRec**ommender. We started
> off using a JDBCDataModel but for performance reasons we had to switch to a
> FileDataModel so everything would be kept in memory. Although now that our
> recommendations service is blazing fast the start up/reloading time for
> each of these services are in the minutes. If we try to update all services
> at once then all recommendation requests come to a halt. As a result of
> this whenever we push a new model we have to do it in stages... ie disable
> server1, update, wait, renable, disable server2.... We've "automated" this
> using cron by simply updating one server waiting 10 mins then updating the
> next and so on. We are trying to figure out if this coordination would be
> better managed via ZooKeeper.
>
> I've read a bit into ZooKeeper and it seems like it would be easy to set a
> watch on a node to trigger when a model has changed thus triggering a
> refresh of our recommender. Where I get lost is how would I coordinate this
> so only one server at a time goes down? When it comes back up then the next
> server should be updated. Can someone please explain how this could be
> accomplished? Thanks
>