You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@mahout.apache.org by Brandon Root <br...@gmail.com> on 2012/08/28 23:41:37 UTC

Deploying a classification model using zookeeper

Hey! First off, Mahout is pretty much the bee's knees.

Anyhoo, I'm deploying my mahout classifier using zookeeper, following the
technique from Mahout in Action, but my models often exceed the 1M limit
zookeeper wants you to stick to. I'm using the AdaptiveLogisticRegression
algorithm, but I think I'm doing all the things I'm supposed to (only
serializing the best model etc)

Here is my code:

ModelSerializer.writeBinary("/var/www/shared/model/products.model",
learningAlgorithm.getBest().getPayload().getLearner().getModels().get(0));

I feel like i'm missing something, most of my models are clocking in at
something like 1.8m. The complete model is of course somewhere around 200m.

Do most people boost the znode size? Am I simply being too ambitious with
the number of features I'm using?

ZooKeeper lists boosting znode size under "unsafe operations," but I don't
know how big a deal this is.

(From http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html)

*(Java system property: jute.maxbuffer)

This option can only be set as a Java system property. There is no
zookeeper prefix on it. It specifies the maximum size of the data that can
be stored in a znode. The default is 0xfffff, or just under 1M. If this
option is changed, the system property must be set on all servers and
clients otherwise problems will arise. This is really a sanity check.
ZooKeeper is designed to store data on the order of kilobytes in size.*

Any help would be much appreciated, thanks!

Brandon Root

Re: Deploying a classification model using zookeeper

Posted by Ted Dunning <te...@gmail.com>.
It isn't a big deal to increase the Znode size, but it is bad practice.  ZK
isn't a file store.  It is a coordination server.  The size limit is
intended to prevent large operations slowing down other operations.  If you
aren't sharing your ZK or your neighbors don't have response time
expectations, bumping the size is fine.

Keep in mind also that the ALR seems to lock down the learning rate too
quickly for many problems.  I haven't had time to investigate, but it is a
good idea to treat ALR models with some caution.  They shouldn't be crazy
off-base, but just are likely to be less converged than is desirable.

On Tue, Aug 28, 2012 at 5:41 PM, Brandon Root <br...@gmail.com> wrote:

> Hey! First off, Mahout is pretty much the bee's knees.
>
> Anyhoo, I'm deploying my mahout classifier using zookeeper, following the
> technique from Mahout in Action, but my models often exceed the 1M limit
> zookeeper wants you to stick to. I'm using the AdaptiveLogisticRegression
> algorithm, but I think I'm doing all the things I'm supposed to (only
> serializing the best model etc)
>
> Here is my code:
>
> ModelSerializer.writeBinary("/var/www/shared/model/products.model",
> learningAlgorithm.getBest().getPayload().getLearner().getModels().get(0));
>
> I feel like i'm missing something, most of my models are clocking in at
> something like 1.8m. The complete model is of course somewhere around 200m.
>
> Do most people boost the znode size? Am I simply being too ambitious with
> the number of features I'm using?
>
> ZooKeeper lists boosting znode size under "unsafe operations," but I don't
> know how big a deal this is.
>
> (From http://zookeeper.apache.org/doc/r3.1.2/zookeeperAdmin.html)
>
> *(Java system property: jute.maxbuffer)
>
> This option can only be set as a Java system property. There is no
> zookeeper prefix on it. It specifies the maximum size of the data that can
> be stored in a znode. The default is 0xfffff, or just under 1M. If this
> option is changed, the system property must be set on all servers and
> clients otherwise problems will arise. This is really a sanity check.
> ZooKeeper is designed to store data on the order of kilobytes in size.*
>
> Any help would be much appreciated, thanks!
>
> Brandon Root
>