You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by msaunier <ms...@citya.com> on 2018/06/05 08:54:13 UTC

Zk ManifoldCF just questions

Hello Karl,

 

I have just many questions.

 

Today, I use single process ManifoldCF. To crawl, I have 5 jobs by server
and I have a script to check if last job is finish and start the following.
I crawl just 1 job in the same time / server.

 

The multiprocess can be useful for a configuration like this?

 

And a second question:

What is the utility of Zookeeper for multiprocess? To distribute the
configuration?

 

And the last:

Multiprocess ManifoldCF is stable for production env? And with Zk?

 

Thanks,

Maxence 


RE: Zk ManifoldCF just questions

Posted by msaunier <ms...@citya.com>.
Ok so I go to test this functionnality. Thanks you.

 

Maxence,

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 5 juin 2018 11:11
À : user@manifoldcf.apache.org
Objet : Re: Zk ManifoldCF just questions

 

Hi Maxence,

 

I think this will answer your questions:

(1) Multiprocess MCF is stable, yes.  Zookeeper is the recommended configuration; shared files are deprecated.  Zookeeper is used to coordinate cluster processes and store global configuration.
(2) Multiprocess MCF is best viewed as a cluster.  The behavior does not change when more cluster members are added.  Documents are still processed in the same order, just more documents can be done at once.

Karl

 

On Tue, Jun 5, 2018 at 4:54 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hello Karl,

 

I have just many questions.

 

Today, I use single process ManifoldCF. To crawl, I have 5 jobs by server and I have a script to check if last job is finish and start the following. I crawl just 1 job in the same time / server.

 

The multiprocess can be useful for a configuration like this?

 

And a second question:

What is the utility of Zookeeper for multiprocess? To distribute the configuration?

 

And the last:

Multiprocess ManifoldCF is stable for production env? And with Zk?

 

Thanks,

Maxence 


Re: Zk ManifoldCF just questions

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

You should be able to migrate without issue.

Karl

On Tue, Jun 5, 2018 at 5:17 AM msaunier <ms...@citya.com> wrote:

> Last question:
>
>
>
> Can I migrate to Zk Multiprocess and have the same Database? (in order not
> to lose the current data)
>
>
>
> Thanks
>
>
>
>
>
>
>
> *De :* Karl Wright [mailto:daddywri@gmail.com]
> *Envoyé :* mardi 5 juin 2018 11:11
> *À :* user@manifoldcf.apache.org
> *Objet :* Re: Zk ManifoldCF just questions
>
>
>
> Hi Maxence,
>
>
>
> I think this will answer your questions:
>
> (1) Multiprocess MCF is stable, yes.  Zookeeper is the recommended
> configuration; shared files are deprecated.  Zookeeper is used to
> coordinate cluster processes and store global configuration.
> (2) Multiprocess MCF is best viewed as a cluster.  The behavior does not
> change when more cluster members are added.  Documents are still processed
> in the same order, just more documents can be done at once.
>
> Karl
>
>
>
> On Tue, Jun 5, 2018 at 4:54 AM msaunier <ms...@citya.com> wrote:
>
> Hello Karl,
>
>
>
> I have just many questions.
>
>
>
> Today, I use single process ManifoldCF. To crawl, I have 5 jobs by server
> and I have a script to check if last job is finish and start the following.
> I crawl just 1 job in the same time / server.
>
>
>
> The multiprocess can be useful for a configuration like this?
>
>
>
> And a second question:
>
> What is the utility of Zookeeper for multiprocess? To distribute the
> configuration?
>
>
>
> And the last:
>
> Multiprocess ManifoldCF is stable for production env? And with Zk?
>
>
>
> Thanks,
>
> Maxence
>
>

RE: Zk ManifoldCF just questions

Posted by msaunier <ms...@citya.com>.
Last question:

 

Can I migrate to Zk Multiprocess and have the same Database? (in order not to lose the current data)

 

Thanks

 

 

 

De : Karl Wright [mailto:daddywri@gmail.com] 
Envoyé : mardi 5 juin 2018 11:11
À : user@manifoldcf.apache.org
Objet : Re: Zk ManifoldCF just questions

 

Hi Maxence,

 

I think this will answer your questions:

(1) Multiprocess MCF is stable, yes.  Zookeeper is the recommended configuration; shared files are deprecated.  Zookeeper is used to coordinate cluster processes and store global configuration.
(2) Multiprocess MCF is best viewed as a cluster.  The behavior does not change when more cluster members are added.  Documents are still processed in the same order, just more documents can be done at once.

Karl

 

On Tue, Jun 5, 2018 at 4:54 AM msaunier <msaunier@citya.com <ma...@citya.com> > wrote:

Hello Karl,

 

I have just many questions.

 

Today, I use single process ManifoldCF. To crawl, I have 5 jobs by server and I have a script to check if last job is finish and start the following. I crawl just 1 job in the same time / server.

 

The multiprocess can be useful for a configuration like this?

 

And a second question:

What is the utility of Zookeeper for multiprocess? To distribute the configuration?

 

And the last:

Multiprocess ManifoldCF is stable for production env? And with Zk?

 

Thanks,

Maxence 


Re: Zk ManifoldCF just questions

Posted by Karl Wright <da...@gmail.com>.
Hi Maxence,

I think this will answer your questions:

(1) Multiprocess MCF is stable, yes.  Zookeeper is the recommended
configuration; shared files are deprecated.  Zookeeper is used to
coordinate cluster processes and store global configuration.
(2) Multiprocess MCF is best viewed as a cluster.  The behavior does not
change when more cluster members are added.  Documents are still processed
in the same order, just more documents can be done at once.

Karl

On Tue, Jun 5, 2018 at 4:54 AM msaunier <ms...@citya.com> wrote:

> Hello Karl,
>
>
>
> I have just many questions.
>
>
>
> Today, I use single process ManifoldCF. To crawl, I have 5 jobs by server
> and I have a script to check if last job is finish and start the following.
> I crawl just 1 job in the same time / server.
>
>
>
> The multiprocess can be useful for a configuration like this?
>
>
>
> And a second question:
>
> What is the utility of Zookeeper for multiprocess? To distribute the
> configuration?
>
>
>
> And the last:
>
> Multiprocess ManifoldCF is stable for production env? And with Zk?
>
>
>
> Thanks,
>
> Maxence
>