You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Ricardo Ruiz <ri...@gmail.com> on 2022/06/13 05:39:40 UTC

Can't delete a job when solr output connection can't connect to the instance.

Hi all
My team uses mcf to crawl documents and index into solr instances, but for
reasons beyond our control, sometimes the instances or collections are
deleted.
When we try to delete a job and the solr instance or collection doesn't
exist anymore, the job reaches the "End notification" status and gets stuck
there. No other job can be aborted or deleted until the initial error is
fixed.

We are able to clean up the errors following the next steps:

1.  Reconfigure the output connector to an existing Solr instance and
collection
2.  Reset the output connection, so it forgets any indexed documents.
3.  Reset the job, so it forgets any indexed documents.
4.  Restart the ManifoldCF server.

Is there any other way we can solve this error? Is there any way we can
force delete the job if we don't care about the job's documents anymore?

Thanks in advance.
Ricardo.

Re: Can't delete a job when solr output connection can't connect to the instance.

Posted by Karl Wright <da...@gmail.com>.
Remember, there is already a "forget" button on the output connection,
which will remove everything associated with the connection.  It's meant to
be used when the output index has been reset and is empty.  I'm not sure
what you'd do different functionally.

Karl


On Tue, Jun 14, 2022 at 2:04 AM Koji Sekiguchi <ko...@rondhuit.com>
wrote:

> +1.
>
> I respect for the design concept of ManifoldCF, but I think force delete
> options make MCF more
> useful for those who use MCF as crawler. Adding force delete options
> doesn't change default
> behaviors and it doesn't break back-compatibility.
>
> Koji
>
> On 2022/06/14 14:46, Ricardo Ruiz wrote:
> > Hi Karl
> > We are using  ManifoldCF as a crawler more than a synchronizer. We are
> thinking of contributing to
> > ManifoldCf by including a force job delete and force output connector
> delete, considering of course
> > the things that need to be deleted with them (BD, etc). Do you think
> this is possible?
> > We think that not only us but the community would be benefited from this
> kind of functionality.
> >
> > Ricardo.
> >
> > On Mon, Jun 13, 2022 at 7:34 PM Karl Wright <daddywri@gmail.com <mailto:
> daddywri@gmail.com>> wrote:
> >
> >     Because ManifoldCF is not just a crawler, but a synchonizer, a job
> represents and includes a
> >     list of documents that have been indexed.  Deleting the job requires
> deleting the documents that
> >     have been indexed also.  It's part of the basic model.
> >
> >     So if you tear down your target output instance and then try to tear
> down the job, it won't
> >     work.  ManifoldCF won't just throw away the memory of those
> documents and act as if nothing
> >     happened.
> >
> >     If you're just using ManifoldCF as a crawler, therefore, your fix is
> about as good as it gets.
> >
> >     You can get into similar trouble if, for example, you reinstall
> ManifoldCF but forget to include
> >     a connector class that was there before.  Carnage ensues.
> >
> >     Karl
> >
> >
> >     On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz <ricrui3solr@gmail.com
> >     <ma...@gmail.com>> wrote:
> >
> >         Hi all
> >         My team uses mcf to crawl documents and index into solr
> instances, but for reasons beyond
> >         our control, sometimes the instances or collections are deleted.
> >         When we try to delete a job and the solr instance or collection
> doesn't exist anymore, the
> >         job reaches the "End notification" status and gets stuck there.
> No other job can be aborted
> >         or deleted until the initial error is fixed.
> >
> >         We are able to clean up the errors following the next steps:
> >
> >         1.  Reconfigure the output connector to an existing Solr
> instance and collection
> >         2.  Reset the output connection, so it forgets any indexed
> documents.
> >         3.  Reset the job, so it forgets any indexed documents.
> >         4.  Restart the ManifoldCF server.
> >
> >         Is there any other way we can solve this error? Is there any way
> we can force delete the job
> >         if we don't care about the job's documents anymore?
> >
> >         Thanks in advance.
> >         Ricardo.
> >
>

Re: Can't delete a job when solr output connection can't connect to the instance.

Posted by Koji Sekiguchi <ko...@rondhuit.com>.
+1.

I respect for the design concept of ManifoldCF, but I think force delete options make MCF more 
useful for those who use MCF as crawler. Adding force delete options doesn't change default 
behaviors and it doesn't break back-compatibility.

Koji

On 2022/06/14 14:46, Ricardo Ruiz wrote:
> Hi Karl
> We are using  ManifoldCF as a crawler more than a synchronizer. We are thinking of contributing to 
> ManifoldCf by including a force job delete and force output connector delete, considering of course 
> the things that need to be deleted with them (BD, etc). Do you think this is possible?
> We think that not only us but the community would be benefited from this kind of functionality.
> 
> Ricardo.
> 
> On Mon, Jun 13, 2022 at 7:34 PM Karl Wright <daddywri@gmail.com <ma...@gmail.com>> wrote:
> 
>     Because ManifoldCF is not just a crawler, but a synchonizer, a job represents and includes a
>     list of documents that have been indexed.  Deleting the job requires deleting the documents that
>     have been indexed also.  It's part of the basic model.
> 
>     So if you tear down your target output instance and then try to tear down the job, it won't
>     work.  ManifoldCF won't just throw away the memory of those documents and act as if nothing
>     happened.
> 
>     If you're just using ManifoldCF as a crawler, therefore, your fix is about as good as it gets.
> 
>     You can get into similar trouble if, for example, you reinstall ManifoldCF but forget to include
>     a connector class that was there before.  Carnage ensues.
> 
>     Karl
> 
> 
>     On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz <ricrui3solr@gmail.com
>     <ma...@gmail.com>> wrote:
> 
>         Hi all
>         My team uses mcf to crawl documents and index into solr instances, but for reasons beyond
>         our control, sometimes the instances or collections are deleted.
>         When we try to delete a job and the solr instance or collection doesn't exist anymore, the
>         job reaches the "End notification" status and gets stuck there. No other job can be aborted
>         or deleted until the initial error is fixed.
> 
>         We are able to clean up the errors following the next steps:
> 
>         1.  Reconfigure the output connector to an existing Solr instance and collection
>         2.  Reset the output connection, so it forgets any indexed documents.
>         3.  Reset the job, so it forgets any indexed documents.
>         4.  Restart the ManifoldCF server.
> 
>         Is there any other way we can solve this error? Is there any way we can force delete the job
>         if we don't care about the job's documents anymore?
> 
>         Thanks in advance.
>         Ricardo.
> 

Re: Can't delete a job when solr output connection can't connect to the instance.

Posted by Ricardo Ruiz <ri...@gmail.com>.
Hi Karl
We are using  ManifoldCF as a crawler more than a synchronizer. We are
thinking of contributing to ManifoldCf by including a force job delete and
force output connector delete, considering of course the things that need
to be deleted with them (BD, etc). Do you think this is possible?
We think that not only us but the community would be benefited from this
kind of functionality.

Ricardo.

On Mon, Jun 13, 2022 at 7:34 PM Karl Wright <da...@gmail.com> wrote:

> Because ManifoldCF is not just a crawler, but a synchonizer, a job
> represents and includes a list of documents that have been indexed.
> Deleting the job requires deleting the documents that have been indexed
> also.  It's part of the basic model.
>
> So if you tear down your target output instance and then try to tear down
> the job, it won't work.  ManifoldCF won't just throw away the memory of
> those documents and act as if nothing happened.
>
> If you're just using ManifoldCF as a crawler, therefore, your fix is about
> as good as it gets.
>
> You can get into similar trouble if, for example, you reinstall ManifoldCF
> but forget to include a connector class that was there before.  Carnage
> ensues.
>
> Karl
>
>
> On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz <ri...@gmail.com>
> wrote:
>
>> Hi all
>> My team uses mcf to crawl documents and index into solr instances, but
>> for reasons beyond our control, sometimes the instances or collections are
>> deleted.
>> When we try to delete a job and the solr instance or collection doesn't
>> exist anymore, the job reaches the "End notification" status and gets stuck
>> there. No other job can be aborted or deleted until the initial error is
>> fixed.
>>
>> We are able to clean up the errors following the next steps:
>>
>> 1.  Reconfigure the output connector to an existing Solr instance and
>> collection
>> 2.  Reset the output connection, so it forgets any indexed documents.
>> 3.  Reset the job, so it forgets any indexed documents.
>> 4.  Restart the ManifoldCF server.
>>
>> Is there any other way we can solve this error? Is there any way we can
>> force delete the job if we don't care about the job's documents anymore?
>>
>> Thanks in advance.
>> Ricardo.
>>
>

Re: Can't delete a job when solr output connection can't connect to the instance.

Posted by Karl Wright <da...@gmail.com>.
Because ManifoldCF is not just a crawler, but a synchonizer, a job
represents and includes a list of documents that have been indexed.
Deleting the job requires deleting the documents that have been indexed
also.  It's part of the basic model.

So if you tear down your target output instance and then try to tear down
the job, it won't work.  ManifoldCF won't just throw away the memory of
those documents and act as if nothing happened.

If you're just using ManifoldCF as a crawler, therefore, your fix is about
as good as it gets.

You can get into similar trouble if, for example, you reinstall ManifoldCF
but forget to include a connector class that was there before.  Carnage
ensues.

Karl


On Mon, Jun 13, 2022 at 1:39 AM Ricardo Ruiz <ri...@gmail.com> wrote:

> Hi all
> My team uses mcf to crawl documents and index into solr instances, but for
> reasons beyond our control, sometimes the instances or collections are
> deleted.
> When we try to delete a job and the solr instance or collection doesn't
> exist anymore, the job reaches the "End notification" status and gets stuck
> there. No other job can be aborted or deleted until the initial error is
> fixed.
>
> We are able to clean up the errors following the next steps:
>
> 1.  Reconfigure the output connector to an existing Solr instance and
> collection
> 2.  Reset the output connection, so it forgets any indexed documents.
> 3.  Reset the job, so it forgets any indexed documents.
> 4.  Restart the ManifoldCF server.
>
> Is there any other way we can solve this error? Is there any way we can
> force delete the job if we don't care about the job's documents anymore?
>
> Thanks in advance.
> Ricardo.
>