You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Matteo Grolla <m....@sourcesense.com> on 2014/06/16 18:47:49 UTC
getDocumentVersions returning null
Hi,
I see that if I return null in getDocumentVersions() (actually the array values are null)
the method processDocuments is not called for the corresponding identifiers
But the document is not deleted from the target repository.
I'm using the filesystem connector, so those are my settings for the crawling mode.
Supposing that my source repository gives me the list of deleted documents, what should I do to handle the deletion?
Cheers
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Re: getDocumentVersions returning null
Posted by Matteo Grolla <m....@sourcesense.com>.
My fault actually,
I was making experiments so
I indexed the document D directly to solr
added a reference in processDocuments to doc D
getDocumentVersions() was returning null for doc D
but it wasn't removed…
then I realized that manifold doesn't remove what it didn't index itself (not all crawlers behave this way)
So I made another test indexing doc D with manifold and everything works as expected
hope this helps others
--
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com
Il giorno 16/giu/2014, alle ore 19:11, Karl Wright ha scritto:
> Hi Matteo,
>
> The document should be deleted from the target repository when you return a
> null document version. Why do you think it does not?
>
> As for your second question, please read up on the various models that the
> crawler supports. They're described pretty thoroughly in ManifoldCF in
> Action.
>
> Karl
>
>
>
> On Mon, Jun 16, 2014 at 12:47 PM, Matteo Grolla <m....@sourcesense.com>
> wrote:
>
>> Hi,
>> I see that if I return null in getDocumentVersions() (actually
>> the array values are null)
>> the method processDocuments is not called for the corresponding identifiers
>> But the document is not deleted from the target repository.
>> I'm using the filesystem connector, so those are my settings for the
>> crawling mode.
>> Supposing that my source repository gives me the list of deleted
>> documents, what should I do to handle the deletion?
>>
>> Cheers
>>
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
>> http://www.sourcesense.com
>>
>>
Re: getDocumentVersions returning null
Posted by Karl Wright <da...@gmail.com>.
Hi Matteo,
The document should be deleted from the target repository when you return a
null document version. Why do you think it does not?
As for your second question, please read up on the various models that the
crawler supports. They're described pretty thoroughly in ManifoldCF in
Action.
Karl
On Mon, Jun 16, 2014 at 12:47 PM, Matteo Grolla <m....@sourcesense.com>
wrote:
> Hi,
> I see that if I return null in getDocumentVersions() (actually
> the array values are null)
> the method processDocuments is not called for the corresponding identifiers
> But the document is not deleted from the target repository.
> I'm using the filesystem connector, so those are my settings for the
> crawling mode.
> Supposing that my source repository gives me the list of deleted
> documents, what should I do to handle the deletion?
>
> Cheers
>
> --
> Matteo Grolla
> Sourcesense - making sense of Open Source
> http://www.sourcesense.com
>
>