You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Matteo Grolla <m....@sourcesense.com> on 2014/06/16 18:47:49 UTC

getDocumentVersions returning null

Hi,
	I see that if I return null in getDocumentVersions()  (actually the array values are null) 
the method processDocuments is not called for the corresponding identifiers
But the document is not deleted from the target repository.
I'm using the filesystem connector, so those are my settings for the crawling mode.
Supposing that my source repository gives me the list of deleted documents, what should I do to handle the deletion?

Cheers

-- 
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com


Re: getDocumentVersions returning null

Posted by Matteo Grolla <m....@sourcesense.com>.
My fault actually,
	I was making experiments so
	I indexed the document D directly to solr 
	added a reference in processDocuments to doc D
		getDocumentVersions() was returning null for doc D
	but it wasn't removed…

	then I realized that manifold doesn't remove what it didn't index itself (not all crawlers behave this way)
	So I made another test indexing doc D with manifold and everything works as expected

	hope this helps others
-- 
Matteo Grolla
Sourcesense - making sense of Open Source
http://www.sourcesense.com

Il giorno 16/giu/2014, alle ore 19:11, Karl Wright ha scritto:

> Hi Matteo,
> 
> The document should be deleted from the target repository when you return a
> null document version.  Why do you think it does not?
> 
> As for your second question, please read up on the various models that the
> crawler supports.  They're described pretty thoroughly in ManifoldCF in
> Action.
> 
> Karl
> 
> 
> 
> On Mon, Jun 16, 2014 at 12:47 PM, Matteo Grolla <m....@sourcesense.com>
> wrote:
> 
>> Hi,
>>        I see that if I return null in getDocumentVersions()  (actually
>> the array values are null)
>> the method processDocuments is not called for the corresponding identifiers
>> But the document is not deleted from the target repository.
>> I'm using the filesystem connector, so those are my settings for the
>> crawling mode.
>> Supposing that my source repository gives me the list of deleted
>> documents, what should I do to handle the deletion?
>> 
>> Cheers
>> 
>> --
>> Matteo Grolla
>> Sourcesense - making sense of Open Source
>> http://www.sourcesense.com
>> 
>> 


Re: getDocumentVersions returning null

Posted by Karl Wright <da...@gmail.com>.
Hi Matteo,

The document should be deleted from the target repository when you return a
null document version.  Why do you think it does not?

As for your second question, please read up on the various models that the
crawler supports.  They're described pretty thoroughly in ManifoldCF in
Action.

Karl



On Mon, Jun 16, 2014 at 12:47 PM, Matteo Grolla <m....@sourcesense.com>
wrote:

> Hi,
>         I see that if I return null in getDocumentVersions()  (actually
> the array values are null)
> the method processDocuments is not called for the corresponding identifiers
> But the document is not deleted from the target repository.
> I'm using the filesystem connector, so those are my settings for the
> crawling mode.
> Supposing that my source repository gives me the list of deleted
> documents, what should I do to handle the deletion?
>
> Cheers
>
> --
> Matteo Grolla
> Sourcesense - making sense of Open Source
> http://www.sourcesense.com
>
>