You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2012/04/01 15:34:16 UTC

Re: Running 2 jobs to update same document Index but different

Hi Anupam,

I looked at the code at some length, and there is another deletion
pathway that would not have been caught by my previous patch.
However, this pathway is only triggered *before* indexing.  Still, we
should rule it out.

Can you remove the old patch I gave you and apply this one instead:

Index: framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java
===================================================================
--- framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java	(revision
1307815)
+++ framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java	(working
copy)
@@ -1589,6 +1589,7 @@
   protected void removeDocument(IOutputConnection connection, String
documentURI, String outputDescription, IOutputRemoveActivity
activities)
     throws ManifoldCFException, ServiceInterruption
   {
+    Logging.ingest.error("Removing document",new Exception("Removing
document"));
     IOutputConnector connector =
OutputConnectorFactory.grab(threadContext,connection.getClassName(),connection.getConfigParams(),connection.getMaxConnections());
     if (connector == null)
       // The connector is not installed; treat this as a service interruption.


Same instructions as before.

Thanks,
Karl


On Sat, Mar 31, 2012 at 4:34 PM, Karl Wright <da...@gmail.com> wrote:
> I tried modifying the file system connector here to use the same crawling
> model as the document connector.  Everything still behaved exactly as
> expected.  So we are really going to need that trace to make any further
> progress.
>
>
> Karl
>
> Sent from my Windows Phone
> ________________________________
> From: Karl Wright
> Sent: 3/31/2012 4:11 PM
> To: Anupam Bhattacharya
> Subject: RE: Running 2 jobs to update same document Index but different
>
> The output from the patch I gave you will go to manifoldcf.log.  If you
> aren't seeing it please send me your properties.xml file.
>
> Karl
>
> Sent from my Windows Phone
> ________________________________
> From: Anupam Bhattacharya
> Sent: 3/31/2012 10:37 AM
> To: Karl Wright
> Subject: Re: Running 2 jobs to update same document Index but different
>
> Hello Karl,
>
> I did today the filesystem crawling where Output connector to Null and Input
> with Filesystem Connector.
> The job ran properly without deleting the files which i crawled.
>
> Although i could not find any log messages logged into any manifoldcf.log
> files. I did the rebuild in the IncrementalIngestor.java.
>
> Can you please mention where i need to look for any log messages related to
> this.
>
> Regards
> Anupam
>
>
> On Fri, Mar 30, 2012 at 4:21 PM, Karl Wright <da...@gmail.com> wrote:
>>
>> I did not see that you tried creating a filesystem connection and job.
>>  Did you do that, and did it work for you without sending a deletion?
>> If not, please go back to using the manifoldcf id field and try that
>> first.
>>
>> Here is the patch I'd like you to apply:
>>
>> ===================================================================
>> ---
>> framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java
>>      (revision
>> 1307149)
>> +++
>> framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java
>>      (working
>> copy)
>> @@ -697,6 +697,8 @@
>>   {
>>     IOutputConnection connection =
>> connectionManager.load(outputConnectionName);
>>
>> +    Logging.ingest.error("Deleting documents!", new
>> Exception("Deletion stack trace"));
>> +
>>     if (Logging.ingest.isDebugEnabled())
>>     {
>>       int i = 0;
>>
>>
>> Then, rebuild ManifoldCF.  Every document that is deleted from the
>> index will generate a trace in the log.  Run your crawl and send me
>> one of those traces.
>>
>> Karl
>>
>>
>> On Fri, Mar 30, 2012 at 6:06 AM, Anupam Bhattacharya
>> <an...@gmail.com> wrote:
>> > I checked the Manifoldcf logs and i there were no exceptions.
>> >
>> > Additionally i changed the id (uniqueKey) in SOLR to the documentum
>> > specific
>> > unique id i.e. r_object_id and ran the job. This i time i could easily
>> > create the indexes.
>> >
>> > For (4) please provide the places for which i need to enable logging.
>> >
>> > On Thu, Mar 29, 2012 at 6:56 PM, Karl Wright <da...@gmail.com> wrote:
>> >>
>> >> "But as per my observation the deletion happens only when uniqueKey in
>> >> SOLR schema is set to id. "
>> >>
>> >> The SOLR setup cannot influence the flow in ManifoldCF unless it causes
>> >> SOLR to reject the ManifoldCF requests.  So I suspect that the delete
>> >> request is happening in both cases, and it is not getting acted upon by
>> >> SOLR
>> >> in the case where uniqueKey is not set to "id".  That's because the
>> >> delete
>> >> request from ManifoldCF will be for a key that solr doesn't recognize
>> >> as
>> >> such.
>> >>
>> >> Please do try recommendations (3) and (4).
>> >>
>> >> Karl
>> >>
>> >>
>> >
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>