You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Karl Wright <da...@gmail.com> on 2012/04/01 15:34:16 UTC
Re: Running 2 jobs to update same document Index but different
Hi Anupam,
I looked at the code at some length, and there is another deletion
pathway that would not have been caught by my previous patch.
However, this pathway is only triggered *before* indexing. Still, we
should rule it out.
Can you remove the old patch I gave you and apply this one instead:
Index: framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java
===================================================================
--- framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java (revision
1307815)
+++ framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java (working
copy)
@@ -1589,6 +1589,7 @@
protected void removeDocument(IOutputConnection connection, String
documentURI, String outputDescription, IOutputRemoveActivity
activities)
throws ManifoldCFException, ServiceInterruption
{
+ Logging.ingest.error("Removing document",new Exception("Removing
document"));
IOutputConnector connector =
OutputConnectorFactory.grab(threadContext,connection.getClassName(),connection.getConfigParams(),connection.getMaxConnections());
if (connector == null)
// The connector is not installed; treat this as a service interruption.
Same instructions as before.
Thanks,
Karl
On Sat, Mar 31, 2012 at 4:34 PM, Karl Wright <da...@gmail.com> wrote:
> I tried modifying the file system connector here to use the same crawling
> model as the document connector. Everything still behaved exactly as
> expected. So we are really going to need that trace to make any further
> progress.
>
>
> Karl
>
> Sent from my Windows Phone
> ________________________________
> From: Karl Wright
> Sent: 3/31/2012 4:11 PM
> To: Anupam Bhattacharya
> Subject: RE: Running 2 jobs to update same document Index but different
>
> The output from the patch I gave you will go to manifoldcf.log. If you
> aren't seeing it please send me your properties.xml file.
>
> Karl
>
> Sent from my Windows Phone
> ________________________________
> From: Anupam Bhattacharya
> Sent: 3/31/2012 10:37 AM
> To: Karl Wright
> Subject: Re: Running 2 jobs to update same document Index but different
>
> Hello Karl,
>
> I did today the filesystem crawling where Output connector to Null and Input
> with Filesystem Connector.
> The job ran properly without deleting the files which i crawled.
>
> Although i could not find any log messages logged into any manifoldcf.log
> files. I did the rebuild in the IncrementalIngestor.java.
>
> Can you please mention where i need to look for any log messages related to
> this.
>
> Regards
> Anupam
>
>
> On Fri, Mar 30, 2012 at 4:21 PM, Karl Wright <da...@gmail.com> wrote:
>>
>> I did not see that you tried creating a filesystem connection and job.
>> Did you do that, and did it work for you without sending a deletion?
>> If not, please go back to using the manifoldcf id field and try that
>> first.
>>
>> Here is the patch I'd like you to apply:
>>
>> ===================================================================
>> ---
>> framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java
>> (revision
>> 1307149)
>> +++
>> framework/agents/src/main/java/org/apache/manifoldcf/agents/incrementalingest/IncrementalIngester.java
>> (working
>> copy)
>> @@ -697,6 +697,8 @@
>> {
>> IOutputConnection connection =
>> connectionManager.load(outputConnectionName);
>>
>> + Logging.ingest.error("Deleting documents!", new
>> Exception("Deletion stack trace"));
>> +
>> if (Logging.ingest.isDebugEnabled())
>> {
>> int i = 0;
>>
>>
>> Then, rebuild ManifoldCF. Every document that is deleted from the
>> index will generate a trace in the log. Run your crawl and send me
>> one of those traces.
>>
>> Karl
>>
>>
>> On Fri, Mar 30, 2012 at 6:06 AM, Anupam Bhattacharya
>> <an...@gmail.com> wrote:
>> > I checked the Manifoldcf logs and i there were no exceptions.
>> >
>> > Additionally i changed the id (uniqueKey) in SOLR to the documentum
>> > specific
>> > unique id i.e. r_object_id and ran the job. This i time i could easily
>> > create the indexes.
>> >
>> > For (4) please provide the places for which i need to enable logging.
>> >
>> > On Thu, Mar 29, 2012 at 6:56 PM, Karl Wright <da...@gmail.com> wrote:
>> >>
>> >> "But as per my observation the deletion happens only when uniqueKey in
>> >> SOLR schema is set to id. "
>> >>
>> >> The SOLR setup cannot influence the flow in ManifoldCF unless it causes
>> >> SOLR to reject the ManifoldCF requests. So I suspect that the delete
>> >> request is happening in both cases, and it is not getting acted upon by
>> >> SOLR
>> >> in the case where uniqueKey is not set to "id". That's because the
>> >> delete
>> >> request from ManifoldCF will be for a key that solr doesn't recognize
>> >> as
>> >> such.
>> >>
>> >> Please do try recommendations (3) and (4).
>> >>
>> >> Karl
>> >>
>> >>
>> >
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>