You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Shinichiro Abe <sh...@gmail.com> on 2011/07/04 15:37:58 UTC

Changing Include files doesn't work

Hi.
I scheduled the job of once crawling.I set including *.txt only on Paths tab.
I started the job, text files were indexed.
After that, I changed *.txt into *.xls.
I started the job, it should index xls files, but text files were not deleted and xls files were not indexed.

This problem occurs in Tomcat + agent process, not in Jetty.
(If restarting agent process after changing target file, this can be resolved.) 
Which is correct behavior?
It seems that it is raised by cache managing, or looks like some bugs around seeding 
or something wrong on my Tomcat environment.

And it occurs on both once crawling and continuous crawling.
And it occurs on both jcifs and filesystem.

Regards, Shichiro Abe

Re: Changing Include files doesn't work

Posted by Karl Wright <da...@gmail.com>.
Feel free to open a ticket.  But this cannot be done at the script
level - there's a race condition if you were to do that.  So any fix
needs to be in the AgentRun class.

Karl

On Tue, Jul 5, 2011 at 12:13 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> Thank you.
> I set up sync directory property, then it worked well.
>
> BTW, though I don't know whether it has relevant to the synchronization,
> can it check running more than one of instance of agent?
> For example, if one runs "executecommand.sh ~ agents.AgentRun" twice,
> can the command warn that it is already running?
> Now, if one runs the command twice, Java procceses run twice.
>
> Thank you,
> Shinichiro Abe
>
> On 2011/07/05, at 9:26, Karl Wright wrote:
>
>> It sounds like you have not set up synchronization properly for the
>> multi-process installation you are running.  See:
>> http://incubator.apache.org/connectors/how-to-build-and-deploy.html#Examples
>>
>> The synch directory needs to be specified for multi-process installations.
>>
>> Karl
>>
>>
>> On Mon, Jul 4, 2011 at 9:37 AM, Shinichiro Abe
>> <sh...@gmail.com> wrote:
>>> Hi.
>>> I scheduled the job of once crawling.I set including *.txt only on Paths tab.
>>> I started the job, text files were indexed.
>>> After that, I changed *.txt into *.xls.
>>> I started the job, it should index xls files, but text files were not deleted and xls files were not indexed.
>>>
>>> This problem occurs in Tomcat + agent process, not in Jetty.
>>> (If restarting agent process after changing target file, this can be resolved.)
>>> Which is correct behavior?
>>> It seems that it is raised by cache managing, or looks like some bugs around seeding
>>> or something wrong on my Tomcat environment.
>>>
>>> And it occurs on both once crawling and continuous crawling.
>>> And it occurs on both jcifs and filesystem.
>>>
>>> Regards, Shichiro Abe
>
>

Re: Changing Include files doesn't work

Posted by Shinichiro Abe <sh...@gmail.com>.
Thank you.
I set up sync directory property, then it worked well.

BTW, though I don't know whether it has relevant to the synchronization,
can it check running more than one of instance of agent?
For example, if one runs "executecommand.sh ~ agents.AgentRun" twice,
can the command warn that it is already running?
Now, if one runs the command twice, Java procceses run twice.

Thank you,
Shinichiro Abe

On 2011/07/05, at 9:26, Karl Wright wrote:

> It sounds like you have not set up synchronization properly for the
> multi-process installation you are running.  See:
> http://incubator.apache.org/connectors/how-to-build-and-deploy.html#Examples
> 
> The synch directory needs to be specified for multi-process installations.
> 
> Karl
> 
> 
> On Mon, Jul 4, 2011 at 9:37 AM, Shinichiro Abe
> <sh...@gmail.com> wrote:
>> Hi.
>> I scheduled the job of once crawling.I set including *.txt only on Paths tab.
>> I started the job, text files were indexed.
>> After that, I changed *.txt into *.xls.
>> I started the job, it should index xls files, but text files were not deleted and xls files were not indexed.
>> 
>> This problem occurs in Tomcat + agent process, not in Jetty.
>> (If restarting agent process after changing target file, this can be resolved.)
>> Which is correct behavior?
>> It seems that it is raised by cache managing, or looks like some bugs around seeding
>> or something wrong on my Tomcat environment.
>> 
>> And it occurs on both once crawling and continuous crawling.
>> And it occurs on both jcifs and filesystem.
>> 
>> Regards, Shichiro Abe


Re: Changing Include files doesn't work

Posted by Karl Wright <da...@gmail.com>.
It sounds like you have not set up synchronization properly for the
multi-process installation you are running.  See:
http://incubator.apache.org/connectors/how-to-build-and-deploy.html#Examples

The synch directory needs to be specified for multi-process installations.

Karl


On Mon, Jul 4, 2011 at 9:37 AM, Shinichiro Abe
<sh...@gmail.com> wrote:
> Hi.
> I scheduled the job of once crawling.I set including *.txt only on Paths tab.
> I started the job, text files were indexed.
> After that, I changed *.txt into *.xls.
> I started the job, it should index xls files, but text files were not deleted and xls files were not indexed.
>
> This problem occurs in Tomcat + agent process, not in Jetty.
> (If restarting agent process after changing target file, this can be resolved.)
> Which is correct behavior?
> It seems that it is raised by cache managing, or looks like some bugs around seeding
> or something wrong on my Tomcat environment.
>
> And it occurs on both once crawling and continuous crawling.
> And it occurs on both jcifs and filesystem.
>
> Regards, Shichiro Abe