You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Anupam Bhattacharya <an...@gmail.com> on 2012/08/02 19:12:57 UTC

The Schedulars are not starting automatically

I have a Job which is indexing properly even the incremental indexing, if
initiated/Run manually. Although even after adding a specific time to Run
the schedular process the Jobs is not starting on its own.

What is the ideal configuration to configure a Job which run automatically
everyday at 12 am and does and incremental re-indexing (only look for those
document which are new OR modified after the last crawl) of the repository ?

Is it necessary to input/give the total run time details for adding a
specific schedule time.

Regards
Anupam

Re: The Schedulars are not starting automatically

Posted by Karl Wright <da...@gmail.com>.
On Tue, Nov 6, 2012 at 1:41 PM, Anupam Bhattacharya <an...@gmail.com> wrote:
> Hi Karl,
>
> Unfortunately I currently don't have a copy of you book. Thus I am asking
> all architectural and configuration question.
> Can you please confirm that the by first option you mean "rescan
> dynamically" and second option is "scan document once" ?
>

Yes.

> Regarding my second question. From the List Output Connections if I click
> view for an existing SOLR connection and Click the Re-ingest all associated
> documents what changes occurs within ManifoldCF ? Does this action delete
> any record from existing tables ?
>

Yes, of course it does, otherwise it wouldn't do anything.  It removes
entries from the ingeststatus table.

Listen, you sound like you are working too hard to understand the
internals of the project.  It might really help you to understand how
to *use* the project instead.  That is why I pointed you at the book -
it describes how ManifoldCF works.  The first chapter is free.  I
suggest you try just reading it.

Karl

> Regards
> Anupam
>
>
> On Tue, Nov 6, 2012 at 11:46 PM, Karl Wright <da...@gmail.com> wrote:
>>
>> Hi Anupam,
>>
>> I'm having difficulty understanding what you posted here, but I will
>> try to explain the difference between "rescan dynamically" and "scan
>> every document once".  You may find more help also in ManifoldCF in
>> Action, at http://www.manning.com/wright .
>>
>> The first option causes your job to run forever.  The job runs only in
>> the schedule windows allotted for it.  It periodically "discovers" new
>> documents, and (depending on the crawling model of the connector) may
>> check for existence or modification of an already-crawled document.
>> Each document has its own schedule for doing this.
>>
>> The second option is more likely to be what you want.  Each job
>> starts, runs, and completes, being sure to run only in the scheduling
>> windows you provide.  You then run it again, and again (or your job
>> schedule makes that happen).  It will do the minimal work to keep your
>> index up to date.
>>
>> There are significant differences between how you would set up a job
>> using one model vs. the other.  I strongly suggest you read at least
>> the first few chapters of the book.
>>
>> Karl
>>
>> On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya
>> <an...@gmail.com> wrote:
>> > My incremental indexing was working previously but I have messed up with
>> > few
>> > settings due to which the documents indexed for the previous day gets
>> > deleted & only the new once shows up. I suspect that it is due to the
>> > settings in List all Job>Edit selected job>Scheduling>Schedule type:
>> > "Rescan
>> > documents dynamically" OR "Scan every document once" ? Please let me
>> > know
>> > the appropriate settings to index only the new documents in the
>> > repository.
>> >
>> > After deleting the SOLR indexes data folder and clearing the table
>> > records
>> > in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans
>> > only
>> > the rest new document list. Untill I go to List Output Connections and
>> > Click
>> > View for a SOLR connection and Click and Ok the Re-ingest all associated
>> > documents. How it is functioning to keep a track of which documents
>> > ingested
>> > previously and then fetch only the list of new document list ?
>> >
>> > Regards
>> > Anupam
>> >
>> >
>> > On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya
>> > <an...@gmail.com>
>> > wrote:
>> >>
>> >> Thanks..
>> >>
>> >> There is a option to set Start Method in Connection tab in the Job
>> >> settings. I made to changes to "Start when the Schedule window starts"
>> >> and
>> >> the problem got resolved.
>> >>
>> >> Regards
>> >> Anupam
>> >>
>> >>
>> >> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <da...@gmail.com>
>> >> wrote:
>> >>>
>> >>> The incremental will work the same whether the job is run manually or
>> >>> started automatically.
>> >>>
>> >>> If you have added the appropriate schedule record to your job, you
>> >>> also have to select the "run job automatically" radio button on one of
>> >>> the other job tabs for automatic runs to take place.  I suspect that
>> >>> is what you are missing.
>> >>>
>> >>> Karl
>> >>>
>> >>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya
>> >>> <an...@gmail.com>
>> >>> wrote:
>> >>> > I have a Job which is indexing properly even the incremental
>> >>> > indexing,
>> >>> > if
>> >>> > initiated/Run manually. Although even after adding a specific time
>> >>> > to
>> >>> > Run
>> >>> > the schedular process the Jobs is not starting on its own.
>> >>> >
>> >>> > What is the ideal configuration to configure a Job which run
>> >>> > automatically
>> >>> > everyday at 12 am and does and incremental re-indexing (only look
>> >>> > for
>> >>> > those
>> >>> > document which are new OR modified after the last crawl) of the
>> >>> > repository ?
>> >>> >
>> >>> > Is it necessary to input/give the total run time details for adding
>> >>> > a
>> >>> > specific schedule time.
>> >>> >
>> >>> > Regards
>> >>> > Anupam
>> >>
>> >>
>> >
>> >
>> >
>> > --
>> > Thanks & Regards
>> > Anupam Bhattacharya
>> >
>> >
>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Re: The Schedulars are not starting automatically

Posted by Anupam Bhattacharya <an...@gmail.com>.
Hi Karl,

Unfortunately I currently don't have a copy of you book. Thus I am asking
all architectural and configuration question.
Can you please confirm that the by first option you mean "rescan
dynamically" and second option is "scan document once" ?

Regarding my second question. From the List Output Connections if I click
view for an existing SOLR connection and Click the Re-ingest all associated
documents what changes occurs within ManifoldCF ? Does this action delete
any record from existing tables ?

Regards
Anupam

On Tue, Nov 6, 2012 at 11:46 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Anupam,
>
> I'm having difficulty understanding what you posted here, but I will
> try to explain the difference between "rescan dynamically" and "scan
> every document once".  You may find more help also in ManifoldCF in
> Action, at http://www.manning.com/wright .
>
> The first option causes your job to run forever.  The job runs only in
> the schedule windows allotted for it.  It periodically "discovers" new
> documents, and (depending on the crawling model of the connector) may
> check for existence or modification of an already-crawled document.
> Each document has its own schedule for doing this.
>
> The second option is more likely to be what you want.  Each job
> starts, runs, and completes, being sure to run only in the scheduling
> windows you provide.  You then run it again, and again (or your job
> schedule makes that happen).  It will do the minimal work to keep your
> index up to date.
>
> There are significant differences between how you would set up a job
> using one model vs. the other.  I strongly suggest you read at least
> the first few chapters of the book.
>
> Karl
>
> On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya
> <an...@gmail.com> wrote:
> > My incremental indexing was working previously but I have messed up with
> few
> > settings due to which the documents indexed for the previous day gets
> > deleted & only the new once shows up. I suspect that it is due to the
> > settings in List all Job>Edit selected job>Scheduling>Schedule type:
> "Rescan
> > documents dynamically" OR "Scan every document once" ? Please let me know
> > the appropriate settings to index only the new documents in the
> repository.
> >
> > After deleting the SOLR indexes data folder and clearing the table
> records
> > in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans only
> > the rest new document list. Untill I go to List Output Connections and
> Click
> > View for a SOLR connection and Click and Ok the Re-ingest all associated
> > documents. How it is functioning to keep a track of which documents
> ingested
> > previously and then fetch only the list of new document list ?
> >
> > Regards
> > Anupam
> >
> >
> > On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya <
> anupamb82@gmail.com>
> > wrote:
> >>
> >> Thanks..
> >>
> >> There is a option to set Start Method in Connection tab in the Job
> >> settings. I made to changes to "Start when the Schedule window starts"
> and
> >> the problem got resolved.
> >>
> >> Regards
> >> Anupam
> >>
> >>
> >> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <da...@gmail.com>
> wrote:
> >>>
> >>> The incremental will work the same whether the job is run manually or
> >>> started automatically.
> >>>
> >>> If you have added the appropriate schedule record to your job, you
> >>> also have to select the "run job automatically" radio button on one of
> >>> the other job tabs for automatic runs to take place.  I suspect that
> >>> is what you are missing.
> >>>
> >>> Karl
> >>>
> >>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <
> anupamb82@gmail.com>
> >>> wrote:
> >>> > I have a Job which is indexing properly even the incremental
> indexing,
> >>> > if
> >>> > initiated/Run manually. Although even after adding a specific time to
> >>> > Run
> >>> > the schedular process the Jobs is not starting on its own.
> >>> >
> >>> > What is the ideal configuration to configure a Job which run
> >>> > automatically
> >>> > everyday at 12 am and does and incremental re-indexing (only look for
> >>> > those
> >>> > document which are new OR modified after the last crawl) of the
> >>> > repository ?
> >>> >
> >>> > Is it necessary to input/give the total run time details for adding a
> >>> > specific schedule time.
> >>> >
> >>> > Regards
> >>> > Anupam
> >>
> >>
> >
> >
> >
> > --
> > Thanks & Regards
> > Anupam Bhattacharya
> >
> >
>



-- 
Thanks & Regards
Anupam Bhattacharya

Re: The Schedulars are not starting automatically

Posted by Karl Wright <da...@gmail.com>.
Hi Anupam,

I'm having difficulty understanding what you posted here, but I will
try to explain the difference between "rescan dynamically" and "scan
every document once".  You may find more help also in ManifoldCF in
Action, at http://www.manning.com/wright .

The first option causes your job to run forever.  The job runs only in
the schedule windows allotted for it.  It periodically "discovers" new
documents, and (depending on the crawling model of the connector) may
check for existence or modification of an already-crawled document.
Each document has its own schedule for doing this.

The second option is more likely to be what you want.  Each job
starts, runs, and completes, being sure to run only in the scheduling
windows you provide.  You then run it again, and again (or your job
schedule makes that happen).  It will do the minimal work to keep your
index up to date.

There are significant differences between how you would set up a job
using one model vs. the other.  I strongly suggest you read at least
the first few chapters of the book.

Karl

On Tue, Nov 6, 2012 at 12:35 PM, Anupam Bhattacharya
<an...@gmail.com> wrote:
> My incremental indexing was working previously but I have messed up with few
> settings due to which the documents indexed for the previous day gets
> deleted & only the new once shows up. I suspect that it is due to the
> settings in List all Job>Edit selected job>Scheduling>Schedule type: "Rescan
> documents dynamically" OR "Scan every document once" ? Please let me know
> the appropriate settings to index only the new documents in the repository.
>
> After deleting the SOLR indexes data folder and clearing the table records
> in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans only
> the rest new document list. Untill I go to List Output Connections and Click
> View for a SOLR connection and Click and Ok the Re-ingest all associated
> documents. How it is functioning to keep a track of which documents ingested
> previously and then fetch only the list of new document list ?
>
> Regards
> Anupam
>
>
> On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya <an...@gmail.com>
> wrote:
>>
>> Thanks..
>>
>> There is a option to set Start Method in Connection tab in the Job
>> settings. I made to changes to "Start when the Schedule window starts" and
>> the problem got resolved.
>>
>> Regards
>> Anupam
>>
>>
>> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>> The incremental will work the same whether the job is run manually or
>>> started automatically.
>>>
>>> If you have added the appropriate schedule record to your job, you
>>> also have to select the "run job automatically" radio button on one of
>>> the other job tabs for automatic runs to take place.  I suspect that
>>> is what you are missing.
>>>
>>> Karl
>>>
>>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <an...@gmail.com>
>>> wrote:
>>> > I have a Job which is indexing properly even the incremental indexing,
>>> > if
>>> > initiated/Run manually. Although even after adding a specific time to
>>> > Run
>>> > the schedular process the Jobs is not starting on its own.
>>> >
>>> > What is the ideal configuration to configure a Job which run
>>> > automatically
>>> > everyday at 12 am and does and incremental re-indexing (only look for
>>> > those
>>> > document which are new OR modified after the last crawl) of the
>>> > repository ?
>>> >
>>> > Is it necessary to input/give the total run time details for adding a
>>> > specific schedule time.
>>> >
>>> > Regards
>>> > Anupam
>>
>>
>
>
>
> --
> Thanks & Regards
> Anupam Bhattacharya
>
>

Re: The Schedulars are not starting automatically

Posted by Anupam Bhattacharya <an...@gmail.com>.
My incremental indexing was working previously but I have messed up with
few settings due to which the documents indexed for the previous day gets
deleted & only the new once shows up. I suspect that it is due to the
settings in List all Job>Edit selected job>Scheduling>Schedule type:
"Rescan documents dynamically" OR "Scan every document once" ? Please let
me know the appropriate settings to index only the new documents in the
repository.

After deleting the SOLR indexes data folder and clearing the table records
in jobqueue, repohistory, ingeststatus I found that ManifoldCF scans only
the rest new document list. Untill I go to List Output Connections and
Click View for a SOLR connection and Click and Ok the Re-ingest all
associated documents. How it is functioning to keep a track of which
documents ingested previously and then fetch only the list of new document
list ?

Regards
Anupam


On Tue, Aug 14, 2012 at 10:01 AM, Anupam Bhattacharya
<an...@gmail.com>wrote:

> Thanks..
>
> There is a option to set Start Method in Connection tab in the Job
> settings. I made to changes to "Start when the Schedule window starts" and
> the problem got resolved.
>
> Regards
> Anupam
>
>
> On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <da...@gmail.com> wrote:
>
>> The incremental will work the same whether the job is run manually or
>> started automatically.
>>
>> If you have added the appropriate schedule record to your job, you
>> also have to select the "run job automatically" radio button on one of
>> the other job tabs for automatic runs to take place.  I suspect that
>> is what you are missing.
>>
>> Karl
>>
>> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <an...@gmail.com>
>> wrote:
>> > I have a Job which is indexing properly even the incremental indexing,
>> if
>> > initiated/Run manually. Although even after adding a specific time to
>> Run
>> > the schedular process the Jobs is not starting on its own.
>> >
>> > What is the ideal configuration to configure a Job which run
>> automatically
>> > everyday at 12 am and does and incremental re-indexing (only look for
>> those
>> > document which are new OR modified after the last crawl) of the
>> repository ?
>> >
>> > Is it necessary to input/give the total run time details for adding a
>> > specific schedule time.
>> >
>> > Regards
>> > Anupam
>>
>
>


-- 
Thanks & Regards
Anupam Bhattacharya

Re: The Schedulars are not starting automatically

Posted by Anupam Bhattacharya <an...@gmail.com>.
Thanks..

There is a option to set Start Method in Connection tab in the Job
settings. I made to changes to "Start when the Schedule window starts" and
the problem got resolved.

Regards
Anupam

On Thu, Aug 2, 2012 at 10:59 PM, Karl Wright <da...@gmail.com> wrote:

> The incremental will work the same whether the job is run manually or
> started automatically.
>
> If you have added the appropriate schedule record to your job, you
> also have to select the "run job automatically" radio button on one of
> the other job tabs for automatic runs to take place.  I suspect that
> is what you are missing.
>
> Karl
>
> On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <an...@gmail.com>
> wrote:
> > I have a Job which is indexing properly even the incremental indexing, if
> > initiated/Run manually. Although even after adding a specific time to Run
> > the schedular process the Jobs is not starting on its own.
> >
> > What is the ideal configuration to configure a Job which run
> automatically
> > everyday at 12 am and does and incremental re-indexing (only look for
> those
> > document which are new OR modified after the last crawl) of the
> repository ?
> >
> > Is it necessary to input/give the total run time details for adding a
> > specific schedule time.
> >
> > Regards
> > Anupam
>

Re: The Schedulars are not starting automatically

Posted by Karl Wright <da...@gmail.com>.
The incremental will work the same whether the job is run manually or
started automatically.

If you have added the appropriate schedule record to your job, you
also have to select the "run job automatically" radio button on one of
the other job tabs for automatic runs to take place.  I suspect that
is what you are missing.

Karl

On Thu, Aug 2, 2012 at 1:12 PM, Anupam Bhattacharya <an...@gmail.com> wrote:
> I have a Job which is indexing properly even the incremental indexing, if
> initiated/Run manually. Although even after adding a specific time to Run
> the schedular process the Jobs is not starting on its own.
>
> What is the ideal configuration to configure a Job which run automatically
> everyday at 12 am and does and incremental re-indexing (only look for those
> document which are new OR modified after the last crawl) of the repository ?
>
> Is it necessary to input/give the total run time details for adding a
> specific schedule time.
>
> Regards
> Anupam