You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@manifoldcf.apache.org by "Colreavy, Niall" <Ni...@fmr.com.INVALID> on 2015/09/17 17:07:35 UTC

Potential Issue with pausing jobs

Hi,

I am experimenting with pausing a job. The job has a simple JDBC connection and a null output connection. I was experimenting with pausing the job and I notice that when I resume the job, and monitor it's progress in the simple history report, the job never seems to run the data query any more. I can see that it runs the seed query but it doesn't progress to the data query. If I abort the job and restart it, it does seem to start running the data query again.

Can anyone explain this behaviour? And on a side note, what is the difference between pausing a job and aborting a job?

Thanks,

Niall

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

Aborting a job, or restarting it, is perfectly safe and will lose no data.
As I said before, the difference lies in the fact that pausing does not
disrupt the document fetching and seeding schedules, while aborting will
disrupt these, and make everything start over schedule-wise.

Karl


On Fri, Sep 18, 2015 at 5:31 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> Thanks for looking into that. In the interim, we are going to abort,
> rather than pause the job to circumvent the issue. Just out of curiosity,
> what is the difference between aborting the job and pausing the job? We
> would just be a little bit concerned that there would be adverse effects
> from regularly aborting the job.
>
> Thanks,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 5:53
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> I was able to reproduce this; CONNECTORS-1242.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <da...@gmail.com> wrote:
>
> > I'm interested in the time it is supposed to be processed, actually.
> >
> > I'm trying to recreate your example here to see if I can get more
> > information.
> >
> > Karl
> >
> >
> >
> > On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> >> The document is in a state of 'Processed' and the status is 'Ready for
> >> processing'
> >>
> >> -----Original Message-----
> >> From: Karl Wright [mailto:daddywri@gmail.com]
> >> Sent: 17 September 2015 5:28
> >> To: dev
> >> Subject: Re: Potential Issue with pausing jobs
> >>
> >> When it is in the state after the job has resumed, can you do a Document
> >> Status report and tell me what that says for your document?
> >>
> >> Thanks,
> >> Karl
> >>
> >>
> >> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
> >> Niall.Colreavy@fmr.com.invalid> wrote:
> >>
> >> > Hi Karl,
> >> >
> >> > Thanks for that. I think the problem might be more fundamental. When I
> >> > start my job and monitor the simple job history I can see the job
> doing
> >> > things like:
> >> >
> >> > Run the seed query
> >> > Run the data query
> >> > Run the seed query
> >> > Run the data query
> >> >
> >> > Etc.
> >> >
> >> > It continues to do this indefinitely from what I have observed. As
> soon
> >> as
> >> > I pause and resume the job, all I can see in the simple job history
> is:
> >> >
> >> > Run the seed query
> >> > Run the seed query
> >> > Run the seed query
> >> >
> >> > It's like it's never going to run the data query again?
> >> >
> >> > Kind Regards,
> >> >
> >> > Niall
> >> >
> >> > -----Original Message-----
> >> > From: Karl Wright [mailto:daddywri@gmail.com]
> >> > Sent: 17 September 2015 4:53
> >> > To: dev
> >> > Subject: Re: Potential Issue with pausing jobs
> >> >
> >> > Hi Niall,
> >> >
> >> > A continuous job reseeds on a schedule, which you set as part of the
> job
> >> > setup.  For a continuous job, if the document has been crawled, it
> will
> >> be
> >> > recrawled again at a specific time in the future, and if at that time
> it
> >> > hasn't changed, it will be scheduled for checking again even further
> >> out,
> >> > up to a certain limit (also settable within the job).
> >> >
> >> > You can look at the document's schedule, by the way, using the
> "Document
> >> > Status" report, and it should be pretty clear from that what should
> >> happen
> >> > and when.
> >> >
> >> > When you abort the job and restart it, everything is reset, so the
> >> document
> >> > will be checked immediately at that point, and relatively frequently
> >> for a
> >> > while until the system figures out that the document isn't changing
> very
> >> > rapidly.
> >> >
> >> > Thanks,
> >> > Karl
> >> >
> >> >
> >> >
> >> >
> >> >
> >> >
> >> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> >> > Niall.Colreavy@fmr.com.invalid> wrote:
> >> >
> >> > > Hi Karl,
> >> > >
> >> > > You'll have to forgive me if my answer is a bit uncertain but I am
> >> very
> >> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> >> > > connector, I am literally just selecting 1 for the id, 'myurl' for
> the
> >> > url
> >> > > and 'mydata' for the data. So there is only ever 1 document being
> >> > processed.
> >> > >
> >> > > So to answer the questions:
> >> > >
> >> > > 1. There are 0 active documents on the queue.
> >> > > 2. Single process
> >> > > 3. Yes, this is a continuous crawl.
> >> > >
> >> > > Kind Regards,
> >> > >
> >> > > Niall
> >> > >
> >> > > -----Original Message-----
> >> > > From: Karl Wright [mailto:daddywri@gmail.com]
> >> > > Sent: 17 September 2015 4:27
> >> > > To: dev
> >> > > Subject: Re: Potential Issue with pausing jobs
> >> > >
> >> > > Hi Niall,
> >> > >
> >> > > Pausing and resuming a job should have no effects *other* than
> >> > > reprioritization of the active documents on the queue, which if
> there
> >> > are a
> >> > > lot of them, may take some time.
> >> > >
> >> > > So let's ask some basic questions.  (1) How many active documents on
> >> your
> >> > > queue? (2) What kind of synchronization are you using?  Is this
> single
> >> > > process, or multiprocess?  (3) Is this a continuous crawl?
> >> > >
> >> > > >>>>>>
> >> > > And on a side note, what is the difference between pausing a job and
> >> > > aborting a job?
> >> > > <<<<<<
> >> > >
> >> > > I can't fully answer that unless I know the characteristics of your
> >> job,
> >> > > especially continuous crawl vs. crawl to completion.
> >> > >
> >> > > Karl
> >> > >
> >> > >
> >> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> >> > > Niall.Colreavy@fmr.com.invalid> wrote:
> >> > >
> >> > > > Hi,
> >> > > >
> >> > > > I am experimenting with pausing a job. The job has a simple JDBC
> >> > > > connection and a null output connection. I was experimenting with
> >> > pausing
> >> > > > the job and I notice that when I resume the job, and monitor it's
> >> > > progress
> >> > > > in the simple history report, the job never seems to run the data
> >> query
> >> > > any
> >> > > > more. I can see that it runs the seed query but it doesn't
> progress
> >> to
> >> > > the
> >> > > > data query. If I abort the job and restart it, it does seem to
> start
> >> > > > running the data query again.
> >> > > >
> >> > > > Can anyone explain this behaviour? And on a side note, what is the
> >> > > > difference between pausing a job and aborting a job?
> >> > > >
> >> > > > Thanks,
> >> > > >
> >> > > > Niall
> >> > > >
> >> > >
> >> >
> >>
> >
> >
>

RE: Potential Issue with pausing jobs

Posted by "Colreavy, Niall" <Ni...@fmr.com.INVALID>.

Hi Karl,

Thanks for looking into that. In the interim, we are going to abort, rather than pause the job to circumvent the issue. Just out of curiosity, what is the difference between aborting the job and pausing the job? We would just be a little bit concerned that there would be adverse effects from regularly aborting the job.

Thanks,

Niall

-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com] 
Sent: 17 September 2015 5:53
To: dev
Subject: Re: Potential Issue with pausing jobs

I was able to reproduce this; CONNECTORS-1242.

Karl


On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <da...@gmail.com> wrote:

> I'm interested in the time it is supposed to be processed, actually.
>
> I'm trying to recreate your example here to see if I can get more
> information.
>
> Karl
>
>
>
> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
>> The document is in a state of 'Processed' and the status is 'Ready for
>> processing'
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:daddywri@gmail.com]
>> Sent: 17 September 2015 5:28
>> To: dev
>> Subject: Re: Potential Issue with pausing jobs
>>
>> When it is in the state after the job has resumed, can you do a Document
>> Status report and tell me what that says for your document?
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
>> Niall.Colreavy@fmr.com.invalid> wrote:
>>
>> > Hi Karl,
>> >
>> > Thanks for that. I think the problem might be more fundamental. When I
>> > start my job and monitor the simple job history I can see the job doing
>> > things like:
>> >
>> > Run the seed query
>> > Run the data query
>> > Run the seed query
>> > Run the data query
>> >
>> > Etc.
>> >
>> > It continues to do this indefinitely from what I have observed. As soon
>> as
>> > I pause and resume the job, all I can see in the simple job history is:
>> >
>> > Run the seed query
>> > Run the seed query
>> > Run the seed query
>> >
>> > It's like it's never going to run the data query again?
>> >
>> > Kind Regards,
>> >
>> > Niall
>> >
>> > -----Original Message-----
>> > From: Karl Wright [mailto:daddywri@gmail.com]
>> > Sent: 17 September 2015 4:53
>> > To: dev
>> > Subject: Re: Potential Issue with pausing jobs
>> >
>> > Hi Niall,
>> >
>> > A continuous job reseeds on a schedule, which you set as part of the job
>> > setup.  For a continuous job, if the document has been crawled, it will
>> be
>> > recrawled again at a specific time in the future, and if at that time it
>> > hasn't changed, it will be scheduled for checking again even further
>> out,
>> > up to a certain limit (also settable within the job).
>> >
>> > You can look at the document's schedule, by the way, using the "Document
>> > Status" report, and it should be pretty clear from that what should
>> happen
>> > and when.
>> >
>> > When you abort the job and restart it, everything is reset, so the
>> document
>> > will be checked immediately at that point, and relatively frequently
>> for a
>> > while until the system figures out that the document isn't changing very
>> > rapidly.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
>> > Niall.Colreavy@fmr.com.invalid> wrote:
>> >
>> > > Hi Karl,
>> > >
>> > > You'll have to forgive me if my answer is a bit uncertain but I am
>> very
>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
>> > > connector, I am literally just selecting 1 for the id, 'myurl' for the
>> > url
>> > > and 'mydata' for the data. So there is only ever 1 document being
>> > processed.
>> > >
>> > > So to answer the questions:
>> > >
>> > > 1. There are 0 active documents on the queue.
>> > > 2. Single process
>> > > 3. Yes, this is a continuous crawl.
>> > >
>> > > Kind Regards,
>> > >
>> > > Niall
>> > >
>> > > -----Original Message-----
>> > > From: Karl Wright [mailto:daddywri@gmail.com]
>> > > Sent: 17 September 2015 4:27
>> > > To: dev
>> > > Subject: Re: Potential Issue with pausing jobs
>> > >
>> > > Hi Niall,
>> > >
>> > > Pausing and resuming a job should have no effects *other* than
>> > > reprioritization of the active documents on the queue, which if there
>> > are a
>> > > lot of them, may take some time.
>> > >
>> > > So let's ask some basic questions.  (1) How many active documents on
>> your
>> > > queue? (2) What kind of synchronization are you using?  Is this single
>> > > process, or multiprocess?  (3) Is this a continuous crawl?
>> > >
>> > > >>>>>>
>> > > And on a side note, what is the difference between pausing a job and
>> > > aborting a job?
>> > > <<<<<<
>> > >
>> > > I can't fully answer that unless I know the characteristics of your
>> job,
>> > > especially continuous crawl vs. crawl to completion.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
>> > > Niall.Colreavy@fmr.com.invalid> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I am experimenting with pausing a job. The job has a simple JDBC
>> > > > connection and a null output connection. I was experimenting with
>> > pausing
>> > > > the job and I notice that when I resume the job, and monitor it's
>> > > progress
>> > > > in the simple history report, the job never seems to run the data
>> query
>> > > any
>> > > > more. I can see that it runs the seed query but it doesn't progress
>> to
>> > > the
>> > > > data query. If I abort the job and restart it, it does seem to start
>> > > > running the data query again.
>> > > >
>> > > > Can anyone explain this behaviour? And on a side note, what is the
>> > > > difference between pausing a job and aborting a job?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Niall
>> > > >
>> > >
>> >
>>
>
>

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

I've attached a patch to the CONNECTORS-1242 ticket.

Karl


On Thu, Sep 17, 2015 at 12:52 PM, Karl Wright <da...@gmail.com> wrote:

> I was able to reproduce this; CONNECTORS-1242.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <da...@gmail.com> wrote:
>
>> I'm interested in the time it is supposed to be processed, actually.
>>
>> I'm trying to recreate your example here to see if I can get more
>> information.
>>
>> Karl
>>
>>
>>
>> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
>> Niall.Colreavy@fmr.com.invalid> wrote:
>>
>>> The document is in a state of 'Processed' and the status is 'Ready for
>>> processing'
>>>
>>> -----Original Message-----
>>> From: Karl Wright [mailto:daddywri@gmail.com]
>>> Sent: 17 September 2015 5:28
>>> To: dev
>>> Subject: Re: Potential Issue with pausing jobs
>>>
>>> When it is in the state after the job has resumed, can you do a Document
>>> Status report and tell me what that says for your document?
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
>>> Niall.Colreavy@fmr.com.invalid> wrote:
>>>
>>> > Hi Karl,
>>> >
>>> > Thanks for that. I think the problem might be more fundamental. When I
>>> > start my job and monitor the simple job history I can see the job doing
>>> > things like:
>>> >
>>> > Run the seed query
>>> > Run the data query
>>> > Run the seed query
>>> > Run the data query
>>> >
>>> > Etc.
>>> >
>>> > It continues to do this indefinitely from what I have observed. As
>>> soon as
>>> > I pause and resume the job, all I can see in the simple job history is:
>>> >
>>> > Run the seed query
>>> > Run the seed query
>>> > Run the seed query
>>> >
>>> > It's like it's never going to run the data query again?
>>> >
>>> > Kind Regards,
>>> >
>>> > Niall
>>> >
>>> > -----Original Message-----
>>> > From: Karl Wright [mailto:daddywri@gmail.com]
>>> > Sent: 17 September 2015 4:53
>>> > To: dev
>>> > Subject: Re: Potential Issue with pausing jobs
>>> >
>>> > Hi Niall,
>>> >
>>> > A continuous job reseeds on a schedule, which you set as part of the
>>> job
>>> > setup.  For a continuous job, if the document has been crawled, it
>>> will be
>>> > recrawled again at a specific time in the future, and if at that time
>>> it
>>> > hasn't changed, it will be scheduled for checking again even further
>>> out,
>>> > up to a certain limit (also settable within the job).
>>> >
>>> > You can look at the document's schedule, by the way, using the
>>> "Document
>>> > Status" report, and it should be pretty clear from that what should
>>> happen
>>> > and when.
>>> >
>>> > When you abort the job and restart it, everything is reset, so the
>>> document
>>> > will be checked immediately at that point, and relatively frequently
>>> for a
>>> > while until the system figures out that the document isn't changing
>>> very
>>> > rapidly.
>>> >
>>> > Thanks,
>>> > Karl
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
>>> > Niall.Colreavy@fmr.com.invalid> wrote:
>>> >
>>> > > Hi Karl,
>>> > >
>>> > > You'll have to forgive me if my answer is a bit uncertain but I am
>>> very
>>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
>>> > > connector, I am literally just selecting 1 for the id, 'myurl' for
>>> the
>>> > url
>>> > > and 'mydata' for the data. So there is only ever 1 document being
>>> > processed.
>>> > >
>>> > > So to answer the questions:
>>> > >
>>> > > 1. There are 0 active documents on the queue.
>>> > > 2. Single process
>>> > > 3. Yes, this is a continuous crawl.
>>> > >
>>> > > Kind Regards,
>>> > >
>>> > > Niall
>>> > >
>>> > > -----Original Message-----
>>> > > From: Karl Wright [mailto:daddywri@gmail.com]
>>> > > Sent: 17 September 2015 4:27
>>> > > To: dev
>>> > > Subject: Re: Potential Issue with pausing jobs
>>> > >
>>> > > Hi Niall,
>>> > >
>>> > > Pausing and resuming a job should have no effects *other* than
>>> > > reprioritization of the active documents on the queue, which if there
>>> > are a
>>> > > lot of them, may take some time.
>>> > >
>>> > > So let's ask some basic questions.  (1) How many active documents on
>>> your
>>> > > queue? (2) What kind of synchronization are you using?  Is this
>>> single
>>> > > process, or multiprocess?  (3) Is this a continuous crawl?
>>> > >
>>> > > >>>>>>
>>> > > And on a side note, what is the difference between pausing a job and
>>> > > aborting a job?
>>> > > <<<<<<
>>> > >
>>> > > I can't fully answer that unless I know the characteristics of your
>>> job,
>>> > > especially continuous crawl vs. crawl to completion.
>>> > >
>>> > > Karl
>>> > >
>>> > >
>>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
>>> > > Niall.Colreavy@fmr.com.invalid> wrote:
>>> > >
>>> > > > Hi,
>>> > > >
>>> > > > I am experimenting with pausing a job. The job has a simple JDBC
>>> > > > connection and a null output connection. I was experimenting with
>>> > pausing
>>> > > > the job and I notice that when I resume the job, and monitor it's
>>> > > progress
>>> > > > in the simple history report, the job never seems to run the data
>>> query
>>> > > any
>>> > > > more. I can see that it runs the seed query but it doesn't
>>> progress to
>>> > > the
>>> > > > data query. If I abort the job and restart it, it does seem to
>>> start
>>> > > > running the data query again.
>>> > > >
>>> > > > Can anyone explain this behaviour? And on a side note, what is the
>>> > > > difference between pausing a job and aborting a job?
>>> > > >
>>> > > > Thanks,
>>> > > >
>>> > > > Niall
>>> > > >
>>> > >
>>> >
>>>
>>
>>
>

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

I was able to reproduce this; CONNECTORS-1242.

Karl


On Thu, Sep 17, 2015 at 12:45 PM, Karl Wright <da...@gmail.com> wrote:

> I'm interested in the time it is supposed to be processed, actually.
>
> I'm trying to recreate your example here to see if I can get more
> information.
>
> Karl
>
>
>
> On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
>> The document is in a state of 'Processed' and the status is 'Ready for
>> processing'
>>
>> -----Original Message-----
>> From: Karl Wright [mailto:daddywri@gmail.com]
>> Sent: 17 September 2015 5:28
>> To: dev
>> Subject: Re: Potential Issue with pausing jobs
>>
>> When it is in the state after the job has resumed, can you do a Document
>> Status report and tell me what that says for your document?
>>
>> Thanks,
>> Karl
>>
>>
>> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
>> Niall.Colreavy@fmr.com.invalid> wrote:
>>
>> > Hi Karl,
>> >
>> > Thanks for that. I think the problem might be more fundamental. When I
>> > start my job and monitor the simple job history I can see the job doing
>> > things like:
>> >
>> > Run the seed query
>> > Run the data query
>> > Run the seed query
>> > Run the data query
>> >
>> > Etc.
>> >
>> > It continues to do this indefinitely from what I have observed. As soon
>> as
>> > I pause and resume the job, all I can see in the simple job history is:
>> >
>> > Run the seed query
>> > Run the seed query
>> > Run the seed query
>> >
>> > It's like it's never going to run the data query again?
>> >
>> > Kind Regards,
>> >
>> > Niall
>> >
>> > -----Original Message-----
>> > From: Karl Wright [mailto:daddywri@gmail.com]
>> > Sent: 17 September 2015 4:53
>> > To: dev
>> > Subject: Re: Potential Issue with pausing jobs
>> >
>> > Hi Niall,
>> >
>> > A continuous job reseeds on a schedule, which you set as part of the job
>> > setup.  For a continuous job, if the document has been crawled, it will
>> be
>> > recrawled again at a specific time in the future, and if at that time it
>> > hasn't changed, it will be scheduled for checking again even further
>> out,
>> > up to a certain limit (also settable within the job).
>> >
>> > You can look at the document's schedule, by the way, using the "Document
>> > Status" report, and it should be pretty clear from that what should
>> happen
>> > and when.
>> >
>> > When you abort the job and restart it, everything is reset, so the
>> document
>> > will be checked immediately at that point, and relatively frequently
>> for a
>> > while until the system figures out that the document isn't changing very
>> > rapidly.
>> >
>> > Thanks,
>> > Karl
>> >
>> >
>> >
>> >
>> >
>> >
>> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
>> > Niall.Colreavy@fmr.com.invalid> wrote:
>> >
>> > > Hi Karl,
>> > >
>> > > You'll have to forgive me if my answer is a bit uncertain but I am
>> very
>> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
>> > > connector, I am literally just selecting 1 for the id, 'myurl' for the
>> > url
>> > > and 'mydata' for the data. So there is only ever 1 document being
>> > processed.
>> > >
>> > > So to answer the questions:
>> > >
>> > > 1. There are 0 active documents on the queue.
>> > > 2. Single process
>> > > 3. Yes, this is a continuous crawl.
>> > >
>> > > Kind Regards,
>> > >
>> > > Niall
>> > >
>> > > -----Original Message-----
>> > > From: Karl Wright [mailto:daddywri@gmail.com]
>> > > Sent: 17 September 2015 4:27
>> > > To: dev
>> > > Subject: Re: Potential Issue with pausing jobs
>> > >
>> > > Hi Niall,
>> > >
>> > > Pausing and resuming a job should have no effects *other* than
>> > > reprioritization of the active documents on the queue, which if there
>> > are a
>> > > lot of them, may take some time.
>> > >
>> > > So let's ask some basic questions.  (1) How many active documents on
>> your
>> > > queue? (2) What kind of synchronization are you using?  Is this single
>> > > process, or multiprocess?  (3) Is this a continuous crawl?
>> > >
>> > > >>>>>>
>> > > And on a side note, what is the difference between pausing a job and
>> > > aborting a job?
>> > > <<<<<<
>> > >
>> > > I can't fully answer that unless I know the characteristics of your
>> job,
>> > > especially continuous crawl vs. crawl to completion.
>> > >
>> > > Karl
>> > >
>> > >
>> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
>> > > Niall.Colreavy@fmr.com.invalid> wrote:
>> > >
>> > > > Hi,
>> > > >
>> > > > I am experimenting with pausing a job. The job has a simple JDBC
>> > > > connection and a null output connection. I was experimenting with
>> > pausing
>> > > > the job and I notice that when I resume the job, and monitor it's
>> > > progress
>> > > > in the simple history report, the job never seems to run the data
>> query
>> > > any
>> > > > more. I can see that it runs the seed query but it doesn't progress
>> to
>> > > the
>> > > > data query. If I abort the job and restart it, it does seem to start
>> > > > running the data query again.
>> > > >
>> > > > Can anyone explain this behaviour? And on a side note, what is the
>> > > > difference between pausing a job and aborting a job?
>> > > >
>> > > > Thanks,
>> > > >
>> > > > Niall
>> > > >
>> > >
>> >
>>
>
>

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

I'm interested in the time it is supposed to be processed, actually.

I'm trying to recreate your example here to see if I can get more
information.

Karl



On Thu, Sep 17, 2015 at 12:36 PM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> The document is in a state of 'Processed' and the status is 'Ready for
> processing'
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 5:28
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> When it is in the state after the job has resumed, can you do a Document
> Status report and tell me what that says for your document?
>
> Thanks,
> Karl
>
>
> On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi Karl,
> >
> > Thanks for that. I think the problem might be more fundamental. When I
> > start my job and monitor the simple job history I can see the job doing
> > things like:
> >
> > Run the seed query
> > Run the data query
> > Run the seed query
> > Run the data query
> >
> > Etc.
> >
> > It continues to do this indefinitely from what I have observed. As soon
> as
> > I pause and resume the job, all I can see in the simple job history is:
> >
> > Run the seed query
> > Run the seed query
> > Run the seed query
> >
> > It's like it's never going to run the data query again?
> >
> > Kind Regards,
> >
> > Niall
> >
> > -----Original Message-----
> > From: Karl Wright [mailto:daddywri@gmail.com]
> > Sent: 17 September 2015 4:53
> > To: dev
> > Subject: Re: Potential Issue with pausing jobs
> >
> > Hi Niall,
> >
> > A continuous job reseeds on a schedule, which you set as part of the job
> > setup.  For a continuous job, if the document has been crawled, it will
> be
> > recrawled again at a specific time in the future, and if at that time it
> > hasn't changed, it will be scheduled for checking again even further out,
> > up to a certain limit (also settable within the job).
> >
> > You can look at the document's schedule, by the way, using the "Document
> > Status" report, and it should be pretty clear from that what should
> happen
> > and when.
> >
> > When you abort the job and restart it, everything is reset, so the
> document
> > will be checked immediately at that point, and relatively frequently for
> a
> > while until the system figures out that the document isn't changing very
> > rapidly.
> >
> > Thanks,
> > Karl
> >
> >
> >
> >
> >
> >
> > On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> > > Hi Karl,
> > >
> > > You'll have to forgive me if my answer is a bit uncertain but I am very
> > > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> > > connector, I am literally just selecting 1 for the id, 'myurl' for the
> > url
> > > and 'mydata' for the data. So there is only ever 1 document being
> > processed.
> > >
> > > So to answer the questions:
> > >
> > > 1. There are 0 active documents on the queue.
> > > 2. Single process
> > > 3. Yes, this is a continuous crawl.
> > >
> > > Kind Regards,
> > >
> > > Niall
> > >
> > > -----Original Message-----
> > > From: Karl Wright [mailto:daddywri@gmail.com]
> > > Sent: 17 September 2015 4:27
> > > To: dev
> > > Subject: Re: Potential Issue with pausing jobs
> > >
> > > Hi Niall,
> > >
> > > Pausing and resuming a job should have no effects *other* than
> > > reprioritization of the active documents on the queue, which if there
> > are a
> > > lot of them, may take some time.
> > >
> > > So let's ask some basic questions.  (1) How many active documents on
> your
> > > queue? (2) What kind of synchronization are you using?  Is this single
> > > process, or multiprocess?  (3) Is this a continuous crawl?
> > >
> > > >>>>>>
> > > And on a side note, what is the difference between pausing a job and
> > > aborting a job?
> > > <<<<<<
> > >
> > > I can't fully answer that unless I know the characteristics of your
> job,
> > > especially continuous crawl vs. crawl to completion.
> > >
> > > Karl
> > >
> > >
> > > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> > > Niall.Colreavy@fmr.com.invalid> wrote:
> > >
> > > > Hi,
> > > >
> > > > I am experimenting with pausing a job. The job has a simple JDBC
> > > > connection and a null output connection. I was experimenting with
> > pausing
> > > > the job and I notice that when I resume the job, and monitor it's
> > > progress
> > > > in the simple history report, the job never seems to run the data
> query
> > > any
> > > > more. I can see that it runs the seed query but it doesn't progress
> to
> > > the
> > > > data query. If I abort the job and restart it, it does seem to start
> > > > running the data query again.
> > > >
> > > > Can anyone explain this behaviour? And on a side note, what is the
> > > > difference between pausing a job and aborting a job?
> > > >
> > > > Thanks,
> > > >
> > > > Niall
> > > >
> > >
> >
>

RE: Potential Issue with pausing jobs

Posted by "Colreavy, Niall" <Ni...@fmr.com.INVALID>.

The document is in a state of 'Processed' and the status is 'Ready for processing'

-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com] 
Sent: 17 September 2015 5:28
To: dev
Subject: Re: Potential Issue with pausing jobs

When it is in the state after the job has resumed, can you do a Document
Status report and tell me what that says for your document?

Thanks,
Karl


On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> Thanks for that. I think the problem might be more fundamental. When I
> start my job and monitor the simple job history I can see the job doing
> things like:
>
> Run the seed query
> Run the data query
> Run the seed query
> Run the data query
>
> Etc.
>
> It continues to do this indefinitely from what I have observed. As soon as
> I pause and resume the job, all I can see in the simple job history is:
>
> Run the seed query
> Run the seed query
> Run the seed query
>
> It's like it's never going to run the data query again?
>
> Kind Regards,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 4:53
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> Hi Niall,
>
> A continuous job reseeds on a schedule, which you set as part of the job
> setup.  For a continuous job, if the document has been crawled, it will be
> recrawled again at a specific time in the future, and if at that time it
> hasn't changed, it will be scheduled for checking again even further out,
> up to a certain limit (also settable within the job).
>
> You can look at the document's schedule, by the way, using the "Document
> Status" report, and it should be pretty clear from that what should happen
> and when.
>
> When you abort the job and restart it, everything is reset, so the document
> will be checked immediately at that point, and relatively frequently for a
> while until the system figures out that the document isn't changing very
> rapidly.
>
> Thanks,
> Karl
>
>
>
>
>
>
> On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi Karl,
> >
> > You'll have to forgive me if my answer is a bit uncertain but I am very
> > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> > connector, I am literally just selecting 1 for the id, 'myurl' for the
> url
> > and 'mydata' for the data. So there is only ever 1 document being
> processed.
> >
> > So to answer the questions:
> >
> > 1. There are 0 active documents on the queue.
> > 2. Single process
> > 3. Yes, this is a continuous crawl.
> >
> > Kind Regards,
> >
> > Niall
> >
> > -----Original Message-----
> > From: Karl Wright [mailto:daddywri@gmail.com]
> > Sent: 17 September 2015 4:27
> > To: dev
> > Subject: Re: Potential Issue with pausing jobs
> >
> > Hi Niall,
> >
> > Pausing and resuming a job should have no effects *other* than
> > reprioritization of the active documents on the queue, which if there
> are a
> > lot of them, may take some time.
> >
> > So let's ask some basic questions.  (1) How many active documents on your
> > queue? (2) What kind of synchronization are you using?  Is this single
> > process, or multiprocess?  (3) Is this a continuous crawl?
> >
> > >>>>>>
> > And on a side note, what is the difference between pausing a job and
> > aborting a job?
> > <<<<<<
> >
> > I can't fully answer that unless I know the characteristics of your job,
> > especially continuous crawl vs. crawl to completion.
> >
> > Karl
> >
> >
> > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > I am experimenting with pausing a job. The job has a simple JDBC
> > > connection and a null output connection. I was experimenting with
> pausing
> > > the job and I notice that when I resume the job, and monitor it's
> > progress
> > > in the simple history report, the job never seems to run the data query
> > any
> > > more. I can see that it runs the seed query but it doesn't progress to
> > the
> > > data query. If I abort the job and restart it, it does seem to start
> > > running the data query again.
> > >
> > > Can anyone explain this behaviour? And on a side note, what is the
> > > difference between pausing a job and aborting a job?
> > >
> > > Thanks,
> > >
> > > Niall
> > >
> >
>

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

When it is in the state after the job has resumed, can you do a Document
Status report and tell me what that says for your document?

Thanks,
Karl


On Thu, Sep 17, 2015 at 12:16 PM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> Thanks for that. I think the problem might be more fundamental. When I
> start my job and monitor the simple job history I can see the job doing
> things like:
>
> Run the seed query
> Run the data query
> Run the seed query
> Run the data query
>
> Etc.
>
> It continues to do this indefinitely from what I have observed. As soon as
> I pause and resume the job, all I can see in the simple job history is:
>
> Run the seed query
> Run the seed query
> Run the seed query
>
> It's like it's never going to run the data query again?
>
> Kind Regards,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 4:53
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> Hi Niall,
>
> A continuous job reseeds on a schedule, which you set as part of the job
> setup.  For a continuous job, if the document has been crawled, it will be
> recrawled again at a specific time in the future, and if at that time it
> hasn't changed, it will be scheduled for checking again even further out,
> up to a certain limit (also settable within the job).
>
> You can look at the document's schedule, by the way, using the "Document
> Status" report, and it should be pretty clear from that what should happen
> and when.
>
> When you abort the job and restart it, everything is reset, so the document
> will be checked immediately at that point, and relatively frequently for a
> while until the system figures out that the document isn't changing very
> rapidly.
>
> Thanks,
> Karl
>
>
>
>
>
>
> On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi Karl,
> >
> > You'll have to forgive me if my answer is a bit uncertain but I am very
> > new to MCF. Just to clarify, I have a very simple job. For the JDBC
> > connector, I am literally just selecting 1 for the id, 'myurl' for the
> url
> > and 'mydata' for the data. So there is only ever 1 document being
> processed.
> >
> > So to answer the questions:
> >
> > 1. There are 0 active documents on the queue.
> > 2. Single process
> > 3. Yes, this is a continuous crawl.
> >
> > Kind Regards,
> >
> > Niall
> >
> > -----Original Message-----
> > From: Karl Wright [mailto:daddywri@gmail.com]
> > Sent: 17 September 2015 4:27
> > To: dev
> > Subject: Re: Potential Issue with pausing jobs
> >
> > Hi Niall,
> >
> > Pausing and resuming a job should have no effects *other* than
> > reprioritization of the active documents on the queue, which if there
> are a
> > lot of them, may take some time.
> >
> > So let's ask some basic questions.  (1) How many active documents on your
> > queue? (2) What kind of synchronization are you using?  Is this single
> > process, or multiprocess?  (3) Is this a continuous crawl?
> >
> > >>>>>>
> > And on a side note, what is the difference between pausing a job and
> > aborting a job?
> > <<<<<<
> >
> > I can't fully answer that unless I know the characteristics of your job,
> > especially continuous crawl vs. crawl to completion.
> >
> > Karl
> >
> >
> > On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> > Niall.Colreavy@fmr.com.invalid> wrote:
> >
> > > Hi,
> > >
> > > I am experimenting with pausing a job. The job has a simple JDBC
> > > connection and a null output connection. I was experimenting with
> pausing
> > > the job and I notice that when I resume the job, and monitor it's
> > progress
> > > in the simple history report, the job never seems to run the data query
> > any
> > > more. I can see that it runs the seed query but it doesn't progress to
> > the
> > > data query. If I abort the job and restart it, it does seem to start
> > > running the data query again.
> > >
> > > Can anyone explain this behaviour? And on a side note, what is the
> > > difference between pausing a job and aborting a job?
> > >
> > > Thanks,
> > >
> > > Niall
> > >
> >
>

RE: Potential Issue with pausing jobs

Posted by "Colreavy, Niall" <Ni...@fmr.com.INVALID>.

Hi Karl,

Thanks for that. I think the problem might be more fundamental. When I start my job and monitor the simple job history I can see the job doing things like:

Run the seed query
Run the data query
Run the seed query
Run the data query

Etc.

It continues to do this indefinitely from what I have observed. As soon as I pause and resume the job, all I can see in the simple job history is:

Run the seed query
Run the seed query
Run the seed query

It's like it's never going to run the data query again?

Kind Regards,

Niall

-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com] 
Sent: 17 September 2015 4:53
To: dev
Subject: Re: Potential Issue with pausing jobs

Hi Niall,

A continuous job reseeds on a schedule, which you set as part of the job
setup.  For a continuous job, if the document has been crawled, it will be
recrawled again at a specific time in the future, and if at that time it
hasn't changed, it will be scheduled for checking again even further out,
up to a certain limit (also settable within the job).

You can look at the document's schedule, by the way, using the "Document
Status" report, and it should be pretty clear from that what should happen
and when.

When you abort the job and restart it, everything is reset, so the document
will be checked immediately at that point, and relatively frequently for a
while until the system figures out that the document isn't changing very
rapidly.

Thanks,
Karl

On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> You'll have to forgive me if my answer is a bit uncertain but I am very
> new to MCF. Just to clarify, I have a very simple job. For the JDBC
> connector, I am literally just selecting 1 for the id, 'myurl' for the url
> and 'mydata' for the data. So there is only ever 1 document being processed.
>
> So to answer the questions:
>
> 1. There are 0 active documents on the queue.
> 2. Single process
> 3. Yes, this is a continuous crawl.
>
> Kind Regards,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 4:27
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> Hi Niall,
>
> Pausing and resuming a job should have no effects *other* than
> reprioritization of the active documents on the queue, which if there are a
> lot of them, may take some time.
>
> So let's ask some basic questions.  (1) How many active documents on your
> queue? (2) What kind of synchronization are you using?  Is this single
> process, or multiprocess?  (3) Is this a continuous crawl?
>
> >>>>>>
> And on a side note, what is the difference between pausing a job and
> aborting a job?
> <<<<<<
>
> I can't fully answer that unless I know the characteristics of your job,
> especially continuous crawl vs. crawl to completion.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi,
> >
> > I am experimenting with pausing a job. The job has a simple JDBC
> > connection and a null output connection. I was experimenting with pausing
> > the job and I notice that when I resume the job, and monitor it's
> progress
> > in the simple history report, the job never seems to run the data query
> any
> > more. I can see that it runs the seed query but it doesn't progress to
> the
> > data query. If I abort the job and restart it, it does seem to start
> > running the data query again.
> >
> > Can anyone explain this behaviour? And on a side note, what is the
> > difference between pausing a job and aborting a job?
> >
> > Thanks,
> >
> > Niall
> >
>

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

Hi Niall,

A continuous job reseeds on a schedule, which you set as part of the job
setup.  For a continuous job, if the document has been crawled, it will be
recrawled again at a specific time in the future, and if at that time it
hasn't changed, it will be scheduled for checking again even further out,
up to a certain limit (also settable within the job).

You can look at the document's schedule, by the way, using the "Document
Status" report, and it should be pretty clear from that what should happen
and when.

When you abort the job and restart it, everything is reset, so the document
will be checked immediately at that point, and relatively frequently for a
while until the system figures out that the document isn't changing very
rapidly.

Thanks,
Karl






On Thu, Sep 17, 2015 at 11:38 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi Karl,
>
> You'll have to forgive me if my answer is a bit uncertain but I am very
> new to MCF. Just to clarify, I have a very simple job. For the JDBC
> connector, I am literally just selecting 1 for the id, 'myurl' for the url
> and 'mydata' for the data. So there is only ever 1 document being processed.
>
> So to answer the questions:
>
> 1. There are 0 active documents on the queue.
> 2. Single process
> 3. Yes, this is a continuous crawl.
>
> Kind Regards,
>
> Niall
>
> -----Original Message-----
> From: Karl Wright [mailto:daddywri@gmail.com]
> Sent: 17 September 2015 4:27
> To: dev
> Subject: Re: Potential Issue with pausing jobs
>
> Hi Niall,
>
> Pausing and resuming a job should have no effects *other* than
> reprioritization of the active documents on the queue, which if there are a
> lot of them, may take some time.
>
> So let's ask some basic questions.  (1) How many active documents on your
> queue? (2) What kind of synchronization are you using?  Is this single
> process, or multiprocess?  (3) Is this a continuous crawl?
>
> >>>>>>
> And on a side note, what is the difference between pausing a job and
> aborting a job?
> <<<<<<
>
> I can't fully answer that unless I know the characteristics of your job,
> especially continuous crawl vs. crawl to completion.
>
> Karl
>
>
> On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
> Niall.Colreavy@fmr.com.invalid> wrote:
>
> > Hi,
> >
> > I am experimenting with pausing a job. The job has a simple JDBC
> > connection and a null output connection. I was experimenting with pausing
> > the job and I notice that when I resume the job, and monitor it's
> progress
> > in the simple history report, the job never seems to run the data query
> any
> > more. I can see that it runs the seed query but it doesn't progress to
> the
> > data query. If I abort the job and restart it, it does seem to start
> > running the data query again.
> >
> > Can anyone explain this behaviour? And on a side note, what is the
> > difference between pausing a job and aborting a job?
> >
> > Thanks,
> >
> > Niall
> >
>

RE: Potential Issue with pausing jobs

Posted by "Colreavy, Niall" <Ni...@fmr.com.INVALID>.

Hi Karl,

You'll have to forgive me if my answer is a bit uncertain but I am very new to MCF. Just to clarify, I have a very simple job. For the JDBC connector, I am literally just selecting 1 for the id, 'myurl' for the url and 'mydata' for the data. So there is only ever 1 document being processed. 

So to answer the questions:

1. There are 0 active documents on the queue.
2. Single process
3. Yes, this is a continuous crawl.

Kind Regards,

Niall

-----Original Message-----
From: Karl Wright [mailto:daddywri@gmail.com] 
Sent: 17 September 2015 4:27
To: dev
Subject: Re: Potential Issue with pausing jobs

Hi Niall,

Pausing and resuming a job should have no effects *other* than
reprioritization of the active documents on the queue, which if there are a
lot of them, may take some time.

So let's ask some basic questions.  (1) How many active documents on your
queue? (2) What kind of synchronization are you using?  Is this single
process, or multiprocess?  (3) Is this a continuous crawl?

>>>>>>
And on a side note, what is the difference between pausing a job and
aborting a job?
<<<<<<

I can't fully answer that unless I know the characteristics of your job,
especially continuous crawl vs. crawl to completion.

Karl

On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi,
>
> I am experimenting with pausing a job. The job has a simple JDBC
> connection and a null output connection. I was experimenting with pausing
> the job and I notice that when I resume the job, and monitor it's progress
> in the simple history report, the job never seems to run the data query any
> more. I can see that it runs the seed query but it doesn't progress to the
> data query. If I abort the job and restart it, it does seem to start
> running the data query again.
>
> Can anyone explain this behaviour? And on a side note, what is the
> difference between pausing a job and aborting a job?
>
> Thanks,
>
> Niall
>

Re: Potential Issue with pausing jobs

Posted by Karl Wright <da...@gmail.com>.

Hi Niall,

Pausing and resuming a job should have no effects *other* than
reprioritization of the active documents on the queue, which if there are a
lot of them, may take some time.

So let's ask some basic questions.  (1) How many active documents on your
queue? (2) What kind of synchronization are you using?  Is this single
process, or multiprocess?  (3) Is this a continuous crawl?

>>>>>>
And on a side note, what is the difference between pausing a job and
aborting a job?
<<<<<<

I can't fully answer that unless I know the characteristics of your job,
especially continuous crawl vs. crawl to completion.

Karl

On Thu, Sep 17, 2015 at 11:07 AM, Colreavy, Niall <
Niall.Colreavy@fmr.com.invalid> wrote:

> Hi,
>
> I am experimenting with pausing a job. The job has a simple JDBC
> connection and a null output connection. I was experimenting with pausing
> the job and I notice that when I resume the job, and monitor it's progress
> in the simple history report, the job never seems to run the data query any
> more. I can see that it runs the seed query but it doesn't progress to the
> data query. If I abort the job and restart it, it does seem to start
> running the data query again.
>
> Can anyone explain this behaviour? And on a side note, what is the
> difference between pausing a job and aborting a job?
>
> Thanks,
>
> Niall
>