You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@manifoldcf.apache.org by Jitu <ab...@gmail.com> on 2014/12/19 08:30:12 UTC

schedule information

Hi Karl,
            Thanks for all your support. For one of our customer they need
job scheduled information to be sent as part of output connector. Basically
my customer wants to know what all files are indexed in one job run using
solr search.

For example if my job ran on 17th dec 2014 at 11:23 AM then i will send a
unique string say "JobName 17-12-2014 11:23" as part of file metadata to
solr output connector. During solr search it will use this string to search
what all files are indexed as part of this string or job run.

Please correct me if i am wrong or suggest me how to achive it.

Thanks,
Jitu

Re: schedule information

Posted by Karl Wright <da...@gmail.com>.

Hi Jitu,

This is nothing like what I recommended for you to do.  I said to look in
WorkerThread.  Inside the ProcessActivity class, you will have access to
both the RepositoryDocument object and the IJobDescription object for that
job.

Karl


On Tue, Dec 23, 2014 at 7:31 AM, Jitu <ab...@gmail.com> wrote:

> Hi Karl,
>
> I checked the source code and in IncrementalIngester.java at line 555 of
> checkFetchDocument() method we are checking for forced metadata match of
> previous run and current run. if there is a change then file is considered
> updated. So Please advice on how to send a parameter to output connector
> from StartupThread class which changes for every job execution?
>
> Thanks,
> Jitu
>
> On Tue, Dec 23, 2014 at 5:32 PM, Jitu <ab...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Thanks for your support. Here is what i tried. In StartupThread.java
>> inside run method. i am trying to create one unique id called InstanceId
>> and store it as part of forcedMetaData which will be sent to
>> outputconnector. It all works fine. But when i re-run the same job again
>> and again all files are getting crawled again. Is this because forced
>> metadata is getting changed? is forced metadata used to check whether the
>> file is updated or not?
>>
>> code snippet:
>>
>>                   final String instanceId = IDFactory.make(threadContext);
>>                   // Only now record the fact that we are trying to start
>> the job.
>>
>> connectionMgr.recordHistory(jobDescription.getConnectionName(),
>>                     null,connectionMgr.ACTIVITY_JOBSTART,null,
>>
>> jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
>>                   jobDescription.clearForcedMetadata();
>>
>> jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID", instanceId);
>>                   jobManager.save(jobDescription);
>>
>>
>> Thanks,
>> Jitu
>>
>> On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> Your client's needs seem rather unusual, and will potentially be
>>> somewhat expensive performance-wise.  So unless I hear from others as well
>>> that this is a key feature, there's no point in contributing a patch.
>>>
>>> You will of course need to keep track of whatever changes you develop so
>>> that you can later upgrade to newer versions of MCF.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <ab...@gmail.com> wrote:
>>>
>>>> Hi Karl,
>>>>
>>>> Thanks for the quick reply and support. This is exactly what i was
>>>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>>>> submit a patch for the same?
>>>>
>>>> Thanks,
>>>> Jitu
>>>>
>>>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jitu,
>>>>>
>>>>> I'm sorry for the miscommunication.  What I meant is that without any
>>>>> modifications, you can add the job's name as metadata for all documents
>>>>> indexed with the job.
>>>>>
>>>>> If you need to index hard-wired metadata for every job run, you will
>>>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>>>> available there, but you will also need to write a SQL query to obtain the
>>>>> job's start time.
>>>>>
>>>>> Karl
>>>>>
>>>>>
>>>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <ab...@gmail.com> wrote:
>>>>>
>>>>>> Hi Karl,
>>>>>>           Thanks for the quick reply and support. i have gone through
>>>>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>>>>> document "
>>>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>>>>> It says we can add a string constant for every job run. but for my client
>>>>>> requirement he wants to know what all files crawled for every run of the
>>>>>> job. so to search that i need to a send unique id of every job run as part
>>>>>> of metadata. this unique id changes for every job run so i cannot use
>>>>>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>>>>>> current job's start time field as hard-wired metadata" Please let me know
>>>>>> how to achieve it.
>>>>>>
>>>>>> Thanks,
>>>>>> Jitu
>>>>>>
>>>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Jitu,
>>>>>>>
>>>>>>> You can certainly add a unique string associated with a job to every
>>>>>>> document using the Metadata Adjuster transformation connector (which of
>>>>>>> course can be the job name).  The time of indexing is already sent as a
>>>>>>> metadata field (can't remember which one off the top of my head, but I'm
>>>>>>> sure you can find it).  What you can't get, mainly because it basically has
>>>>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>>>>> possible to add the current job's start time field as hard-wired metadata,
>>>>>>> but I bet your client would prefer the actual time of indexing of the
>>>>>>> document anyhow.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Karl
>>>>>>>
>>>>>>>
>>>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>>>>>>>
>>>>>>>> Hi Karl,
>>>>>>>>             Thanks for all your support. For one of our customer
>>>>>>>> they need job scheduled information to be sent as part of output connector.
>>>>>>>> Basically my customer wants to know what all files are indexed in one job
>>>>>>>> run using solr search.
>>>>>>>>
>>>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will
>>>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of
>>>>>>>> file metadata to solr output connector. During solr search it will use this
>>>>>>>> string to search what all files are indexed as part of this string or job
>>>>>>>> run.
>>>>>>>>
>>>>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Jitu
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: schedule information

Posted by Jitu <ab...@gmail.com>.

Hi Karl,

I checked the source code and in IncrementalIngester.java at line 555 of
checkFetchDocument() method we are checking for forced metadata match of
previous run and current run. if there is a change then file is considered
updated. So Please advice on how to send a parameter to output connector
from StartupThread class which changes for every job execution?

Thanks,
Jitu

On Tue, Dec 23, 2014 at 5:32 PM, Jitu <ab...@gmail.com> wrote:

> Hi Karl,
>
> Thanks for your support. Here is what i tried. In StartupThread.java
> inside run method. i am trying to create one unique id called InstanceId
> and store it as part of forcedMetaData which will be sent to
> outputconnector. It all works fine. But when i re-run the same job again
> and again all files are getting crawled again. Is this because forced
> metadata is getting changed? is forced metadata used to check whether the
> file is updated or not?
>
> code snippet:
>
>                   final String instanceId = IDFactory.make(threadContext);
>                   // Only now record the fact that we are trying to start
> the job.
>
> connectionMgr.recordHistory(jobDescription.getConnectionName(),
>                     null,connectionMgr.ACTIVITY_JOBSTART,null,
>
> jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
>                   jobDescription.clearForcedMetadata();
>                   jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID",
> instanceId);
>                   jobManager.save(jobDescription);
>
>
> Thanks,
> Jitu
>
> On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Jitu,
>>
>> Your client's needs seem rather unusual, and will potentially be somewhat
>> expensive performance-wise.  So unless I hear from others as well that this
>> is a key feature, there's no point in contributing a patch.
>>
>> You will of course need to keep track of whatever changes you develop so
>> that you can later upgrade to newer versions of MCF.
>>
>> Thanks,
>> Karl
>>
>>
>> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <ab...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>
>>> Thanks for the quick reply and support. This is exactly what i was
>>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>>> submit a patch for the same?
>>>
>>> Thanks,
>>> Jitu
>>>
>>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Hi Jitu,
>>>>
>>>> I'm sorry for the miscommunication.  What I meant is that without any
>>>> modifications, you can add the job's name as metadata for all documents
>>>> indexed with the job.
>>>>
>>>> If you need to index hard-wired metadata for every job run, you will
>>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>>> available there, but you will also need to write a SQL query to obtain the
>>>> job's start time.
>>>>
>>>> Karl
>>>>
>>>>
>>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <ab...@gmail.com> wrote:
>>>>
>>>>> Hi Karl,
>>>>>           Thanks for the quick reply and support. i have gone through
>>>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>>>> document "
>>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>>>> It says we can add a string constant for every job run. but for my client
>>>>> requirement he wants to know what all files crawled for every run of the
>>>>> job. so to search that i need to a send unique id of every job run as part
>>>>> of metadata. this unique id changes for every job run so i cannot use
>>>>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>>>>> current job's start time field as hard-wired metadata" Please let me know
>>>>> how to achieve it.
>>>>>
>>>>> Thanks,
>>>>> Jitu
>>>>>
>>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi Jitu,
>>>>>>
>>>>>> You can certainly add a unique string associated with a job to every
>>>>>> document using the Metadata Adjuster transformation connector (which of
>>>>>> course can be the job name).  The time of indexing is already sent as a
>>>>>> metadata field (can't remember which one off the top of my head, but I'm
>>>>>> sure you can find it).  What you can't get, mainly because it basically has
>>>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>>>> possible to add the current job's start time field as hard-wired metadata,
>>>>>> but I bet your client would prefer the actual time of indexing of the
>>>>>> document anyhow.
>>>>>>
>>>>>> Thanks,
>>>>>> Karl
>>>>>>
>>>>>>
>>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>>>>>>
>>>>>>> Hi Karl,
>>>>>>>             Thanks for all your support. For one of our customer
>>>>>>> they need job scheduled information to be sent as part of output connector.
>>>>>>> Basically my customer wants to know what all files are indexed in one job
>>>>>>> run using solr search.
>>>>>>>
>>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will
>>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of file
>>>>>>> metadata to solr output connector. During solr search it will use this
>>>>>>> string to search what all files are indexed as part of this string or job
>>>>>>> run.
>>>>>>>
>>>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Jitu
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: schedule information

Posted by Jitu <ab...@gmail.com>.

Hi Karl,

Thanks for your support. Here is what i tried. In StartupThread.java inside
run method. i am trying to create one unique id called InstanceId and store
it as part of forcedMetaData which will be sent to outputconnector. It all
works fine. But when i re-run the same job again and again all files are
getting crawled again. Is this because forced metadata is getting changed?
is forced metadata used to check whether the file is updated or not?

code snippet:

                  final String instanceId = IDFactory.make(threadContext);
                  // Only now record the fact that we are trying to start
the job.

connectionMgr.recordHistory(jobDescription.getConnectionName(),
                    null,connectionMgr.ACTIVITY_JOBSTART,null,

jobID.toString()+"("+jobDescription.getDescription()+")",null,instanceId,null);
                  jobDescription.clearForcedMetadata();
                  jobDescription.addForcedMetadataValue("JOB_INSTANCE_ID",
instanceId);
                  jobManager.save(jobDescription);


Thanks,
Jitu

On Mon, Dec 22, 2014 at 6:58 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Jitu,
>
> Your client's needs seem rather unusual, and will potentially be somewhat
> expensive performance-wise.  So unless I hear from others as well that this
> is a key feature, there's no point in contributing a patch.
>
> You will of course need to keep track of whatever changes you develop so
> that you can later upgrade to newer versions of MCF.
>
> Thanks,
> Karl
>
>
> On Mon, Dec 22, 2014 at 8:14 AM, Jitu <ab...@gmail.com> wrote:
>
>> Hi Karl,
>>
>> Thanks for the quick reply and support. This is exactly what i was
>> looking for. Thank you so much. If i modify WorkerThread.java do i need to
>> submit a patch for the same?
>>
>> Thanks,
>> Jitu
>>
>> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> I'm sorry for the miscommunication.  What I meant is that without any
>>> modifications, you can add the job's name as metadata for all documents
>>> indexed with the job.
>>>
>>> If you need to index hard-wired metadata for every job run, you will
>>> need to modify WorkerThread.java.  The IJobDescription object is readily
>>> available there, but you will also need to write a SQL query to obtain the
>>> job's start time.
>>>
>>> Karl
>>>
>>>
>>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <ab...@gmail.com> wrote:
>>>
>>>> Hi Karl,
>>>>           Thanks for the quick reply and support. i have gone through
>>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>>> document "
>>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>>> It says we can add a string constant for every job run. but for my client
>>>> requirement he wants to know what all files crawled for every run of the
>>>> job. so to search that i need to a send unique id of every job run as part
>>>> of metadata. this unique id changes for every job run so i cannot use
>>>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>>>> current job's start time field as hard-wired metadata" Please let me know
>>>> how to achieve it.
>>>>
>>>> Thanks,
>>>> Jitu
>>>>
>>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi Jitu,
>>>>>
>>>>> You can certainly add a unique string associated with a job to every
>>>>> document using the Metadata Adjuster transformation connector (which of
>>>>> course can be the job name).  The time of indexing is already sent as a
>>>>> metadata field (can't remember which one off the top of my head, but I'm
>>>>> sure you can find it).  What you can't get, mainly because it basically has
>>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>>> possible to add the current job's start time field as hard-wired metadata,
>>>>> but I bet your client would prefer the actual time of indexing of the
>>>>> document anyhow.
>>>>>
>>>>> Thanks,
>>>>> Karl
>>>>>
>>>>>
>>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>>>>>
>>>>>> Hi Karl,
>>>>>>             Thanks for all your support. For one of our customer they
>>>>>> need job scheduled information to be sent as part of output connector.
>>>>>> Basically my customer wants to know what all files are indexed in one job
>>>>>> run using solr search.
>>>>>>
>>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will
>>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of file
>>>>>> metadata to solr output connector. During solr search it will use this
>>>>>> string to search what all files are indexed as part of this string or job
>>>>>> run.
>>>>>>
>>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>>
>>>>>> Thanks,
>>>>>> Jitu
>>>>>>
>>>>>
>>>>
>>>
>>
>

Re: schedule information

Posted by Karl Wright <da...@gmail.com>.

Hi Jitu,

Your client's needs seem rather unusual, and will potentially be somewhat
expensive performance-wise.  So unless I hear from others as well that this
is a key feature, there's no point in contributing a patch.

You will of course need to keep track of whatever changes you develop so
that you can later upgrade to newer versions of MCF.

Thanks,
Karl


On Mon, Dec 22, 2014 at 8:14 AM, Jitu <ab...@gmail.com> wrote:

> Hi Karl,
>
> Thanks for the quick reply and support. This is exactly what i was looking
> for. Thank you so much. If i modify WorkerThread.java do i need to submit a
> patch for the same?
>
> Thanks,
> Jitu
>
> On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Jitu,
>>
>> I'm sorry for the miscommunication.  What I meant is that without any
>> modifications, you can add the job's name as metadata for all documents
>> indexed with the job.
>>
>> If you need to index hard-wired metadata for every job run, you will need
>> to modify WorkerThread.java.  The IJobDescription object is readily
>> available there, but you will also need to write a SQL query to obtain the
>> job's start time.
>>
>> Karl
>>
>>
>> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <ab...@gmail.com> wrote:
>>
>>> Hi Karl,
>>>           Thanks for the quick reply and support. i have gone through
>>> the source code of "ForcedMetadataConnector.java" as well as  end user
>>> document "
>>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>>> It says we can add a string constant for every job run. but for my client
>>> requirement he wants to know what all files crawled for every run of the
>>> job. so to search that i need to a send unique id of every job run as part
>>> of metadata. this unique id changes for every job run so i cannot use
>>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>>> current job's start time field as hard-wired metadata" Please let me know
>>> how to achieve it.
>>>
>>> Thanks,
>>> Jitu
>>>
>>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com> wrote:
>>>
>>>> Hi Jitu,
>>>>
>>>> You can certainly add a unique string associated with a job to every
>>>> document using the Metadata Adjuster transformation connector (which of
>>>> course can be the job name).  The time of indexing is already sent as a
>>>> metadata field (can't remember which one off the top of my head, but I'm
>>>> sure you can find it).  What you can't get, mainly because it basically has
>>>> little meaning in MCF, is the time the job was started.  It's certainly
>>>> possible to add the current job's start time field as hard-wired metadata,
>>>> but I bet your client would prefer the actual time of indexing of the
>>>> document anyhow.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>
>>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>>>>
>>>>> Hi Karl,
>>>>>             Thanks for all your support. For one of our customer they
>>>>> need job scheduled information to be sent as part of output connector.
>>>>> Basically my customer wants to know what all files are indexed in one job
>>>>> run using solr search.
>>>>>
>>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will
>>>>> send a unique string say "JobName 17-12-2014 11:23" as part of file
>>>>> metadata to solr output connector. During solr search it will use this
>>>>> string to search what all files are indexed as part of this string or job
>>>>> run.
>>>>>
>>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>>
>>>>> Thanks,
>>>>> Jitu
>>>>>
>>>>
>>>
>>
>

Re: schedule information

Posted by Jitu <ab...@gmail.com>.

Hi Karl,

Thanks for the quick reply and support. This is exactly what i was looking
for. Thank you so much. If i modify WorkerThread.java do i need to submit a
patch for the same?

Thanks,
Jitu

On Mon, Dec 22, 2014 at 4:12 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Jitu,
>
> I'm sorry for the miscommunication.  What I meant is that without any
> modifications, you can add the job's name as metadata for all documents
> indexed with the job.
>
> If you need to index hard-wired metadata for every job run, you will need
> to modify WorkerThread.java.  The IJobDescription object is readily
> available there, but you will also need to write a SQL query to obtain the
> job's start time.
>
> Karl
>
>
> On Mon, Dec 22, 2014 at 4:33 AM, Jitu <ab...@gmail.com> wrote:
>
>> Hi Karl,
>>           Thanks for the quick reply and support. i have gone through the
>> source code of "ForcedMetadataConnector.java" as well as  end user document
>> "
>> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
>> It says we can add a string constant for every job run. but for my client
>> requirement he wants to know what all files crawled for every run of the
>> job. so to search that i need to a send unique id of every job run as part
>> of metadata. this unique id changes for every job run so i cannot use
>> ForcedMetadataConnector. you advised "It's certainly possible to add the
>> current job's start time field as hard-wired metadata" Please let me know
>> how to achieve it.
>>
>> Thanks,
>> Jitu
>>
>> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com> wrote:
>>
>>> Hi Jitu,
>>>
>>> You can certainly add a unique string associated with a job to every
>>> document using the Metadata Adjuster transformation connector (which of
>>> course can be the job name).  The time of indexing is already sent as a
>>> metadata field (can't remember which one off the top of my head, but I'm
>>> sure you can find it).  What you can't get, mainly because it basically has
>>> little meaning in MCF, is the time the job was started.  It's certainly
>>> possible to add the current job's start time field as hard-wired metadata,
>>> but I bet your client would prefer the actual time of indexing of the
>>> document anyhow.
>>>
>>> Thanks,
>>> Karl
>>>
>>>
>>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>>>
>>>> Hi Karl,
>>>>             Thanks for all your support. For one of our customer they
>>>> need job scheduled information to be sent as part of output connector.
>>>> Basically my customer wants to know what all files are indexed in one job
>>>> run using solr search.
>>>>
>>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will send
>>>> a unique string say "JobName 17-12-2014 11:23" as part of file
>>>> metadata to solr output connector. During solr search it will use this
>>>> string to search what all files are indexed as part of this string or job
>>>> run.
>>>>
>>>> Please correct me if i am wrong or suggest me how to achive it.
>>>>
>>>> Thanks,
>>>> Jitu
>>>>
>>>
>>
>

Re: schedule information

Posted by Karl Wright <da...@gmail.com>.

Hi Jitu,

I'm sorry for the miscommunication.  What I meant is that without any
modifications, you can add the job's name as metadata for all documents
indexed with the job.

If you need to index hard-wired metadata for every job run, you will need
to modify WorkerThread.java.  The IJobDescription object is readily
available there, but you will also need to write a SQL query to obtain the
job's start time.

Karl


On Mon, Dec 22, 2014 at 4:33 AM, Jitu <ab...@gmail.com> wrote:

> Hi Karl,
>           Thanks for the quick reply and support. i have gone through the
> source code of "ForcedMetadataConnector.java" as well as  end user document
> "
> http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
> It says we can add a string constant for every job run. but for my client
> requirement he wants to know what all files crawled for every run of the
> job. so to search that i need to a send unique id of every job run as part
> of metadata. this unique id changes for every job run so i cannot use
> ForcedMetadataConnector. you advised "It's certainly possible to add the
> current job's start time field as hard-wired metadata" Please let me know
> how to achieve it.
>
> Thanks,
> Jitu
>
> On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com> wrote:
>
>> Hi Jitu,
>>
>> You can certainly add a unique string associated with a job to every
>> document using the Metadata Adjuster transformation connector (which of
>> course can be the job name).  The time of indexing is already sent as a
>> metadata field (can't remember which one off the top of my head, but I'm
>> sure you can find it).  What you can't get, mainly because it basically has
>> little meaning in MCF, is the time the job was started.  It's certainly
>> possible to add the current job's start time field as hard-wired metadata,
>> but I bet your client would prefer the actual time of indexing of the
>> document anyhow.
>>
>> Thanks,
>> Karl
>>
>>
>> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>>
>>> Hi Karl,
>>>             Thanks for all your support. For one of our customer they
>>> need job scheduled information to be sent as part of output connector.
>>> Basically my customer wants to know what all files are indexed in one job
>>> run using solr search.
>>>
>>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will send
>>> a unique string say "JobName 17-12-2014 11:23" as part of file metadata
>>> to solr output connector. During solr search it will use this string to
>>> search what all files are indexed as part of this string or job run.
>>>
>>> Please correct me if i am wrong or suggest me how to achive it.
>>>
>>> Thanks,
>>> Jitu
>>>
>>
>

Re: schedule information

Posted by Jitu <ab...@gmail.com>.

Hi Karl,
          Thanks for the quick reply and support. i have gone through the
source code of "ForcedMetadataConnector.java" as well as  end user document
"
http://manifoldcf.apache.org/release/trunk/en_US/end-user-documentation.html#metadataadjuster".
It says we can add a string constant for every job run. but for my client
requirement he wants to know what all files crawled for every run of the
job. so to search that i need to a send unique id of every job run as part
of metadata. this unique id changes for every job run so i cannot use
ForcedMetadataConnector. you advised "It's certainly possible to add the
current job's start time field as hard-wired metadata" Please let me know
how to achieve it.

Thanks,
Jitu

On Fri, Dec 19, 2014 at 1:09 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Jitu,
>
> You can certainly add a unique string associated with a job to every
> document using the Metadata Adjuster transformation connector (which of
> course can be the job name).  The time of indexing is already sent as a
> metadata field (can't remember which one off the top of my head, but I'm
> sure you can find it).  What you can't get, mainly because it basically has
> little meaning in MCF, is the time the job was started.  It's certainly
> possible to add the current job's start time field as hard-wired metadata,
> but I bet your client would prefer the actual time of indexing of the
> document anyhow.
>
> Thanks,
> Karl
>
>
> On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>>
>> Hi Karl,
>>             Thanks for all your support. For one of our customer they
>> need job scheduled information to be sent as part of output connector.
>> Basically my customer wants to know what all files are indexed in one job
>> run using solr search.
>>
>> For example if my job ran on 17th dec 2014 at 11:23 AM then i will send a
>> unique string say "JobName 17-12-2014 11:23" as part of file metadata to
>> solr output connector. During solr search it will use this string to search
>> what all files are indexed as part of this string or job run.
>>
>> Please correct me if i am wrong or suggest me how to achive it.
>>
>> Thanks,
>> Jitu
>>
>

Re: schedule information

Posted by Karl Wright <da...@gmail.com>.

Hi Jitu,

You can certainly add a unique string associated with a job to every
document using the Metadata Adjuster transformation connector (which of
course can be the job name).  The time of indexing is already sent as a
metadata field (can't remember which one off the top of my head, but I'm
sure you can find it).  What you can't get, mainly because it basically has
little meaning in MCF, is the time the job was started.  It's certainly
possible to add the current job's start time field as hard-wired metadata,
but I bet your client would prefer the actual time of indexing of the
document anyhow.

Thanks,
Karl

On Fri, Dec 19, 2014 at 2:30 AM, Jitu <ab...@gmail.com> wrote:
>
> Hi Karl,
>             Thanks for all your support. For one of our customer they need
> job scheduled information to be sent as part of output connector. Basically
> my customer wants to know what all files are indexed in one job run using
> solr search.
>
> For example if my job ran on 17th dec 2014 at 11:23 AM then i will send a
> unique string say "JobName 17-12-2014 11:23" as part of file metadata to
> solr output connector. During solr search it will use this string to search
> what all files are indexed as part of this string or job run.
>
> Please correct me if i am wrong or suggest me how to achive it.
>
> Thanks,
> Jitu
>