You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Minoru Osuka <mi...@gmail.com> on 2013/05/29 14:18:49 UTC

How to get an OutputSpecification object in addOrReplaceDocument or removeDocument ?

Hi,

I would like to use an OutputSpecification object in addOrReplaceDocument or remoeDocument.
Please give me advice on how to get it.

Thanks,
Minoru


Minoru Osuka
minoru.osuka@gmail.com





Re: How to get an OutputSpecification object in addOrReplaceDocument or removeDocument ?

Posted by Minoru Osuka <mi...@gmail.com>.
Hi, Karl

I'll try it. 
Thank you for your help.

Thanks, 
Minoru



Minoru Osuka
minoru.osuka@gmail.com




On May 29, 2013, at 9:32 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Minoru,
> 
> The method you want to implement is:
> 
>  /** Get an output version string, given an output specification.  The
> output version string is used to uniquely describe the pertinent details of
>  * the output specification and the configuration, to allow the Connector
> Framework to determine whether a document will need to be output again.
>  * Note that the contents of the document cannot be considered by this
> method, and that a different version string (defined in
> IRepositoryConnector)
>  * is used to describe the version of the actual document.
>  *
>  * This method presumes that the connector object has been configured, and
> it is thus able to communicate with the output data store should that be
>  * necessary.
>  *@param spec is the current output specification for the job that is
> doing the crawling.
>  *@return a string, of unlimited length, which uniquely describes output
> configuration and specification in such a way that if two such strings are
> equal,
>  * the document will not need to be sent again to the output data store.
>  */
>  public String getOutputDescription(OutputSpecification spec)
>    throws ManifoldCFException, ServiceInterruption;
> 
> 
> In this method, you should pack into your string anything you will need to
> use in addOrReplaceDocument or removeDocument.  The reason it is done this
> way is because otherwise it is far too easy to inadvertantly make your
> output connector's function depend on the output specification, and not
> properly set the output description, which is how ManifoldCF keeps track of
> whether indexing needs to be redone if the output specification changes.
> 
> The base class (org.apache.manifoldcf.core.connector.BaseConnector) has
> methods in it which should assist you in this.
> 
> If you find that it is a lot more convenient to have both the string and
> the specification, like ProcessDocuments does in the IRepositoryConnector
> interface, I am open to adding that to the IOutputConnector interface.  But
> please also remember that the output specification may have changed since
> the output description was made, and that is a potential problem with
> having both.  (This is true mainly of the removeDocument method.)
> 
> Karl
> 
> 
> 
> On Wed, May 29, 2013 at 8:18 AM, Minoru Osuka <mi...@gmail.com>wrote:
> 
>> Hi,
>> 
>> I would like to use an OutputSpecification object in addOrReplaceDocument
>> or remoeDocument.
>> Please give me advice on how to get it.
>> 
>> Thanks,
>> Minoru
>> 
>> 
>> Minoru Osuka
>> minoru.osuka@gmail.com
>> 
>> 
>> 
>> 
>> 


Re: How to get an OutputSpecification object in addOrReplaceDocument or removeDocument ?

Posted by Karl Wright <da...@gmail.com>.
Hi Minoru,

The method you want to implement is:

  /** Get an output version string, given an output specification.  The
output version string is used to uniquely describe the pertinent details of
  * the output specification and the configuration, to allow the Connector
Framework to determine whether a document will need to be output again.
  * Note that the contents of the document cannot be considered by this
method, and that a different version string (defined in
IRepositoryConnector)
  * is used to describe the version of the actual document.
  *
  * This method presumes that the connector object has been configured, and
it is thus able to communicate with the output data store should that be
  * necessary.
  *@param spec is the current output specification for the job that is
doing the crawling.
  *@return a string, of unlimited length, which uniquely describes output
configuration and specification in such a way that if two such strings are
equal,
  * the document will not need to be sent again to the output data store.
  */
  public String getOutputDescription(OutputSpecification spec)
    throws ManifoldCFException, ServiceInterruption;


In this method, you should pack into your string anything you will need to
use in addOrReplaceDocument or removeDocument.  The reason it is done this
way is because otherwise it is far too easy to inadvertantly make your
output connector's function depend on the output specification, and not
properly set the output description, which is how ManifoldCF keeps track of
whether indexing needs to be redone if the output specification changes.

The base class (org.apache.manifoldcf.core.connector.BaseConnector) has
methods in it which should assist you in this.

If you find that it is a lot more convenient to have both the string and
the specification, like ProcessDocuments does in the IRepositoryConnector
interface, I am open to adding that to the IOutputConnector interface.  But
please also remember that the output specification may have changed since
the output description was made, and that is a potential problem with
having both.  (This is true mainly of the removeDocument method.)

Karl



On Wed, May 29, 2013 at 8:18 AM, Minoru Osuka <mi...@gmail.com>wrote:

> Hi,
>
> I would like to use an OutputSpecification object in addOrReplaceDocument
> or remoeDocument.
> Please give me advice on how to get it.
>
> Thanks,
> Minoru
>
>
> Minoru Osuka
> minoru.osuka@gmail.com
>
>
>
>
>