You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@manifoldcf.apache.org by Rafa Haro <rh...@apache.org> on 2014/07/01 19:04:51 UTC

Configuration Management at Transformation Connectors

Hi guys,

I'm trying to develop my first Transformation Connector. Before starting 
to code, I have tried to read first enough documentation and I have also 
studied the Tika extractor as transformation connector example. 
Currently, I'm just trying to implement an initial version of my 
connector, starting with something simple to later complicate the things 
a little bit. The first problem I'm facing is the configuration 
management, where I'm probably missing something. In my case, I need a 
fixed configuration while creating an instance of the connector and a 
extended configuration per job. Let's say that the connector 
configuration has to setup a service and the job configuration will 
define how the service should work for each job. With both 
configurations, I need to create an object which is expensive to 
instantiate. Here is where the doubts raise:

1. I would like to initialize the configuration object only once per job 
execution. Because the configuration is not supposed to be changed 
during a job execution, I would like to be able to take the 
configuration parameters from ConfigParams and from Specification 
objects and create a unique instance of my configuration object.

2. The getPipelineDescription method is quite confusing for me. In the 
Tika Extractor, this method is used to pack in a string the 
configuration of the Tika processor. Then this string is again unpacked 
in the addOrReplaceDocumentWithException method to read the 
documentation. My question is why?. As far as I understand, the 
configuration can't change during the job execution and according to the 
documentation "the contents of the document cannot be considered by this 
method, and that a different version string (defined in 
IRepositoryConnector) is used to describe the version of the actual 
document". So, if only configuration data can be used to create the 
output version string, probably this version string can be checked by 
the system before starting the job and not produced and checked per 
document because basically all the documents are going to produce the 
same exact output version string. Probably I'm missing something but, 
for example, looking at Tika Transformation connector seems to be pretty 
clear that there would be no difference between output version strings 
for all the documents because it is using only configuration data to 
create the string.

3.In the addOrReplaceDocumentWithException, why is the 
pipelineDescription passed by parameter instead of the connector 
Specification to ease the developer to access the configuration without 
marshalling and unmarshalling it?

4. Is there a way to reuse a single configuration object per job 
execution? In the Output processor connector, I used to initialize my 
custom stuff in the connect method (I'm not sure if this strategy is 
valid anyway), but for the Transformation connectors I'm not even sure 
if this method is called.

Thanks a lot for your help beforehand. Please note that the questions of 
course are not intended to be criticism. This mail is just a dump of 
doubts that probably will help me to better understand the workflows in 
manifold

Re: Configuration Management at Transformation Connectors

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

El 02/07/14 13:17, Karl Wright escribió:
> Hi Rafa,
>
> Without a concrete example, you won't get a concrete recommendation.  But
> in general please understand that the architecture is what it is, for very
> good reasons, and if you want your connector to work within it, you need to
> play by the rules.

Of course, I just wanted to be sure about the options. I appreciate a 
lot you help and your clarifications and I hope that also the uses cases 
of the developers integrating ManifoldCF can help the community to 
improve it by addressing (or not :->) the proposed issues. From my side, 
I'm going to further investigate how the framework works at code level 
to see if I can extend the Transformation connector workflow for 
allowing resources initialization from Connectors Configuration and 
Output Specificiation.

Thanks,
Rafa
>
> Thanks,
> Karl
>
>
>
> On Wed, Jul 2, 2014 at 6:44 AM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi Karl,
>>
>> El 02/07/14 12:37, Karl Wright escribió:
>>
>>> Hi Rafa,
>>>
>>> If there are a limited number of read-only models, then you can build a
>>> static hashmap between model data and the file name, and populate that on
>>> demand.
>>>
>> It is a possibility, although the models stuff was just an example. The
>> generic question is how to reuse resources in a Transformation Connector?
>> or, which is the proper place in the implementation to initialize a
>> resource that has to be reused during a job running?
>>
>> I'm still missing this because I still don't know where to access the
>> Specification object apart from the getPipelineDescription method. So, if
>> there is no other point in the architecture, I'm afraid I have to check if
>> the initialization has been done or not and even synchronize it by myself.
>>
>> Thanks for your help so far Karl
>>
>>


Re: Configuration Management at Transformation Connectors

Posted by Karl Wright <da...@gmail.com>.
Hi Rafa,

Without a concrete example, you won't get a concrete recommendation.  But
in general please understand that the architecture is what it is, for very
good reasons, and if you want your connector to work within it, you need to
play by the rules.

Thanks,
Karl



On Wed, Jul 2, 2014 at 6:44 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Karl,
>
> El 02/07/14 12:37, Karl Wright escribió:
>
>> Hi Rafa,
>>
>> If there are a limited number of read-only models, then you can build a
>> static hashmap between model data and the file name, and populate that on
>> demand.
>>
> It is a possibility, although the models stuff was just an example. The
> generic question is how to reuse resources in a Transformation Connector?
> or, which is the proper place in the implementation to initialize a
> resource that has to be reused during a job running?
>
> I'm still missing this because I still don't know where to access the
> Specification object apart from the getPipelineDescription method. So, if
> there is no other point in the architecture, I'm afraid I have to check if
> the initialization has been done or not and even synchronize it by myself.
>
> Thanks for your help so far Karl
>
>

Re: Configuration Management at Transformation Connectors

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

El 02/07/14 12:37, Karl Wright escribió:
> Hi Rafa,
>
> If there are a limited number of read-only models, then you can build a
> static hashmap between model data and the file name, and populate that on
> demand.
It is a possibility, although the models stuff was just an example. The 
generic question is how to reuse resources in a Transformation 
Connector? or, which is the proper place in the implementation to 
initialize a resource that has to be reused during a job running?

I'm still missing this because I still don't know where to access the 
Specification object apart from the getPipelineDescription method. So, 
if there is no other point in the architecture, I'm afraid I have to 
check if the initialization has been done or not and even synchronize it 
by myself.

Thanks for your help so far Karl


Re: Configuration Management at Transformation Connectors

Posted by Karl Wright <da...@gmail.com>.
Hi Rafa,

If there are a limited number of read-only models, then you can build a
static hashmap between model data and the file name, and populate that on
demand.  I would write a class (ModelRepository or some such) that
abstracted from the hashmap and loading process.  OR, better yet, you can
include the models as resources in your connector jar.  (It's better
because it's limited only to what you supply in advance, and so the memory
constraints are known in advance and cannot grow.)

Thanks,
Karl


On Wed, Jul 2, 2014 at 6:27 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Karl,
>
> El 02/07/14 12:13, Karl Wright escribió:
>
>> Hi Rafa,
>>
>> First, some more basics.
>>
>> - Configuration information is about "how" documents are indexed
>> - Specification information is about "what" documents or metadata is
>> indexed
>>
>> Configuration is specified as part of the connection definition;
>> specification is edited as part of the job.  Not all connectors have
>> configuration information; Tika doesn't, for instance.  Not all connectors
>> have specification information either.
>>
> Good. This is how I already understood it
>
>
>> The pooling is specific to Configuration information.  Connections which
>> have the same Configuration can be used in multiple jobs and have
>> different
>> Specification information.  That is why you really should not be creating
>> a
>> design where there's a large cost penalty in interpreting your connector's
>> Specification information; it's meant to be relatively light (and so far
>> it
>> has been, in all connectors I've ever heard of.)  Furthermore, and this is
>> really really key, the specification information MUST be digestable to a
>> simple string, because that is how ManifoldCF determines whether changes
>> have occurred or not from one run to another.  This, too, must be easy and
>> fast to do, or your connector will be quite slow.
>>
> The specification information can be very easily digestable, but in some
> cases can be very difficult to manage. I will put an example: Named Entity
> Recognition models are binary models that you need lo load in memory in
> order to use them for extracting entities from documents' content. In terms
> of configuration, the model is simple a path to a file, so a single string.
> Also, I could have different models for different jobs of course. But I
> want to apply the same model to all the documents processed in a single
> job. So, I just need to load in memory the model only once for each job
> execution. And my real problem right now is that I can't find any proper
> place in the design of the Transformation Connector to do such unique
> initialization, because the path to the model is hosted in the
> Specification object, but that object is only passed in the
> getPipelineDescription method which is called for every document processed.
> So, I can probably use the idiom I included in the last email (check if the
> model object is null every time in that method) but I was wondering if
> there is a better proper place to do it
>
> Thanks,
> Rafa
>
>
>> Thanks,
>> Karl
>>
>>
>>
>> On Wed, Jul 2, 2014 at 4:17 AM, Rafa Haro <rh...@apache.org> wrote:
>>
>>  Hi Karl,
>>>
>>> First of all, thanks for your answers. I will read the proposed chapters
>>> but, please, find inline further questions:
>>>
>>> El 01/07/14 19:21, Karl Wright escribió:
>>>
>>>   Hi Rafa,
>>>
>>>> Let me answer one question at a time.
>>>>
>>>> bq. I would like to initialize the configuration object only once per
>>>> job
>>>> execution. Because the configuration is not supposed to be changed
>>>> during
>>>> a
>>>> job execution, I would like to be able to take the configuration
>>>> parameters
>>>> from ConfigParams and from Specification objects and create a unique
>>>> instance of my configuration object.
>>>>
>>>> Connection instances are all pooled and reused.  You need to read about
>>>> their lifetime.  ManifoldCF in Action chapter 6 (IIRC) is where you will
>>>> find this: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/
>>>> You should also be aware that there is *no* prohibition on configuration
>>>> or
>>>> specification changing during a job run; the framework is structured,
>>>> however, so that you don't need to worry about this when writing your
>>>> connector.
>>>>
>>>>  I understand this Karl. And precisely because of the pooling, it is
>>> hard
>>> for me to believe that, during a job execution, the system is able to
>>> stop
>>> all the threads and to freeze the execution for initializing again all
>>> the
>>> connectors instances in the pool if the user changes the configuration.
>>> If
>>> this not actually happen, then for example in implementations like
>>> current
>>> Tika Extractor, the getPipelineDescription method will be returning
>>> always
>>> exactly the same output version for all the crawled documents in the
>>> current job. I understand the need to check the output version from job
>>> to
>>> job, but not per document in a single job.
>>>
>>> Also, there is something I'm completely missing: what is the output
>>> specification and which is the difference with connectors and jobs
>>> configuration?
>>>
>>>
>>>  bq. The getPipelineDescription method is quite confusing for me...
>>>>
>>>> Getting a version string and indexing a document may well be separated
>>>> in
>>>> time, and since it is possible for things to change in-between, the
>>>> version
>>>> string should be the basis of decisions your connector is making about
>>>> how
>>>> to do things.  The version string is what gets actually stored in the
>>>> DB,
>>>> so any differences will be picked up on later crawls.
>>>>
>>>> FWIW, the IRepositoryConnnector interface predates the decision to not
>>>> include a document specification for every method call, and that has
>>>> persisted for backwards compatibility reasons, although in MCF 2.0 that
>>>> may
>>>> change.  The current design enforces proper connector coding.
>>>>
>>>> bq. In the addOrReplaceDocumentWithExcept
>>>> ion, why is the pipelineDescription passed by parameter instead of the
>>>> connector Specification...?
>>>>
>>>> See answer above.
>>>>
>>>>
>>>> bq. Is there a way to reuse a single configuration object per job
>>>> execution? In the Output processor connector, I used to initialize my
>>>> custom stuff in the connect method (I'm not sure if this strategy is
>>>> valid
>>>> anyway), but for the Transformation connectors I'm not even sure if this
>>>> method is called.
>>>>
>>>> You really aren't supposed to have a *single* object, but rather one per
>>>> connection instance.  Connection instances are long-lived, remember.
>>>>  That
>>>> object should also expire eventually if there is no use.  There's a
>>>> particular design pattern you should try to adhere to, which is to have
>>>> a
>>>> getSession() method that sets up your long-lived member object, and have
>>>> the poll() method free it after a certain amount of inactivity.  Pretty
>>>> much all connectors these days use this pattern; for a modern
>>>> implementation, have a look at the Jira connector.
>>>>
>>>>  Yes yes, of course. There would be a configuration object bounded to
>>> each
>>> connector instance, of course. The problem I'm facing is that I want to
>>> create this object only one time (could be perfectly a member of the
>>> connector) and I can't find a proper way/place to do it because I need
>>> both
>>> the configuration of the connector (ConfigParams, which is always
>>> available, so that is fine) and the Specification object (which seems to
>>> contain the job configuration data), which as far as I know is only
>>> passed
>>> in the getPipelineDescriptionMethod. I would like not to do the
>>> initialization in that method because it is called for each processed
>>> documents and I would like to avoid typical hack like "if(customConfig ==
>>> null) customConfig = new CustomConfig(params, specification);"
>>>
>>>
>>>  FWIW, there's no MCF in Action chapter on transformation connectors yet,
>>>> but they are quite similar to output connectors in many respects, so
>>>> reading Chapter 9 may help a bit.
>>>>
>>>> Thanks,
>>>> Karl
>>>>
>>>>  Thanks to you Karl,
>>>
>>> Cheers,
>>> Rafa
>>>
>>>
>>>
>>>>
>>>> On Tue, Jul 1, 2014 at 1:04 PM, Rafa Haro <rh...@apache.org> wrote:
>>>>
>>>>   Hi guys,
>>>>
>>>>> I'm trying to develop my first Transformation Connector. Before
>>>>> starting
>>>>> to code, I have tried to read first enough documentation and I have
>>>>> also
>>>>> studied the Tika extractor as transformation connector example.
>>>>> Currently,
>>>>> I'm just trying to implement an initial version of my connector,
>>>>> starting
>>>>> with something simple to later complicate the things a little bit. The
>>>>> first problem I'm facing is the configuration management, where I'm
>>>>> probably missing something. In my case, I need a fixed configuration
>>>>> while
>>>>> creating an instance of the connector and a extended configuration per
>>>>> job.
>>>>> Let's say that the connector configuration has to setup a service and
>>>>> the
>>>>> job configuration will define how the service should work for each job.
>>>>> With both configurations, I need to create an object which is expensive
>>>>> to
>>>>> instantiate. Here is where the doubts raise:
>>>>>
>>>>> 1. I would like to initialize the configuration object only once per
>>>>> job
>>>>> execution. Because the configuration is not supposed to be changed
>>>>> during a
>>>>> job execution, I would like to be able to take the configuration
>>>>> parameters
>>>>> from ConfigParams and from Specification objects and create a unique
>>>>> instance of my configuration object.
>>>>>
>>>>> 2. The getPipelineDescription method is quite confusing for me. In the
>>>>> Tika Extractor, this method is used to pack in a string the
>>>>> configuration
>>>>> of the Tika processor. Then this string is again unpacked in the
>>>>> addOrReplaceDocumentWithException method to read the documentation. My
>>>>> question is why?. As far as I understand, the configuration can't
>>>>> change
>>>>> during the job execution and according to the documentation "the
>>>>> contents
>>>>> of the document cannot be considered by this method, and that a
>>>>> different
>>>>> version string (defined in IRepositoryConnector) is used to describe
>>>>> the
>>>>> version of the actual document". So, if only configuration data can be
>>>>> used
>>>>> to create the output version string, probably this version string can
>>>>> be
>>>>> checked by the system before starting the job and not produced and
>>>>> checked
>>>>> per document because basically all the documents are going to produce
>>>>> the
>>>>> same exact output version string. Probably I'm missing something but,
>>>>> for
>>>>> example, looking at Tika Transformation connector seems to be pretty
>>>>> clear
>>>>> that there would be no difference between output version strings for
>>>>> all
>>>>> the documents because it is using only configuration data to create the
>>>>> string.
>>>>>
>>>>> 3.In the addOrReplaceDocumentWithException, why is the
>>>>> pipelineDescription passed by parameter instead of the connector
>>>>> Specification to ease the developer to access the configuration without
>>>>> marshalling and unmarshalling it?
>>>>>
>>>>> 4. Is there a way to reuse a single configuration object per job
>>>>> execution? In the Output processor connector, I used to initialize my
>>>>> custom stuff in the connect method (I'm not sure if this strategy is
>>>>> valid
>>>>> anyway), but for the Transformation connectors I'm not even sure if
>>>>> this
>>>>> method is called.
>>>>>
>>>>> Thanks a lot for your help beforehand. Please note that the questions
>>>>> of
>>>>> course are not intended to be criticism. This mail is just a dump of
>>>>> doubts
>>>>> that probably will help me to better understand the workflows in
>>>>> manifold
>>>>>
>>>>>
>>>>>
>

Re: Configuration Management at Transformation Connectors

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

El 02/07/14 12:13, Karl Wright escribió:
> Hi Rafa,
>
> First, some more basics.
>
> - Configuration information is about "how" documents are indexed
> - Specification information is about "what" documents or metadata is indexed
>
> Configuration is specified as part of the connection definition;
> specification is edited as part of the job.  Not all connectors have
> configuration information; Tika doesn't, for instance.  Not all connectors
> have specification information either.
Good. This is how I already understood it
>
> The pooling is specific to Configuration information.  Connections which
> have the same Configuration can be used in multiple jobs and have different
> Specification information.  That is why you really should not be creating a
> design where there's a large cost penalty in interpreting your connector's
> Specification information; it's meant to be relatively light (and so far it
> has been, in all connectors I've ever heard of.)  Furthermore, and this is
> really really key, the specification information MUST be digestable to a
> simple string, because that is how ManifoldCF determines whether changes
> have occurred or not from one run to another.  This, too, must be easy and
> fast to do, or your connector will be quite slow.
The specification information can be very easily digestable, but in some 
cases can be very difficult to manage. I will put an example: Named 
Entity Recognition models are binary models that you need lo load in 
memory in order to use them for extracting entities from documents' 
content. In terms of configuration, the model is simple a path to a 
file, so a single string. Also, I could have different models for 
different jobs of course. But I want to apply the same model to all the 
documents processed in a single job. So, I just need to load in memory 
the model only once for each job execution. And my real problem right 
now is that I can't find any proper place in the design of the 
Transformation Connector to do such unique initialization, because the 
path to the model is hosted in the Specification object, but that object 
is only passed in the getPipelineDescription method which is called for 
every document processed. So, I can probably use the idiom I included in 
the last email (check if the model object is null every time in that 
method) but I was wondering if there is a better proper place to do it

Thanks,
Rafa
>
> Thanks,
> Karl
>
>
>
> On Wed, Jul 2, 2014 at 4:17 AM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi Karl,
>>
>> First of all, thanks for your answers. I will read the proposed chapters
>> but, please, find inline further questions:
>>
>> El 01/07/14 19:21, Karl Wright escribió:
>>
>>   Hi Rafa,
>>> Let me answer one question at a time.
>>>
>>> bq. I would like to initialize the configuration object only once per job
>>> execution. Because the configuration is not supposed to be changed during
>>> a
>>> job execution, I would like to be able to take the configuration
>>> parameters
>>> from ConfigParams and from Specification objects and create a unique
>>> instance of my configuration object.
>>>
>>> Connection instances are all pooled and reused.  You need to read about
>>> their lifetime.  ManifoldCF in Action chapter 6 (IIRC) is where you will
>>> find this: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/
>>> You should also be aware that there is *no* prohibition on configuration
>>> or
>>> specification changing during a job run; the framework is structured,
>>> however, so that you don't need to worry about this when writing your
>>> connector.
>>>
>> I understand this Karl. And precisely because of the pooling, it is hard
>> for me to believe that, during a job execution, the system is able to stop
>> all the threads and to freeze the execution for initializing again all the
>> connectors instances in the pool if the user changes the configuration. If
>> this not actually happen, then for example in implementations like current
>> Tika Extractor, the getPipelineDescription method will be returning always
>> exactly the same output version for all the crawled documents in the
>> current job. I understand the need to check the output version from job to
>> job, but not per document in a single job.
>>
>> Also, there is something I'm completely missing: what is the output
>> specification and which is the difference with connectors and jobs
>> configuration?
>>
>>
>>> bq. The getPipelineDescription method is quite confusing for me...
>>>
>>> Getting a version string and indexing a document may well be separated in
>>> time, and since it is possible for things to change in-between, the
>>> version
>>> string should be the basis of decisions your connector is making about how
>>> to do things.  The version string is what gets actually stored in the DB,
>>> so any differences will be picked up on later crawls.
>>>
>>> FWIW, the IRepositoryConnnector interface predates the decision to not
>>> include a document specification for every method call, and that has
>>> persisted for backwards compatibility reasons, although in MCF 2.0 that
>>> may
>>> change.  The current design enforces proper connector coding.
>>>
>>> bq. In the addOrReplaceDocumentWithExcept
>>> ion, why is the pipelineDescription passed by parameter instead of the
>>> connector Specification...?
>>>
>>> See answer above.
>>>
>>>
>>> bq. Is there a way to reuse a single configuration object per job
>>> execution? In the Output processor connector, I used to initialize my
>>> custom stuff in the connect method (I'm not sure if this strategy is valid
>>> anyway), but for the Transformation connectors I'm not even sure if this
>>> method is called.
>>>
>>> You really aren't supposed to have a *single* object, but rather one per
>>> connection instance.  Connection instances are long-lived, remember.  That
>>> object should also expire eventually if there is no use.  There's a
>>> particular design pattern you should try to adhere to, which is to have a
>>> getSession() method that sets up your long-lived member object, and have
>>> the poll() method free it after a certain amount of inactivity.  Pretty
>>> much all connectors these days use this pattern; for a modern
>>> implementation, have a look at the Jira connector.
>>>
>> Yes yes, of course. There would be a configuration object bounded to each
>> connector instance, of course. The problem I'm facing is that I want to
>> create this object only one time (could be perfectly a member of the
>> connector) and I can't find a proper way/place to do it because I need both
>> the configuration of the connector (ConfigParams, which is always
>> available, so that is fine) and the Specification object (which seems to
>> contain the job configuration data), which as far as I know is only passed
>> in the getPipelineDescriptionMethod. I would like not to do the
>> initialization in that method because it is called for each processed
>> documents and I would like to avoid typical hack like "if(customConfig ==
>> null) customConfig = new CustomConfig(params, specification);"
>>
>>
>>> FWIW, there's no MCF in Action chapter on transformation connectors yet,
>>> but they are quite similar to output connectors in many respects, so
>>> reading Chapter 9 may help a bit.
>>>
>>> Thanks,
>>> Karl
>>>
>> Thanks to you Karl,
>>
>> Cheers,
>> Rafa
>>
>>
>>>
>>>
>>> On Tue, Jul 1, 2014 at 1:04 PM, Rafa Haro <rh...@apache.org> wrote:
>>>
>>>   Hi guys,
>>>> I'm trying to develop my first Transformation Connector. Before starting
>>>> to code, I have tried to read first enough documentation and I have also
>>>> studied the Tika extractor as transformation connector example.
>>>> Currently,
>>>> I'm just trying to implement an initial version of my connector, starting
>>>> with something simple to later complicate the things a little bit. The
>>>> first problem I'm facing is the configuration management, where I'm
>>>> probably missing something. In my case, I need a fixed configuration
>>>> while
>>>> creating an instance of the connector and a extended configuration per
>>>> job.
>>>> Let's say that the connector configuration has to setup a service and the
>>>> job configuration will define how the service should work for each job.
>>>> With both configurations, I need to create an object which is expensive
>>>> to
>>>> instantiate. Here is where the doubts raise:
>>>>
>>>> 1. I would like to initialize the configuration object only once per job
>>>> execution. Because the configuration is not supposed to be changed
>>>> during a
>>>> job execution, I would like to be able to take the configuration
>>>> parameters
>>>> from ConfigParams and from Specification objects and create a unique
>>>> instance of my configuration object.
>>>>
>>>> 2. The getPipelineDescription method is quite confusing for me. In the
>>>> Tika Extractor, this method is used to pack in a string the configuration
>>>> of the Tika processor. Then this string is again unpacked in the
>>>> addOrReplaceDocumentWithException method to read the documentation. My
>>>> question is why?. As far as I understand, the configuration can't change
>>>> during the job execution and according to the documentation "the contents
>>>> of the document cannot be considered by this method, and that a different
>>>> version string (defined in IRepositoryConnector) is used to describe the
>>>> version of the actual document". So, if only configuration data can be
>>>> used
>>>> to create the output version string, probably this version string can be
>>>> checked by the system before starting the job and not produced and
>>>> checked
>>>> per document because basically all the documents are going to produce the
>>>> same exact output version string. Probably I'm missing something but, for
>>>> example, looking at Tika Transformation connector seems to be pretty
>>>> clear
>>>> that there would be no difference between output version strings for all
>>>> the documents because it is using only configuration data to create the
>>>> string.
>>>>
>>>> 3.In the addOrReplaceDocumentWithException, why is the
>>>> pipelineDescription passed by parameter instead of the connector
>>>> Specification to ease the developer to access the configuration without
>>>> marshalling and unmarshalling it?
>>>>
>>>> 4. Is there a way to reuse a single configuration object per job
>>>> execution? In the Output processor connector, I used to initialize my
>>>> custom stuff in the connect method (I'm not sure if this strategy is
>>>> valid
>>>> anyway), but for the Transformation connectors I'm not even sure if this
>>>> method is called.
>>>>
>>>> Thanks a lot for your help beforehand. Please note that the questions of
>>>> course are not intended to be criticism. This mail is just a dump of
>>>> doubts
>>>> that probably will help me to better understand the workflows in manifold
>>>>
>>>>


Re: Configuration Management at Transformation Connectors

Posted by Karl Wright <da...@gmail.com>.
Hi Rafa,

First, some more basics.

- Configuration information is about "how" documents are indexed
- Specification information is about "what" documents or metadata is indexed

Configuration is specified as part of the connection definition;
specification is edited as part of the job.  Not all connectors have
configuration information; Tika doesn't, for instance.  Not all connectors
have specification information either.

The pooling is specific to Configuration information.  Connections which
have the same Configuration can be used in multiple jobs and have different
Specification information.  That is why you really should not be creating a
design where there's a large cost penalty in interpreting your connector's
Specification information; it's meant to be relatively light (and so far it
has been, in all connectors I've ever heard of.)  Furthermore, and this is
really really key, the specification information MUST be digestable to a
simple string, because that is how ManifoldCF determines whether changes
have occurred or not from one run to another.  This, too, must be easy and
fast to do, or your connector will be quite slow.

Thanks,
Karl



On Wed, Jul 2, 2014 at 4:17 AM, Rafa Haro <rh...@apache.org> wrote:

> Hi Karl,
>
> First of all, thanks for your answers. I will read the proposed chapters
> but, please, find inline further questions:
>
> El 01/07/14 19:21, Karl Wright escribió:
>
>  Hi Rafa,
>>
>> Let me answer one question at a time.
>>
>> bq. I would like to initialize the configuration object only once per job
>> execution. Because the configuration is not supposed to be changed during
>> a
>> job execution, I would like to be able to take the configuration
>> parameters
>> from ConfigParams and from Specification objects and create a unique
>> instance of my configuration object.
>>
>> Connection instances are all pooled and reused.  You need to read about
>> their lifetime.  ManifoldCF in Action chapter 6 (IIRC) is where you will
>> find this: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/
>> You should also be aware that there is *no* prohibition on configuration
>> or
>> specification changing during a job run; the framework is structured,
>> however, so that you don't need to worry about this when writing your
>> connector.
>>
> I understand this Karl. And precisely because of the pooling, it is hard
> for me to believe that, during a job execution, the system is able to stop
> all the threads and to freeze the execution for initializing again all the
> connectors instances in the pool if the user changes the configuration. If
> this not actually happen, then for example in implementations like current
> Tika Extractor, the getPipelineDescription method will be returning always
> exactly the same output version for all the crawled documents in the
> current job. I understand the need to check the output version from job to
> job, but not per document in a single job.
>
> Also, there is something I'm completely missing: what is the output
> specification and which is the difference with connectors and jobs
> configuration?
>
>
>>
>> bq. The getPipelineDescription method is quite confusing for me...
>>
>> Getting a version string and indexing a document may well be separated in
>> time, and since it is possible for things to change in-between, the
>> version
>> string should be the basis of decisions your connector is making about how
>> to do things.  The version string is what gets actually stored in the DB,
>> so any differences will be picked up on later crawls.
>>
>> FWIW, the IRepositoryConnnector interface predates the decision to not
>> include a document specification for every method call, and that has
>> persisted for backwards compatibility reasons, although in MCF 2.0 that
>> may
>> change.  The current design enforces proper connector coding.
>>
>> bq. In the addOrReplaceDocumentWithExcept
>> ion, why is the pipelineDescription passed by parameter instead of the
>> connector Specification...?
>>
>> See answer above.
>>
>>
>> bq. Is there a way to reuse a single configuration object per job
>> execution? In the Output processor connector, I used to initialize my
>> custom stuff in the connect method (I'm not sure if this strategy is valid
>> anyway), but for the Transformation connectors I'm not even sure if this
>> method is called.
>>
>> You really aren't supposed to have a *single* object, but rather one per
>> connection instance.  Connection instances are long-lived, remember.  That
>> object should also expire eventually if there is no use.  There's a
>> particular design pattern you should try to adhere to, which is to have a
>> getSession() method that sets up your long-lived member object, and have
>> the poll() method free it after a certain amount of inactivity.  Pretty
>> much all connectors these days use this pattern; for a modern
>> implementation, have a look at the Jira connector.
>>
> Yes yes, of course. There would be a configuration object bounded to each
> connector instance, of course. The problem I'm facing is that I want to
> create this object only one time (could be perfectly a member of the
> connector) and I can't find a proper way/place to do it because I need both
> the configuration of the connector (ConfigParams, which is always
> available, so that is fine) and the Specification object (which seems to
> contain the job configuration data), which as far as I know is only passed
> in the getPipelineDescriptionMethod. I would like not to do the
> initialization in that method because it is called for each processed
> documents and I would like to avoid typical hack like "if(customConfig ==
> null) customConfig = new CustomConfig(params, specification);"
>
>
>>
>> FWIW, there's no MCF in Action chapter on transformation connectors yet,
>> but they are quite similar to output connectors in many respects, so
>> reading Chapter 9 may help a bit.
>>
>> Thanks,
>> Karl
>>
> Thanks to you Karl,
>
> Cheers,
> Rafa
>
>
>>
>>
>>
>> On Tue, Jul 1, 2014 at 1:04 PM, Rafa Haro <rh...@apache.org> wrote:
>>
>>  Hi guys,
>>>
>>> I'm trying to develop my first Transformation Connector. Before starting
>>> to code, I have tried to read first enough documentation and I have also
>>> studied the Tika extractor as transformation connector example.
>>> Currently,
>>> I'm just trying to implement an initial version of my connector, starting
>>> with something simple to later complicate the things a little bit. The
>>> first problem I'm facing is the configuration management, where I'm
>>> probably missing something. In my case, I need a fixed configuration
>>> while
>>> creating an instance of the connector and a extended configuration per
>>> job.
>>> Let's say that the connector configuration has to setup a service and the
>>> job configuration will define how the service should work for each job.
>>> With both configurations, I need to create an object which is expensive
>>> to
>>> instantiate. Here is where the doubts raise:
>>>
>>> 1. I would like to initialize the configuration object only once per job
>>> execution. Because the configuration is not supposed to be changed
>>> during a
>>> job execution, I would like to be able to take the configuration
>>> parameters
>>> from ConfigParams and from Specification objects and create a unique
>>> instance of my configuration object.
>>>
>>> 2. The getPipelineDescription method is quite confusing for me. In the
>>> Tika Extractor, this method is used to pack in a string the configuration
>>> of the Tika processor. Then this string is again unpacked in the
>>> addOrReplaceDocumentWithException method to read the documentation. My
>>> question is why?. As far as I understand, the configuration can't change
>>> during the job execution and according to the documentation "the contents
>>> of the document cannot be considered by this method, and that a different
>>> version string (defined in IRepositoryConnector) is used to describe the
>>> version of the actual document". So, if only configuration data can be
>>> used
>>> to create the output version string, probably this version string can be
>>> checked by the system before starting the job and not produced and
>>> checked
>>> per document because basically all the documents are going to produce the
>>> same exact output version string. Probably I'm missing something but, for
>>> example, looking at Tika Transformation connector seems to be pretty
>>> clear
>>> that there would be no difference between output version strings for all
>>> the documents because it is using only configuration data to create the
>>> string.
>>>
>>> 3.In the addOrReplaceDocumentWithException, why is the
>>> pipelineDescription passed by parameter instead of the connector
>>> Specification to ease the developer to access the configuration without
>>> marshalling and unmarshalling it?
>>>
>>> 4. Is there a way to reuse a single configuration object per job
>>> execution? In the Output processor connector, I used to initialize my
>>> custom stuff in the connect method (I'm not sure if this strategy is
>>> valid
>>> anyway), but for the Transformation connectors I'm not even sure if this
>>> method is called.
>>>
>>> Thanks a lot for your help beforehand. Please note that the questions of
>>> course are not intended to be criticism. This mail is just a dump of
>>> doubts
>>> that probably will help me to better understand the workflows in manifold
>>>
>>>
>

Re: Configuration Management at Transformation Connectors

Posted by Rafa Haro <rh...@apache.org>.
Hi Karl,

First of all, thanks for your answers. I will read the proposed chapters 
but, please, find inline further questions:

El 01/07/14 19:21, Karl Wright escribió:
> Hi Rafa,
>
> Let me answer one question at a time.
>
> bq. I would like to initialize the configuration object only once per job
> execution. Because the configuration is not supposed to be changed during a
> job execution, I would like to be able to take the configuration parameters
> from ConfigParams and from Specification objects and create a unique
> instance of my configuration object.
>
> Connection instances are all pooled and reused.  You need to read about
> their lifetime.  ManifoldCF in Action chapter 6 (IIRC) is where you will
> find this: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/
> You should also be aware that there is *no* prohibition on configuration or
> specification changing during a job run; the framework is structured,
> however, so that you don't need to worry about this when writing your
> connector.
I understand this Karl. And precisely because of the pooling, it is hard 
for me to believe that, during a job execution, the system is able to 
stop all the threads and to freeze the execution for initializing again 
all the connectors instances in the pool if the user changes the 
configuration. If this not actually happen, then for example in 
implementations like current Tika Extractor, the getPipelineDescription 
method will be returning always exactly the same output version for all 
the crawled documents in the current job. I understand the need to check 
the output version from job to job, but not per document in a single job.

Also, there is something I'm completely missing: what is the output 
specification and which is the difference with connectors and jobs 
configuration?
>
>
> bq. The getPipelineDescription method is quite confusing for me...
>
> Getting a version string and indexing a document may well be separated in
> time, and since it is possible for things to change in-between, the version
> string should be the basis of decisions your connector is making about how
> to do things.  The version string is what gets actually stored in the DB,
> so any differences will be picked up on later crawls.
>
> FWIW, the IRepositoryConnnector interface predates the decision to not
> include a document specification for every method call, and that has
> persisted for backwards compatibility reasons, although in MCF 2.0 that may
> change.  The current design enforces proper connector coding.
>
> bq. In the addOrReplaceDocumentWithExcept
> ion, why is the pipelineDescription passed by parameter instead of the
> connector Specification...?
>
> See answer above.
>
>
> bq. Is there a way to reuse a single configuration object per job
> execution? In the Output processor connector, I used to initialize my
> custom stuff in the connect method (I'm not sure if this strategy is valid
> anyway), but for the Transformation connectors I'm not even sure if this
> method is called.
>
> You really aren't supposed to have a *single* object, but rather one per
> connection instance.  Connection instances are long-lived, remember.  That
> object should also expire eventually if there is no use.  There's a
> particular design pattern you should try to adhere to, which is to have a
> getSession() method that sets up your long-lived member object, and have
> the poll() method free it after a certain amount of inactivity.  Pretty
> much all connectors these days use this pattern; for a modern
> implementation, have a look at the Jira connector.
Yes yes, of course. There would be a configuration object bounded to 
each connector instance, of course. The problem I'm facing is that I 
want to create this object only one time (could be perfectly a member of 
the connector) and I can't find a proper way/place to do it because I 
need both the configuration of the connector (ConfigParams, which is 
always available, so that is fine) and the Specification object (which 
seems to contain the job configuration data), which as far as I know is 
only passed in the getPipelineDescriptionMethod. I would like not to do 
the initialization in that method because it is called for each 
processed documents and I would like to avoid typical hack like 
"if(customConfig == null) customConfig = new CustomConfig(params, 
specification);"
>
>
> FWIW, there's no MCF in Action chapter on transformation connectors yet,
> but they are quite similar to output connectors in many respects, so
> reading Chapter 9 may help a bit.
>
> Thanks,
> Karl
Thanks to you Karl,

Cheers,
Rafa
>
>
>
>
> On Tue, Jul 1, 2014 at 1:04 PM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi guys,
>>
>> I'm trying to develop my first Transformation Connector. Before starting
>> to code, I have tried to read first enough documentation and I have also
>> studied the Tika extractor as transformation connector example. Currently,
>> I'm just trying to implement an initial version of my connector, starting
>> with something simple to later complicate the things a little bit. The
>> first problem I'm facing is the configuration management, where I'm
>> probably missing something. In my case, I need a fixed configuration while
>> creating an instance of the connector and a extended configuration per job.
>> Let's say that the connector configuration has to setup a service and the
>> job configuration will define how the service should work for each job.
>> With both configurations, I need to create an object which is expensive to
>> instantiate. Here is where the doubts raise:
>>
>> 1. I would like to initialize the configuration object only once per job
>> execution. Because the configuration is not supposed to be changed during a
>> job execution, I would like to be able to take the configuration parameters
>> from ConfigParams and from Specification objects and create a unique
>> instance of my configuration object.
>>
>> 2. The getPipelineDescription method is quite confusing for me. In the
>> Tika Extractor, this method is used to pack in a string the configuration
>> of the Tika processor. Then this string is again unpacked in the
>> addOrReplaceDocumentWithException method to read the documentation. My
>> question is why?. As far as I understand, the configuration can't change
>> during the job execution and according to the documentation "the contents
>> of the document cannot be considered by this method, and that a different
>> version string (defined in IRepositoryConnector) is used to describe the
>> version of the actual document". So, if only configuration data can be used
>> to create the output version string, probably this version string can be
>> checked by the system before starting the job and not produced and checked
>> per document because basically all the documents are going to produce the
>> same exact output version string. Probably I'm missing something but, for
>> example, looking at Tika Transformation connector seems to be pretty clear
>> that there would be no difference between output version strings for all
>> the documents because it is using only configuration data to create the
>> string.
>>
>> 3.In the addOrReplaceDocumentWithException, why is the
>> pipelineDescription passed by parameter instead of the connector
>> Specification to ease the developer to access the configuration without
>> marshalling and unmarshalling it?
>>
>> 4. Is there a way to reuse a single configuration object per job
>> execution? In the Output processor connector, I used to initialize my
>> custom stuff in the connect method (I'm not sure if this strategy is valid
>> anyway), but for the Transformation connectors I'm not even sure if this
>> method is called.
>>
>> Thanks a lot for your help beforehand. Please note that the questions of
>> course are not intended to be criticism. This mail is just a dump of doubts
>> that probably will help me to better understand the workflows in manifold
>>


Re: Configuration Management at Transformation Connectors

Posted by Karl Wright <da...@gmail.com>.
I should also clarify that another reason that the current
pipelineDescription string design is important is because otherwise there
would be a possibility of something which affects indexing or
transformation *not* getting properly included in the version string.  Such
an omission would, of course, break incremental indexing, because there
would be no way to detect that a pertinent change had taken place.

Thanks,
Karl


On Tue, Jul 1, 2014 at 1:21 PM, Karl Wright <da...@gmail.com> wrote:

> Hi Rafa,
>
> Let me answer one question at a time.
>
> bq. I would like to initialize the configuration object only once per job
> execution. Because the configuration is not supposed to be changed during a
> job execution, I would like to be able to take the configuration parameters
> from ConfigParams and from Specification objects and create a unique
> instance of my configuration object.
>
> Connection instances are all pooled and reused.  You need to read about
> their lifetime.  ManifoldCF in Action chapter 6 (IIRC) is where you will
> find this: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/
> You should also be aware that there is *no* prohibition on configuration
> or specification changing during a job run; the framework is structured,
> however, so that you don't need to worry about this when writing your
> connector.
>
>
> bq. The getPipelineDescription method is quite confusing for me...
>
> Getting a version string and indexing a document may well be separated in
> time, and since it is possible for things to change in-between, the version
> string should be the basis of decisions your connector is making about how
> to do things.  The version string is what gets actually stored in the DB,
> so any differences will be picked up on later crawls.
>
> FWIW, the IRepositoryConnnector interface predates the decision to not
> include a document specification for every method call, and that has
> persisted for backwards compatibility reasons, although in MCF 2.0 that may
> change.  The current design enforces proper connector coding.
>
> bq. In the addOrReplaceDocumentWithExcept
> ion, why is the pipelineDescription passed by parameter instead of the
> connector Specification...?
>
> See answer above.
>
>
> bq. Is there a way to reuse a single configuration object per job
> execution? In the Output processor connector, I used to initialize my
> custom stuff in the connect method (I'm not sure if this strategy is valid
> anyway), but for the Transformation connectors I'm not even sure if this
> method is called.
>
> You really aren't supposed to have a *single* object, but rather one per
> connection instance.  Connection instances are long-lived, remember.  That
> object should also expire eventually if there is no use.  There's a
> particular design pattern you should try to adhere to, which is to have a
> getSession() method that sets up your long-lived member object, and have
> the poll() method free it after a certain amount of inactivity.  Pretty
> much all connectors these days use this pattern; for a modern
> implementation, have a look at the Jira connector.
>
>
> FWIW, there's no MCF in Action chapter on transformation connectors yet,
> but they are quite similar to output connectors in many respects, so
> reading Chapter 9 may help a bit.
>
> Thanks,
> Karl
>
>
>
>
> On Tue, Jul 1, 2014 at 1:04 PM, Rafa Haro <rh...@apache.org> wrote:
>
>> Hi guys,
>>
>> I'm trying to develop my first Transformation Connector. Before starting
>> to code, I have tried to read first enough documentation and I have also
>> studied the Tika extractor as transformation connector example. Currently,
>> I'm just trying to implement an initial version of my connector, starting
>> with something simple to later complicate the things a little bit. The
>> first problem I'm facing is the configuration management, where I'm
>> probably missing something. In my case, I need a fixed configuration while
>> creating an instance of the connector and a extended configuration per job.
>> Let's say that the connector configuration has to setup a service and the
>> job configuration will define how the service should work for each job.
>> With both configurations, I need to create an object which is expensive to
>> instantiate. Here is where the doubts raise:
>>
>> 1. I would like to initialize the configuration object only once per job
>> execution. Because the configuration is not supposed to be changed during a
>> job execution, I would like to be able to take the configuration parameters
>> from ConfigParams and from Specification objects and create a unique
>> instance of my configuration object.
>>
>> 2. The getPipelineDescription method is quite confusing for me. In the
>> Tika Extractor, this method is used to pack in a string the configuration
>> of the Tika processor. Then this string is again unpacked in the
>> addOrReplaceDocumentWithException method to read the documentation. My
>> question is why?. As far as I understand, the configuration can't change
>> during the job execution and according to the documentation "the contents
>> of the document cannot be considered by this method, and that a different
>> version string (defined in IRepositoryConnector) is used to describe the
>> version of the actual document". So, if only configuration data can be used
>> to create the output version string, probably this version string can be
>> checked by the system before starting the job and not produced and checked
>> per document because basically all the documents are going to produce the
>> same exact output version string. Probably I'm missing something but, for
>> example, looking at Tika Transformation connector seems to be pretty clear
>> that there would be no difference between output version strings for all
>> the documents because it is using only configuration data to create the
>> string.
>>
>> 3.In the addOrReplaceDocumentWithException, why is the
>> pipelineDescription passed by parameter instead of the connector
>> Specification to ease the developer to access the configuration without
>> marshalling and unmarshalling it?
>>
>> 4. Is there a way to reuse a single configuration object per job
>> execution? In the Output processor connector, I used to initialize my
>> custom stuff in the connect method (I'm not sure if this strategy is valid
>> anyway), but for the Transformation connectors I'm not even sure if this
>> method is called.
>>
>> Thanks a lot for your help beforehand. Please note that the questions of
>> course are not intended to be criticism. This mail is just a dump of doubts
>> that probably will help me to better understand the workflows in manifold
>>
>
>

Re: Configuration Management at Transformation Connectors

Posted by Karl Wright <da...@gmail.com>.
Hi Rafa,

Let me answer one question at a time.

bq. I would like to initialize the configuration object only once per job
execution. Because the configuration is not supposed to be changed during a
job execution, I would like to be able to take the configuration parameters
from ConfigParams and from Specification objects and create a unique
instance of my configuration object.

Connection instances are all pooled and reused.  You need to read about
their lifetime.  ManifoldCF in Action chapter 6 (IIRC) is where you will
find this: https://manifoldcfinaction.googlecode.com/svn/trunk/pdfs/
You should also be aware that there is *no* prohibition on configuration or
specification changing during a job run; the framework is structured,
however, so that you don't need to worry about this when writing your
connector.


bq. The getPipelineDescription method is quite confusing for me...

Getting a version string and indexing a document may well be separated in
time, and since it is possible for things to change in-between, the version
string should be the basis of decisions your connector is making about how
to do things.  The version string is what gets actually stored in the DB,
so any differences will be picked up on later crawls.

FWIW, the IRepositoryConnnector interface predates the decision to not
include a document specification for every method call, and that has
persisted for backwards compatibility reasons, although in MCF 2.0 that may
change.  The current design enforces proper connector coding.

bq. In the addOrReplaceDocumentWithExcept
ion, why is the pipelineDescription passed by parameter instead of the
connector Specification...?

See answer above.


bq. Is there a way to reuse a single configuration object per job
execution? In the Output processor connector, I used to initialize my
custom stuff in the connect method (I'm not sure if this strategy is valid
anyway), but for the Transformation connectors I'm not even sure if this
method is called.

You really aren't supposed to have a *single* object, but rather one per
connection instance.  Connection instances are long-lived, remember.  That
object should also expire eventually if there is no use.  There's a
particular design pattern you should try to adhere to, which is to have a
getSession() method that sets up your long-lived member object, and have
the poll() method free it after a certain amount of inactivity.  Pretty
much all connectors these days use this pattern; for a modern
implementation, have a look at the Jira connector.


FWIW, there's no MCF in Action chapter on transformation connectors yet,
but they are quite similar to output connectors in many respects, so
reading Chapter 9 may help a bit.

Thanks,
Karl




On Tue, Jul 1, 2014 at 1:04 PM, Rafa Haro <rh...@apache.org> wrote:

> Hi guys,
>
> I'm trying to develop my first Transformation Connector. Before starting
> to code, I have tried to read first enough documentation and I have also
> studied the Tika extractor as transformation connector example. Currently,
> I'm just trying to implement an initial version of my connector, starting
> with something simple to later complicate the things a little bit. The
> first problem I'm facing is the configuration management, where I'm
> probably missing something. In my case, I need a fixed configuration while
> creating an instance of the connector and a extended configuration per job.
> Let's say that the connector configuration has to setup a service and the
> job configuration will define how the service should work for each job.
> With both configurations, I need to create an object which is expensive to
> instantiate. Here is where the doubts raise:
>
> 1. I would like to initialize the configuration object only once per job
> execution. Because the configuration is not supposed to be changed during a
> job execution, I would like to be able to take the configuration parameters
> from ConfigParams and from Specification objects and create a unique
> instance of my configuration object.
>
> 2. The getPipelineDescription method is quite confusing for me. In the
> Tika Extractor, this method is used to pack in a string the configuration
> of the Tika processor. Then this string is again unpacked in the
> addOrReplaceDocumentWithException method to read the documentation. My
> question is why?. As far as I understand, the configuration can't change
> during the job execution and according to the documentation "the contents
> of the document cannot be considered by this method, and that a different
> version string (defined in IRepositoryConnector) is used to describe the
> version of the actual document". So, if only configuration data can be used
> to create the output version string, probably this version string can be
> checked by the system before starting the job and not produced and checked
> per document because basically all the documents are going to produce the
> same exact output version string. Probably I'm missing something but, for
> example, looking at Tika Transformation connector seems to be pretty clear
> that there would be no difference between output version strings for all
> the documents because it is using only configuration data to create the
> string.
>
> 3.In the addOrReplaceDocumentWithException, why is the
> pipelineDescription passed by parameter instead of the connector
> Specification to ease the developer to access the configuration without
> marshalling and unmarshalling it?
>
> 4. Is there a way to reuse a single configuration object per job
> execution? In the Output processor connector, I used to initialize my
> custom stuff in the connect method (I'm not sure if this strategy is valid
> anyway), but for the Transformation connectors I'm not even sure if this
> method is called.
>
> Thanks a lot for your help beforehand. Please note that the questions of
> course are not intended to be criticism. This mail is just a dump of doubts
> that probably will help me to better understand the workflows in manifold
>