You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Marc Emery <ma...@valtech.com> on 2016/10/03 14:40:36 UTC

Custom Transfo Connector - Strange behaviour

Hi,

First of all, thanks for this amazing framework !

I’m running a 2.4 Command-driven multi-process manifoldcf, with a custom transformation connector deployed in /connector-lib.
Once registered, I add the connector in first place after a web connector. Everything runs fine the first time,

10-03-2016 14:35:01.171

document ingest (Solr)

https://library....

OK

0

11

10-03-2016 14:35:01.162

extract [transfo tika]

https://library...

OK

0

3

10-03-2016 14:35:01.151

enhance [transfo biblio]

https://library...

ACCEPTED

0

35

10-03-2016 14:35:01.150

process

https://library....

OK

12815

38

10-03-2016 14:35:00.009

fetch

https://library...

200

12815

1136



but on subsequent run, each url ingestion stops after a successful fetch, without reaching downstream connectors.

10-03-2016 16:06:01.085

fetch

https://library...

200

13992

1250

10-03-2016 16:05:56.084

fetch

https://library...

200

15505

1090

10-03-2016 16:05:51.084

fetch

https://library...

200

12876

922



I can’t see any errors in the logs.

How could I debug this ? Thanks for your help.


Regards
marc

RE: Custom Transfo Connector - Strange behaviour

Posted by Marc Emery <ma...@valtech.com>.
Hi Karl,

You’re right, removing the associated records have forced the complete pipeline.
I will investigate tomorrow on this and keep you informed.

Thanks a lot
marc



De : Karl Wright [mailto:daddywri@gmail.com]
Envoyé : lundi 3 octobre 2016 16:57
À : user@manifoldcf.apache.org
Objet : Re: Custom Transfo Connector - Strange behaviour

Hi Marc,

Sounds like you are running into the incremental nature of the platform.

The framework keeps track of a "version string" for each document from each connector involved in the pipeline.  If the version string differs, then the framework knows that it must continue pushing the document down the pipeline.  If not, then the framework may conclude that it is unnecessary to continue.

I would look at how other similar transformation connectors handle the version string that they return.  I suspect that your code may be missing a subtlety there.  You can also confirm this picture by going to the output connection's view page and clicking the appropriate "forget" button and running the job again. If you see ingestions, you will know that you have connector problems that prevent MCF from doing its incremental logic properly.

Please let me know what you find.

Thanks,
Karl


On Mon, Oct 3, 2016 at 10:40 AM, Marc Emery <ma...@valtech.com>> wrote:
Hi,

First of all, thanks for this amazing framework !

I’m running a 2.4 Command-driven multi-process manifoldcf, with a custom transformation connector deployed in /connector-lib.
Once registered, I add the connector in first place after a web connector. Everything runs fine the first time,

10-03-2016 14:35:01.171

document ingest (Solr)

https://library....

OK

0

11

10-03-2016 14:35:01.162

extract [transfo tika]

https://library...

OK

0

3

10-03-2016 14:35:01.151

enhance [transfo biblio]

https://library...

ACCEPTED

0

35

10-03-2016 14:35:01.150

process

https://library....

OK

12815

38

10-03-2016 14:35:00.009

fetch

https://library...

200

12815

1136




but on subsequent run, each url ingestion stops after a successful fetch, without reaching downstream connectors.

10-03-2016 16:06:01.085

fetch

https://library...

200

13992

1250

10-03-2016 16:05:56.084

fetch

https://library...

200

15505

1090

10-03-2016 16:05:51.084

fetch

https://library...

200

12876

922




I can’t see any errors in the logs.

How could I debug this ? Thanks for your help.


Regards
marc


Re: Custom Transfo Connector - Strange behaviour

Posted by Karl Wright <da...@gmail.com>.
Hi Marc,

Sounds like you are running into the incremental nature of the platform.

The framework keeps track of a "version string" for each document from each
connector involved in the pipeline.  If the version string differs, then
the framework knows that it must continue pushing the document down the
pipeline.  If not, then the framework may conclude that it is unnecessary
to continue.

I would look at how other similar transformation connectors handle the
version string that they return.  I suspect that your code may be missing a
subtlety there.  You can also confirm this picture by going to the output
connection's view page and clicking the appropriate "forget" button and
running the job again. If you see ingestions, you will know that you have
connector problems that prevent MCF from doing its incremental logic
properly.

Please let me know what you find.

Thanks,
Karl


On Mon, Oct 3, 2016 at 10:40 AM, Marc Emery <ma...@valtech.com> wrote:

> Hi,
>
>
>
> First of all, thanks for this amazing framework !
>
>
>
> I’m running a 2.4 Command-driven multi-process manifoldcf, with a custom
> transformation connector deployed in /connector-lib.
>
> Once registered, I add the connector in first place after a web connector.
> Everything runs fine the first time,
>
>
>
> 10-03-2016 14:35:01.171
>
> document ingest (Solr)
>
> https://library....
>
> OK
>
> 0
>
> 11
>
> 10-03-2016 14:35:01.162
>
> extract [transfo tika]
>
> https://library...
>
> OK
>
> 0
>
> 3
>
> 10-03-2016 14:35:01.151
>
> enhance [transfo biblio]
>
> https://library...
>
> ACCEPTED
>
> 0
>
> 35
>
> 10-03-2016 14:35:01.150
>
> process
>
> https://library....
>
> OK
>
> 12815
>
> 38
>
> 10-03-2016 14:35:00.009
>
> fetch
>
> https://library...
>
> 200
>
> 12815
>
> 1136
>
>
>
>
>
> but on subsequent run, each url ingestion stops after a successful fetch,
> without reaching downstream connectors.
>
>
>
> 10-03-2016 16:06:01.085
>
> fetch
>
> https://library...
>
> 200
>
> 13992
>
> 1250
>
> 10-03-2016 16:05:56.084
>
> fetch
>
> https://library...
>
> 200
>
> 15505
>
> 1090
>
> 10-03-2016 16:05:51.084
>
> fetch
>
> https://library...
>
> 200
>
> 12876
>
> 922
>
>
>
>
>
> I can’t see any errors in the logs.
>
>
>
> How could I debug this ? Thanks for your help.
>
>
>
>
>
> Regards
>
> marc
>