You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@manifoldcf.apache.org by Marc Emery <ma...@valtech.com> on 2016/10/03 14:40:36 UTC
Custom Transfo Connector - Strange behaviour
Hi,
First of all, thanks for this amazing framework !
I’m running a 2.4 Command-driven multi-process manifoldcf, with a custom transformation connector deployed in /connector-lib.
Once registered, I add the connector in first place after a web connector. Everything runs fine the first time,
10-03-2016 14:35:01.171
document ingest (Solr)
https://library....
OK
0
11
10-03-2016 14:35:01.162
extract [transfo tika]
https://library...
OK
0
3
10-03-2016 14:35:01.151
enhance [transfo biblio]
https://library...
ACCEPTED
0
35
10-03-2016 14:35:01.150
process
https://library....
OK
12815
38
10-03-2016 14:35:00.009
fetch
https://library...
200
12815
1136
but on subsequent run, each url ingestion stops after a successful fetch, without reaching downstream connectors.
10-03-2016 16:06:01.085
fetch
https://library...
200
13992
1250
10-03-2016 16:05:56.084
fetch
https://library...
200
15505
1090
10-03-2016 16:05:51.084
fetch
https://library...
200
12876
922
I can’t see any errors in the logs.
How could I debug this ? Thanks for your help.
Regards
marc
RE: Custom Transfo Connector - Strange behaviour
Posted by Marc Emery <ma...@valtech.com>.
Hi Karl,
You’re right, removing the associated records have forced the complete pipeline.
I will investigate tomorrow on this and keep you informed.
Thanks a lot
marc
De : Karl Wright [mailto:daddywri@gmail.com]
Envoyé : lundi 3 octobre 2016 16:57
À : user@manifoldcf.apache.org
Objet : Re: Custom Transfo Connector - Strange behaviour
Hi Marc,
Sounds like you are running into the incremental nature of the platform.
The framework keeps track of a "version string" for each document from each connector involved in the pipeline. If the version string differs, then the framework knows that it must continue pushing the document down the pipeline. If not, then the framework may conclude that it is unnecessary to continue.
I would look at how other similar transformation connectors handle the version string that they return. I suspect that your code may be missing a subtlety there. You can also confirm this picture by going to the output connection's view page and clicking the appropriate "forget" button and running the job again. If you see ingestions, you will know that you have connector problems that prevent MCF from doing its incremental logic properly.
Please let me know what you find.
Thanks,
Karl
On Mon, Oct 3, 2016 at 10:40 AM, Marc Emery <ma...@valtech.com>> wrote:
Hi,
First of all, thanks for this amazing framework !
I’m running a 2.4 Command-driven multi-process manifoldcf, with a custom transformation connector deployed in /connector-lib.
Once registered, I add the connector in first place after a web connector. Everything runs fine the first time,
10-03-2016 14:35:01.171
document ingest (Solr)
https://library....
OK
0
11
10-03-2016 14:35:01.162
extract [transfo tika]
https://library...
OK
0
3
10-03-2016 14:35:01.151
enhance [transfo biblio]
https://library...
ACCEPTED
0
35
10-03-2016 14:35:01.150
process
https://library....
OK
12815
38
10-03-2016 14:35:00.009
fetch
https://library...
200
12815
1136
but on subsequent run, each url ingestion stops after a successful fetch, without reaching downstream connectors.
10-03-2016 16:06:01.085
fetch
https://library...
200
13992
1250
10-03-2016 16:05:56.084
fetch
https://library...
200
15505
1090
10-03-2016 16:05:51.084
fetch
https://library...
200
12876
922
I can’t see any errors in the logs.
How could I debug this ? Thanks for your help.
Regards
marc
Re: Custom Transfo Connector - Strange behaviour
Posted by Karl Wright <da...@gmail.com>.
Hi Marc,
Sounds like you are running into the incremental nature of the platform.
The framework keeps track of a "version string" for each document from each
connector involved in the pipeline. If the version string differs, then
the framework knows that it must continue pushing the document down the
pipeline. If not, then the framework may conclude that it is unnecessary
to continue.
I would look at how other similar transformation connectors handle the
version string that they return. I suspect that your code may be missing a
subtlety there. You can also confirm this picture by going to the output
connection's view page and clicking the appropriate "forget" button and
running the job again. If you see ingestions, you will know that you have
connector problems that prevent MCF from doing its incremental logic
properly.
Please let me know what you find.
Thanks,
Karl
On Mon, Oct 3, 2016 at 10:40 AM, Marc Emery <ma...@valtech.com> wrote:
> Hi,
>
>
>
> First of all, thanks for this amazing framework !
>
>
>
> I’m running a 2.4 Command-driven multi-process manifoldcf, with a custom
> transformation connector deployed in /connector-lib.
>
> Once registered, I add the connector in first place after a web connector.
> Everything runs fine the first time,
>
>
>
> 10-03-2016 14:35:01.171
>
> document ingest (Solr)
>
> https://library....
>
> OK
>
> 0
>
> 11
>
> 10-03-2016 14:35:01.162
>
> extract [transfo tika]
>
> https://library...
>
> OK
>
> 0
>
> 3
>
> 10-03-2016 14:35:01.151
>
> enhance [transfo biblio]
>
> https://library...
>
> ACCEPTED
>
> 0
>
> 35
>
> 10-03-2016 14:35:01.150
>
> process
>
> https://library....
>
> OK
>
> 12815
>
> 38
>
> 10-03-2016 14:35:00.009
>
> fetch
>
> https://library...
>
> 200
>
> 12815
>
> 1136
>
>
>
>
>
> but on subsequent run, each url ingestion stops after a successful fetch,
> without reaching downstream connectors.
>
>
>
> 10-03-2016 16:06:01.085
>
> fetch
>
> https://library...
>
> 200
>
> 13992
>
> 1250
>
> 10-03-2016 16:05:56.084
>
> fetch
>
> https://library...
>
> 200
>
> 15505
>
> 1090
>
> 10-03-2016 16:05:51.084
>
> fetch
>
> https://library...
>
> 200
>
> 12876
>
> 922
>
>
>
>
>
> I can’t see any errors in the logs.
>
>
>
> How could I debug this ? Thanks for your help.
>
>
>
>
>
> Regards
>
> marc
>