You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by lewis john mcgibbney <le...@apache.org> on 2018/12/04 05:11:02 UTC
Updating Workflow Status to Post-Ingest
Hi Folks,
Whilst executing the following command
./crawler/bin/crawler_launcher \
--filemgrUrl http://localhost:9000 \
--operation --launchMetCrawler \
--clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
--productPath /usr/local/coal-sds-deploy/data/staging \
--metExtractor
org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor \
--metExtractorConfig /usr/local/coal-sds-deploy/data/met/tika.conf \
--failureDir /usr/local/coal-sds-deploy/data/failure \
--daemonPort 9003 \
--daemonWait 2 \
--successDir /usr/local/coal-sds-deploy/data/archive \
--actionIds DeleteDataFile UpdateWorkflowStatusToIngest \
--workflowMgrUrl http://localhost:9001
As you can see, I am trying to kick off a workflow post a successful file
ingestion task. The error I'm getting is as follows
INFO: Performing action (id = UpdateWorkflowStatusToIngest : description =
Triggers workflow event with the name [ProductType]Ingest)
21:00:22.537 [main] DEBUG
org.apache.oodt.cas.workflow.system.rpc.RpcCommunicationFactory - Using
workflow manager client factory : class
org.apache.oodt.cas.workflow.system.rpc.AvroRpcWorkflowManagerFactory
21:00:22.549 [main] INFO
org.apache.oodt.cas.workflow.system.AvroRpcWorkflowManagerClient - Client
created successfully for workflow manager URL: http://localhost:9001
Dec 03, 2018 9:00:22 PM org.apache.oodt.cas.crawl.ProductCrawler
performProductCrawlerActions
WARNING: Failed to perform crawler action : Action (id =
UpdateWorkflowStatusToIngest : description = Triggers workflow event with
the name [ProductType]Ingest) returned false
java.lang.Exception: Action (id = UpdateWorkflowStatusToIngest :
description = Triggers workflow event with the name [ProductType]Ingest)
returned false
at
org.apache.oodt.cas.crawl.ProductCrawler.performProductCrawlerActions(ProductCrawler.java:362)
at
org.apache.oodt.cas.crawl.ProductCrawler.performPostIngestOnSuccessActions(ProductCrawler.java:334)
at
org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:198)
at
org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:109)
at
org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:76)
at
org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:84)
at
org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:56)
at
org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
at
org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:37)
I have configured workflow manager and have a PGE named 'pycoal-pge' which
includes several tasks. I am just not sure how to reference it from the
crawler_launcher input parameters.
Any ideas? Thanks in advance,
Lewis
--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc
Re: Updating Workflow Status to Post-Ingest
Posted by Lewis John McGibbney <le...@apache.org>.
By the way folks, the 'UpdateWorkflowStatusToIngest' action bean is really just 'TriggerPostIngestWorkflow'. The bean was incorrect so I changed it to the latter.
Thanks for any information.
On 2018/12/04 05:11:02, lewis john mcgibbney <le...@apache.org> wrote:
> Hi Folks,
> Whilst executing the following command
>
> ./crawler/bin/crawler_launcher \
> --filemgrUrl http://localhost:9000 \
> --operation --launchMetCrawler \
> --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
> --productPath /usr/local/coal-sds-deploy/data/staging \
> --metExtractor
> org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor \
> --metExtractorConfig /usr/local/coal-sds-deploy/data/met/tika.conf \
> --failureDir /usr/local/coal-sds-deploy/data/failure \
> --daemonPort 9003 \
> --daemonWait 2 \
> --successDir /usr/local/coal-sds-deploy/data/archive \
> --actionIds DeleteDataFile UpdateWorkflowStatusToIngest \
> --workflowMgrUrl http://localhost:9001
>
> As you can see, I am trying to kick off a workflow post a successful file
> ingestion task. The error I'm getting is as follows
>
> INFO: Performing action (id = UpdateWorkflowStatusToIngest : description =
> Triggers workflow event with the name [ProductType]Ingest)
> 21:00:22.537 [main] DEBUG
> org.apache.oodt.cas.workflow.system.rpc.RpcCommunicationFactory - Using
> workflow manager client factory : class
> org.apache.oodt.cas.workflow.system.rpc.AvroRpcWorkflowManagerFactory
> 21:00:22.549 [main] INFO
> org.apache.oodt.cas.workflow.system.AvroRpcWorkflowManagerClient - Client
> created successfully for workflow manager URL: http://localhost:9001
> Dec 03, 2018 9:00:22 PM org.apache.oodt.cas.crawl.ProductCrawler
> performProductCrawlerActions
> WARNING: Failed to perform crawler action : Action (id =
> UpdateWorkflowStatusToIngest : description = Triggers workflow event with
> the name [ProductType]Ingest) returned false
> java.lang.Exception: Action (id = UpdateWorkflowStatusToIngest :
> description = Triggers workflow event with the name [ProductType]Ingest)
> returned false
> at
> org.apache.oodt.cas.crawl.ProductCrawler.performProductCrawlerActions(ProductCrawler.java:362)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.performPostIngestOnSuccessActions(ProductCrawler.java:334)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:198)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:109)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:76)
> at
> org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:84)
> at
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:56)
> at
> org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> at
> org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:37)
>
> I have configured workflow manager and have a PGE named 'pycoal-pge' which
> includes several tasks. I am just not sure how to reference it from the
> crawler_launcher input parameters.
>
> Any ideas? Thanks in advance,
> Lewis
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>
Re: Updating Workflow Status to Post-Ingest
Posted by Chris Mattmann <ma...@apache.org>.
Probably a good idea to read this too:
https://cwiki.apache.org/confluence/display/OODT/Understanding+the+flow+of+Metadata+during+PGE+based+Processing
https://cwiki.apache.org/confluence/display/OODT/Understanding+CAS-PGE+Metadata+Precendence
From: Lewis John McGibbney <le...@apache.org>
Reply-To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Date: Tuesday, December 4, 2018 at 8:40 AM
To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Subject: Re: Updating Workflow Status to Post-Ingest
The current state of my PGE and Workflow policy can be seen at
https://github.com/capstone-coal/coal-sds/tree/master/workflow/src/main/resources/policy
On 2018/12/04 05:11:02, lewis john mcgibbney <le...@apache.org> wrote:
Hi Folks,
Whilst executing the following command
./crawler/bin/crawler_launcher \
--filemgrUrl http://localhost:9000 \
--operation --launchMetCrawler \
--clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
--productPath /usr/local/coal-sds-deploy/data/staging \
--metExtractor
org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor \
--metExtractorConfig /usr/local/coal-sds-deploy/data/met/tika.conf \
--failureDir /usr/local/coal-sds-deploy/data/failure \
--daemonPort 9003 \
--daemonWait 2 \
--successDir /usr/local/coal-sds-deploy/data/archive \
--actionIds DeleteDataFile UpdateWorkflowStatusToIngest \
--workflowMgrUrl http://localhost:9001
As you can see, I am trying to kick off a workflow post a successful file
ingestion task. The error I'm getting is as follows
INFO: Performing action (id = UpdateWorkflowStatusToIngest : description =
Triggers workflow event with the name [ProductType]Ingest)
21:00:22.537 [main] DEBUG
org.apache.oodt.cas.workflow.system.rpc.RpcCommunicationFactory - Using
workflow manager client factory : class
org.apache.oodt.cas.workflow.system.rpc.AvroRpcWorkflowManagerFactory
21:00:22.549 [main] INFO
org.apache.oodt.cas.workflow.system.AvroRpcWorkflowManagerClient - Client
created successfully for workflow manager URL: http://localhost:9001
Dec 03, 2018 9:00:22 PM org.apache.oodt.cas.crawl.ProductCrawler
performProductCrawlerActions
WARNING: Failed to perform crawler action : Action (id =
UpdateWorkflowStatusToIngest : description = Triggers workflow event with
the name [ProductType]Ingest) returned false
java.lang.Exception: Action (id = UpdateWorkflowStatusToIngest :
description = Triggers workflow event with the name [ProductType]Ingest)
returned false
at
org.apache.oodt.cas.crawl.ProductCrawler.performProductCrawlerActions(ProductCrawler.java:362)
at
org.apache.oodt.cas.crawl.ProductCrawler.performPostIngestOnSuccessActions(ProductCrawler.java:334)
at
org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:198)
at
org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:109)
at
org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:76)
at
org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:84)
at
org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:56)
at
org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
at
org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:37)
I have configured workflow manager and have a PGE named 'pycoal-pge' which
includes several tasks. I am just not sure how to reference it from the
crawler_launcher input parameters.
Any ideas? Thanks in advance,
Lewis
--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc
Re: Updating Workflow Status to Post-Ingest
Posted by Lewis John McGibbney <le...@apache.org>.
Hi Chris,
Working perfectly now thanks for the guidance.
My tika.conf has to contain the following KV's
ProductType=GenericFile
ProductTypeName=GenericFileIngest
My events.xml has to contain the following
<cas:workflowevents xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
<event name="GenericFileIngest">
<workflow id="urn:oodt:PycoalWorkflow"/>
</event>
</cas:workflowevents>
The 'GenericFileIngest' concerns me as these files are not really considered as generic from within the COAL-SDS system... but it is something I can live with for sure.
Thanks again,
Lewis
On 2018/12/04 19:08:08, Chris Mattmann <ma...@apache.org> wrote:
> I think it needs to be “ProductTypeName” specifically so try changing the
> met field name to exactly that, and yes it needs to be GenericFileIngest.
>
> So you want basically an event named that in events.xml and then you want
> to map that to the workflow you want it to automatically kick off in the
> event-to-worfklow-map.xml.
>
>
>
>
>
>
>
> From: Lewis John McGibbney <le...@apache.org>
> Reply-To: <de...@oodt.apache.org>
> Date: Tuesday, December 4, 2018 at 10:43 AM
> To: <de...@oodt.apache.org>
> Subject: Re: Updating Workflow Status to Post-Ingest
>
>
>
> Hi Chris,
>
>
>
> On 2018/12/04 16:51:35, Chris Mattmann <ma...@apache.org> wrote:
>
> Do you have a workflow with the name [ProductTypeName]Ingest?
>
>
>
> No I do not. Is the key within the square brackets a variable e.g. would it look like 'GenericFileIngest'?
>
> I'm not entirely sure how and where to define this.
>
>
>
> Is ProductTypeName one of the extracted metadata fields?
>
>
>
> In my Solr Catalog every record ends up with a field '"CAS.ProductTypeName": "GenericFile",' however I will most likely change this from 'GenericFile' to something more specific and relevant e.g. 'AVIRIS-NGDataFile', for each crawler daemon I have set up. This can be set in tika.conf.
>
>
>
> If not, then
>
> that part will be null when it goes to look it up in the crawler. Please check.
>
>
>
> Is my above understanding correct?
>
> Thanks
>
>
>
>
Re: Updating Workflow Status to Post-Ingest
Posted by Chris Mattmann <ma...@apache.org>.
I think it needs to be “ProductTypeName” specifically so try changing the
met field name to exactly that, and yes it needs to be GenericFileIngest.
So you want basically an event named that in events.xml and then you want
to map that to the workflow you want it to automatically kick off in the
event-to-worfklow-map.xml.
From: Lewis John McGibbney <le...@apache.org>
Reply-To: <de...@oodt.apache.org>
Date: Tuesday, December 4, 2018 at 10:43 AM
To: <de...@oodt.apache.org>
Subject: Re: Updating Workflow Status to Post-Ingest
Hi Chris,
On 2018/12/04 16:51:35, Chris Mattmann <ma...@apache.org> wrote:
Do you have a workflow with the name [ProductTypeName]Ingest?
No I do not. Is the key within the square brackets a variable e.g. would it look like 'GenericFileIngest'?
I'm not entirely sure how and where to define this.
Is ProductTypeName one of the extracted metadata fields?
In my Solr Catalog every record ends up with a field '"CAS.ProductTypeName": "GenericFile",' however I will most likely change this from 'GenericFile' to something more specific and relevant e.g. 'AVIRIS-NGDataFile', for each crawler daemon I have set up. This can be set in tika.conf.
If not, then
that part will be null when it goes to look it up in the crawler. Please check.
Is my above understanding correct?
Thanks
Re: Updating Workflow Status to Post-Ingest
Posted by Lewis John McGibbney <le...@apache.org>.
Hi Chris,
On 2018/12/04 16:51:35, Chris Mattmann <ma...@apache.org> wrote:
> Do you have a workflow with the name [ProductTypeName]Ingest?
No I do not. Is the key within the square brackets a variable e.g. would it look like 'GenericFileIngest'?
I'm not entirely sure how and where to define this.
>
> Is ProductTypeName one of the extracted metadata fields?
In my Solr Catalog every record ends up with a field '"CAS.ProductTypeName": "GenericFile",' however I will most likely change this from 'GenericFile' to something more specific and relevant e.g. 'AVIRIS-NGDataFile', for each crawler daemon I have set up. This can be set in tika.conf.
> If not, then
> that part will be null when it goes to look it up in the crawler. Please check.
Is my above understanding correct?
Thanks
Re: Updating Workflow Status to Post-Ingest
Posted by Chris Mattmann <ma...@apache.org>.
Do you have a workflow with the name [ProductTypeName]Ingest?
Is ProductTypeName one of the extracted metadata fields? If not, then
that part will be null when it goes to look it up in the crawler. Please check.
Thanks,
Chris
From: Lewis John McGibbney <le...@apache.org>
Reply-To: <de...@oodt.apache.org>
Date: Tuesday, December 4, 2018 at 8:40 AM
To: <de...@oodt.apache.org>
Subject: Re: Updating Workflow Status to Post-Ingest
The current state of my PGE and Workflow policy can be seen at
https://github.com/capstone-coal/coal-sds/tree/master/workflow/src/main/resources/policy
On 2018/12/04 05:11:02, lewis john mcgibbney <le...@apache.org> wrote:
Hi Folks,
Whilst executing the following command
./crawler/bin/crawler_launcher \
--filemgrUrl http://localhost:9000 \
--operation --launchMetCrawler \
--clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
--productPath /usr/local/coal-sds-deploy/data/staging \
--metExtractor
org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor \
--metExtractorConfig /usr/local/coal-sds-deploy/data/met/tika.conf \
--failureDir /usr/local/coal-sds-deploy/data/failure \
--daemonPort 9003 \
--daemonWait 2 \
--successDir /usr/local/coal-sds-deploy/data/archive \
--actionIds DeleteDataFile UpdateWorkflowStatusToIngest \
--workflowMgrUrl http://localhost:9001
As you can see, I am trying to kick off a workflow post a successful file
ingestion task. The error I'm getting is as follows
INFO: Performing action (id = UpdateWorkflowStatusToIngest : description =
Triggers workflow event with the name [ProductType]Ingest)
21:00:22.537 [main] DEBUG
org.apache.oodt.cas.workflow.system.rpc.RpcCommunicationFactory - Using
workflow manager client factory : class
org.apache.oodt.cas.workflow.system.rpc.AvroRpcWorkflowManagerFactory
21:00:22.549 [main] INFO
org.apache.oodt.cas.workflow.system.AvroRpcWorkflowManagerClient - Client
created successfully for workflow manager URL: http://localhost:9001
Dec 03, 2018 9:00:22 PM org.apache.oodt.cas.crawl.ProductCrawler
performProductCrawlerActions
WARNING: Failed to perform crawler action : Action (id =
UpdateWorkflowStatusToIngest : description = Triggers workflow event with
the name [ProductType]Ingest) returned false
java.lang.Exception: Action (id = UpdateWorkflowStatusToIngest :
description = Triggers workflow event with the name [ProductType]Ingest)
returned false
at
org.apache.oodt.cas.crawl.ProductCrawler.performProductCrawlerActions(ProductCrawler.java:362)
at
org.apache.oodt.cas.crawl.ProductCrawler.performPostIngestOnSuccessActions(ProductCrawler.java:334)
at
org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:198)
at
org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:109)
at
org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:76)
at
org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:84)
at
org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:56)
at
org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
at
org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:37)
I have configured workflow manager and have a PGE named 'pycoal-pge' which
includes several tasks. I am just not sure how to reference it from the
crawler_launcher input parameters.
Any ideas? Thanks in advance,
Lewis
--
http://home.apache.org/~lewismc/
http://people.apache.org/keys/committer/lewismc
Re: Updating Workflow Status to Post-Ingest
Posted by Lewis John McGibbney <le...@apache.org>.
The current state of my PGE and Workflow policy can be seen at
https://github.com/capstone-coal/coal-sds/tree/master/workflow/src/main/resources/policy
On 2018/12/04 05:11:02, lewis john mcgibbney <le...@apache.org> wrote:
> Hi Folks,
> Whilst executing the following command
>
> ./crawler/bin/crawler_launcher \
> --filemgrUrl http://localhost:9000 \
> --operation --launchMetCrawler \
> --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
> --productPath /usr/local/coal-sds-deploy/data/staging \
> --metExtractor
> org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor \
> --metExtractorConfig /usr/local/coal-sds-deploy/data/met/tika.conf \
> --failureDir /usr/local/coal-sds-deploy/data/failure \
> --daemonPort 9003 \
> --daemonWait 2 \
> --successDir /usr/local/coal-sds-deploy/data/archive \
> --actionIds DeleteDataFile UpdateWorkflowStatusToIngest \
> --workflowMgrUrl http://localhost:9001
>
> As you can see, I am trying to kick off a workflow post a successful file
> ingestion task. The error I'm getting is as follows
>
> INFO: Performing action (id = UpdateWorkflowStatusToIngest : description =
> Triggers workflow event with the name [ProductType]Ingest)
> 21:00:22.537 [main] DEBUG
> org.apache.oodt.cas.workflow.system.rpc.RpcCommunicationFactory - Using
> workflow manager client factory : class
> org.apache.oodt.cas.workflow.system.rpc.AvroRpcWorkflowManagerFactory
> 21:00:22.549 [main] INFO
> org.apache.oodt.cas.workflow.system.AvroRpcWorkflowManagerClient - Client
> created successfully for workflow manager URL: http://localhost:9001
> Dec 03, 2018 9:00:22 PM org.apache.oodt.cas.crawl.ProductCrawler
> performProductCrawlerActions
> WARNING: Failed to perform crawler action : Action (id =
> UpdateWorkflowStatusToIngest : description = Triggers workflow event with
> the name [ProductType]Ingest) returned false
> java.lang.Exception: Action (id = UpdateWorkflowStatusToIngest :
> description = Triggers workflow event with the name [ProductType]Ingest)
> returned false
> at
> org.apache.oodt.cas.crawl.ProductCrawler.performProductCrawlerActions(ProductCrawler.java:362)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.performPostIngestOnSuccessActions(ProductCrawler.java:334)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.handleFile(ProductCrawler.java:198)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:109)
> at
> org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:76)
> at
> org.apache.oodt.cas.crawl.daemon.CrawlDaemon.startCrawling(CrawlDaemon.java:84)
> at
> org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:56)
> at
> org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
> at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:188)
> at
> org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:37)
>
> I have configured workflow manager and have a PGE named 'pycoal-pge' which
> includes several tasks. I am just not sure how to reference it from the
> crawler_launcher input parameters.
>
> Any ideas? Thanks in advance,
> Lewis
> --
> http://home.apache.org/~lewismc/
> http://people.apache.org/keys/committer/lewismc
>