You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@oodt.apache.org by Chris Mattmann <ch...@gmail.com> on 2014/10/08 20:51:43 UTC

Re: what is batch stub? Is it necessary?

Hi Val,

I don¹t think you need to run a CAS-PGE task to call
crawler_launcher. If you define blocks in the <output>..</output>
section of the XML file, a crawler will be forked in the job working
directory of CAS-PGE and crawl your specified output.

I believe that will accomplish the same goal of what you are looking for.

No need to have crawling be a separate task from CAS-PGE - CAS-PGE will
do the crawling for you! :)

Cheers,
Chris

------------------------
Chris Mattmann
chris.mattmann@gmail.com




-----Original Message-----
From: "Verma, Rishi (398J)" <Ri...@jpl.nasa.gov>
Reply-To: <de...@oodt.apache.org>
Date: Thursday, October 9, 2014 at 2:44 AM
To: "dev@oodt.apache.org" <de...@oodt.apache.org>
Subject: Re: what is batch stub? Is it necessary?

>Hi Val,
>
>Yep - here¹s a link to the tasks.xml file:
>https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/work
>flow/src/main/resources/policy/tasks.xml
>
>> The problem is that the ExternScriptTaskInstance is unable to recognize
>>the command line arguments that I want to pass to the crawler_launcher
>>script. 
>
>
>Hmm.. could you share your workflow manager log, or better yet, the
>batch_stub output? Curious to see what error is thrown.
>
>Is a script file being generated for your PGE? For example, inside your
>[PGE_HOME] directory, and within the particular job directory created for
>your execution of a workflow, you will see some files starting with
>³sciPgeExeScript_². You¹ll find one for your pgeConfig, and you can
>check to see what the PGE commands actually translate into, with respect
>to a shell script format. If that file is there, take a look at it, and
>validate whether the command works within the script (i.e. copy/paste and
>run the crawler command manually).
>
>Another suggestion is to take a step back, and build up slowly, i.e.:
>1. Do an ³echo² command within your PGE first. (e.g. <cmd> echo ³Hello
>APL.² > /tmp/test.txt</cmd>)
>2. If above works, do a crawler_launcher empty command(e.g.
><cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the
>batch_stub or Workflow Manager prints some kind of output when you run
>the workflow.
>3. Build up your crawler_launcher command piece by piece to see where it
>is failing
>
>Thanks,
>Rishi
>
>On Oct 8, 2014, at 4:24 PM, Mallder, Valerie <Va...@jhuapl.edu>
>wrote:
>
>> Hi Rishi,
>> 
>> Thank you very much for pointing me to your working example. This is
>>very helpful.  My pgeConfig looks very similar to yours.  So, I
>>commented out the resource manager like you suggested and tried running
>>again without the resource manager. And my problem still exists. The
>>problem is that the ExternScriptTaskInstance is unable to recognize the
>>command line arguments that I want to pass to the crawler_launcher
>>script. Could you send me a link to your tasks.xml file? I'm curious as
>>to how you defined your task.  My pgeConfig and tasks.xml are below.
>> 
>> Thanks!
>> Val
>> 
>> 
>> <?xml version="1.0" encoding="UTF-8"?>
>> <pgeConfig>
>> 
>>   <!-- How to run the PGE -->
>>   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
>>        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
>>--launchAutoCrawler \
>>        --filemgrUrl [FILEMGR_URL] \
>>        --clientTransferer
>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>        --productPath [JobInputDir] \
>>        --mimeExtractorRepo
>>[OODT_HOME]/extensions/policy/mime-extractor-map.xml \
>>        --actionIds MoveFileToLevel0Dir</cmd>
>>   </exe>
>> 
>>   <!-- Files to ingest -->
>>   <output/>
>>   </output>
>> 
>> <!-- Custom metadata to add to output files -->
>>   <customMetadata>
>>      <metadata key="JobDir" val="[OODT_HOME]"/>
>>      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
>>      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
>>      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
>>   </customMetadata>
>> 
>> </pgeConfig>
>> 
>> 
>> 
>> <!-- tasks.xml **************************************************-->
>> 
>> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>> 
>>   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
>>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
>>      <conditions/>  <!-- There are no pre execution conditions right
>>now -->
>>      <configuration>
>> 
>>          <property name="ShellType" value="/bin/sh" />
>>          <property name="PathToScript"
>>value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
>> 
>>          <property name="PGETask_Name" value="crawler_launcher PGE
>>Task"/>
>>          <property name="PGETask_ConfigFilePath"
>>value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
>>envReplace="true" />
>>      </configuration>
>>   </task>
>> 
>> </cas:tasks>
>> 
>> Valerie A. Mallder
>> New Horizons Deputy Mission System Engineer
>> Johns Hopkins University/Applied Physics Laboratory
>> 
>> 
>>> -----Original Message-----
>>> From: Verma, Rishi (398J) [mailto:Rishi.Verma@jpl.nasa.gov]
>>> Sent: Wednesday, October 08, 2014 6:01 PM
>>> To: dev@oodt.apache.org
>>> Subject: Re: what is batch stub? Is it necessary?
>>> 
>>> Hi Valerie,
>>> 
>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow task
>>>>>>> in the CAS PGE environment.
>>> 
>>> Interesting. I have a working example here [1] you can look at that
>>>does this exact
>>> thing.
>>> 
>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
>>>>>>> what it is, why it is necessary, and how to run it (please provide
>>>>>>> exact syntax to put in my startup shell script, because I would
>>>>>>> never be able to figure it out for myself and I don't want to have
>>>>>>> to bother everyone again.)
>>> 
>>> Batchstub is only necessary if your Workflow Manger is sending jobs to
>>>Resource
>>> Manager for execution (where the default execution is to run the job
>>>in something
>>> called a ?batch stub? executable). Think of batch stubs as a small
>>>wrapper
>>> program that takes a bundle of executable instructions from Resource
>>>Manager,
>>> and executes them in a shell environment within a given remote (or
>>>local) machine.
>>> 
>>> Here?s my suggestion:
>>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
>>> following command (it?ll start a batch stub in a terminal on port
>>>2001):
>>>> ./batch_stub 2001
>>> 
>>> If the above step doesn?t fix your problem, you can also try having
>>>Workflow
>>> Manager NOT send jobs to Resource Manager for execution, and instead
>>>execute
>>> jobs locally through Workflow Manager itself (on localhost only!). To
>>>disable job
>>> transfer to Resource Manger, you?ll need to modify the Workflow Manager
>>> properties file ($OODT_HOME/wmgr/etc/workflow.properties), and
>>>specifically
>>> comment out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
>>>line.
>>> I?ve done this in my example code below, see [2] for an exact example
>>>of this.
>>> After modifying workflow.properties, make sure to restart workflow
>>>manager
>>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by $OODT_HOME/wmgr/bin/wmgr
>>> start).
>>> 
>>> Thanks,
>>> Rishi
>>> 
>>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>>> 
>>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsample
>>>.xml
>>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
>>> netscan/workflow/src/main/resources/etc/workflow.properties
>>> 
>>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
>>> <pa...@jpl.nasa.gov> wrote:
>>> 
>>>> Valerie,
>>>> 
>>>> I would have thought it would have just not used a batch stub by
>>>>default. That
>>> said if you go into the $OODT_HOME/resmgr/bin there should be a script
>>>to start a
>>> batch stub. Right now on my phone I forget the name of the script but
>>>if you more
>>> the file you will see the Java class name that corresponds to below.
>>>You should
>>> specify a port when you run the script which from the looks of the
>>>output below
>>> should be 2001.
>>>> 
>>>> HTH,
>>>> Paul R
>>>> 
>>>> Sent from my iPhone
>>>> 
>>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie
>>>>><Va...@jhuapl.edu>
>>> wrote:
>>>>> 
>>>>> Well then, I'm proud to be a member :)  (I think .... )
>>>>> 
>>>>> 
>>>>> Valerie A. Mallder
>>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
>>>>> University/Applied Physics Laboratory
>>>>> 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
>>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
>>>>>> To: dev@oodt.apache.org
>>>>>> Subject: Re: what is batch stub? Is it necessary?
>>>>>> 
>>>>>> You have every right to bother everyone.
>>>>>> You won't get what you need unless you do.
>>>>>> 
>>>>>> You get one honorary membership in the Society of General Agitators
>>>>>> - at the rank of Major Agitator.
>>>>>> 
>>>>>> Bruce B.
>>>>>> 
>>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
>>>>>> <Valerie.Mallder@jhuapl.edu
>>>>>>> wrote:
>>>>>> 
>>>>>>> Hello,
>>>>>>> 
>>>>>>> I am still having trouble getting my CAS PGE crawler task to run
>>>>>>> due to
>>>>>>> http://localhost:2001 being "down". I have spent the last 2 days
>>>>>>> tracing through the resource manager code and tracked this down to
>>>>>>> line 146 of LRUScheduler where the XmlRpcBatchMgr is failing to
>>>>>>> execute the task remotely, because on line 75 of
>>>>>>> XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on its
>>>>>>> line 74) is trying to call "isAlive" on the webservice named
>>>>>>> "batchstub" which, to my knowledge, is not running because I have
>>>>>>>not done
>>> anything explicitly to run it.
>>>>>>> 
>>>>>>> All I am trying to do is run "crawler_launcher" as a workflow task
>>>>>>> in the CAS PGE environment.  I had it running perfectly before I
>>>>>>> started trying to make it run as part of a workflow.  I really miss
>>>>>>> my crawler and really want it to run again L
>>>>>>> 
>>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
>>>>>>> what it is, why it is necessary, and how to run it (please provide
>>>>>>> exact syntax to put in my startup shell script, because I would
>>>>>>> never be able to figure it out for myself and I don't want to have
>>>>>>> to bother everyone again.)
>>>>>>> 
>>>>>>> Thanks so much!
>>>>>>> 
>>>>>>> Val
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> Valerie A. Mallder
>>>>>>> 
>>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
>>>>>>> University/Applied Physics Laboratory
>>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
>>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
>>>>>>> 
>>>>>>> 
>>> 
>>> ---
>>> Rishi Verma
>>> NASA Jet Propulsion Laboratory
>>> California Institute of Technology
>> 
>
>---
>Rishi Verma
>NASA Jet Propulsion Laboratory
>California Institute of Technology
>

RE: what is batch stub? Is it necessary?

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.

Hi Lewis,

I will be happy to do this for you tomorrow. I'm on the east coast so I just left for day and now I'm on my way to the barn to ride my horse. I just wanted to let you know so that you weren't waiting for a quick reply. Have a nice evening.

Valerie



Sent from my iPhone.
________________________________
From: Lewis John Mcgibbney <le...@gmail.com>
Sent: Thursday, October 9, 2014 5:25:07 PM
To: dev@oodt.apache.org
Subject: Re: what is batch stub? Is it necessary?

Can you do a ls -al of your /lib directory please?
Also can you please provide any relevant snippet of your pom.xml which
contains filemgr pom.xml dependency
Thank you
Lewis

On Thu, Oct 9, 2014 at 2:20 PM, Mallder, Valerie <Valerie.Mallder@jhuapl.edu
> wrote:

> Thanks Chris,  (Thanks everyone for all of the help, it was helpful,
> really it was :)  )
>
> My brain is exhausted ..... (heavy sigh) and I feel like I have to start
> all over again.
>
> My intention (after I got the crawler and filemanager working together
> last week), was to integrate it with the workflow manager to demonstrate
> launching a workflow that consisted of a simple script that runs before the
> crawler, and then run the crawler.  After that, I was going to try to
> integrate a java application into the workflow, and try to continue
> integrating new things step by step. I think everything would have been
> fine in this simple setup if I could have just gotten the
> ExternScriptTaskInstance to run. But that was a huge fail.  It doesn't look
> like the test program for that class tests it the way I want to use it, so
> I have no idea if it actually works or not.  The code implies that you can
> specify arguments to your external script, but I could not find a way to
> get them read in.  The the getAllMetada method always returned a null list
> of arguments that causes an exception on line 72.
>
> So right now, I've basically gone back to the beginning of using CAS-PGE,
> and I'm trying to get the crawler to run as the very first step in my
> pipeline the raw telemetry files that are dropped off by FEI.  After the
> ingestion and archival, one of the postIngestSuccess actions of the crawler
> copies all of the new raw telemetry files to a directory where we store all
> of the level 0 files.  The level 0 directory (and all of it's
> subdirectories and files) is what I consider to be the "output" of this
> simple first step of the pipeline.  I realize that I may need to start a
> crawler again at a later point in the pipeline. But I want to focus on one
> step at a time.
>
> Chris, In regards to your comments below, here are two questions followed
> by the contents of my .xml files for review.
>
> [1]- When you say " define blocks in the <output>..</output> section of
> the XML file", what xml file are you referring to? I think the
> <output>..</output> tags can only go in the PGE config file, is that
> correct?
>
> Here is what I have in my fei-crawler-pge-config.xml file. Is this OK?
>    <!-- Files to ingest -->
>    <output>
>       <dir="[FEI_DROP_DIR]" envReplace="true" />
>    </output>
>
> [2] If I don't need to define a CAS-PGE Task, how do I tell the workflow
> to start the crawler?   Right now, I am trying to do it with a task, but if
> you can tell me how to do it without a task, I will be happy to try it.
>
>
>
> So, here is my current workflow:
>
> <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"
> id="urn:oodt:jediWorkflowId" name="jediWorkflowName">
>    <tasks>
>        <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName" />
>    </tasks>
> </cas:workflow>
>
> Here is my current task:
>
> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>    <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName"
> class="org.apache.oodt.cas.pge.StdPGETaskInstance">
>       <configuration>
>          <property name="PGETask/Name" value="feiCrawlerTaskname"/>
>          <property name="PGETask/ConfigFilePath"
> value="[OODT_HOME]/extensions/config/fei-crawler-pge-config.xml"
> envReplace="true"/>
>          <property name="PGETask/DumpMetadata" value="true"/>
>          <property name="PGETask/WorkflowManagerUrl"
> value="[WORKFLOW_URL]" envReplace="true" />
>          <property name="PGETask/Query/FileManagerUrl"
>  value="[FILEMGR_URL]" envReplace="true"/>
>          <property name="PGETask/Ingest/FileManagerUrl"
>  value="[FILEMGR_URL]" envReplace="true"/>
>
>          <property name="PGETask/Query/ClientTransferServiceFactory"
> value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
>          <property name="PGETask/Ingest/CrawlerConfigFile"
> value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
>          <property name="PGETask/Ingest/MimeExtractorRepo"
> value="file:[OODT_HOME]/extensions/policy/mime-extractor-map.xml"
> envReplace="true"/>
>          <property name="PGETask/Ingest/ActionIds"
> value="MoveFileToLevel0Dir" envReplace="true"/>
>          <property name="PGE_HOME" value="[PGE_HOME]" envReplace="true"/>
>       </configuration>
>    </task>
> </cas:tasks>
>
> And, here is my current PGE config - fei-crawler-pge-config.xml
>
> <pgeConfig>
>    <!-- How to run the PGE -->
>    <exe dir="[OODT_HOME]">
>       <cmd>mkdir [JobDir]</cmd>
>    </exe>
>
>    <!-- Files to ingest -->
>    <output>
>       <dir="[FEI_DROP_DIR]" envReplace="true" />
>    </output>
>
> <!-- Custom metadata to add to output files -->
>    <customMetadata>
>      <metadata key="JobDir" value="[OODT_HOME]/data/pge/jobs" />
>    </customMetadata>
> </pgeConfig>
>
>
>
> With the settings in these settings I do not get to the point where the
> first command in the PGE config gets executed. The data/pge/jobs directory
> does not get created.  However, the workflow starts and the task gets
> submitted to the resource manager, and a new thread called "Thread-2" gets
> spawned. But, "Thread-2" gets an exception and that's it.  I thought maybe
> it was due to the fact that the filemgr jar is not in the resmgr/lib
> direcory when you do the radix install. So, I copied the filemgr jar file
> to resmgr/lib and ran again, but still get the same the exception.  And,
> the filemgr IS running, and I shut down all of the filemgr, workfow mgr,
> resmgr and batch_stub each time I run so that every run starts with new
> processes every time.
>
> If anyone has any recommendations on a better way to do this please let me
> know.
>
> Thanks,
> Val
>
>
>
>
>
>
> INFO: Task: [feiCrawlerTaskName] has no required metadata fields
> Exception in thread "Thread-2" java.lang.NoClassDefFoundError:
> org/apache/oodt/cas/filemgr/metadata/CoreMetKeys
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>         at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>         at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:190)
>         at
> org.apache.oodt.cas.workflow.util.GenericWorkflowObjectFactory.getTaskObjectFromClassName(GenericWorkflowObjectFactory.java:169)
>         at
> org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.run(IterativeWorkflowProcessorThread.java:222)
>         at
> EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.oodt.cas.filemgr.metadata.CoreMetKeys
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         ... 28 more
>
>
>
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
>
>
> > -----Original Message-----
> > From: Chris Mattmann [mailto:chris.mattmann@gmail.com]
> > Sent: Wednesday, October 08, 2014 2:52 PM
> > To: dev@oodt.apache.org
> > Subject: Re: what is batch stub? Is it necessary?
> >
> > Hi Val,
> >
> > I don?t think you need to run a CAS-PGE task to call crawler_launcher.
> If you
> > define blocks in the <output>..</output> section of the XML file, a
> crawler will be
> > forked in the job working directory of CAS-PGE and crawl your specified
> output.
> >
> > I believe that will accomplish the same goal of what you are looking for.
> >
> > No need to have crawling be a separate task from CAS-PGE - CAS-PGE will
> do
> > the crawling for you! :)
> >
> > Cheers,
> > Chris
> >
> > ------------------------
> > Chris Mattmann
> > chris.mattmann@gmail.com
> >
> >
> >
> >
> > -----Original Message-----
> > From: "Verma, Rishi (398J)" <Ri...@jpl.nasa.gov>
> > Reply-To: <de...@oodt.apache.org>
> > Date: Thursday, October 9, 2014 at 2:44 AM
> > To: "dev@oodt.apache.org" <de...@oodt.apache.org>
> > Subject: Re: what is batch stub? Is it necessary?
> >
> > >Hi Val,
> > >
> > >Yep - here?s a link to the tasks.xml file:
> > >https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/w
> > >ork flow/src/main/resources/policy/tasks.xml
> > >
> > >> The problem is that the ExternScriptTaskInstance is unable to
> > >>recognize the command line arguments that I want to pass to the
> > >>crawler_launcher script.
> > >
> > >
> > >Hmm.. could you share your workflow manager log, or better yet, the
> > >batch_stub output? Curious to see what error is thrown.
> > >
> > >Is a script file being generated for your PGE? For example, inside your
> > >[PGE_HOME] directory, and within the particular job directory created
> > >for your execution of a workflow, you will see some files starting with
> > >?sciPgeExeScript_??. You?ll find one for your pgeConfig, and you can
> > >check to see what the PGE commands actually translate into, with
> > >respect to a shell script format. If that file is there, take a look at
> > >it, and validate whether the command works within the script (i.e.
> > >copy/paste and run the crawler command manually).
> > >
> > >Another suggestion is to take a step back, and build up slowly, i.e.:
> > >1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo ?Hello
> > >APL.? > /tmp/test.txt</cmd>) 2. If above works, do a crawler_launcher
> > >empty command(e.g.
> > ><cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the
> > >batch_stub or Workflow Manager prints some kind of output when you run
> > >the workflow.
> > >3. Build up your crawler_launcher command piece by piece to see where
> > >it is failing
> > >
> > >Thanks,
> > >Rishi
> > >
> > >On Oct 8, 2014, at 4:24 PM, Mallder, Valerie
> > ><Va...@jhuapl.edu>
> > >wrote:
> > >
> > >> Hi Rishi,
> > >>
> > >> Thank you very much for pointing me to your working example. This is
> > >>very helpful.  My pgeConfig looks very similar to yours.  So, I
> > >>commented out the resource manager like you suggested and tried
> > >>running again without the resource manager. And my problem still
> > >>exists. The problem is that the ExternScriptTaskInstance is unable to
> > >>recognize the command line arguments that I want to pass to the
> > >>crawler_launcher script. Could you send me a link to your tasks.xml
> > >>file? I'm curious as to how you defined your task.  My pgeConfig and
> tasks.xml
> > are below.
> > >>
> > >> Thanks!
> > >> Val
> > >>
> > >>
> > >> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig>
> > >>
> > >>   <!-- How to run the PGE -->
> > >>   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
> > >>        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
> > >>--launchAutoCrawler \
> > >>        --filemgrUrl [FILEMGR_URL] \
> > >>        --clientTransferer
> > >>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
> > >>        --productPath [JobInputDir] \
> > >>        --mimeExtractorRepo
> > >>[OODT_HOME]/extensions/policy/mime-extractor-map.xml \
> > >>        --actionIds MoveFileToLevel0Dir</cmd>
> > >>   </exe>
> > >>
> > >>   <!-- Files to ingest -->
> > >>   <output/>
> > >>   </output>
> > >>
> > >> <!-- Custom metadata to add to output files -->
> > >>   <customMetadata>
> > >>      <metadata key="JobDir" val="[OODT_HOME]"/>
> > >>      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
> > >>      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
> > >>      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
> > >>   </customMetadata>
> > >>
> > >> </pgeConfig>
> > >>
> > >>
> > >>
> > >> <!-- tasks.xml **************************************************-->
> > >>
> > >> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
> > >>
> > >>   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
> > >>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
> > >>      <conditions/>  <!-- There are no pre execution conditions right
> > >>now -->
> > >>      <configuration>
> > >>
> > >>          <property name="ShellType" value="/bin/sh" />
> > >>          <property name="PathToScript"
> > >>value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
> > >>
> > >>          <property name="PGETask_Name" value="crawler_launcher PGE
> > >>Task"/>
> > >>          <property name="PGETask_ConfigFilePath"
> > >>value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
> > >>envReplace="true" />
> > >>      </configuration>
> > >>   </task>
> > >>
> > >> </cas:tasks>
> > >>
> > >> Valerie A. Mallder
> > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > >> University/Applied Physics Laboratory
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Verma, Rishi (398J) [mailto:Rishi.Verma@jpl.nasa.gov]
> > >>> Sent: Wednesday, October 08, 2014 6:01 PM
> > >>> To: dev@oodt.apache.org
> > >>> Subject: Re: what is batch stub? Is it necessary?
> > >>>
> > >>> Hi Valerie,
> > >>>
> > >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow
> > >>>>>>> task in the CAS PGE environment.
> > >>>
> > >>> Interesting. I have a working example here [1] you can look at that
> > >>>does this exact  thing.
> > >>>
> > >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
> > >>>>>>> what it is, why it is necessary, and how to run it (please
> > >>>>>>> provide exact syntax to put in my startup shell script, because
> > >>>>>>> I would never be able to figure it out for myself and I don't
> > >>>>>>> want to have to bother everyone again.)
> > >>>
> > >>> Batchstub is only necessary if your Workflow Manger is sending jobs
> > >>>to Resource  Manager for execution (where the default execution is to
> > >>>run the job in something  called a ?batch stub? executable). Think of
> > >>>batch stubs as a small wrapper  program that takes a bundle of
> > >>>executable instructions from Resource Manager,  and executes them in
> > >>>a shell environment within a given remote (or
> > >>>local) machine.
> > >>>
> > >>> Here?s my suggestion:
> > >>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
> > >>>following command (it?ll start a batch stub in a terminal on port
> > >>>2001):
> > >>>> ./batch_stub 2001
> > >>>
> > >>> If the above step doesn?t fix your problem, you can also try having
> > >>>Workflow  Manager NOT send jobs to Resource Manager for execution,
> > >>>and instead execute  jobs locally through Workflow Manager itself (on
> > >>>localhost only!). To disable job  transfer to Resource Manger, you?ll
> > >>>need to modify the Workflow Manager  properties file
> > >>>($OODT_HOME/wmgr/etc/workflow.properties), and specifically  comment
> > >>>out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
> > >>>line.
> > >>> I?ve done this in my example code below, see [2] for an exact
> > >>>example of this.
> > >>> After modifying workflow.properties, make sure to restart workflow
> > >>>manager
> > >>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by
> > $OODT_HOME/wmgr/bin/wmgr
> > >>> start).
> > >>>
> > >>> Thanks,
> > >>> Rishi
> > >>>
> > >>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> > >>>
> > >>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsam
> > >>>ple
> > >>>.xml
> > >>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> > >>> netscan/workflow/src/main/resources/etc/workflow.properties
> > >>>
> > >>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
> > >>> <pa...@jpl.nasa.gov> wrote:
> > >>>
> > >>>> Valerie,
> > >>>>
> > >>>> I would have thought it would have just not used a batch stub by
> > >>>>default. That
> > >>> said if you go into the $OODT_HOME/resmgr/bin there should be a
> > >>>script to start a  batch stub. Right now on my phone I forget the
> > >>>name of the script but if you more  the file you will see the Java
> > >>>class name that corresponds to below.
> > >>>You should
> > >>> specify a port when you run the script which from the looks of the
> > >>>output below  should be 2001.
> > >>>>
> > >>>> HTH,
> > >>>> Paul R
> > >>>>
> > >>>> Sent from my iPhone
> > >>>>
> > >>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie
> > >>>>><Va...@jhuapl.edu>
> > >>> wrote:
> > >>>>>
> > >>>>> Well then, I'm proud to be a member :)  (I think .... )
> > >>>>>
> > >>>>>
> > >>>>> Valerie A. Mallder
> > >>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
> > >>>>> University/Applied Physics Laboratory
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
> > >>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
> > >>>>>> To: dev@oodt.apache.org
> > >>>>>> Subject: Re: what is batch stub? Is it necessary?
> > >>>>>>
> > >>>>>> You have every right to bother everyone.
> > >>>>>> You won't get what you need unless you do.
> > >>>>>>
> > >>>>>> You get one honorary membership in the Society of General
> > >>>>>> Agitators
> > >>>>>> - at the rank of Major Agitator.
> > >>>>>>
> > >>>>>> Bruce B.
> > >>>>>>
> > >>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
> > >>>>>> <Valerie.Mallder@jhuapl.edu
> > >>>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hello,
> > >>>>>>>
> > >>>>>>> I am still having trouble getting my CAS PGE crawler task to run
> > >>>>>>>due to
> > >>>>>>> http://localhost:2001 being "down". I have spent the last 2 days
> > >>>>>>>tracing through the resource manager code and tracked this down
> > >>>>>>>to  line 146 of LRUScheduler where the XmlRpcBatchMgr is failing
> > >>>>>>>to  execute the task remotely, because on line 75 of
> > >>>>>>>XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on
> > >>>>>>>its  line 74) is trying to call "isAlive" on the webservice named
> > >>>>>>>"batchstub" which, to my knowledge, is not running because I have
> > >>>>>>>not done
> > >>> anything explicitly to run it.
> > >>>>>>>
> > >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow
> > >>>>>>> task in the CAS PGE environment.  I had it running perfectly
> > >>>>>>> before I started trying to make it run as part of a workflow.  I
> > >>>>>>> really miss my crawler and really want it to run again L
> > >>>>>>>
> > >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
> > >>>>>>> what it is, why it is necessary, and how to run it (please
> > >>>>>>> provide exact syntax to put in my startup shell script, because
> > >>>>>>> I would never be able to figure it out for myself and I don't
> > >>>>>>> want to have to bother everyone again.)
> > >>>>>>>
> > >>>>>>> Thanks so much!
> > >>>>>>>
> > >>>>>>> Val
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Valerie A. Mallder
> > >>>>>>>
> > >>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
> > >>>>>>> University/Applied Physics Laboratory
> > >>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > >>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > >>>>>>>
> > >>>>>>>
> > >>>
> > >>> ---
> > >>> Rishi Verma
> > >>> NASA Jet Propulsion Laboratory
> > >>> California Institute of Technology
> > >>
> > >
> > >---
> > >Rishi Verma
> > >NASA Jet Propulsion Laboratory
> > >California Institute of Technology
> > >
> >
>
>


--
*Lewis*

Re: what is batch stub? Is it necessary?

Posted by Lewis John Mcgibbney <le...@gmail.com>.

Can you do a ls -al of your /lib directory please?
Also can you please provide any relevant snippet of your pom.xml which
contains filemgr pom.xml dependency
Thank you
Lewis

On Thu, Oct 9, 2014 at 2:20 PM, Mallder, Valerie <Valerie.Mallder@jhuapl.edu
> wrote:

> Thanks Chris,  (Thanks everyone for all of the help, it was helpful,
> really it was :)  )
>
> My brain is exhausted ..... (heavy sigh) and I feel like I have to start
> all over again.
>
> My intention (after I got the crawler and filemanager working together
> last week), was to integrate it with the workflow manager to demonstrate
> launching a workflow that consisted of a simple script that runs before the
> crawler, and then run the crawler.  After that, I was going to try to
> integrate a java application into the workflow, and try to continue
> integrating new things step by step. I think everything would have been
> fine in this simple setup if I could have just gotten the
> ExternScriptTaskInstance to run. But that was a huge fail.  It doesn't look
> like the test program for that class tests it the way I want to use it, so
> I have no idea if it actually works or not.  The code implies that you can
> specify arguments to your external script, but I could not find a way to
> get them read in.  The the getAllMetada method always returned a null list
> of arguments that causes an exception on line 72.
>
> So right now, I've basically gone back to the beginning of using CAS-PGE,
> and I'm trying to get the crawler to run as the very first step in my
> pipeline the raw telemetry files that are dropped off by FEI.  After the
> ingestion and archival, one of the postIngestSuccess actions of the crawler
> copies all of the new raw telemetry files to a directory where we store all
> of the level 0 files.  The level 0 directory (and all of it's
> subdirectories and files) is what I consider to be the "output" of this
> simple first step of the pipeline.  I realize that I may need to start a
> crawler again at a later point in the pipeline. But I want to focus on one
> step at a time.
>
> Chris, In regards to your comments below, here are two questions followed
> by the contents of my .xml files for review.
>
> [1]- When you say " define blocks in the <output>..</output> section of
> the XML file", what xml file are you referring to? I think the
> <output>..</output> tags can only go in the PGE config file, is that
> correct?
>
> Here is what I have in my fei-crawler-pge-config.xml file. Is this OK?
>    <!-- Files to ingest -->
>    <output>
>       <dir="[FEI_DROP_DIR]" envReplace="true" />
>    </output>
>
> [2] If I don't need to define a CAS-PGE Task, how do I tell the workflow
> to start the crawler?   Right now, I am trying to do it with a task, but if
> you can tell me how to do it without a task, I will be happy to try it.
>
>
>
> So, here is my current workflow:
>
> <cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas"
> id="urn:oodt:jediWorkflowId" name="jediWorkflowName">
>    <tasks>
>        <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName" />
>    </tasks>
> </cas:workflow>
>
> Here is my current task:
>
> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
>    <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName"
> class="org.apache.oodt.cas.pge.StdPGETaskInstance">
>       <configuration>
>          <property name="PGETask/Name" value="feiCrawlerTaskname"/>
>          <property name="PGETask/ConfigFilePath"
> value="[OODT_HOME]/extensions/config/fei-crawler-pge-config.xml"
> envReplace="true"/>
>          <property name="PGETask/DumpMetadata" value="true"/>
>          <property name="PGETask/WorkflowManagerUrl"
> value="[WORKFLOW_URL]" envReplace="true" />
>          <property name="PGETask/Query/FileManagerUrl"
>  value="[FILEMGR_URL]" envReplace="true"/>
>          <property name="PGETask/Ingest/FileManagerUrl"
>  value="[FILEMGR_URL]" envReplace="true"/>
>
>          <property name="PGETask/Query/ClientTransferServiceFactory"
> value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
>          <property name="PGETask/Ingest/CrawlerConfigFile"
> value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
>          <property name="PGETask/Ingest/MimeExtractorRepo"
> value="file:[OODT_HOME]/extensions/policy/mime-extractor-map.xml"
> envReplace="true"/>
>          <property name="PGETask/Ingest/ActionIds"
> value="MoveFileToLevel0Dir" envReplace="true"/>
>          <property name="PGE_HOME" value="[PGE_HOME]" envReplace="true"/>
>       </configuration>
>    </task>
> </cas:tasks>
>
> And, here is my current PGE config - fei-crawler-pge-config.xml
>
> <pgeConfig>
>    <!-- How to run the PGE -->
>    <exe dir="[OODT_HOME]">
>       <cmd>mkdir [JobDir]</cmd>
>    </exe>
>
>    <!-- Files to ingest -->
>    <output>
>       <dir="[FEI_DROP_DIR]" envReplace="true" />
>    </output>
>
> <!-- Custom metadata to add to output files -->
>    <customMetadata>
>      <metadata key="JobDir" value="[OODT_HOME]/data/pge/jobs" />
>    </customMetadata>
> </pgeConfig>
>
>
>
> With the settings in these settings I do not get to the point where the
> first command in the PGE config gets executed. The data/pge/jobs directory
> does not get created.  However, the workflow starts and the task gets
> submitted to the resource manager, and a new thread called "Thread-2" gets
> spawned. But, "Thread-2" gets an exception and that's it.  I thought maybe
> it was due to the fact that the filemgr jar is not in the resmgr/lib
> direcory when you do the radix install. So, I copied the filemgr jar file
> to resmgr/lib and ran again, but still get the same the exception.  And,
> the filemgr IS running, and I shut down all of the filemgr, workfow mgr,
> resmgr and batch_stub each time I run so that every run starts with new
> processes every time.
>
> If anyone has any recommendations on a better way to do this please let me
> know.
>
> Thanks,
> Val
>
>
>
>
>
>
> INFO: Task: [feiCrawlerTaskName] has no required metadata fields
> Exception in thread "Thread-2" java.lang.NoClassDefFoundError:
> org/apache/oodt/cas/filemgr/metadata/CoreMetKeys
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>         at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.ClassLoader.defineClass1(Native Method)
>         at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
>         at
> java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
>         at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
>         at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         at java.lang.Class.forName0(Native Method)
>         at java.lang.Class.forName(Class.java:190)
>         at
> org.apache.oodt.cas.workflow.util.GenericWorkflowObjectFactory.getTaskObjectFromClassName(GenericWorkflowObjectFactory.java:169)
>         at
> org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.run(IterativeWorkflowProcessorThread.java:222)
>         at
> EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
>         at java.lang.Thread.run(Thread.java:745)
> Caused by: java.lang.ClassNotFoundException:
> org.apache.oodt.cas.filemgr.metadata.CoreMetKeys
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>         at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>         at java.security.AccessController.doPrivileged(Native Method)
>         at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>         at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>         ... 28 more
>
>
>
> Valerie A. Mallder
> New Horizons Deputy Mission System Engineer
> Johns Hopkins University/Applied Physics Laboratory
>
>
> > -----Original Message-----
> > From: Chris Mattmann [mailto:chris.mattmann@gmail.com]
> > Sent: Wednesday, October 08, 2014 2:52 PM
> > To: dev@oodt.apache.org
> > Subject: Re: what is batch stub? Is it necessary?
> >
> > Hi Val,
> >
> > I don?t think you need to run a CAS-PGE task to call crawler_launcher.
> If you
> > define blocks in the <output>..</output> section of the XML file, a
> crawler will be
> > forked in the job working directory of CAS-PGE and crawl your specified
> output.
> >
> > I believe that will accomplish the same goal of what you are looking for.
> >
> > No need to have crawling be a separate task from CAS-PGE - CAS-PGE will
> do
> > the crawling for you! :)
> >
> > Cheers,
> > Chris
> >
> > ------------------------
> > Chris Mattmann
> > chris.mattmann@gmail.com
> >
> >
> >
> >
> > -----Original Message-----
> > From: "Verma, Rishi (398J)" <Ri...@jpl.nasa.gov>
> > Reply-To: <de...@oodt.apache.org>
> > Date: Thursday, October 9, 2014 at 2:44 AM
> > To: "dev@oodt.apache.org" <de...@oodt.apache.org>
> > Subject: Re: what is batch stub? Is it necessary?
> >
> > >Hi Val,
> > >
> > >Yep - here?s a link to the tasks.xml file:
> > >https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/w
> > >ork flow/src/main/resources/policy/tasks.xml
> > >
> > >> The problem is that the ExternScriptTaskInstance is unable to
> > >>recognize the command line arguments that I want to pass to the
> > >>crawler_launcher script.
> > >
> > >
> > >Hmm.. could you share your workflow manager log, or better yet, the
> > >batch_stub output? Curious to see what error is thrown.
> > >
> > >Is a script file being generated for your PGE? For example, inside your
> > >[PGE_HOME] directory, and within the particular job directory created
> > >for your execution of a workflow, you will see some files starting with
> > >?sciPgeExeScript_??. You?ll find one for your pgeConfig, and you can
> > >check to see what the PGE commands actually translate into, with
> > >respect to a shell script format. If that file is there, take a look at
> > >it, and validate whether the command works within the script (i.e.
> > >copy/paste and run the crawler command manually).
> > >
> > >Another suggestion is to take a step back, and build up slowly, i.e.:
> > >1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo ?Hello
> > >APL.? > /tmp/test.txt</cmd>) 2. If above works, do a crawler_launcher
> > >empty command(e.g.
> > ><cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the
> > >batch_stub or Workflow Manager prints some kind of output when you run
> > >the workflow.
> > >3. Build up your crawler_launcher command piece by piece to see where
> > >it is failing
> > >
> > >Thanks,
> > >Rishi
> > >
> > >On Oct 8, 2014, at 4:24 PM, Mallder, Valerie
> > ><Va...@jhuapl.edu>
> > >wrote:
> > >
> > >> Hi Rishi,
> > >>
> > >> Thank you very much for pointing me to your working example. This is
> > >>very helpful.  My pgeConfig looks very similar to yours.  So, I
> > >>commented out the resource manager like you suggested and tried
> > >>running again without the resource manager. And my problem still
> > >>exists. The problem is that the ExternScriptTaskInstance is unable to
> > >>recognize the command line arguments that I want to pass to the
> > >>crawler_launcher script. Could you send me a link to your tasks.xml
> > >>file? I'm curious as to how you defined your task.  My pgeConfig and
> tasks.xml
> > are below.
> > >>
> > >> Thanks!
> > >> Val
> > >>
> > >>
> > >> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig>
> > >>
> > >>   <!-- How to run the PGE -->
> > >>   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
> > >>        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
> > >>--launchAutoCrawler \
> > >>        --filemgrUrl [FILEMGR_URL] \
> > >>        --clientTransferer
> > >>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
> > >>        --productPath [JobInputDir] \
> > >>        --mimeExtractorRepo
> > >>[OODT_HOME]/extensions/policy/mime-extractor-map.xml \
> > >>        --actionIds MoveFileToLevel0Dir</cmd>
> > >>   </exe>
> > >>
> > >>   <!-- Files to ingest -->
> > >>   <output/>
> > >>   </output>
> > >>
> > >> <!-- Custom metadata to add to output files -->
> > >>   <customMetadata>
> > >>      <metadata key="JobDir" val="[OODT_HOME]"/>
> > >>      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
> > >>      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
> > >>      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
> > >>   </customMetadata>
> > >>
> > >> </pgeConfig>
> > >>
> > >>
> > >>
> > >> <!-- tasks.xml **************************************************-->
> > >>
> > >> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
> > >>
> > >>   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
> > >>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
> > >>      <conditions/>  <!-- There are no pre execution conditions right
> > >>now -->
> > >>      <configuration>
> > >>
> > >>          <property name="ShellType" value="/bin/sh" />
> > >>          <property name="PathToScript"
> > >>value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
> > >>
> > >>          <property name="PGETask_Name" value="crawler_launcher PGE
> > >>Task"/>
> > >>          <property name="PGETask_ConfigFilePath"
> > >>value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
> > >>envReplace="true" />
> > >>      </configuration>
> > >>   </task>
> > >>
> > >> </cas:tasks>
> > >>
> > >> Valerie A. Mallder
> > >> New Horizons Deputy Mission System Engineer Johns Hopkins
> > >> University/Applied Physics Laboratory
> > >>
> > >>
> > >>> -----Original Message-----
> > >>> From: Verma, Rishi (398J) [mailto:Rishi.Verma@jpl.nasa.gov]
> > >>> Sent: Wednesday, October 08, 2014 6:01 PM
> > >>> To: dev@oodt.apache.org
> > >>> Subject: Re: what is batch stub? Is it necessary?
> > >>>
> > >>> Hi Valerie,
> > >>>
> > >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow
> > >>>>>>> task in the CAS PGE environment.
> > >>>
> > >>> Interesting. I have a working example here [1] you can look at that
> > >>>does this exact  thing.
> > >>>
> > >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
> > >>>>>>> what it is, why it is necessary, and how to run it (please
> > >>>>>>> provide exact syntax to put in my startup shell script, because
> > >>>>>>> I would never be able to figure it out for myself and I don't
> > >>>>>>> want to have to bother everyone again.)
> > >>>
> > >>> Batchstub is only necessary if your Workflow Manger is sending jobs
> > >>>to Resource  Manager for execution (where the default execution is to
> > >>>run the job in something  called a ?batch stub? executable). Think of
> > >>>batch stubs as a small wrapper  program that takes a bundle of
> > >>>executable instructions from Resource Manager,  and executes them in
> > >>>a shell environment within a given remote (or
> > >>>local) machine.
> > >>>
> > >>> Here?s my suggestion:
> > >>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
> > >>>following command (it?ll start a batch stub in a terminal on port
> > >>>2001):
> > >>>> ./batch_stub 2001
> > >>>
> > >>> If the above step doesn?t fix your problem, you can also try having
> > >>>Workflow  Manager NOT send jobs to Resource Manager for execution,
> > >>>and instead execute  jobs locally through Workflow Manager itself (on
> > >>>localhost only!). To disable job  transfer to Resource Manger, you?ll
> > >>>need to modify the Workflow Manager  properties file
> > >>>($OODT_HOME/wmgr/etc/workflow.properties), and specifically  comment
> > >>>out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
> > >>>line.
> > >>> I?ve done this in my example code below, see [2] for an exact
> > >>>example of this.
> > >>> After modifying workflow.properties, make sure to restart workflow
> > >>>manager
> > >>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by
> > $OODT_HOME/wmgr/bin/wmgr
> > >>> start).
> > >>>
> > >>> Thanks,
> > >>> Rishi
> > >>>
> > >>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> > >>>
> > >>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsam
> > >>>ple
> > >>>.xml
> > >>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> > >>> netscan/workflow/src/main/resources/etc/workflow.properties
> > >>>
> > >>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
> > >>> <pa...@jpl.nasa.gov> wrote:
> > >>>
> > >>>> Valerie,
> > >>>>
> > >>>> I would have thought it would have just not used a batch stub by
> > >>>>default. That
> > >>> said if you go into the $OODT_HOME/resmgr/bin there should be a
> > >>>script to start a  batch stub. Right now on my phone I forget the
> > >>>name of the script but if you more  the file you will see the Java
> > >>>class name that corresponds to below.
> > >>>You should
> > >>> specify a port when you run the script which from the looks of the
> > >>>output below  should be 2001.
> > >>>>
> > >>>> HTH,
> > >>>> Paul R
> > >>>>
> > >>>> Sent from my iPhone
> > >>>>
> > >>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie
> > >>>>><Va...@jhuapl.edu>
> > >>> wrote:
> > >>>>>
> > >>>>> Well then, I'm proud to be a member :)  (I think .... )
> > >>>>>
> > >>>>>
> > >>>>> Valerie A. Mallder
> > >>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
> > >>>>> University/Applied Physics Laboratory
> > >>>>>
> > >>>>>
> > >>>>>> -----Original Message-----
> > >>>>>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
> > >>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
> > >>>>>> To: dev@oodt.apache.org
> > >>>>>> Subject: Re: what is batch stub? Is it necessary?
> > >>>>>>
> > >>>>>> You have every right to bother everyone.
> > >>>>>> You won't get what you need unless you do.
> > >>>>>>
> > >>>>>> You get one honorary membership in the Society of General
> > >>>>>> Agitators
> > >>>>>> - at the rank of Major Agitator.
> > >>>>>>
> > >>>>>> Bruce B.
> > >>>>>>
> > >>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
> > >>>>>> <Valerie.Mallder@jhuapl.edu
> > >>>>>>> wrote:
> > >>>>>>
> > >>>>>>> Hello,
> > >>>>>>>
> > >>>>>>> I am still having trouble getting my CAS PGE crawler task to run
> > >>>>>>>due to
> > >>>>>>> http://localhost:2001 being "down". I have spent the last 2 days
> > >>>>>>>tracing through the resource manager code and tracked this down
> > >>>>>>>to  line 146 of LRUScheduler where the XmlRpcBatchMgr is failing
> > >>>>>>>to  execute the task remotely, because on line 75 of
> > >>>>>>>XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on
> > >>>>>>>its  line 74) is trying to call "isAlive" on the webservice named
> > >>>>>>>"batchstub" which, to my knowledge, is not running because I have
> > >>>>>>>not done
> > >>> anything explicitly to run it.
> > >>>>>>>
> > >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow
> > >>>>>>> task in the CAS PGE environment.  I had it running perfectly
> > >>>>>>> before I started trying to make it run as part of a workflow.  I
> > >>>>>>> really miss my crawler and really want it to run again L
> > >>>>>>>
> > >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
> > >>>>>>> what it is, why it is necessary, and how to run it (please
> > >>>>>>> provide exact syntax to put in my startup shell script, because
> > >>>>>>> I would never be able to figure it out for myself and I don't
> > >>>>>>> want to have to bother everyone again.)
> > >>>>>>>
> > >>>>>>> Thanks so much!
> > >>>>>>>
> > >>>>>>> Val
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Valerie A. Mallder
> > >>>>>>>
> > >>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
> > >>>>>>> University/Applied Physics Laboratory
> > >>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> > >>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
> > >>>>>>>
> > >>>>>>>
> > >>>
> > >>> ---
> > >>> Rishi Verma
> > >>> NASA Jet Propulsion Laboratory
> > >>> California Institute of Technology
> > >>
> > >
> > >---
> > >Rishi Verma
> > >NASA Jet Propulsion Laboratory
> > >California Institute of Technology
> > >
> >
>
>


-- 
*Lewis*

RE: what is batch stub? Is it necessary?

Posted by "Mallder, Valerie" <Va...@jhuapl.edu>.

Thanks Chris,  (Thanks everyone for all of the help, it was helpful, really it was :)  )

My brain is exhausted ..... (heavy sigh) and I feel like I have to start all over again.

My intention (after I got the crawler and filemanager working together last week), was to integrate it with the workflow manager to demonstrate launching a workflow that consisted of a simple script that runs before the crawler, and then run the crawler.  After that, I was going to try to integrate a java application into the workflow, and try to continue integrating new things step by step. I think everything would have been fine in this simple setup if I could have just gotten the ExternScriptTaskInstance to run. But that was a huge fail.  It doesn't look like the test program for that class tests it the way I want to use it, so I have no idea if it actually works or not.  The code implies that you can specify arguments to your external script, but I could not find a way to get them read in.  The the getAllMetada method always returned a null list of arguments that causes an exception on line 72.

So right now, I've basically gone back to the beginning of using CAS-PGE, and I'm trying to get the crawler to run as the very first step in my pipeline the raw telemetry files that are dropped off by FEI.  After the ingestion and archival, one of the postIngestSuccess actions of the crawler copies all of the new raw telemetry files to a directory where we store all of the level 0 files.  The level 0 directory (and all of it's subdirectories and files) is what I consider to be the "output" of this simple first step of the pipeline.  I realize that I may need to start a crawler again at a later point in the pipeline. But I want to focus on one step at a time.

Chris, In regards to your comments below, here are two questions followed by the contents of my .xml files for review.

[1]- When you say " define blocks in the <output>..</output> section of the XML file", what xml file are you referring to? I think the <output>..</output> tags can only go in the PGE config file, is that correct?

Here is what I have in my fei-crawler-pge-config.xml file. Is this OK?
   <!-- Files to ingest -->
   <output>
      <dir="[FEI_DROP_DIR]" envReplace="true" />
   </output>

[2] If I don't need to define a CAS-PGE Task, how do I tell the workflow to start the crawler?   Right now, I am trying to do it with a task, but if you can tell me how to do it without a task, I will be happy to try it.



So, here is my current workflow:

<cas:workflow xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas" id="urn:oodt:jediWorkflowId" name="jediWorkflowName">
   <tasks>
       <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName" />
   </tasks>
</cas:workflow>

Here is my current task:

<cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
   <task id="urn:oodt:feiCrawlerTaskId" name="feiCrawlerTaskName" class="org.apache.oodt.cas.pge.StdPGETaskInstance">
      <configuration>
         <property name="PGETask/Name" value="feiCrawlerTaskname"/>
         <property name="PGETask/ConfigFilePath" value="[OODT_HOME]/extensions/config/fei-crawler-pge-config.xml" envReplace="true"/>
         <property name="PGETask/DumpMetadata" value="true"/>
         <property name="PGETask/WorkflowManagerUrl" value="[WORKFLOW_URL]" envReplace="true" />
         <property name="PGETask/Query/FileManagerUrl"     value="[FILEMGR_URL]" envReplace="true"/>
         <property name="PGETask/Ingest/FileManagerUrl"     value="[FILEMGR_URL]" envReplace="true"/>

         <property name="PGETask/Query/ClientTransferServiceFactory" value="org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory"/>
         <property name="PGETask/Ingest/CrawlerConfigFile" value="file:[CRAWLER_HOME]/policy/crawler-config.xml" envReplace="true"/>
         <property name="PGETask/Ingest/MimeExtractorRepo" value="file:[OODT_HOME]/extensions/policy/mime-extractor-map.xml" envReplace="true"/>
         <property name="PGETask/Ingest/ActionIds" value="MoveFileToLevel0Dir" envReplace="true"/>
         <property name="PGE_HOME" value="[PGE_HOME]" envReplace="true"/>
      </configuration>
   </task>
</cas:tasks>

And, here is my current PGE config - fei-crawler-pge-config.xml

<pgeConfig>
   <!-- How to run the PGE -->
   <exe dir="[OODT_HOME]">
      <cmd>mkdir [JobDir]</cmd>
   </exe>

   <!-- Files to ingest -->
   <output>
      <dir="[FEI_DROP_DIR]" envReplace="true" />
   </output>

<!-- Custom metadata to add to output files -->
   <customMetadata>
     <metadata key="JobDir" value="[OODT_HOME]/data/pge/jobs" />
   </customMetadata>
</pgeConfig>



With the settings in these settings I do not get to the point where the first command in the PGE config gets executed. The data/pge/jobs directory does not get created.  However, the workflow starts and the task gets submitted to the resource manager, and a new thread called "Thread-2" gets spawned. But, "Thread-2" gets an exception and that's it.  I thought maybe it was due to the fact that the filemgr jar is not in the resmgr/lib direcory when you do the radix install. So, I copied the filemgr jar file to resmgr/lib and ran again, but still get the same the exception.  And, the filemgr IS running, and I shut down all of the filemgr, workfow mgr, resmgr and batch_stub each time I run so that every run starts with new processes every time.

If anyone has any recommendations on a better way to do this please let me know.

Thanks,
Val






INFO: Task: [feiCrawlerTaskName] has no required metadata fields
Exception in thread "Thread-2" java.lang.NoClassDefFoundError: org/apache/oodt/cas/filemgr/metadata/CoreMetKeys
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.ClassLoader.defineClass1(Native Method)
        at java.lang.ClassLoader.defineClass(ClassLoader.java:800)
        at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142)
        at java.net.URLClassLoader.defineClass(URLClassLoader.java:449)
        at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:190)
        at org.apache.oodt.cas.workflow.util.GenericWorkflowObjectFactory.getTaskObjectFromClassName(GenericWorkflowObjectFactory.java:169)
        at org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread.run(IterativeWorkflowProcessorThread.java:222)
        at EDU.oswego.cs.dl.util.concurrent.PooledExecutor$Worker.run(Unknown Source)
        at java.lang.Thread.run(Thread.java:745)
Caused by: java.lang.ClassNotFoundException: org.apache.oodt.cas.filemgr.metadata.CoreMetKeys
        at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
        at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
        at java.security.AccessController.doPrivileged(Native Method)
        at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
        ... 28 more



Valerie A. Mallder
New Horizons Deputy Mission System Engineer
Johns Hopkins University/Applied Physics Laboratory


> -----Original Message-----
> From: Chris Mattmann [mailto:chris.mattmann@gmail.com]
> Sent: Wednesday, October 08, 2014 2:52 PM
> To: dev@oodt.apache.org
> Subject: Re: what is batch stub? Is it necessary?
>
> Hi Val,
>
> I don?t think you need to run a CAS-PGE task to call crawler_launcher. If you
> define blocks in the <output>..</output> section of the XML file, a crawler will be
> forked in the job working directory of CAS-PGE and crawl your specified output.
>
> I believe that will accomplish the same goal of what you are looking for.
>
> No need to have crawling be a separate task from CAS-PGE - CAS-PGE will do
> the crawling for you! :)
>
> Cheers,
> Chris
>
> ------------------------
> Chris Mattmann
> chris.mattmann@gmail.com
>
>
>
>
> -----Original Message-----
> From: "Verma, Rishi (398J)" <Ri...@jpl.nasa.gov>
> Reply-To: <de...@oodt.apache.org>
> Date: Thursday, October 9, 2014 at 2:44 AM
> To: "dev@oodt.apache.org" <de...@oodt.apache.org>
> Subject: Re: what is batch stub? Is it necessary?
>
> >Hi Val,
> >
> >Yep - here?s a link to the tasks.xml file:
> >https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-netscan/w
> >ork flow/src/main/resources/policy/tasks.xml
> >
> >> The problem is that the ExternScriptTaskInstance is unable to
> >>recognize the command line arguments that I want to pass to the
> >>crawler_launcher script.
> >
> >
> >Hmm.. could you share your workflow manager log, or better yet, the
> >batch_stub output? Curious to see what error is thrown.
> >
> >Is a script file being generated for your PGE? For example, inside your
> >[PGE_HOME] directory, and within the particular job directory created
> >for your execution of a workflow, you will see some files starting with
> >?sciPgeExeScript_??. You?ll find one for your pgeConfig, and you can
> >check to see what the PGE commands actually translate into, with
> >respect to a shell script format. If that file is there, take a look at
> >it, and validate whether the command works within the script (i.e.
> >copy/paste and run the crawler command manually).
> >
> >Another suggestion is to take a step back, and build up slowly, i.e.:
> >1. Do an ?echo? command within your PGE first. (e.g. <cmd> echo ?Hello
> >APL.? > /tmp/test.txt</cmd>) 2. If above works, do a crawler_launcher
> >empty command(e.g.
> ><cmd>/path/to/oodt/crawler/bin/crawler_launcher</cmd>) and verify the
> >batch_stub or Workflow Manager prints some kind of output when you run
> >the workflow.
> >3. Build up your crawler_launcher command piece by piece to see where
> >it is failing
> >
> >Thanks,
> >Rishi
> >
> >On Oct 8, 2014, at 4:24 PM, Mallder, Valerie
> ><Va...@jhuapl.edu>
> >wrote:
> >
> >> Hi Rishi,
> >>
> >> Thank you very much for pointing me to your working example. This is
> >>very helpful.  My pgeConfig looks very similar to yours.  So, I
> >>commented out the resource manager like you suggested and tried
> >>running again without the resource manager. And my problem still
> >>exists. The problem is that the ExternScriptTaskInstance is unable to
> >>recognize the command line arguments that I want to pass to the
> >>crawler_launcher script. Could you send me a link to your tasks.xml
> >>file? I'm curious as to how you defined your task.  My pgeConfig and tasks.xml
> are below.
> >>
> >> Thanks!
> >> Val
> >>
> >>
> >> <?xml version="1.0" encoding="UTF-8"?> <pgeConfig>
> >>
> >>   <!-- How to run the PGE -->
> >>   <exe dir="[JobDir]" shell="/bin/sh" envReplace="true">
> >>        <cmd>[CRAWLER_HOME]/bin/crawler_launcher --operation
> >>--launchAutoCrawler \
> >>        --filemgrUrl [FILEMGR_URL] \
> >>        --clientTransferer
> >>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
> >>        --productPath [JobInputDir] \
> >>        --mimeExtractorRepo
> >>[OODT_HOME]/extensions/policy/mime-extractor-map.xml \
> >>        --actionIds MoveFileToLevel0Dir</cmd>
> >>   </exe>
> >>
> >>   <!-- Files to ingest -->
> >>   <output/>
> >>   </output>
> >>
> >> <!-- Custom metadata to add to output files -->
> >>   <customMetadata>
> >>      <metadata key="JobDir" val="[OODT_HOME]"/>
> >>      <metadata key="JobInputDir" val="[FEI_DROP_DIR]"/>
> >>      <metadata key="JobOutputDir" val="[JobDir]/data/pge/jobs"/>
> >>      <metadata key="JobLogDir" val="[JobDir]/data/pge/logs"/>
> >>   </customMetadata>
> >>
> >> </pgeConfig>
> >>
> >>
> >>
> >> <!-- tasks.xml **************************************************-->
> >>
> >> <cas:tasks xmlns:cas="http://oodt.jpl.nasa.gov/1.0/cas">
> >>
> >>   <task id="urn:oodt:crawlerLauncherId" name="crawlerLauncherName"
> >>class="org.apache.oodt.cas.workflow.examples.ExternScriptTaskInstance">
> >>      <conditions/>  <!-- There are no pre execution conditions right
> >>now -->
> >>      <configuration>
> >>
> >>          <property name="ShellType" value="/bin/sh" />
> >>          <property name="PathToScript"
> >>value="[CRAWLER_HOME]/bin/crawler_launcher" envReplace="true" />
> >>
> >>          <property name="PGETask_Name" value="crawler_launcher PGE
> >>Task"/>
> >>          <property name="PGETask_ConfigFilePath"
> >>value="[OODT_HOME]/extensions/config/crawler-pge-config.xml"
> >>envReplace="true" />
> >>      </configuration>
> >>   </task>
> >>
> >> </cas:tasks>
> >>
> >> Valerie A. Mallder
> >> New Horizons Deputy Mission System Engineer Johns Hopkins
> >> University/Applied Physics Laboratory
> >>
> >>
> >>> -----Original Message-----
> >>> From: Verma, Rishi (398J) [mailto:Rishi.Verma@jpl.nasa.gov]
> >>> Sent: Wednesday, October 08, 2014 6:01 PM
> >>> To: dev@oodt.apache.org
> >>> Subject: Re: what is batch stub? Is it necessary?
> >>>
> >>> Hi Valerie,
> >>>
> >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow
> >>>>>>> task in the CAS PGE environment.
> >>>
> >>> Interesting. I have a working example here [1] you can look at that
> >>>does this exact  thing.
> >>>
> >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
> >>>>>>> what it is, why it is necessary, and how to run it (please
> >>>>>>> provide exact syntax to put in my startup shell script, because
> >>>>>>> I would never be able to figure it out for myself and I don't
> >>>>>>> want to have to bother everyone again.)
> >>>
> >>> Batchstub is only necessary if your Workflow Manger is sending jobs
> >>>to Resource  Manager for execution (where the default execution is to
> >>>run the job in something  called a ?batch stub? executable). Think of
> >>>batch stubs as a small wrapper  program that takes a bundle of
> >>>executable instructions from Resource Manager,  and executes them in
> >>>a shell environment within a given remote (or
> >>>local) machine.
> >>>
> >>> Here?s my suggestion:
> >>> 1. Like Paul suggested, go to $OODT_HOME/resmgr/bin, and execute the
> >>>following command (it?ll start a batch stub in a terminal on port
> >>>2001):
> >>>> ./batch_stub 2001
> >>>
> >>> If the above step doesn?t fix your problem, you can also try having
> >>>Workflow  Manager NOT send jobs to Resource Manager for execution,
> >>>and instead execute  jobs locally through Workflow Manager itself (on
> >>>localhost only!). To disable job  transfer to Resource Manger, you?ll
> >>>need to modify the Workflow Manager  properties file
> >>>($OODT_HOME/wmgr/etc/workflow.properties), and specifically  comment
> >>>out the ?org.apache.oodt.cas.workflow.engine.resourcemgr.url?
> >>>line.
> >>> I?ve done this in my example code below, see [2] for an exact
> >>>example of this.
> >>> After modifying workflow.properties, make sure to restart workflow
> >>>manager
> >>> ($OODT_HOME/wmgr/bin/wmgr stop   followed by
> $OODT_HOME/wmgr/bin/wmgr
> >>> start).
> >>>
> >>> Thanks,
> >>> Rishi
> >>>
> >>> [1] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> >>>
> >>>netscan/pge/src/main/resources/policy/netscan-getipv4entriesrandomsam
> >>>ple
> >>>.xml
> >>> [2] https://github.com/riverma/xdata-jpl-netscan/blob/master/oodt-
> >>> netscan/workflow/src/main/resources/etc/workflow.properties
> >>>
> >>> On Oct 8, 2014, at 2:31 PM, Ramirez, Paul M (398J)
> >>> <pa...@jpl.nasa.gov> wrote:
> >>>
> >>>> Valerie,
> >>>>
> >>>> I would have thought it would have just not used a batch stub by
> >>>>default. That
> >>> said if you go into the $OODT_HOME/resmgr/bin there should be a
> >>>script to start a  batch stub. Right now on my phone I forget the
> >>>name of the script but if you more  the file you will see the Java
> >>>class name that corresponds to below.
> >>>You should
> >>> specify a port when you run the script which from the looks of the
> >>>output below  should be 2001.
> >>>>
> >>>> HTH,
> >>>> Paul R
> >>>>
> >>>> Sent from my iPhone
> >>>>
> >>>>> On Oct 8, 2014, at 2:04 PM, Mallder, Valerie
> >>>>><Va...@jhuapl.edu>
> >>> wrote:
> >>>>>
> >>>>> Well then, I'm proud to be a member :)  (I think .... )
> >>>>>
> >>>>>
> >>>>> Valerie A. Mallder
> >>>>> New Horizons Deputy Mission System Engineer Johns Hopkins
> >>>>> University/Applied Physics Laboratory
> >>>>>
> >>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Bruce Barkstrom [mailto:brbarkstrom@gmail.com]
> >>>>>> Sent: Wednesday, October 08, 2014 4:54 PM
> >>>>>> To: dev@oodt.apache.org
> >>>>>> Subject: Re: what is batch stub? Is it necessary?
> >>>>>>
> >>>>>> You have every right to bother everyone.
> >>>>>> You won't get what you need unless you do.
> >>>>>>
> >>>>>> You get one honorary membership in the Society of General
> >>>>>> Agitators
> >>>>>> - at the rank of Major Agitator.
> >>>>>>
> >>>>>> Bruce B.
> >>>>>>
> >>>>>> On Wed, Oct 8, 2014 at 4:49 PM, Mallder, Valerie
> >>>>>> <Valerie.Mallder@jhuapl.edu
> >>>>>>> wrote:
> >>>>>>
> >>>>>>> Hello,
> >>>>>>>
> >>>>>>> I am still having trouble getting my CAS PGE crawler task to run
> >>>>>>>due to
> >>>>>>> http://localhost:2001 being "down". I have spent the last 2 days
> >>>>>>>tracing through the resource manager code and tracked this down
> >>>>>>>to  line 146 of LRUScheduler where the XmlRpcBatchMgr is failing
> >>>>>>>to  execute the task remotely, because on line 75 of
> >>>>>>>XmlRpcBatchMgrProxy (that was instantiated by XmlRpcBatchMgr on
> >>>>>>>its  line 74) is trying to call "isAlive" on the webservice named
> >>>>>>>"batchstub" which, to my knowledge, is not running because I have
> >>>>>>>not done
> >>> anything explicitly to run it.
> >>>>>>>
> >>>>>>> All I am trying to do is run "crawler_launcher" as a workflow
> >>>>>>> task in the CAS PGE environment.  I had it running perfectly
> >>>>>>> before I started trying to make it run as part of a workflow.  I
> >>>>>>> really miss my crawler and really want it to run again L
> >>>>>>>
> >>>>>>> So, if "batchstub" is necessary in this scenario, pleast tell me
> >>>>>>> what it is, why it is necessary, and how to run it (please
> >>>>>>> provide exact syntax to put in my startup shell script, because
> >>>>>>> I would never be able to figure it out for myself and I don't
> >>>>>>> want to have to bother everyone again.)
> >>>>>>>
> >>>>>>> Thanks so much!
> >>>>>>>
> >>>>>>> Val
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Valerie A. Mallder
> >>>>>>>
> >>>>>>> New Horizons Deputy Mission System Engineer The Johns Hopkins
> >>>>>>> University/Applied Physics Laboratory
> >>>>>>> 11100 Johns Hopkins Rd (MS 23-282), Laurel, MD 20723
> >>>>>>> 240-228-7846 (Office) 410-504-2233 (Blackberry)
> >>>>>>>
> >>>>>>>
> >>>
> >>> ---
> >>> Rishi Verma
> >>> NASA Jet Propulsion Laboratory
> >>> California Institute of Technology
> >>
> >
> >---
> >Rishi Verma
> >NASA Jet Propulsion Laboratory
> >California Institute of Technology
> >
>