You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov> on 2014/11/01 18:47:49 UTC

Re: re: Question about OODT file manager

Dear Luke, just confirming, we solved this in class right? It had
to do with the batch stub not being turned on.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Luke <sh...@usc.edu>
Date: Tuesday, October 28, 2014 at 12:52 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
<zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
<zi...@usc.edu>
Subject: RE: re: Question about OODT file manager

>Dear Professor Mattamnn,
>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>sorry for getting back to you with my appreciation, I have been
>conducting tests with OODT based on your advice, but unfortunately I am
>having another problem....
>
>I am following the steps
>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>) to get a sense of how to get workflow to work.
>The problem is that the File-Concatenator-PGE (by running the wmgr-client
>command-line) does not seems to be invoked or executed, but I am seeing
>the tasks are getting stacked up in the workflow manager with status
>either "RSUBMIT" or "QUEUED", but they are not getting executed, PFA:
>workflow_monitor.jpg, please note, by default the workflow min pool size
>is 6; so here comes another problem, i have 6 submitted tasks with status
>RSUBMIT, but any new incoming tasks will be forwarded to the waiting
>QUEUE with status "QUEUED"...please refer to the workflow_monitor.jpg for
>details, where I have 3 QUEUED workflow task and 6 RSUMBITE tasks.
>
>Question 1): not sure why the workflow is not being executed, and hanging
>at the state of "RSUBMIT", after enabling the log level, I am seeing the
>following entry in the log, not sure if this has anything to do with the
>"hanging" problem where workflow is not getting executed and hanging at
>state of "RSUBMIT".
>	Oct 28, 2014 3:35:07 AM
>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>safeCheckJobComplete
>	WARNING: Exception checking completion status for job:
>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>java.lang.NullPointerException
>
>Question 2): I think currently on my side any new incoming workflow task
>I am sending with the following command is being directed to the waiting
>"QUEUE" because of the min pool size (i.e. 6) (I can increase this to a
>larger number though),
>			./wmgr-client --url http://localhost:9200 --operation --sendEvent
>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>	If possible, I would like to please know if there is a way we can purge
>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>"QUEUED" I have already sent, please kindly help.
>
>Very sorry for troubling you with this, to be honest I find OODT a bit
>challenging to grasp within a short time frame, probably because there is
>no book like OODT in action like Solr.... and what I am doing is just
>trial and error blended with guess, but I don’t want to make a blind
>guess, it will be appreciated if you can please also shed some lights on
>where I can get more information logging or other way where I can
>troubleshoot. I think it might be worth tracking what is happening when
>workflow reach the status "RSUBMIT" and how to get a specific logging
>info specific to it...
>
>Again your advice and kind help will be appreciated usual.
>
>
>Thanks
>Luke
>
>> -----Original Message-----
>> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>> Sent: 2014年10月26日 22:18
>> To: Luke; 'Zichuan Wang'
>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>> dev@oodt.apache.org
>> Subject: Re: re: Question about OODT file manager
>> 
>> Hi Luke,
>> 
>> Thanks and sorry it’s taken me a while to reply. Here are some details
>>below:
>> 
>> 
>> -----Original Message-----
>> From: Luke <sh...@usc.edu>
>> Date: Sunday, October 26, 2014 at 6:19 PM
>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>> <zi...@usc.edu>
>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>> Subject: RE: re: Question about OODT file manager
>> 
>> >Hi Professor Mattmann and OODT DEV,
>> >
>> >Sorry to trouble you with this email, our team has been struggling in
>> >the oodt to send json files to solr.
>> >One of the difficulties is still getting OODT workflow to call the
>> >poster.py in etllib.
>> 
>> Sorry that you’re having difficulty let me try and help.
>> 
>> >
>> >I am not sure if my understanding is correct with OODT requirement, I
>> >hope you can please kindly advice and help with our confusion.
>> >
>> >a set of goals in my mind with OODT is as follows, please kindly
>> >confirm and clarify:
>> >
>> >1)
>> >Get the File-Manager up and running.
>> 
>> Yep, hopefully as installed via OODT RADIX.
>> 
>> >2)
>> >send all json files with command wmgr-client to the fileManager server.
>> >(I believe we can achieve it with a bash script or probably  python
>> >that calls the command line sequentially with each json file name as an
>> >argument?!)
>> 
>> Suggestion:
>> 
>> 1. Use the OODT crawler and file manager to crawl/index the JSON files
>>(in
>> place data transfer).
>> 2. Take a look at CAS-PGE, it will help you write a workflow task that
>>will wrap
>> ETLlib and the poster command.
>> 3. Once you are confident with #2, whip up a script that pages through
>>all of
>> your indexed JSON files, and then for each one, submits a workflow
>>event (you
>> may need to look into aggregating them) that calls your CAS-PGE wrapped
>> poster task from ETLlib.
>> 
>> >3)
>> >Once we have json files sent and stored in the File-Manager, we need to
>> >get workflow-manager up and running, and we can create a workflow  that
>> >send those jsons file from the file manager to solr.
>> 
>> See above.
>> 
>> >4)
>> >Create a workflow according to
>> >Workflow2 User Guide
>> ><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide>
>> >>>>>>>>>>> here comes the problem…..
>> >         I am not sure how to create a workflow task which can call the
>> >poster.py in python etllib, it looks like we need to create our own
>> >java  class that extend <TaskInstance> which is an abstract Java class
>> >with one abstract method that has the following signature:
>> >
>> >
>> >protectedabstract ResultsState performExecution(ControlMetadata
>> >crtlMetadata);
>> >         However, the detail of where to find the corresponding libs
>> >and where to put our implementation in workflow manager is being
>> >neglected  in that page.  I am not sure if we should use TaskInstance,
>> >but it seems the workflow has to have an interface thru which it can
>> >call the python code i.e. poster.py. and it looks like we need to
>> >embody the TaskInstance::performExecution by injecting the code  that
>> >calls the poster.py and return the resultState.
>> >
>> >
>> >It would be greatly appreciated if you could please shed some lights
>> >and advice how we can get a task instance to call the poster.py. BTW, I
>> >am  also not sure if my understanding is correct, please kindly correct
>> >it if inappropriate. Your help will be appreciated as usual.
>> >
>> >
>> >
>> >Thanks
>> >Luke
>> 
>> Thanks Luke, see above. Let me know if it helps.
>> 
>> Cheers!
>> 
>> Chris
>> 
>> >
>> >From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>> >
>> >Sent: 2014年10月25日
>> > 13:34
>> >To: Zichuan Wang
>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu; xiaoyanj@usc.edu
>> >Subject: Re: 回复: Question about OODT file manager
>> >
>> >
>> >
>> >Please cc
>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in detail
>> >soon
>> >
>> >Sent from my iPhone
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398) NASA Jet
>> Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++
>> Adjunct Associate Professor, Computer Science Department University of
>> Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++
>> 
>> 
>> 
>> 
>> 
>> 
>> >
>> >
>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu> wrote:
>> >
>> >
>> >Dear Professor,
>> >
>> >
>> >
>> >Could please also explain how I can crawl all JSON file name under a
>> >specific directory using CAS-PGE? I’ll work through this example
>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>> p
>> >le,  but it doesn’t mention anything about crawling, instead it
>> >manually set the Input files paths...
>> >
>> >
>> >
>> >
>> >--
>> >
>> >Zichuan Wang
>> >
>> >University of Southern California, Department of Computer Science
>> >
>> >
>> >
>> >
>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>> >写道:
>> >
>> >Dear Professor,
>> >
>> >
>> >
>> >In assignment 2 specification I noticed that you mentioned OODT File
>> >Manager, but from my understanding, we are using ETLLib poster which
>> >talks directly to Solr. So how can we use OODT File Manager in this
>> >assignment?
>> >
>> >
>> >
>> >--
>> >
>> >Zichuan Wang
>> >
>> >University of Southern California, Department of Computer Science
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>


Re: re: Question about OODT file manager

Posted by Chris Mattmann <ch...@gmail.com>.
Thanks, I just meant -Xms and -Xmx parameters :)

------------------------
Chris Mattmann
chris.mattmann@gmail.com




-----Original Message-----
From: Zichuan Wang <zi...@usc.edu>
Reply-To: <de...@oodt.apache.org>
Date: Wednesday, November 5, 2014 at 11:10 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>
Cc: Chris Mattmann <ma...@usc.edu>, <de...@oodt.apache.org>, Luke liu
<sh...@usc.edu>, <xi...@usc.edu>, <zh...@usc.edu>
Subject: Re: re: Question about OODT file manager

>Thanks for the quick reply.
>
>
>
>
>By increasing the heap size of batch stub, do you mean increase the value
>for org.apache.oodt.cas.resource.jobqueue.jobstack.maxstacksize ?
>
>
>
>
>I tried it but still got the same error.
>
>
>
>
>Looking forward to your reply.
>
>
>—
>Zichuan Wang
>Department of Computer Science, USC
>
>On Wed, Nov 5, 2014 at 10:40 PM, Mattmann, Chris A (3980)
><ch...@jpl.nasa.gov> wrote:
>
>> Got it. Can you increase the heap space on your batch stub? That
>> should take care of it.
>> Cheers,
>> Chris
>> P.S. Great work!
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> -----Original Message-----
>> From: Zichuan Wang <zi...@usc.edu>
>> Date: Wednesday, November 5, 2014 at 11:12 PM
>> To: Chris Mattmann <ma...@usc.edu>
>> Cc: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>"dev@oodt.apache.org"
>> <de...@oodt.apache.org>, Luke liu <sh...@usc.edu>, "xiaoyanj@usc.edu"
>> <xi...@usc.edu>, "zhoujian@usc.edu" <zh...@usc.edu>
>> Subject: Re: re: Question about OODT file manager
>>>Dear Professor,
>>>
>>>
>>>I finally figured out how to trigger a post ingest event. However when I
>>>try to crawl the whole dataset, I got an OutOfMemory Error. Could you
>>>please take a look and maybe give some suggestions?
>>>
>>>
>>>➜  bin  ./crawler_launcher \
>>>--operation --launchAutoCrawler \
>>>--filemgrUrl http://localhost:9000 \
>>>--clientTransferer
>>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>>--productPath /Users/zichuanwang/Downloads/output \
>>>--mimeExtractorRepo ../policy/mime-extractor-map.xml \
>>>--workflowMgrUrl http://localhost:9200 \
>>>-ais TriggerPostIngestWorkflow
>>>Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>>>Setting property 'StdProductCrawler.clientTransferer'
>>>Setting property 'MetExtractorProductCrawler.clientTransferer'
>>>Setting property 'AutoDetectProductCrawler.clientTransferer'
>>>Setting property 'StdProductCrawler.filemgrUrl'
>>>Setting property 'MetExtractorProductCrawler.filemgrUrl'
>>>Setting property 'AutoDetectProductCrawler.filemgrUrl'
>>>Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'
>>>Setting property 'StdProductCrawler.actionIds'
>>>Setting property 'MetExtractorProductCrawler.actionIds'
>>>Setting property 'AutoDetectProductCrawler.actionIds'
>>>Setting property 'StdProductCrawler.productPath'
>>>Setting property 'MetExtractorProductCrawler.productPath'
>>>Setting property 'AutoDetectProductCrawler.productPath'
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value
>>>[http://localhost:9200]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
>>>[../policy/mime-extractor-map.xml]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
>>>Ϣ: Crawling /Users/zichuanwang/Downloads/output
>>>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>at java.io.UnixFileSystem.list(Native Method)
>>>at java.io.File.list(File.java:973)
>>>at java.io.File.listFiles(File.java:1129)
>>>at 
>>>org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>>at 
>>>org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
>>>at 
>>>org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(Cr
>>>aw
>>>lerLauncherCliAction.java:58)
>>>at 
>>>org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>>at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
>>>at 
>>>org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>>>
>>>
>>>—
>>>Zichuan Wang
>>>Department of Computer Science, USC
>>>
>>>
>>>On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann
>>><ma...@usc.edu> wrote:
>>>
>>>
>>>Thanks Luke, I’ve given you permissions so you should now see an
>>>“edit” button on that wiki page.
>>>
>>>Cheers, 
>>>Chris 
>>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Adjunct Associate Professor, Computer Science Department
>>>University of Southern California
>>>Los Angeles, CA 90089 USA
>>>Email: mattmann@usc.edu
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Luke liu <sh...@usc.edu>
>>>Date: Wednesday, November 5, 2014 at 6:48 PM
>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>"dev@oodt.apache.org"
>>><de...@oodt.apache.org>
>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>Wang'
>>><zi...@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>>
>>>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>>>following account detail:
>>>> Account name: luke
>>>> Full Name: Shuai Liu (Luke)
>>>> Email: hanson311biz@gmail.com
>>>> Password: *******
>>>> 
>>>>But I am not sure where I can add my notes to the following web article
>>>>with 
>>>>which I had trouble , I also tried to create a new article, but failed
>>>>to 
>>>>do 
>>>>it as I cannot find a place where I can edit, does this have something
>>>>do 
>>>>with my account that is not visible for the "edit" or "comments"
>>>>action?
>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>>le
>>>> 
>>>> 
>>>> 
>>>>Thanks 
>>>>Luke 
>>>>-----Original Message-----
>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>Sent: Sunday, November 2, 2014 6:59 AM
>>>>To: Luke liu; dev@oodt.apache.org
>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>'Zichuan 
>>>>Wang' 
>>>>Subject: Re: re: Question about OODT file manager
>>>> 
>>>>Yes Luke, making the instructions better would be much appreciated!
>>>> 
>>>>If you have an account on the wiki please share it, else sign up for an
>>>>Apache OODT wiki account and please share it with me or anyone else on
>>>>dev@oodt and we’ll add you.
>>>> 
>>>>Cheers, 
>>>>Chris 
>>>> 
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Chris Mattmann, Ph.D.
>>>>Chief Architect
>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>Office: 168-519, Mailstop: 168-527
>>>>Email: chris.a.mattmann@nasa.gov
>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>Southern California, Los Angeles, CA 90089 USA
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>-----Original Message-----
>>>>From: Luke liu <sh...@usc.edu>
>>>>Date: Sunday, November 2, 2014 at 1:32 AM
>>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>>"dev@oodt.apache.org"
>>>><de...@oodt.apache.org>
>>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>>Wang' 
>>>><zi...@usc.edu>
>>>>Subject: RE: re: Question about OODT file manager
>>>> 
>>>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>>>and there were some other issues such as missing jars; and sorry for
>>>>>not confirming this right away, my laptop was actually crashing, and i
>>>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>>>were still successfully ingested and placed in the output directory)
>>>>> 
>>>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>>>us to help correct the document(i.e.
>>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>>>>>p
>>>>>le 
>>>>>)? 
>>>>>If possible, I would like to please share my notes with some problem
>>>>>steps mentioned there.
>>>>> 
>>>>>Anyway, thanks for your help and appreciated.
>>>>> 
>>>>>Thanks 
>>>>>Luke 
>>>>>-----Original Message-----
>>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>>>To: Luke; dev@oodt.apache.org
>>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>'Zichuan Wang'
>>>>>Subject: Re: re: Question about OODT file manager
>>>>> 
>>>>>Dear Luke, just confirming, we solved this in class right? It had to
>>>>>do
>>>>>with the batch stub not being turned on.
>>>>> 
>>>>>Cheers, 
>>>>>Chris 
>>>>> 
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Chris Mattmann, Ph.D.
>>>>>Chief Architect
>>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>Office: 168-519, Mailstop: 168-527
>>>>>Email: chris.a.mattmann@nasa.gov
>>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>>Southern California, Los Angeles, CA 90089 USA
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>-----Original Message-----
>>>>>From: Luke <sh...@usc.edu>
>>>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>>>"dev@oodt.apache.org"
>>>>><de...@oodt.apache.org>
>>>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>>>Wang' 
>>>>><zi...@usc.edu>
>>>>>Subject: RE: re: Question about OODT file manager
>>>>> 
>>>>>>Dear Professor Mattamnn,
>>>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>>>sorry for getting back to you with my appreciation, I have been
>>>>>>conducting tests with OODT based on your advice, but unfortunately I
>>>>>>am having another problem....
>>>>>> 
>>>>>>I am following the steps
>>>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Ex
>>>>>>a
>>>>>>mpl 
>>>>>>e 
>>>>>>) to get a sense of how to get workflow to work.
>>>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>>>wmgr-client 
>>>>>>command-line) does not seems to be invoked or executed, but I am
>>>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>>>status either "RSUBMIT" or "QUEUED", but they are not getting
>>>>>>executed, 
>>>>PFA: 
>>>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>>>and 
>>>>6 RSUMBITE tasks.
>>>>>> 
>>>>>>Question 1): not sure why the workflow is not being executed, and
>>>>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>>>>seeing the following entry in the log, not sure if this has anything
>>>>>>to do with the "hanging" problem where workflow is not getting
>>>>>>executed and hanging at state of "RSUBMIT".
>>>>>> Oct 28, 2014 3:35:07 AM
>>>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>>>safeCheckJobComplete
>>>>>> WARNING: Exception checking completion status for job:
>>>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>>>java.lang.NullPointerException
>>>>>> 
>>>>>>Question 2): I think currently on my side any new incoming workflow
>>>>>>task I am sending with the following command is being directed to the
>>>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>>>this to a larger number though),
>>>>>> ./wmgr-client --url http://localhost:9200
>>>>>--operation --sendEvent
>>>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>>>> If possible, I would like to please know if there is a way we can
>>>>>purge 
>>>>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>>>>"QUEUED" I have already sent, please kindly help.
>>>>>> 
>>>>>>Very sorry for troubling you with this, to be honest I find OODT a
>>>>>>bit
>>>>>>challenging to grasp within a short time frame, probably because
>>>>>>there
>>>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>>>just trial and error blended with guess, but I don’t want to make a
>>>>>>blind guess, it will be appreciated if you can please also shed some
>>>>>>lights on where I can get more information logging or other way where
>>>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>>>>specific logging info specific to it...
>>>>>> 
>>>>>>Again your advice and kind help will be appreciated usual.
>>>>>> 
>>>>>> 
>>>>>>Thanks 
>>>>>>Luke 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Mattmann, Chris A (3980)
>>>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>>> Sent: 2014年10月26日 22:18
>>>>>>> To: Luke; 'Zichuan Wang'
>>>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>>> dev@oodt.apache.org
>>>>>>> Subject: Re: re: Question about OODT file manager
>>>>>>> 
>>>>>>> Hi Luke, 
>>>>>>> 
>>>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>>>details 
>>>>>>>below: 
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Luke <sh...@usc.edu>
>>>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>>>>> <zi...@usc.edu>
>>>>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>>>>> Subject: RE: re: Question about OODT file manager
>>>>>>> 
>>>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>>>> > 
>>>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>>>> >in the oodt to send json files to solr.
>>>>>>> >One of the difficulties is still getting OODT workflow to call the
>>>>>>> >poster.py in etllib.
>>>>>>> 
>>>>>>> Sorry that you’re having difficulty let me try and help.
>>>>>>> 
>>>>>>> > 
>>>>>>> >I am not sure if my understanding is correct with OODT
>>>>>>>requirement,
>>>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>>>> > 
>>>>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>>>>> >confirm and clarify:
>>>>>>> > 
>>>>>>> >1) 
>>>>>>> >Get the File-Manager up and running.
>>>>>>> 
>>>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>>>> 
>>>>>>> >2) 
>>>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>>>server. 
>>>>>>> >(I believe we can achieve it with a bash script or probably python
>>>>>>> >that calls the command line sequentially with each json file name
>>>>>>> >as 
>>>>>>>an 
>>>>>>> >argument?!)
>>>>>>> 
>>>>>>> Suggestion:
>>>>>>> 
>>>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>>>files (in place data transfer).
>>>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>>>>that will wrap ETLlib and the poster command.
>>>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>>>through all of your indexed JSON files, and then for each one,
>>>>>>>submits a workflow event (you may need to look into aggregating
>>>>>>>them) that calls your CAS-PGE wrapped poster task from ETLlib.
>>>>>>> 
>>>>>>> >3) 
>>>>>>> >Once we have json files sent and stored in the File-Manager, we
>>>>>>> >need 
>>>>>>>to 
>>>>>>> >get workflow-manager up and running, and we can create a workflow
>>>>>>>that 
>>>>>>> >send those jsons file from the file manager to solr.
>>>>>>> 
>>>>>>> See above. 
>>>>>>> 
>>>>>>> >4) 
>>>>>>> >Create a workflow according to
>>>>>>> >Workflow2 User Guide
>>>>>>> 
>>>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gu
>>>>>>>>i
>>>>>>>>de> 
>>>>>>> >>>>>>>>>>> here comes the problem…..
>>>>>>> > I am not sure how to create a workflow task which can call
>>>>>>>the 
>>>>>>> >poster.py in python etllib, it looks like we need to create our
>>>>>>>own
>>>>>>> >java class that extend <TaskInstance> which is an abstract Java
>>>>>>> >class with one abstract method that has the following signature:
>>>>>>> > 
>>>>>>> > 
>>>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>>>> >crtlMetadata);
>>>>>>> > However, the detail of where to find the corresponding
>>>>>>> >libs and where to put our implementation in workflow manager is
>>>>>>> >being neglected in that page. I am not sure if we should use
>>>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>>>> >thru which it can call the python code i.e. poster.py. and it
>>>>>>>looks
>>>>>>> >like we need to embody the TaskInstance::performExecution by
>>>>>>> >injecting the code that calls the poster.py and return the
>>>>resultState. 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >It would be greatly appreciated if you could please shed some
>>>>>>> >lights and advice how we can get a task instance to call the
>>>>>>> >poster.py. BTW,
>>>>>>>I 
>>>>>>> >am also not sure if my understanding is correct, please kindly
>>>>>>>correct 
>>>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Thanks 
>>>>>>> >Luke 
>>>>>>> 
>>>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>>>> 
>>>>>>> Cheers! 
>>>>>>> 
>>>>>>> Chris 
>>>>>>> 
>>>>>>> > 
>>>>>>> >From: Mattmann, Chris A (3980)
>>>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>>> > 
>>>>>>> >Sent: 2014年10月25日
>>>>>>> > 13:34 
>>>>>>> >To: Zichuan Wang
>>>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>>>> >xiaoyanj@usc.edu
>>>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Please cc 
>>>>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>>>>detail 
>>>>>>> >soon 
>>>>>>> > 
>>>>>>> >Sent from my iPhone
>>>>>>> 
>>>>>>> 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Chief Architect
>>>>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW: http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu>
>>>>>>>wrote: 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Dear Professor,
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Could please also explain how I can crawl all JSON file name under
>>>>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>>>>> 
>>>>>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>>>> >xam 
>>>>>>> p 
>>>>>>> >le, but it doesn’t mention anything about crawling, instead it
>>>>>>> >manually set the Input files paths...
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >-- 
>>>>>>> > 
>>>>>>> >Zichuan Wang
>>>>>>> > 
>>>>>>> >University of Southern California, Department of Computer Science
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>>>> >写道: 
>>>>>>> > 
>>>>>>> >Dear Professor,
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>>>>> >File Manager, but from my understanding, we are using ETLLib
>>>>>>>poster
>>>>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>>>>> >in this assignment?
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >-- 
>>>>>>> > 
>>>>>>> >Zichuan Wang
>>>>>>> > 
>>>>>>> >University of Southern California, Department of Computer Science
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>
>>>
>>>
>>>
>>>



Re: re: Question about OODT file manager

Posted by Zichuan Wang <zi...@usc.edu>.
Thanks for the quick reply.




By increasing the heap size of batch stub, do you mean increase the value for org.apache.oodt.cas.resource.jobqueue.jobstack.maxstacksize ?




I tried it but still got the same error.




Looking forward to your reply.


—
Zichuan Wang
Department of Computer Science, USC

On Wed, Nov 5, 2014 at 10:40 PM, Mattmann, Chris A (3980)
<ch...@jpl.nasa.gov> wrote:

> Got it. Can you increase the heap space on your batch stub? That
> should take care of it.
> Cheers,
> Chris
> P.S. Great work!
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> -----Original Message-----
> From: Zichuan Wang <zi...@usc.edu>
> Date: Wednesday, November 5, 2014 at 11:12 PM
> To: Chris Mattmann <ma...@usc.edu>
> Cc: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
> <de...@oodt.apache.org>, Luke liu <sh...@usc.edu>, "xiaoyanj@usc.edu"
> <xi...@usc.edu>, "zhoujian@usc.edu" <zh...@usc.edu>
> Subject: Re: re: Question about OODT file manager
>>Dear Professor,
>>
>>
>>I finally figured out how to trigger a post ingest event. However when I
>>try to crawl the whole dataset, I got an OutOfMemory Error. Could you
>>please take a look and maybe give some suggestions?
>>
>>
>>➜  bin  ./crawler_launcher \
>>--operation --launchAutoCrawler \
>>--filemgrUrl http://localhost:9000 \
>>--clientTransferer
>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>--productPath /Users/zichuanwang/Downloads/output \
>>--mimeExtractorRepo ../policy/mime-extractor-map.xml \
>>--workflowMgrUrl http://localhost:9200 \
>>-ais TriggerPostIngestWorkflow
>>Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>>Setting property 'StdProductCrawler.clientTransferer'
>>Setting property 'MetExtractorProductCrawler.clientTransferer'
>>Setting property 'AutoDetectProductCrawler.clientTransferer'
>>Setting property 'StdProductCrawler.filemgrUrl'
>>Setting property 'MetExtractorProductCrawler.filemgrUrl'
>>Setting property 'AutoDetectProductCrawler.filemgrUrl'
>>Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'
>>Setting property 'StdProductCrawler.actionIds'
>>Setting property 'MetExtractorProductCrawler.actionIds'
>>Setting property 'AutoDetectProductCrawler.actionIds'
>>Setting property 'StdProductCrawler.productPath'
>>Setting property 'MetExtractorProductCrawler.productPath'
>>Setting property 'AutoDetectProductCrawler.productPath'
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.productPath' set to value
>>[/Users/zichuanwang/Downloads/output]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value
>>[http://localhost:9200]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
>>[../policy/mime-extractor-map.xml]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.clientTransferer' set to value
>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.filemgrUrl' set to value
>>[http://localhost:9000]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.clientTransferer' set to value
>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.actionIds' set to value
>>[TriggerPostIngestWorkflow]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.actionIds' set to value
>>[TriggerPostIngestWorkflow]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.filemgrUrl' set to value
>>[http://localhost:9000]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.actionIds' set to value
>>[TriggerPostIngestWorkflow]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.productPath' set to value
>>[/Users/zichuanwang/Downloads/output]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>>[http://localhost:9000]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.clientTransferer' set to value
>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.productPath' set to value
>>[/Users/zichuanwang/Downloads/output]
>>Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
>>Ϣ: Crawling /Users/zichuanwang/Downloads/output
>>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>at java.io.UnixFileSystem.list(Native Method)
>>at java.io.File.list(File.java:973)
>>at java.io.File.listFiles(File.java:1129)
>>at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
>>at 
>>org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(Craw
>>lerLauncherCliAction.java:58)
>>at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
>>at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>>
>>
>>—
>>Zichuan Wang
>>Department of Computer Science, USC
>>
>>
>>On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann
>><ma...@usc.edu> wrote:
>>
>>
>>Thanks Luke, I’ve given you permissions so you should now see an
>>“edit” button on that wiki page.
>>
>>Cheers, 
>>Chris 
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California
>>Los Angeles, CA 90089 USA
>>Email: mattmann@usc.edu
>>WWW: http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>-----Original Message-----
>>From: Luke liu <sh...@usc.edu>
>>Date: Wednesday, November 5, 2014 at 6:48 PM
>>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
>><de...@oodt.apache.org>
>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
>><zi...@usc.edu>
>>Subject: RE: re: Question about OODT file manager
>>
>>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>>following account detail:
>>> Account name: luke
>>> Full Name: Shuai Liu (Luke)
>>> Email: hanson311biz@gmail.com
>>> Password: *******
>>> 
>>>But I am not sure where I can add my notes to the following web article
>>>with 
>>>which I had trouble , I also tried to create a new article, but failed
>>>to 
>>>do 
>>>it as I cannot find a place where I can edit, does this have something
>>>do 
>>>with my account that is not visible for the "edit" or "comments" action?
>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>>> 
>>> 
>>> 
>>>Thanks 
>>>Luke 
>>>-----Original Message-----
>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>Sent: Sunday, November 2, 2014 6:59 AM
>>>To: Luke liu; dev@oodt.apache.org
>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>'Zichuan 
>>>Wang' 
>>>Subject: Re: re: Question about OODT file manager
>>> 
>>>Yes Luke, making the instructions better would be much appreciated!
>>> 
>>>If you have an account on the wiki please share it, else sign up for an
>>>Apache OODT wiki account and please share it with me or anyone else on
>>>dev@oodt and we’ll add you.
>>> 
>>>Cheers, 
>>>Chris 
>>> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Chief Architect 
>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>Office: 168-519, Mailstop: 168-527
>>>Email: chris.a.mattmann@nasa.gov
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Adjunct Associate Professor, Computer Science Department University of
>>>Southern California, Los Angeles, CA 90089 USA
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>-----Original Message-----
>>>From: Luke liu <sh...@usc.edu>
>>>Date: Sunday, November 2, 2014 at 1:32 AM
>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>"dev@oodt.apache.org"
>>><de...@oodt.apache.org>
>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>Wang' 
>>><zi...@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>> 
>>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>>and there were some other issues such as missing jars; and sorry for
>>>>not confirming this right away, my laptop was actually crashing, and i
>>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>>were still successfully ingested and placed in the output directory)
>>>> 
>>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>>us to help correct the document(i.e.
>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>>le 
>>>>)? 
>>>>If possible, I would like to please share my notes with some problem
>>>>steps mentioned there.
>>>> 
>>>>Anyway, thanks for your help and appreciated.
>>>> 
>>>>Thanks 
>>>>Luke 
>>>>-----Original Message-----
>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>>To: Luke; dev@oodt.apache.org
>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>'Zichuan Wang' 
>>>>Subject: Re: re: Question about OODT file manager
>>>> 
>>>>Dear Luke, just confirming, we solved this in class right? It had to do
>>>>with the batch stub not being turned on.
>>>> 
>>>>Cheers, 
>>>>Chris 
>>>> 
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Chris Mattmann, Ph.D.
>>>>Chief Architect 
>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>Office: 168-519, Mailstop: 168-527
>>>>Email: chris.a.mattmann@nasa.gov
>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>Southern California, Los Angeles, CA 90089 USA
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>-----Original Message-----
>>>>From: Luke <sh...@usc.edu>
>>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>>"dev@oodt.apache.org"
>>>><de...@oodt.apache.org>
>>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>>Wang' 
>>>><zi...@usc.edu>
>>>>Subject: RE: re: Question about OODT file manager
>>>> 
>>>>>Dear Professor Mattamnn,
>>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>>sorry for getting back to you with my appreciation, I have been
>>>>>conducting tests with OODT based on your advice, but unfortunately I
>>>>>am having another problem....
>>>>> 
>>>>>I am following the steps
>>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exa
>>>>>mpl 
>>>>>e 
>>>>>) to get a sense of how to get workflow to work.
>>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>>wmgr-client 
>>>>>command-line) does not seems to be invoked or executed, but I am
>>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>>status either "RSUBMIT" or "QUEUED", but they are not getting
>>>>>executed, 
>>>PFA: 
>>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>>and 
>>>6 RSUMBITE tasks.
>>>>> 
>>>>>Question 1): not sure why the workflow is not being executed, and
>>>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>>>seeing the following entry in the log, not sure if this has anything
>>>>>to do with the "hanging" problem where workflow is not getting
>>>>>executed and hanging at state of "RSUBMIT".
>>>>> Oct 28, 2014 3:35:07 AM
>>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>>safeCheckJobComplete
>>>>> WARNING: Exception checking completion status for job:
>>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>>java.lang.NullPointerException
>>>>> 
>>>>>Question 2): I think currently on my side any new incoming workflow
>>>>>task I am sending with the following command is being directed to the
>>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>>this to a larger number though),
>>>>> ./wmgr-client --url http://localhost:9200
>>>>--operation --sendEvent
>>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>>> If possible, I would like to please know if there is a way we can
>>>>purge 
>>>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>>>"QUEUED" I have already sent, please kindly help.
>>>>> 
>>>>>Very sorry for troubling you with this, to be honest I find OODT a bit
>>>>>challenging to grasp within a short time frame, probably because there
>>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>>just trial and error blended with guess, but I don’t want to make a
>>>>>blind guess, it will be appreciated if you can please also shed some
>>>>>lights on where I can get more information logging or other way where
>>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>>>specific logging info specific to it...
>>>>> 
>>>>>Again your advice and kind help will be appreciated usual.
>>>>> 
>>>>> 
>>>>>Thanks 
>>>>>Luke 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Mattmann, Chris A (3980)
>>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>> Sent: 2014年10月26日 22:18
>>>>>> To: Luke; 'Zichuan Wang'
>>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>> dev@oodt.apache.org
>>>>>> Subject: Re: re: Question about OODT file manager
>>>>>> 
>>>>>> Hi Luke, 
>>>>>> 
>>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>>details 
>>>>>>below: 
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Luke <sh...@usc.edu>
>>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>>>> <zi...@usc.edu>
>>>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>>>> Subject: RE: re: Question about OODT file manager
>>>>>> 
>>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>>> > 
>>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>>> >in the oodt to send json files to solr.
>>>>>> >One of the difficulties is still getting OODT workflow to call the
>>>>>> >poster.py in etllib.
>>>>>> 
>>>>>> Sorry that you’re having difficulty let me try and help.
>>>>>> 
>>>>>> > 
>>>>>> >I am not sure if my understanding is correct with OODT requirement,
>>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>>> > 
>>>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>>>> >confirm and clarify:
>>>>>> > 
>>>>>> >1) 
>>>>>> >Get the File-Manager up and running.
>>>>>> 
>>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>>> 
>>>>>> >2) 
>>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>>server. 
>>>>>> >(I believe we can achieve it with a bash script or probably python
>>>>>> >that calls the command line sequentially with each json file name
>>>>>> >as 
>>>>>>an 
>>>>>> >argument?!) 
>>>>>> 
>>>>>> Suggestion: 
>>>>>> 
>>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>>files (in place data transfer).
>>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>>>that will wrap ETLlib and the poster command.
>>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>>through all of your indexed JSON files, and then for each one,
>>>>>>submits a workflow event (you may need to look into aggregating
>>>>>>them) that calls your CAS-PGE wrapped poster task from ETLlib.
>>>>>> 
>>>>>> >3) 
>>>>>> >Once we have json files sent and stored in the File-Manager, we
>>>>>> >need 
>>>>>>to 
>>>>>> >get workflow-manager up and running, and we can create a workflow
>>>>>>that 
>>>>>> >send those jsons file from the file manager to solr.
>>>>>> 
>>>>>> See above. 
>>>>>> 
>>>>>> >4) 
>>>>>> >Create a workflow according to
>>>>>> >Workflow2 User Guide
>>>>>> 
>>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gui
>>>>>>>de> 
>>>>>> >>>>>>>>>>> here comes the problem…..
>>>>>> > I am not sure how to create a workflow task which can call
>>>>>>the 
>>>>>> >poster.py in python etllib, it looks like we need to create our own
>>>>>> >java class that extend <TaskInstance> which is an abstract Java
>>>>>> >class with one abstract method that has the following signature:
>>>>>> > 
>>>>>> > 
>>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>>> >crtlMetadata);
>>>>>> > However, the detail of where to find the corresponding
>>>>>> >libs and where to put our implementation in workflow manager is
>>>>>> >being neglected in that page. I am not sure if we should use
>>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>>> >thru which it can call the python code i.e. poster.py. and it looks
>>>>>> >like we need to embody the TaskInstance::performExecution by
>>>>>> >injecting the code that calls the poster.py and return the
>>>resultState. 
>>>>>> > 
>>>>>> > 
>>>>>> >It would be greatly appreciated if you could please shed some
>>>>>> >lights and advice how we can get a task instance to call the
>>>>>> >poster.py. BTW,
>>>>>>I 
>>>>>> >am also not sure if my understanding is correct, please kindly
>>>>>>correct 
>>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >Thanks 
>>>>>> >Luke 
>>>>>> 
>>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>>> 
>>>>>> Cheers! 
>>>>>> 
>>>>>> Chris 
>>>>>> 
>>>>>> > 
>>>>>> >From: Mattmann, Chris A (3980)
>>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>> > 
>>>>>> >Sent: 2014年10月25日
>>>>>> > 13:34 
>>>>>> >To: Zichuan Wang
>>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>>> >xiaoyanj@usc.edu
>>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >Please cc 
>>>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>>>detail 
>>>>>> >soon 
>>>>>> > 
>>>>>> >Sent from my iPhone
>>>>>> 
>>>>>> 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++ 
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Chief Architect
>>>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>> WWW: http://sunset.usc.edu/~mattmann/
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++ 
>>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++ 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> > 
>>>>>> > 
>>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu>
>>>>>>wrote: 
>>>>>> > 
>>>>>> > 
>>>>>> >Dear Professor,
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >Could please also explain how I can crawl all JSON file name under
>>>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>>> >xam 
>>>>>> p 
>>>>>> >le, but it doesn’t mention anything about crawling, instead it
>>>>>> >manually set the Input files paths...
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >-- 
>>>>>> > 
>>>>>> >Zichuan Wang
>>>>>> > 
>>>>>> >University of Southern California, Department of Computer Science
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>>> >写道: 
>>>>>> > 
>>>>>> >Dear Professor,
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>>>> >File Manager, but from my understanding, we are using ETLLib poster
>>>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>>>> >in this assignment?
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >-- 
>>>>>> > 
>>>>>> >Zichuan Wang
>>>>>> > 
>>>>>> >University of Southern California, Department of Computer Science
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>
>>
>>
>>
>>
>>

Re: re: Question about OODT file manager

Posted by Chris Mattmann <ch...@gmail.com>.
woot thanks

------------------------
Chris Mattmann
chris.mattmann@gmail.com




-----Original Message-----
From: Zichuan Wang <zi...@usc.edu>
Reply-To: <de...@oodt.apache.org>
Date: Wednesday, November 5, 2014 at 11:22 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>
Cc: Chris Mattmann <ma...@usc.edu>, <de...@oodt.apache.org>, Luke liu
<sh...@usc.edu>, <xi...@usc.edu>, <zh...@usc.edu>
Subject: Re: re: Question about OODT file manager

>Googled around and find this little trick:
>
>export JAVA_OPTS=-Xmx2048m
>
>
>It works now, thanks professor!
>
>
>—
>Zichuan Wang
>Department of Computer Science, USC
>
>On Wed, Nov 5, 2014 at 10:40 PM, Mattmann, Chris A (3980)
><ch...@jpl.nasa.gov> wrote:
>
>> Got it. Can you increase the heap space on your batch stub? That
>> should take care of it.
>> Cheers,
>> Chris
>> P.S. Great work!
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398)
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Associate Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> -----Original Message-----
>> From: Zichuan Wang <zi...@usc.edu>
>> Date: Wednesday, November 5, 2014 at 11:12 PM
>> To: Chris Mattmann <ma...@usc.edu>
>> Cc: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>"dev@oodt.apache.org"
>> <de...@oodt.apache.org>, Luke liu <sh...@usc.edu>, "xiaoyanj@usc.edu"
>> <xi...@usc.edu>, "zhoujian@usc.edu" <zh...@usc.edu>
>> Subject: Re: re: Question about OODT file manager
>>>Dear Professor,
>>>
>>>
>>>I finally figured out how to trigger a post ingest event. However when I
>>>try to crawl the whole dataset, I got an OutOfMemory Error. Could you
>>>please take a look and maybe give some suggestions?
>>>
>>>
>>>➜  bin  ./crawler_launcher \
>>>--operation --launchAutoCrawler \
>>>--filemgrUrl http://localhost:9000 \
>>>--clientTransferer
>>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>>--productPath /Users/zichuanwang/Downloads/output \
>>>--mimeExtractorRepo ../policy/mime-extractor-map.xml \
>>>--workflowMgrUrl http://localhost:9200 \
>>>-ais TriggerPostIngestWorkflow
>>>Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>>>Setting property 'StdProductCrawler.clientTransferer'
>>>Setting property 'MetExtractorProductCrawler.clientTransferer'
>>>Setting property 'AutoDetectProductCrawler.clientTransferer'
>>>Setting property 'StdProductCrawler.filemgrUrl'
>>>Setting property 'MetExtractorProductCrawler.filemgrUrl'
>>>Setting property 'AutoDetectProductCrawler.filemgrUrl'
>>>Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'
>>>Setting property 'StdProductCrawler.actionIds'
>>>Setting property 'MetExtractorProductCrawler.actionIds'
>>>Setting property 'AutoDetectProductCrawler.actionIds'
>>>Setting property 'StdProductCrawler.productPath'
>>>Setting property 'MetExtractorProductCrawler.productPath'
>>>Setting property 'AutoDetectProductCrawler.productPath'
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value
>>>[http://localhost:9200]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
>>>[../policy/mime-extractor-map.xml]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.actionIds' set to value
>>>[TriggerPostIngestWorkflow]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'AutoDetectProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>>>[http://localhost:9000]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'StdProductCrawler.clientTransferer' set to value
>>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>>Nov 5, 2014 10:07:47 PM
>>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>>processKey
>>>: Property 'MetExtractorProductCrawler.productPath' set to value
>>>[/Users/zichuanwang/Downloads/output]
>>>Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
>>>Ϣ: Crawling /Users/zichuanwang/Downloads/output
>>>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>>at java.io.UnixFileSystem.list(Native Method)
>>>at java.io.File.list(File.java:973)
>>>at java.io.File.listFiles(File.java:1129)
>>>at 
>>>org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>>at 
>>>org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
>>>at 
>>>org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(Cr
>>>aw
>>>lerLauncherCliAction.java:58)
>>>at 
>>>org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>>at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
>>>at 
>>>org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>>>
>>>
>>>—
>>>Zichuan Wang
>>>Department of Computer Science, USC
>>>
>>>
>>>On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann
>>><ma...@usc.edu> wrote:
>>>
>>>
>>>Thanks Luke, I’ve given you permissions so you should now see an
>>>“edit” button on that wiki page.
>>>
>>>Cheers, 
>>>Chris 
>>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Adjunct Associate Professor, Computer Science Department
>>>University of Southern California
>>>Los Angeles, CA 90089 USA
>>>Email: mattmann@usc.edu
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Luke liu <sh...@usc.edu>
>>>Date: Wednesday, November 5, 2014 at 6:48 PM
>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>"dev@oodt.apache.org"
>>><de...@oodt.apache.org>
>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>Wang'
>>><zi...@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>>
>>>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>>>following account detail:
>>>> Account name: luke
>>>> Full Name: Shuai Liu (Luke)
>>>> Email: hanson311biz@gmail.com
>>>> Password: *******
>>>> 
>>>>But I am not sure where I can add my notes to the following web article
>>>>with 
>>>>which I had trouble , I also tried to create a new article, but failed
>>>>to 
>>>>do 
>>>>it as I cannot find a place where I can edit, does this have something
>>>>do 
>>>>with my account that is not visible for the "edit" or "comments"
>>>>action?
>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>>le
>>>> 
>>>> 
>>>> 
>>>>Thanks 
>>>>Luke 
>>>>-----Original Message-----
>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>Sent: Sunday, November 2, 2014 6:59 AM
>>>>To: Luke liu; dev@oodt.apache.org
>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>'Zichuan 
>>>>Wang' 
>>>>Subject: Re: re: Question about OODT file manager
>>>> 
>>>>Yes Luke, making the instructions better would be much appreciated!
>>>> 
>>>>If you have an account on the wiki please share it, else sign up for an
>>>>Apache OODT wiki account and please share it with me or anyone else on
>>>>dev@oodt and we’ll add you.
>>>> 
>>>>Cheers, 
>>>>Chris 
>>>> 
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Chris Mattmann, Ph.D.
>>>>Chief Architect
>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>Office: 168-519, Mailstop: 168-527
>>>>Email: chris.a.mattmann@nasa.gov
>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>Southern California, Los Angeles, CA 90089 USA
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>-----Original Message-----
>>>>From: Luke liu <sh...@usc.edu>
>>>>Date: Sunday, November 2, 2014 at 1:32 AM
>>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>>"dev@oodt.apache.org"
>>>><de...@oodt.apache.org>
>>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>>Wang' 
>>>><zi...@usc.edu>
>>>>Subject: RE: re: Question about OODT file manager
>>>> 
>>>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>>>and there were some other issues such as missing jars; and sorry for
>>>>>not confirming this right away, my laptop was actually crashing, and i
>>>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>>>were still successfully ingested and placed in the output directory)
>>>>> 
>>>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>>>us to help correct the document(i.e.
>>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>>>>>p
>>>>>le 
>>>>>)? 
>>>>>If possible, I would like to please share my notes with some problem
>>>>>steps mentioned there.
>>>>> 
>>>>>Anyway, thanks for your help and appreciated.
>>>>> 
>>>>>Thanks 
>>>>>Luke 
>>>>>-----Original Message-----
>>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>>>To: Luke; dev@oodt.apache.org
>>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>'Zichuan Wang'
>>>>>Subject: Re: re: Question about OODT file manager
>>>>> 
>>>>>Dear Luke, just confirming, we solved this in class right? It had to
>>>>>do
>>>>>with the batch stub not being turned on.
>>>>> 
>>>>>Cheers, 
>>>>>Chris 
>>>>> 
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Chris Mattmann, Ph.D.
>>>>>Chief Architect
>>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>Office: 168-519, Mailstop: 168-527
>>>>>Email: chris.a.mattmann@nasa.gov
>>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>>Southern California, Los Angeles, CA 90089 USA
>>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>>-----Original Message-----
>>>>>From: Luke <sh...@usc.edu>
>>>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>>>"dev@oodt.apache.org"
>>>>><de...@oodt.apache.org>
>>>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>>>Wang' 
>>>>><zi...@usc.edu>
>>>>>Subject: RE: re: Question about OODT file manager
>>>>> 
>>>>>>Dear Professor Mattamnn,
>>>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>>>sorry for getting back to you with my appreciation, I have been
>>>>>>conducting tests with OODT based on your advice, but unfortunately I
>>>>>>am having another problem....
>>>>>> 
>>>>>>I am following the steps
>>>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Ex
>>>>>>a
>>>>>>mpl 
>>>>>>e 
>>>>>>) to get a sense of how to get workflow to work.
>>>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>>>wmgr-client 
>>>>>>command-line) does not seems to be invoked or executed, but I am
>>>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>>>status either "RSUBMIT" or "QUEUED", but they are not getting
>>>>>>executed, 
>>>>PFA: 
>>>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>>>and 
>>>>6 RSUMBITE tasks.
>>>>>> 
>>>>>>Question 1): not sure why the workflow is not being executed, and
>>>>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>>>>seeing the following entry in the log, not sure if this has anything
>>>>>>to do with the "hanging" problem where workflow is not getting
>>>>>>executed and hanging at state of "RSUBMIT".
>>>>>> Oct 28, 2014 3:35:07 AM
>>>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>>>safeCheckJobComplete
>>>>>> WARNING: Exception checking completion status for job:
>>>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>>>java.lang.NullPointerException
>>>>>> 
>>>>>>Question 2): I think currently on my side any new incoming workflow
>>>>>>task I am sending with the following command is being directed to the
>>>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>>>this to a larger number though),
>>>>>> ./wmgr-client --url http://localhost:9200
>>>>>--operation --sendEvent
>>>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>>>> If possible, I would like to please know if there is a way we can
>>>>>purge 
>>>>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>>>>"QUEUED" I have already sent, please kindly help.
>>>>>> 
>>>>>>Very sorry for troubling you with this, to be honest I find OODT a
>>>>>>bit
>>>>>>challenging to grasp within a short time frame, probably because
>>>>>>there
>>>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>>>just trial and error blended with guess, but I don’t want to make a
>>>>>>blind guess, it will be appreciated if you can please also shed some
>>>>>>lights on where I can get more information logging or other way where
>>>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>>>>specific logging info specific to it...
>>>>>> 
>>>>>>Again your advice and kind help will be appreciated usual.
>>>>>> 
>>>>>> 
>>>>>>Thanks 
>>>>>>Luke 
>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Mattmann, Chris A (3980)
>>>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>>> Sent: 2014年10月26日 22:18
>>>>>>> To: Luke; 'Zichuan Wang'
>>>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>>> dev@oodt.apache.org
>>>>>>> Subject: Re: re: Question about OODT file manager
>>>>>>> 
>>>>>>> Hi Luke, 
>>>>>>> 
>>>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>>>details 
>>>>>>>below: 
>>>>>>> 
>>>>>>> 
>>>>>>> -----Original Message-----
>>>>>>> From: Luke <sh...@usc.edu>
>>>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>>>>> <zi...@usc.edu>
>>>>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>>>>> Subject: RE: re: Question about OODT file manager
>>>>>>> 
>>>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>>>> > 
>>>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>>>> >in the oodt to send json files to solr.
>>>>>>> >One of the difficulties is still getting OODT workflow to call the
>>>>>>> >poster.py in etllib.
>>>>>>> 
>>>>>>> Sorry that you’re having difficulty let me try and help.
>>>>>>> 
>>>>>>> > 
>>>>>>> >I am not sure if my understanding is correct with OODT
>>>>>>>requirement,
>>>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>>>> > 
>>>>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>>>>> >confirm and clarify:
>>>>>>> > 
>>>>>>> >1) 
>>>>>>> >Get the File-Manager up and running.
>>>>>>> 
>>>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>>>> 
>>>>>>> >2) 
>>>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>>>server. 
>>>>>>> >(I believe we can achieve it with a bash script or probably python
>>>>>>> >that calls the command line sequentially with each json file name
>>>>>>> >as 
>>>>>>>an 
>>>>>>> >argument?!)
>>>>>>> 
>>>>>>> Suggestion:
>>>>>>> 
>>>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>>>files (in place data transfer).
>>>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>>>>that will wrap ETLlib and the poster command.
>>>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>>>through all of your indexed JSON files, and then for each one,
>>>>>>>submits a workflow event (you may need to look into aggregating
>>>>>>>them) that calls your CAS-PGE wrapped poster task from ETLlib.
>>>>>>> 
>>>>>>> >3) 
>>>>>>> >Once we have json files sent and stored in the File-Manager, we
>>>>>>> >need 
>>>>>>>to 
>>>>>>> >get workflow-manager up and running, and we can create a workflow
>>>>>>>that 
>>>>>>> >send those jsons file from the file manager to solr.
>>>>>>> 
>>>>>>> See above. 
>>>>>>> 
>>>>>>> >4) 
>>>>>>> >Create a workflow according to
>>>>>>> >Workflow2 User Guide
>>>>>>> 
>>>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gu
>>>>>>>>i
>>>>>>>>de> 
>>>>>>> >>>>>>>>>>> here comes the problem…..
>>>>>>> > I am not sure how to create a workflow task which can call
>>>>>>>the 
>>>>>>> >poster.py in python etllib, it looks like we need to create our
>>>>>>>own
>>>>>>> >java class that extend <TaskInstance> which is an abstract Java
>>>>>>> >class with one abstract method that has the following signature:
>>>>>>> > 
>>>>>>> > 
>>>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>>>> >crtlMetadata);
>>>>>>> > However, the detail of where to find the corresponding
>>>>>>> >libs and where to put our implementation in workflow manager is
>>>>>>> >being neglected in that page. I am not sure if we should use
>>>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>>>> >thru which it can call the python code i.e. poster.py. and it
>>>>>>>looks
>>>>>>> >like we need to embody the TaskInstance::performExecution by
>>>>>>> >injecting the code that calls the poster.py and return the
>>>>resultState. 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >It would be greatly appreciated if you could please shed some
>>>>>>> >lights and advice how we can get a task instance to call the
>>>>>>> >poster.py. BTW,
>>>>>>>I 
>>>>>>> >am also not sure if my understanding is correct, please kindly
>>>>>>>correct 
>>>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Thanks 
>>>>>>> >Luke 
>>>>>>> 
>>>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>>>> 
>>>>>>> Cheers! 
>>>>>>> 
>>>>>>> Chris 
>>>>>>> 
>>>>>>> > 
>>>>>>> >From: Mattmann, Chris A (3980)
>>>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>>> > 
>>>>>>> >Sent: 2014年10月25日
>>>>>>> > 13:34 
>>>>>>> >To: Zichuan Wang
>>>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>>>> >xiaoyanj@usc.edu
>>>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Please cc 
>>>>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>>>>detail 
>>>>>>> >soon 
>>>>>>> > 
>>>>>>> >Sent from my iPhone
>>>>>>> 
>>>>>>> 
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> Chris Mattmann, Ph.D.
>>>>>>> Chief Architect
>>>>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>>> WWW: http://sunset.usc.edu/~mattmann/
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>>> ++ 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu>
>>>>>>>wrote: 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Dear Professor,
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >Could please also explain how I can crawl all JSON file name under
>>>>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>>>>> 
>>>>>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>>>> >xam 
>>>>>>> p 
>>>>>>> >le, but it doesn’t mention anything about crawling, instead it
>>>>>>> >manually set the Input files paths...
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >-- 
>>>>>>> > 
>>>>>>> >Zichuan Wang
>>>>>>> > 
>>>>>>> >University of Southern California, Department of Computer Science
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>>>> >写道: 
>>>>>>> > 
>>>>>>> >Dear Professor,
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>>>>> >File Manager, but from my understanding, we are using ETLLib
>>>>>>>poster
>>>>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>>>>> >in this assignment?
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> >-- 
>>>>>>> > 
>>>>>>> >Zichuan Wang
>>>>>>> > 
>>>>>>> >University of Southern California, Department of Computer Science
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>>> > 
>>>>>> 
>>>>> 
>>>>> 
>>>> 
>>>> 
>>>
>>>
>>>
>>>
>>>



Re: re: Question about OODT file manager

Posted by Zichuan Wang <zi...@usc.edu>.
Googled around and find this little trick:

export JAVA_OPTS=-Xmx2048m


It works now, thanks professor!


—
Zichuan Wang
Department of Computer Science, USC

On Wed, Nov 5, 2014 at 10:40 PM, Mattmann, Chris A (3980)
<ch...@jpl.nasa.gov> wrote:

> Got it. Can you increase the heap space on your batch stub? That
> should take care of it.
> Cheers,
> Chris
> P.S. Great work!
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> -----Original Message-----
> From: Zichuan Wang <zi...@usc.edu>
> Date: Wednesday, November 5, 2014 at 11:12 PM
> To: Chris Mattmann <ma...@usc.edu>
> Cc: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
> <de...@oodt.apache.org>, Luke liu <sh...@usc.edu>, "xiaoyanj@usc.edu"
> <xi...@usc.edu>, "zhoujian@usc.edu" <zh...@usc.edu>
> Subject: Re: re: Question about OODT file manager
>>Dear Professor,
>>
>>
>>I finally figured out how to trigger a post ingest event. However when I
>>try to crawl the whole dataset, I got an OutOfMemory Error. Could you
>>please take a look and maybe give some suggestions?
>>
>>
>>➜  bin  ./crawler_launcher \
>>--operation --launchAutoCrawler \
>>--filemgrUrl http://localhost:9000 \
>>--clientTransferer
>>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>>--productPath /Users/zichuanwang/Downloads/output \
>>--mimeExtractorRepo ../policy/mime-extractor-map.xml \
>>--workflowMgrUrl http://localhost:9200 \
>>-ais TriggerPostIngestWorkflow
>>Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>>Setting property 'StdProductCrawler.clientTransferer'
>>Setting property 'MetExtractorProductCrawler.clientTransferer'
>>Setting property 'AutoDetectProductCrawler.clientTransferer'
>>Setting property 'StdProductCrawler.filemgrUrl'
>>Setting property 'MetExtractorProductCrawler.filemgrUrl'
>>Setting property 'AutoDetectProductCrawler.filemgrUrl'
>>Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'
>>Setting property 'StdProductCrawler.actionIds'
>>Setting property 'MetExtractorProductCrawler.actionIds'
>>Setting property 'AutoDetectProductCrawler.actionIds'
>>Setting property 'StdProductCrawler.productPath'
>>Setting property 'MetExtractorProductCrawler.productPath'
>>Setting property 'AutoDetectProductCrawler.productPath'
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.productPath' set to value
>>[/Users/zichuanwang/Downloads/output]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value
>>[http://localhost:9200]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
>>[../policy/mime-extractor-map.xml]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.clientTransferer' set to value
>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.filemgrUrl' set to value
>>[http://localhost:9000]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.clientTransferer' set to value
>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.actionIds' set to value
>>[TriggerPostIngestWorkflow]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.actionIds' set to value
>>[TriggerPostIngestWorkflow]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.filemgrUrl' set to value
>>[http://localhost:9000]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.actionIds' set to value
>>[TriggerPostIngestWorkflow]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'AutoDetectProductCrawler.productPath' set to value
>>[/Users/zichuanwang/Downloads/output]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>>[http://localhost:9000]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'StdProductCrawler.clientTransferer' set to value
>>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>>Nov 5, 2014 10:07:47 PM
>>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>>processKey
>>: Property 'MetExtractorProductCrawler.productPath' set to value
>>[/Users/zichuanwang/Downloads/output]
>>Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
>>Ϣ: Crawling /Users/zichuanwang/Downloads/output
>>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>>at java.io.UnixFileSystem.list(Native Method)
>>at java.io.File.list(File.java:973)
>>at java.io.File.listFiles(File.java:1129)
>>at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>>at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
>>at 
>>org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(Craw
>>lerLauncherCliAction.java:58)
>>at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>>at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
>>at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>>
>>
>>—
>>Zichuan Wang
>>Department of Computer Science, USC
>>
>>
>>On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann
>><ma...@usc.edu> wrote:
>>
>>
>>Thanks Luke, I’ve given you permissions so you should now see an
>>“edit” button on that wiki page.
>>
>>Cheers, 
>>Chris 
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Adjunct Associate Professor, Computer Science Department
>>University of Southern California
>>Los Angeles, CA 90089 USA
>>Email: mattmann@usc.edu
>>WWW: http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>-----Original Message-----
>>From: Luke liu <sh...@usc.edu>
>>Date: Wednesday, November 5, 2014 at 6:48 PM
>>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
>><de...@oodt.apache.org>
>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
>><zi...@usc.edu>
>>Subject: RE: re: Question about OODT file manager
>>
>>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>>following account detail:
>>> Account name: luke
>>> Full Name: Shuai Liu (Luke)
>>> Email: hanson311biz@gmail.com
>>> Password: *******
>>> 
>>>But I am not sure where I can add my notes to the following web article
>>>with 
>>>which I had trouble , I also tried to create a new article, but failed
>>>to 
>>>do 
>>>it as I cannot find a place where I can edit, does this have something
>>>do 
>>>with my account that is not visible for the "edit" or "comments" action?
>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>>> 
>>> 
>>> 
>>>Thanks 
>>>Luke 
>>>-----Original Message-----
>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>Sent: Sunday, November 2, 2014 6:59 AM
>>>To: Luke liu; dev@oodt.apache.org
>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>'Zichuan 
>>>Wang' 
>>>Subject: Re: re: Question about OODT file manager
>>> 
>>>Yes Luke, making the instructions better would be much appreciated!
>>> 
>>>If you have an account on the wiki please share it, else sign up for an
>>>Apache OODT wiki account and please share it with me or anyone else on
>>>dev@oodt and we’ll add you.
>>> 
>>>Cheers, 
>>>Chris 
>>> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Chief Architect 
>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>Office: 168-519, Mailstop: 168-527
>>>Email: chris.a.mattmann@nasa.gov
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Adjunct Associate Professor, Computer Science Department University of
>>>Southern California, Los Angeles, CA 90089 USA
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>-----Original Message-----
>>>From: Luke liu <sh...@usc.edu>
>>>Date: Sunday, November 2, 2014 at 1:32 AM
>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>"dev@oodt.apache.org"
>>><de...@oodt.apache.org>
>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>Wang' 
>>><zi...@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>> 
>>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>>and there were some other issues such as missing jars; and sorry for
>>>>not confirming this right away, my laptop was actually crashing, and i
>>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>>were still successfully ingested and placed in the output directory)
>>>> 
>>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>>us to help correct the document(i.e.
>>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>>le 
>>>>)? 
>>>>If possible, I would like to please share my notes with some problem
>>>>steps mentioned there.
>>>> 
>>>>Anyway, thanks for your help and appreciated.
>>>> 
>>>>Thanks 
>>>>Luke 
>>>>-----Original Message-----
>>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>>To: Luke; dev@oodt.apache.org
>>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>'Zichuan Wang' 
>>>>Subject: Re: re: Question about OODT file manager
>>>> 
>>>>Dear Luke, just confirming, we solved this in class right? It had to do
>>>>with the batch stub not being turned on.
>>>> 
>>>>Cheers, 
>>>>Chris 
>>>> 
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Chris Mattmann, Ph.D.
>>>>Chief Architect 
>>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>>Office: 168-519, Mailstop: 168-527
>>>>Email: chris.a.mattmann@nasa.gov
>>>>WWW: http://sunset.usc.edu/~mattmann/
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>Adjunct Associate Professor, Computer Science Department University of
>>>>Southern California, Los Angeles, CA 90089 USA
>>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>>-----Original Message-----
>>>>From: Luke <sh...@usc.edu>
>>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>>"dev@oodt.apache.org"
>>>><de...@oodt.apache.org>
>>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>>Wang' 
>>>><zi...@usc.edu>
>>>>Subject: RE: re: Question about OODT file manager
>>>> 
>>>>>Dear Professor Mattamnn,
>>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>>sorry for getting back to you with my appreciation, I have been
>>>>>conducting tests with OODT based on your advice, but unfortunately I
>>>>>am having another problem....
>>>>> 
>>>>>I am following the steps
>>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exa
>>>>>mpl 
>>>>>e 
>>>>>) to get a sense of how to get workflow to work.
>>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>>wmgr-client 
>>>>>command-line) does not seems to be invoked or executed, but I am
>>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>>status either "RSUBMIT" or "QUEUED", but they are not getting
>>>>>executed, 
>>>PFA: 
>>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>>and 
>>>6 RSUMBITE tasks.
>>>>> 
>>>>>Question 1): not sure why the workflow is not being executed, and
>>>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>>>seeing the following entry in the log, not sure if this has anything
>>>>>to do with the "hanging" problem where workflow is not getting
>>>>>executed and hanging at state of "RSUBMIT".
>>>>> Oct 28, 2014 3:35:07 AM
>>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>>safeCheckJobComplete
>>>>> WARNING: Exception checking completion status for job:
>>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>>java.lang.NullPointerException
>>>>> 
>>>>>Question 2): I think currently on my side any new incoming workflow
>>>>>task I am sending with the following command is being directed to the
>>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>>this to a larger number though),
>>>>> ./wmgr-client --url http://localhost:9200
>>>>--operation --sendEvent
>>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>>> If possible, I would like to please know if there is a way we can
>>>>purge 
>>>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>>>"QUEUED" I have already sent, please kindly help.
>>>>> 
>>>>>Very sorry for troubling you with this, to be honest I find OODT a bit
>>>>>challenging to grasp within a short time frame, probably because there
>>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>>just trial and error blended with guess, but I don’t want to make a
>>>>>blind guess, it will be appreciated if you can please also shed some
>>>>>lights on where I can get more information logging or other way where
>>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>>>specific logging info specific to it...
>>>>> 
>>>>>Again your advice and kind help will be appreciated usual.
>>>>> 
>>>>> 
>>>>>Thanks 
>>>>>Luke 
>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Mattmann, Chris A (3980)
>>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>> Sent: 2014年10月26日 22:18
>>>>>> To: Luke; 'Zichuan Wang'
>>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>>> dev@oodt.apache.org
>>>>>> Subject: Re: re: Question about OODT file manager
>>>>>> 
>>>>>> Hi Luke, 
>>>>>> 
>>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>>details 
>>>>>>below: 
>>>>>> 
>>>>>> 
>>>>>> -----Original Message-----
>>>>>> From: Luke <sh...@usc.edu>
>>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>>>> <zi...@usc.edu>
>>>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>>>> Subject: RE: re: Question about OODT file manager
>>>>>> 
>>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>>> > 
>>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>>> >in the oodt to send json files to solr.
>>>>>> >One of the difficulties is still getting OODT workflow to call the
>>>>>> >poster.py in etllib.
>>>>>> 
>>>>>> Sorry that you’re having difficulty let me try and help.
>>>>>> 
>>>>>> > 
>>>>>> >I am not sure if my understanding is correct with OODT requirement,
>>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>>> > 
>>>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>>>> >confirm and clarify:
>>>>>> > 
>>>>>> >1) 
>>>>>> >Get the File-Manager up and running.
>>>>>> 
>>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>>> 
>>>>>> >2) 
>>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>>server. 
>>>>>> >(I believe we can achieve it with a bash script or probably python
>>>>>> >that calls the command line sequentially with each json file name
>>>>>> >as 
>>>>>>an 
>>>>>> >argument?!) 
>>>>>> 
>>>>>> Suggestion: 
>>>>>> 
>>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>>files (in place data transfer).
>>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>>>that will wrap ETLlib and the poster command.
>>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>>through all of your indexed JSON files, and then for each one,
>>>>>>submits a workflow event (you may need to look into aggregating
>>>>>>them) that calls your CAS-PGE wrapped poster task from ETLlib.
>>>>>> 
>>>>>> >3) 
>>>>>> >Once we have json files sent and stored in the File-Manager, we
>>>>>> >need 
>>>>>>to 
>>>>>> >get workflow-manager up and running, and we can create a workflow
>>>>>>that 
>>>>>> >send those jsons file from the file manager to solr.
>>>>>> 
>>>>>> See above. 
>>>>>> 
>>>>>> >4) 
>>>>>> >Create a workflow according to
>>>>>> >Workflow2 User Guide
>>>>>> 
>>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gui
>>>>>>>de> 
>>>>>> >>>>>>>>>>> here comes the problem…..
>>>>>> > I am not sure how to create a workflow task which can call
>>>>>>the 
>>>>>> >poster.py in python etllib, it looks like we need to create our own
>>>>>> >java class that extend <TaskInstance> which is an abstract Java
>>>>>> >class with one abstract method that has the following signature:
>>>>>> > 
>>>>>> > 
>>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>>> >crtlMetadata);
>>>>>> > However, the detail of where to find the corresponding
>>>>>> >libs and where to put our implementation in workflow manager is
>>>>>> >being neglected in that page. I am not sure if we should use
>>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>>> >thru which it can call the python code i.e. poster.py. and it looks
>>>>>> >like we need to embody the TaskInstance::performExecution by
>>>>>> >injecting the code that calls the poster.py and return the
>>>resultState. 
>>>>>> > 
>>>>>> > 
>>>>>> >It would be greatly appreciated if you could please shed some
>>>>>> >lights and advice how we can get a task instance to call the
>>>>>> >poster.py. BTW,
>>>>>>I 
>>>>>> >am also not sure if my understanding is correct, please kindly
>>>>>>correct 
>>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >Thanks 
>>>>>> >Luke 
>>>>>> 
>>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>>> 
>>>>>> Cheers! 
>>>>>> 
>>>>>> Chris 
>>>>>> 
>>>>>> > 
>>>>>> >From: Mattmann, Chris A (3980)
>>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>>> > 
>>>>>> >Sent: 2014年10月25日
>>>>>> > 13:34 
>>>>>> >To: Zichuan Wang
>>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>>> >xiaoyanj@usc.edu
>>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >Please cc 
>>>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>>>detail 
>>>>>> >soon 
>>>>>> > 
>>>>>> >Sent from my iPhone
>>>>>> 
>>>>>> 
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++ 
>>>>>> Chris Mattmann, Ph.D.
>>>>>> Chief Architect
>>>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>>> Office: 168-519, Mailstop: 168-527
>>>>>> Email: chris.a.mattmann@nasa.gov
>>>>>> WWW: http://sunset.usc.edu/~mattmann/
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++ 
>>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>>> ++ 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> > 
>>>>>> > 
>>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu>
>>>>>>wrote: 
>>>>>> > 
>>>>>> > 
>>>>>> >Dear Professor,
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >Could please also explain how I can crawl all JSON file name under
>>>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>>> >xam 
>>>>>> p 
>>>>>> >le, but it doesn’t mention anything about crawling, instead it
>>>>>> >manually set the Input files paths...
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >-- 
>>>>>> > 
>>>>>> >Zichuan Wang
>>>>>> > 
>>>>>> >University of Southern California, Department of Computer Science
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>>> >写道: 
>>>>>> > 
>>>>>> >Dear Professor,
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>>>> >File Manager, but from my understanding, we are using ETLLib poster
>>>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>>>> >in this assignment?
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> >-- 
>>>>>> > 
>>>>>> >Zichuan Wang
>>>>>> > 
>>>>>> >University of Southern California, Department of Computer Science
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>>> > 
>>>>> 
>>>> 
>>>> 
>>> 
>>> 
>>
>>
>>
>>
>>
>>

Re: re: Question about OODT file manager

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Got it. Can you increase the heap space on your batch stub? That
should take care of it.

Cheers,
Chris

P.S. Great work!

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Zichuan Wang <zi...@usc.edu>
Date: Wednesday, November 5, 2014 at 11:12 PM
To: Chris Mattmann <ma...@usc.edu>
Cc: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>, Luke liu <sh...@usc.edu>, "xiaoyanj@usc.edu"
<xi...@usc.edu>, "zhoujian@usc.edu" <zh...@usc.edu>
Subject: Re: re: Question about OODT file manager

>Dear Professor,
>
>
>I finally figured out how to trigger a post ingest event. However when I
>try to crawl the whole dataset, I got an OutOfMemory Error. Could you
>please take a look and maybe give some suggestions?
>
>
>➜  bin  ./crawler_launcher \
>--operation --launchAutoCrawler \
>--filemgrUrl http://localhost:9000 \
>--clientTransferer
>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \
>--productPath /Users/zichuanwang/Downloads/output \
>--mimeExtractorRepo ../policy/mime-extractor-map.xml \
>--workflowMgrUrl http://localhost:9200 \
>-ais TriggerPostIngestWorkflow
>Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'
>Setting property 'StdProductCrawler.clientTransferer'
>Setting property 'MetExtractorProductCrawler.clientTransferer'
>Setting property 'AutoDetectProductCrawler.clientTransferer'
>Setting property 'StdProductCrawler.filemgrUrl'
>Setting property 'MetExtractorProductCrawler.filemgrUrl'
>Setting property 'AutoDetectProductCrawler.filemgrUrl'
>Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'
>Setting property 'StdProductCrawler.actionIds'
>Setting property 'MetExtractorProductCrawler.actionIds'
>Setting property 'AutoDetectProductCrawler.actionIds'
>Setting property 'StdProductCrawler.productPath'
>Setting property 'MetExtractorProductCrawler.productPath'
>Setting property 'AutoDetectProductCrawler.productPath'
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'StdProductCrawler.productPath' set to value
>[/Users/zichuanwang/Downloads/output]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value
>[http://localhost:9200]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value
>[../policy/mime-extractor-map.xml]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'MetExtractorProductCrawler.clientTransferer' set to value
>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'AutoDetectProductCrawler.filemgrUrl' set to value
>[http://localhost:9000]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'AutoDetectProductCrawler.clientTransferer' set to value
>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'MetExtractorProductCrawler.actionIds' set to value
>[TriggerPostIngestWorkflow]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'StdProductCrawler.actionIds' set to value
>[TriggerPostIngestWorkflow]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'StdProductCrawler.filemgrUrl' set to value
>[http://localhost:9000]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'AutoDetectProductCrawler.actionIds' set to value
>[TriggerPostIngestWorkflow]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'AutoDetectProductCrawler.productPath' set to value
>[/Users/zichuanwang/Downloads/output]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'MetExtractorProductCrawler.filemgrUrl' set to value
>[http://localhost:9000]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'StdProductCrawler.clientTransferer' set to value
>[org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]
>Nov 5, 2014 10:07:47 PM
>org.springframework.beans.factory.config.PropertyOverrideConfigurer
>processKey
>: Property 'MetExtractorProductCrawler.productPath' set to value
>[/Users/zichuanwang/Downloads/output]
>Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
>Ϣ: Crawling /Users/zichuanwang/Downloads/output
>Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
>at java.io.UnixFileSystem.list(Native Method)
>at java.io.File.list(File.java:973)
>at java.io.File.listFiles(File.java:1129)
>at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)
>at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)
>at 
>org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(Craw
>lerLauncherCliAction.java:58)
>at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)
>at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)
>at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)
>
>
>—
>Zichuan Wang
>Department of Computer Science, USC
>
>
>On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann
><ma...@usc.edu> wrote:
>
>
>Thanks Luke, I’ve given you permissions so you should now see an
>“edit” button on that wiki page.
>
>Cheers, 
>Chris 
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Adjunct Associate Professor, Computer Science Department
>University of Southern California
>Los Angeles, CA 90089 USA
>Email: mattmann@usc.edu
>WWW: http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>-----Original Message-----
>From: Luke liu <sh...@usc.edu>
>Date: Wednesday, November 5, 2014 at 6:48 PM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
><zi...@usc.edu>
>Subject: RE: re: Question about OODT file manager
>
>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>following account detail:
>> Account name: luke
>> Full Name: Shuai Liu (Luke)
>> Email: hanson311biz@gmail.com
>> Password: *******
>> 
>>But I am not sure where I can add my notes to the following web article
>>with 
>>which I had trouble , I also tried to create a new article, but failed
>>to 
>>do 
>>it as I cannot find a place where I can edit, does this have something
>>do 
>>with my account that is not visible for the "edit" or "comments" action?
>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>> 
>> 
>> 
>>Thanks 
>>Luke 
>>-----Original Message-----
>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>Sent: Sunday, November 2, 2014 6:59 AM
>>To: Luke liu; dev@oodt.apache.org
>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>'Zichuan 
>>Wang' 
>>Subject: Re: re: Question about OODT file manager
>> 
>>Yes Luke, making the instructions better would be much appreciated!
>> 
>>If you have an account on the wiki please share it, else sign up for an
>>Apache OODT wiki account and please share it with me or anyone else on
>>dev@oodt and we’ll add you.
>> 
>>Cheers, 
>>Chris 
>> 
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect 
>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: chris.a.mattmann@nasa.gov
>>WWW: http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department University of
>>Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 
>> 
>> 
>> 
>> 
>> 
>>-----Original Message-----
>>From: Luke liu <sh...@usc.edu>
>>Date: Sunday, November 2, 2014 at 1:32 AM
>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>"dev@oodt.apache.org"
>><de...@oodt.apache.org>
>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>Wang' 
>><zi...@usc.edu>
>>Subject: RE: re: Question about OODT file manager
>> 
>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>and there were some other issues such as missing jars; and sorry for
>>>not confirming this right away, my laptop was actually crashing, and i
>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>were still successfully ingested and placed in the output directory)
>>> 
>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>us to help correct the document(i.e.
>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>le 
>>>)? 
>>>If possible, I would like to please share my notes with some problem
>>>steps mentioned there.
>>> 
>>>Anyway, thanks for your help and appreciated.
>>> 
>>>Thanks 
>>>Luke 
>>>-----Original Message-----
>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>To: Luke; dev@oodt.apache.org
>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>'Zichuan Wang' 
>>>Subject: Re: re: Question about OODT file manager
>>> 
>>>Dear Luke, just confirming, we solved this in class right? It had to do
>>>with the batch stub not being turned on.
>>> 
>>>Cheers, 
>>>Chris 
>>> 
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Chief Architect 
>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>Office: 168-519, Mailstop: 168-527
>>>Email: chris.a.mattmann@nasa.gov
>>>WWW: http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Adjunct Associate Professor, Computer Science Department University of
>>>Southern California, Los Angeles, CA 90089 USA
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>>-----Original Message-----
>>>From: Luke <sh...@usc.edu>
>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>,
>>>"dev@oodt.apache.org"
>>><de...@oodt.apache.org>
>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan
>>>Wang' 
>>><zi...@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>> 
>>>>Dear Professor Mattamnn,
>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>sorry for getting back to you with my appreciation, I have been
>>>>conducting tests with OODT based on your advice, but unfortunately I
>>>>am having another problem....
>>>> 
>>>>I am following the steps
>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exa
>>>>mpl 
>>>>e 
>>>>) to get a sense of how to get workflow to work.
>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>wmgr-client 
>>>>command-line) does not seems to be invoked or executed, but I am
>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>status either "RSUBMIT" or "QUEUED", but they are not getting
>>>>executed, 
>>PFA: 
>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>and 
>>6 RSUMBITE tasks.
>>>> 
>>>>Question 1): not sure why the workflow is not being executed, and
>>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>>seeing the following entry in the log, not sure if this has anything
>>>>to do with the "hanging" problem where workflow is not getting
>>>>executed and hanging at state of "RSUBMIT".
>>>> Oct 28, 2014 3:35:07 AM
>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>safeCheckJobComplete
>>>> WARNING: Exception checking completion status for job:
>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>java.lang.NullPointerException
>>>> 
>>>>Question 2): I think currently on my side any new incoming workflow
>>>>task I am sending with the following command is being directed to the
>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>this to a larger number though),
>>>> ./wmgr-client --url http://localhost:9200
>>>--operation --sendEvent
>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>> If possible, I would like to please know if there is a way we can
>>>purge 
>>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>>"QUEUED" I have already sent, please kindly help.
>>>> 
>>>>Very sorry for troubling you with this, to be honest I find OODT a bit
>>>>challenging to grasp within a short time frame, probably because there
>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>just trial and error blended with guess, but I don’t want to make a
>>>>blind guess, it will be appreciated if you can please also shed some
>>>>lights on where I can get more information logging or other way where
>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>>specific logging info specific to it...
>>>> 
>>>>Again your advice and kind help will be appreciated usual.
>>>> 
>>>> 
>>>>Thanks 
>>>>Luke 
>>>> 
>>>>> -----Original Message-----
>>>>> From: Mattmann, Chris A (3980)
>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>> Sent: 2014年10月26日 22:18
>>>>> To: Luke; 'Zichuan Wang'
>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>> dev@oodt.apache.org
>>>>> Subject: Re: re: Question about OODT file manager
>>>>> 
>>>>> Hi Luke, 
>>>>> 
>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>details 
>>>>>below: 
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Luke <sh...@usc.edu>
>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>>> <zi...@usc.edu>
>>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>>> Subject: RE: re: Question about OODT file manager
>>>>> 
>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>> > 
>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>> >in the oodt to send json files to solr.
>>>>> >One of the difficulties is still getting OODT workflow to call the
>>>>> >poster.py in etllib.
>>>>> 
>>>>> Sorry that you’re having difficulty let me try and help.
>>>>> 
>>>>> > 
>>>>> >I am not sure if my understanding is correct with OODT requirement,
>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>> > 
>>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>>> >confirm and clarify:
>>>>> > 
>>>>> >1) 
>>>>> >Get the File-Manager up and running.
>>>>> 
>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>> 
>>>>> >2) 
>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>server. 
>>>>> >(I believe we can achieve it with a bash script or probably python
>>>>> >that calls the command line sequentially with each json file name
>>>>> >as 
>>>>>an 
>>>>> >argument?!) 
>>>>> 
>>>>> Suggestion: 
>>>>> 
>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>files (in place data transfer).
>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>>that will wrap ETLlib and the poster command.
>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>through all of your indexed JSON files, and then for each one,
>>>>>submits a workflow event (you may need to look into aggregating
>>>>>them) that calls your CAS-PGE wrapped poster task from ETLlib.
>>>>> 
>>>>> >3) 
>>>>> >Once we have json files sent and stored in the File-Manager, we
>>>>> >need 
>>>>>to 
>>>>> >get workflow-manager up and running, and we can create a workflow
>>>>>that 
>>>>> >send those jsons file from the file manager to solr.
>>>>> 
>>>>> See above. 
>>>>> 
>>>>> >4) 
>>>>> >Create a workflow according to
>>>>> >Workflow2 User Guide
>>>>> 
>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gui
>>>>>>de> 
>>>>> >>>>>>>>>>> here comes the problem…..
>>>>> > I am not sure how to create a workflow task which can call
>>>>>the 
>>>>> >poster.py in python etllib, it looks like we need to create our own
>>>>> >java class that extend <TaskInstance> which is an abstract Java
>>>>> >class with one abstract method that has the following signature:
>>>>> > 
>>>>> > 
>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>> >crtlMetadata);
>>>>> > However, the detail of where to find the corresponding
>>>>> >libs and where to put our implementation in workflow manager is
>>>>> >being neglected in that page. I am not sure if we should use
>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>> >thru which it can call the python code i.e. poster.py. and it looks
>>>>> >like we need to embody the TaskInstance::performExecution by
>>>>> >injecting the code that calls the poster.py and return the
>>resultState. 
>>>>> > 
>>>>> > 
>>>>> >It would be greatly appreciated if you could please shed some
>>>>> >lights and advice how we can get a task instance to call the
>>>>> >poster.py. BTW,
>>>>>I 
>>>>> >am also not sure if my understanding is correct, please kindly
>>>>>correct 
>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >Thanks 
>>>>> >Luke 
>>>>> 
>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>> 
>>>>> Cheers! 
>>>>> 
>>>>> Chris 
>>>>> 
>>>>> > 
>>>>> >From: Mattmann, Chris A (3980)
>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>> > 
>>>>> >Sent: 2014年10月25日
>>>>> > 13:34 
>>>>> >To: Zichuan Wang
>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>> >xiaoyanj@usc.edu
>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >Please cc 
>>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>>detail 
>>>>> >soon 
>>>>> > 
>>>>> >Sent from my iPhone
>>>>> 
>>>>> 
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ++ 
>>>>> Chris Mattmann, Ph.D.
>>>>> Chief Architect
>>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 168-519, Mailstop: 168-527
>>>>> Email: chris.a.mattmann@nasa.gov
>>>>> WWW: http://sunset.usc.edu/~mattmann/
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ++ 
>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ++ 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> > 
>>>>> > 
>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu>
>>>>>wrote: 
>>>>> > 
>>>>> > 
>>>>> >Dear Professor,
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >Could please also explain how I can crawl all JSON file name under
>>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>> >xam 
>>>>> p 
>>>>> >le, but it doesn’t mention anything about crawling, instead it
>>>>> >manually set the Input files paths...
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >-- 
>>>>> > 
>>>>> >Zichuan Wang
>>>>> > 
>>>>> >University of Southern California, Department of Computer Science
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>> >写道: 
>>>>> > 
>>>>> >Dear Professor,
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>>> >File Manager, but from my understanding, we are using ETLLib poster
>>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>>> >in this assignment?
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> >-- 
>>>>> > 
>>>>> >Zichuan Wang
>>>>> > 
>>>>> >University of Southern California, Department of Computer Science
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>>> > 
>>>> 
>>> 
>>> 
>> 
>> 
>
>
>
>
>
>


Re: re: Question about OODT file manager

Posted by Zichuan Wang <zi...@usc.edu>.
Dear Professor,




I finally figured out how to trigger a post ingest event. However when I try to crawl the whole dataset, I got an OutOfMemory Error. Could you please take a look and maybe give some suggestions?




➜  bin  ./crawler_launcher \

--operation --launchAutoCrawler \

--filemgrUrl http://localhost:9000 \

--clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory \

--productPath /Users/zichuanwang/Downloads/output \

--mimeExtractorRepo ../policy/mime-extractor-map.xml \

--workflowMgrUrl http://localhost:9200 \

-ais TriggerPostIngestWorkflow

Setting property 'AutoDetectProductCrawler.mimeExtractorRepo'

Setting property 'StdProductCrawler.clientTransferer'

Setting property 'MetExtractorProductCrawler.clientTransferer'

Setting property 'AutoDetectProductCrawler.clientTransferer'

Setting property 'StdProductCrawler.filemgrUrl'

Setting property 'MetExtractorProductCrawler.filemgrUrl'

Setting property 'AutoDetectProductCrawler.filemgrUrl'

Setting property 'TriggerPostIngestWorkflow.workflowMgrUrl'

Setting property 'StdProductCrawler.actionIds'

Setting property 'MetExtractorProductCrawler.actionIds'

Setting property 'AutoDetectProductCrawler.actionIds'

Setting property 'StdProductCrawler.productPath'

Setting property 'MetExtractorProductCrawler.productPath'

Setting property 'AutoDetectProductCrawler.productPath'

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'StdProductCrawler.productPath' set to value [/Users/zichuanwang/Downloads/output]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'TriggerPostIngestWorkflow.workflowMgrUrl' set to value [http://localhost:9200]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'AutoDetectProductCrawler.mimeExtractorRepo' set to value [../policy/mime-extractor-map.xml]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'MetExtractorProductCrawler.clientTransferer' set to value [org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'AutoDetectProductCrawler.filemgrUrl' set to value [http://localhost:9000]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'AutoDetectProductCrawler.clientTransferer' set to value [org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'MetExtractorProductCrawler.actionIds' set to value [TriggerPostIngestWorkflow]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'StdProductCrawler.actionIds' set to value [TriggerPostIngestWorkflow]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'StdProductCrawler.filemgrUrl' set to value [http://localhost:9000]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'AutoDetectProductCrawler.actionIds' set to value [TriggerPostIngestWorkflow]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'AutoDetectProductCrawler.productPath' set to value [/Users/zichuanwang/Downloads/output]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'MetExtractorProductCrawler.filemgrUrl' set to value [http://localhost:9000]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'StdProductCrawler.clientTransferer' set to value [org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory]

Nov 5, 2014 10:07:47 PM org.springframework.beans.factory.config.PropertyOverrideConfigurer processKey

: Property 'MetExtractorProductCrawler.productPath' set to value [/Users/zichuanwang/Downloads/output]

Nov 5, 2014 10:07:47 PM org.apache.oodt.cas.crawl.ProductCrawler crawl

Ϣ: Crawling /Users/zichuanwang/Downloads/output

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

at java.io.UnixFileSystem.list(Native Method)

at java.io.File.list(File.java:973)

at java.io.File.listFiles(File.java:1129)

at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:104)

at org.apache.oodt.cas.crawl.ProductCrawler.crawl(ProductCrawler.java:75)

at org.apache.oodt.cas.crawl.cli.action.CrawlerLauncherCliAction.execute(CrawlerLauncherCliAction.java:58)

at org.apache.oodt.cas.cli.CmdLineUtility.execute(CmdLineUtility.java:331)

at org.apache.oodt.cas.cli.CmdLineUtility.run(CmdLineUtility.java:187)

at org.apache.oodt.cas.crawl.CrawlerLauncher.main(CrawlerLauncher.java:36)




—
Zichuan Wang
Department of Computer Science, USC

On Wed, Nov 5, 2014 at 6:42 PM, Christian Alan Mattmann <ma...@usc.edu>
wrote:

> Thanks Luke, I’ve given you permissions so you should now see an
> “edit” button on that wiki page.
> Cheers,
> Chris
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Adjunct Associate Professor, Computer Science Department
> University of Southern California
> Los Angeles, CA 90089 USA
> Email: mattmann@usc.edu
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> -----Original Message-----
> From: Luke liu <sh...@usc.edu>
> Date: Wednesday, November 5, 2014 at 6:48 PM
> To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
> <de...@oodt.apache.org>
> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
> <zi...@usc.edu>
> Subject: RE: re: Question about OODT file manager
>>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>>following account detail:
>>	Account name: luke
>>	Full Name: Shuai Liu (Luke)
>>	Email: hanson311biz@gmail.com
>>	Password: *******
>>
>>But I am not sure where I can add my notes to the following web article
>>with
>>which I had trouble , I also tried to create a new article, but failed to
>>do
>>it as I cannot find a place where I can edit, does this have something do
>>with my account that is not visible for the "edit" or "comments" action?
>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>>
>>
>>Thanks
>>Luke
>>-----Original Message-----
>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>Sent: Sunday, November 2, 2014 6:59 AM
>>To: Luke liu; dev@oodt.apache.org
>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>'Zichuan
>>Wang'
>>Subject: Re: re: Question about OODT file manager
>>
>>Yes Luke, making the instructions better would be much appreciated!
>>
>>If you have an account on the wiki please share it, else sign up for an
>>Apache OODT wiki account and please share it with me or anyone else on
>>dev@oodt and we’ll add you.
>>
>>Cheers,
>>Chris
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: chris.a.mattmann@nasa.gov
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department University of
>>Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: Luke liu <sh...@usc.edu>
>>Date: Sunday, November 2, 2014 at 1:32 AM
>>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
>><de...@oodt.apache.org>
>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
>><zi...@usc.edu>
>>Subject: RE: re: Question about OODT file manager
>>
>>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>>and there were some other issues such as missing jars; and sorry for
>>>not confirming this right away, my laptop was actually crashing, and i
>>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>>work, (even though I saw the workflow failed to pass the pre-condition
>>>in the log, the combined file and some metadata files (i.e.3 files)
>>>were still successfully ingested and placed in the output directory)
>>>
>>>BTW, i think there are a lot of mistakes in the documents, do you want
>>>us to help correct the document(i.e.
>>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>>le
>>>)?
>>>If possible, I would like to please share my notes with some problem
>>>steps mentioned there.
>>>
>>>Anyway, thanks for your help and appreciated.
>>>
>>>Thanks
>>>Luke
>>>-----Original Message-----
>>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>Sent: Saturday, November 1, 2014 10:48 AM
>>>To: Luke; dev@oodt.apache.org
>>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>'Zichuan Wang'
>>>Subject: Re: re: Question about OODT file manager
>>>
>>>Dear Luke, just confirming, we solved this in class right? It had to do
>>>with the batch stub not being turned on.
>>>
>>>Cheers,
>>>Chris
>>>
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Chris Mattmann, Ph.D.
>>>Chief Architect
>>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>>Propulsion Laboratory Pasadena, CA 91109 USA
>>>Office: 168-519, Mailstop: 168-527
>>>Email: chris.a.mattmann@nasa.gov
>>>WWW:  http://sunset.usc.edu/~mattmann/
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>Adjunct Associate Professor, Computer Science Department University of
>>>Southern California, Los Angeles, CA 90089 USA
>>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>>>
>>>
>>>
>>>
>>>
>>>-----Original Message-----
>>>From: Luke <sh...@usc.edu>
>>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
>>><de...@oodt.apache.org>
>>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
>>><zi...@usc.edu>
>>>Subject: RE: re: Question about OODT file manager
>>>
>>>>Dear Professor Mattamnn,
>>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>>sorry for getting back to you with my appreciation, I have been
>>>>conducting tests with OODT based on your advice, but unfortunately I
>>>>am having another problem....
>>>>
>>>>I am following the steps
>>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exa
>>>>mpl
>>>>e
>>>>) to get a sense of how to get workflow to work.
>>>>The problem is that the File-Concatenator-PGE (by running the
>>>>wmgr-client
>>>>command-line) does not seems to be invoked or executed, but I am
>>>>seeing the tasks are getting stacked up in the workflow manager with
>>>>status either "RSUBMIT" or "QUEUED", but they are not getting executed,
>>PFA:
>>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>>and
>>6 RSUMBITE tasks.
>>>>
>>>>Question 1): not sure why the workflow is not being executed, and
>>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>>seeing the following entry in the log, not sure if this has anything
>>>>to do with the "hanging" problem where workflow is not getting
>>>>executed and hanging at state of "RSUBMIT".
>>>>	Oct 28, 2014 3:35:07 AM
>>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>>safeCheckJobComplete
>>>>	WARNING: Exception checking completion status for job:
>>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>>java.lang.NullPointerException
>>>>
>>>>Question 2): I think currently on my side any new incoming workflow
>>>>task I am sending with the following command is being directed to the
>>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>>this to a larger number though),
>>>>			./wmgr-client --url http://localhost:9200
>>>--operation --sendEvent
>>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>>	If possible, I would like to please know if there is a way we can
>>>purge
>>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>>"QUEUED" I have already sent, please kindly help.
>>>>
>>>>Very sorry for troubling you with this, to be honest I find OODT a bit
>>>>challenging to grasp within a short time frame, probably because there
>>>>is no book like OODT in action like Solr.... and what I am doing is
>>>>just trial and error blended with guess, but I don’t want to make a
>>>>blind guess, it will be appreciated if you can please also shed some
>>>>lights on where I can get more information logging or other way where
>>>>I can troubleshoot. I think it might be worth tracking what is
>>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>>specific logging info specific to it...
>>>>
>>>>Again your advice and kind help will be appreciated usual.
>>>>
>>>>
>>>>Thanks
>>>>Luke
>>>>
>>>>> -----Original Message-----
>>>>> From: Mattmann, Chris A (3980)
>>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>> Sent: 2014年10月26日 22:18
>>>>> To: Luke; 'Zichuan Wang'
>>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>>> dev@oodt.apache.org
>>>>> Subject: Re: re: Question about OODT file manager
>>>>> 
>>>>> Hi Luke,
>>>>> 
>>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>>details
>>>>>below:
>>>>> 
>>>>> 
>>>>> -----Original Message-----
>>>>> From: Luke <sh...@usc.edu>
>>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>>> <zi...@usc.edu>
>>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>>> Subject: RE: re: Question about OODT file manager
>>>>> 
>>>>> >Hi Professor Mattmann and OODT DEV,
>>>>> >
>>>>> >Sorry to trouble you with this email, our team has been struggling
>>>>> >in the oodt to send json files to solr.
>>>>> >One of the difficulties is still getting OODT workflow to call the
>>>>> >poster.py in etllib.
>>>>> 
>>>>> Sorry that you’re having difficulty let me try and help.
>>>>> 
>>>>> >
>>>>> >I am not sure if my understanding is correct with OODT requirement,
>>>>> >I hope you can please kindly advice and help with our confusion.
>>>>> >
>>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>>> >confirm and clarify:
>>>>> >
>>>>> >1)
>>>>> >Get the File-Manager up and running.
>>>>> 
>>>>> Yep, hopefully as installed via OODT RADIX.
>>>>> 
>>>>> >2)
>>>>> >send all json files with command wmgr-client to the fileManager
>>>>>server.
>>>>> >(I believe we can achieve it with a bash script or probably  python
>>>>> >that calls the command line sequentially with each json file name
>>>>> >as
>>>>>an
>>>>> >argument?!)
>>>>> 
>>>>> Suggestion:
>>>>> 
>>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>>files (in  place data transfer).
>>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>>that will wrap  ETLlib and the poster command.
>>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>>through all of  your indexed JSON files, and then for each one,
>>>>>submits a workflow event (you  may need to look into aggregating
>>>>>them) that calls your CAS-PGE wrapped  poster task from ETLlib.
>>>>> 
>>>>> >3)
>>>>> >Once we have json files sent and stored in the File-Manager, we
>>>>> >need
>>>>>to
>>>>> >get workflow-manager up and running, and we can create a workflow
>>>>>that
>>>>> >send those jsons file from the file manager to solr.
>>>>> 
>>>>> See above.
>>>>> 
>>>>> >4)
>>>>> >Create a workflow according to
>>>>> >Workflow2 User Guide
>>>>> 
>>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gui
>>>>>>de>
>>>>> >>>>>>>>>>> here comes the problem…..
>>>>> >         I am not sure how to create a workflow task which can call
>>>>>the
>>>>> >poster.py in python etllib, it looks like we need to create our own
>>>>> >java  class that extend <TaskInstance> which is an abstract Java
>>>>> >class with one abstract method that has the following signature:
>>>>> >
>>>>> >
>>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>>> >crtlMetadata);
>>>>> >         However, the detail of where to find the corresponding
>>>>> >libs and where to put our implementation in workflow manager is
>>>>> >being neglected  in that page.  I am not sure if we should use
>>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>>> >thru which it can call the python code i.e. poster.py. and it looks
>>>>> >like we need to embody the TaskInstance::performExecution by
>>>>> >injecting the code  that calls the poster.py and return the
>>resultState.
>>>>> >
>>>>> >
>>>>> >It would be greatly appreciated if you could please shed some
>>>>> >lights and advice how we can get a task instance to call the
>>>>> >poster.py. BTW,
>>>>>I
>>>>> >am  also not sure if my understanding is correct, please kindly
>>>>>correct
>>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>>> >
>>>>> >
>>>>> >
>>>>> >Thanks
>>>>> >Luke
>>>>> 
>>>>> Thanks Luke, see above. Let me know if it helps.
>>>>> 
>>>>> Cheers!
>>>>> 
>>>>> Chris
>>>>> 
>>>>> >
>>>>> >From: Mattmann, Chris A (3980)
>>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>>> >
>>>>> >Sent: 2014年10月25日
>>>>> > 13:34
>>>>> >To: Zichuan Wang
>>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>>> >xiaoyanj@usc.edu
>>>>> >Subject: Re: 回复: Question about OODT file manager
>>>>> >
>>>>> >
>>>>> >
>>>>> >Please cc
>>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>>detail
>>>>> >soon
>>>>> >
>>>>> >Sent from my iPhone
>>>>> 
>>>>> 
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ++
>>>>> Chris Mattmann, Ph.D.
>>>>> Chief Architect
>>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>>> Office: 168-519, Mailstop: 168-527
>>>>> Email: chris.a.mattmann@nasa.gov
>>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ++
>>>>> Adjunct Associate Professor, Computer Science Department University
>>>>> of Southern California, Los Angeles, CA 90089 USA
>>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>>> ++
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> 
>>>>> >
>>>>> >
>>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu> wrote:
>>>>> >
>>>>> >
>>>>> >Dear Professor,
>>>>> >
>>>>> >
>>>>> >
>>>>> >Could please also explain how I can crawl all JSON file name under
>>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>>> >xam
>>>>> p
>>>>> >le,  but it doesn’t mention anything about crawling, instead it
>>>>> >manually set the Input files paths...
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >--
>>>>> >
>>>>> >Zichuan Wang
>>>>> >
>>>>> >University of Southern California, Department of Computer Science
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>>> >写道:
>>>>> >
>>>>> >Dear Professor,
>>>>> >
>>>>> >
>>>>> >
>>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>>> >File Manager, but from my understanding, we are using ETLLib poster
>>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>>> >in this assignment?
>>>>> >
>>>>> >
>>>>> >
>>>>> >--
>>>>> >
>>>>> >Zichuan Wang
>>>>> >
>>>>> >University of Southern California, Department of Computer Science
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>> >
>>>>
>>>
>>>
>>
>>

Re: re: Question about OODT file manager

Posted by Christian Alan Mattmann <ma...@usc.edu>.
Thanks Luke, I’ve given you permissions so you should now see an
“edit” button on that wiki page.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Adjunct Associate Professor, Computer Science Department
University of Southern California
Los Angeles, CA 90089 USA
Email: mattmann@usc.edu
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++




-----Original Message-----
From: Luke liu <sh...@usc.edu>
Date: Wednesday, November 5, 2014 at 6:48 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
<zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
<zi...@usc.edu>
Subject: RE: re: Question about OODT file manager

>I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
>following account detail:
>	Account name: luke
>	Full Name: Shuai Liu (Luke)
>	Email: hanson311biz@gmail.com
>	Password: *******
>
>But I am not sure where I can add my notes to the following web article
>with
>which I had trouble , I also tried to create a new article, but failed to
>do
>it as I cannot find a place where I can edit, does this have something do
>with my account that is not visible for the "edit" or "comments" action?
>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>
>
>Thanks
>Luke
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Sunday, November 2, 2014 6:59 AM
>To: Luke liu; dev@oodt.apache.org
>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>'Zichuan
>Wang'
>Subject: Re: re: Question about OODT file manager
>
>Yes Luke, making the instructions better would be much appreciated!
>
>If you have an account on the wiki please share it, else sign up for an
>Apache OODT wiki account and please share it with me or anyone else on
>dev@oodt and we’ll add you.
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department University of
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Luke liu <sh...@usc.edu>
>Date: Sunday, November 2, 2014 at 1:32 AM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
><zi...@usc.edu>
>Subject: RE: re: Question about OODT file manager
>
>>Thanks Professor Mattmann, not running batch_stub was the main culprit
>>and there were some other issues such as missing jars; and sorry for
>>not confirming this right away, my laptop was actually crashing, and i
>>just had time to fix it; BTW, I was able to get the cas-pge example to
>>work, (even though I saw the workflow failed to pass the pre-condition
>>in the log, the combined file and some metadata files (i.e.3 files)
>>were still successfully ingested and placed in the output directory)
>>
>>BTW, i think there are a lot of mistakes in the documents, do you want
>>us to help correct the document(i.e.
>>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>>le
>>)?
>>If possible, I would like to please share my notes with some problem
>>steps mentioned there.
>>
>>Anyway, thanks for your help and appreciated.
>>
>>Thanks
>>Luke
>>-----Original Message-----
>>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>Sent: Saturday, November 1, 2014 10:48 AM
>>To: Luke; dev@oodt.apache.org
>>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>'Zichuan Wang'
>>Subject: Re: re: Question about OODT file manager
>>
>>Dear Luke, just confirming, we solved this in class right? It had to do
>>with the batch stub not being turned on.
>>
>>Cheers,
>>Chris
>>
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Chris Mattmann, Ph.D.
>>Chief Architect
>>Instrument Software and Science Data Systems Section (398) NASA Jet
>>Propulsion Laboratory Pasadena, CA 91109 USA
>>Office: 168-519, Mailstop: 168-527
>>Email: chris.a.mattmann@nasa.gov
>>WWW:  http://sunset.usc.edu/~mattmann/
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>Adjunct Associate Professor, Computer Science Department University of
>>Southern California, Los Angeles, CA 90089 USA
>>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>>
>>
>>
>>
>>-----Original Message-----
>>From: Luke <sh...@usc.edu>
>>Date: Tuesday, October 28, 2014 at 12:52 PM
>>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
>><de...@oodt.apache.org>
>>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
>><zi...@usc.edu>
>>Subject: RE: re: Question about OODT file manager
>>
>>>Dear Professor Mattamnn,
>>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>>sorry for getting back to you with my appreciation, I have been
>>>conducting tests with OODT based on your advice, but unfortunately I
>>>am having another problem....
>>>
>>>I am following the steps
>>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exa
>>>mpl
>>>e
>>>) to get a sense of how to get workflow to work.
>>>The problem is that the File-Concatenator-PGE (by running the
>>>wmgr-client
>>>command-line) does not seems to be invoked or executed, but I am
>>>seeing the tasks are getting stacked up in the workflow manager with
>>>status either "RSUBMIT" or "QUEUED", but they are not getting executed,
>PFA:
>>>workflow_monitor.jpg, please note, by default the workflow min pool
>>>size is 6; so here comes another problem, i have 6 submitted tasks
>>>with status RSUBMIT, but any new incoming tasks will be forwarded to
>>>the waiting QUEUE with status "QUEUED"...please refer to the
>>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task
>>>and
>6 RSUMBITE tasks.
>>>
>>>Question 1): not sure why the workflow is not being executed, and
>>>hanging at the state of "RSUBMIT", after enabling the log level, I am
>>>seeing the following entry in the log, not sure if this has anything
>>>to do with the "hanging" problem where workflow is not getting
>>>executed and hanging at state of "RSUBMIT".
>>>	Oct 28, 2014 3:35:07 AM
>>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>>safeCheckJobComplete
>>>	WARNING: Exception checking completion status for job:
>>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>>java.lang.NullPointerException
>>>
>>>Question 2): I think currently on my side any new incoming workflow
>>>task I am sending with the following command is being directed to the
>>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase
>>>this to a larger number though),
>>>			./wmgr-client --url http://localhost:9200
>>--operation --sendEvent
>>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>>	If possible, I would like to please know if there is a way we can
>>purge
>>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>>"QUEUED" I have already sent, please kindly help.
>>>
>>>Very sorry for troubling you with this, to be honest I find OODT a bit
>>>challenging to grasp within a short time frame, probably because there
>>>is no book like OODT in action like Solr.... and what I am doing is
>>>just trial and error blended with guess, but I don’t want to make a
>>>blind guess, it will be appreciated if you can please also shed some
>>>lights on where I can get more information logging or other way where
>>>I can troubleshoot. I think it might be worth tracking what is
>>>happening when workflow reach the status "RSUBMIT" and how to get a
>>>specific logging info specific to it...
>>>
>>>Again your advice and kind help will be appreciated usual.
>>>
>>>
>>>Thanks
>>>Luke
>>>
>>>> -----Original Message-----
>>>> From: Mattmann, Chris A (3980)
>>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>>> Sent: 2014年10月26日 22:18
>>>> To: Luke; 'Zichuan Wang'
>>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>>> dev@oodt.apache.org
>>>> Subject: Re: re: Question about OODT file manager
>>>> 
>>>> Hi Luke,
>>>> 
>>>> Thanks and sorry it’s taken me a while to reply. Here are some
>>>>details
>>>>below:
>>>> 
>>>> 
>>>> -----Original Message-----
>>>> From: Luke <sh...@usc.edu>
>>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>>> <zi...@usc.edu>
>>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>>> Subject: RE: re: Question about OODT file manager
>>>> 
>>>> >Hi Professor Mattmann and OODT DEV,
>>>> >
>>>> >Sorry to trouble you with this email, our team has been struggling
>>>> >in the oodt to send json files to solr.
>>>> >One of the difficulties is still getting OODT workflow to call the
>>>> >poster.py in etllib.
>>>> 
>>>> Sorry that you’re having difficulty let me try and help.
>>>> 
>>>> >
>>>> >I am not sure if my understanding is correct with OODT requirement,
>>>> >I hope you can please kindly advice and help with our confusion.
>>>> >
>>>> >a set of goals in my mind with OODT is as follows, please kindly
>>>> >confirm and clarify:
>>>> >
>>>> >1)
>>>> >Get the File-Manager up and running.
>>>> 
>>>> Yep, hopefully as installed via OODT RADIX.
>>>> 
>>>> >2)
>>>> >send all json files with command wmgr-client to the fileManager
>>>>server.
>>>> >(I believe we can achieve it with a bash script or probably  python
>>>> >that calls the command line sequentially with each json file name
>>>> >as
>>>>an
>>>> >argument?!)
>>>> 
>>>> Suggestion:
>>>> 
>>>> 1. Use the OODT crawler and file manager to crawl/index the JSON
>>>>files (in  place data transfer).
>>>> 2. Take a look at CAS-PGE, it will help you write a workflow task
>>>>that will wrap  ETLlib and the poster command.
>>>> 3. Once you are confident with #2, whip up a script that pages
>>>>through all of  your indexed JSON files, and then for each one,
>>>>submits a workflow event (you  may need to look into aggregating
>>>>them) that calls your CAS-PGE wrapped  poster task from ETLlib.
>>>> 
>>>> >3)
>>>> >Once we have json files sent and stored in the File-Manager, we
>>>> >need
>>>>to
>>>> >get workflow-manager up and running, and we can create a workflow
>>>>that
>>>> >send those jsons file from the file manager to solr.
>>>> 
>>>> See above.
>>>> 
>>>> >4)
>>>> >Create a workflow according to
>>>> >Workflow2 User Guide
>>>> 
>>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gui
>>>>>de>
>>>> >>>>>>>>>>> here comes the problem…..
>>>> >         I am not sure how to create a workflow task which can call
>>>>the
>>>> >poster.py in python etllib, it looks like we need to create our own
>>>> >java  class that extend <TaskInstance> which is an abstract Java
>>>> >class with one abstract method that has the following signature:
>>>> >
>>>> >
>>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>>> >crtlMetadata);
>>>> >         However, the detail of where to find the corresponding
>>>> >libs and where to put our implementation in workflow manager is
>>>> >being neglected  in that page.  I am not sure if we should use
>>>> >TaskInstance, but it seems the workflow has to have an interface
>>>> >thru which it can call the python code i.e. poster.py. and it looks
>>>> >like we need to embody the TaskInstance::performExecution by
>>>> >injecting the code  that calls the poster.py and return the
>resultState.
>>>> >
>>>> >
>>>> >It would be greatly appreciated if you could please shed some
>>>> >lights and advice how we can get a task instance to call the
>>>> >poster.py. BTW,
>>>>I
>>>> >am  also not sure if my understanding is correct, please kindly
>>>>correct
>>>> >it if inappropriate. Your help will be appreciated as usual.
>>>> >
>>>> >
>>>> >
>>>> >Thanks
>>>> >Luke
>>>> 
>>>> Thanks Luke, see above. Let me know if it helps.
>>>> 
>>>> Cheers!
>>>> 
>>>> Chris
>>>> 
>>>> >
>>>> >From: Mattmann, Chris A (3980)
>>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>>> >
>>>> >Sent: 2014年10月25日
>>>> > 13:34
>>>> >To: Zichuan Wang
>>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu;
>>>> >xiaoyanj@usc.edu
>>>> >Subject: Re: 回复: Question about OODT file manager
>>>> >
>>>> >
>>>> >
>>>> >Please cc
>>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>>detail
>>>> >soon
>>>> >
>>>> >Sent from my iPhone
>>>> 
>>>> 
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> ++
>>>> Chris Mattmann, Ph.D.
>>>> Chief Architect
>>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>>> Office: 168-519, Mailstop: 168-527
>>>> Email: chris.a.mattmann@nasa.gov
>>>> WWW:  http://sunset.usc.edu/~mattmann/
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> ++
>>>> Adjunct Associate Professor, Computer Science Department University
>>>> of Southern California, Los Angeles, CA 90089 USA
>>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> ++
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> >
>>>> >
>>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu> wrote:
>>>> >
>>>> >
>>>> >Dear Professor,
>>>> >
>>>> >
>>>> >
>>>> >Could please also explain how I can crawl all JSON file name under
>>>> >a specific directory using CAS-PGE? I’ll work through this example
>>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>>> >xam
>>>> p
>>>> >le,  but it doesn’t mention anything about crawling, instead it
>>>> >manually set the Input files paths...
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >--
>>>> >
>>>> >Zichuan Wang
>>>> >
>>>> >University of Southern California, Department of Computer Science
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>>> >写道:
>>>> >
>>>> >Dear Professor,
>>>> >
>>>> >
>>>> >
>>>> >In assignment 2 specification I noticed that you mentioned OODT
>>>> >File Manager, but from my understanding, we are using ETLLib poster
>>>> >which talks directly to Solr. So how can we use OODT File Manager
>>>> >in this assignment?
>>>> >
>>>> >
>>>> >
>>>> >--
>>>> >
>>>> >Zichuan Wang
>>>> >
>>>> >University of Southern California, Department of Computer Science
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>> >
>>>
>>
>>
>
>


RE: re: Question about OODT file manager

Posted by Luke liu <sh...@usc.edu>.
I just signed up on the wiki(i.e. https://cwiki.apache.org ) with the
following account detail:
	Account name: luke
	Full Name: Shuai Liu (Luke)
	Email: hanson311biz@gmail.com
	Password: *******

But I am not sure where I can add my notes to the following web article with
which I had trouble , I also tried to create a new article, but failed to do
it as I cannot find a place where I can edit, does this have something do
with my account that is not visible for the "edit" or "comments" action?
https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example


Thanks
Luke
-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Sunday, November 2, 2014 6:59 AM
To: Luke liu; dev@oodt.apache.org
Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu; 'Zichuan
Wang'
Subject: Re: re: Question about OODT file manager

Yes Luke, making the instructions better would be much appreciated!

If you have an account on the wiki please share it, else sign up for an
Apache OODT wiki account and please share it with me or anyone else on
dev@oodt and we’ll add you.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398) NASA Jet
Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department University of
Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Luke liu <sh...@usc.edu>
Date: Sunday, November 2, 2014 at 1:32 AM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
<zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
<zi...@usc.edu>
Subject: RE: re: Question about OODT file manager

>Thanks Professor Mattmann, not running batch_stub was the main culprit 
>and there were some other issues such as missing jars; and sorry for 
>not confirming this right away, my laptop was actually crashing, and i 
>just had time to fix it; BTW, I was able to get the cas-pge example to 
>work, (even though I saw the workflow failed to pass the pre-condition 
>in the log, the combined file and some metadata files (i.e.3 files) 
>were still successfully ingested and placed in the output directory)
>
>BTW, i think there are a lot of mistakes in the documents, do you want 
>us to help correct the document(i.e.
>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Examp
>le
>)?
>If possible, I would like to please share my notes with some problem 
>steps mentioned there.
>
>Anyway, thanks for your help and appreciated.
>
>Thanks
>Luke
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Saturday, November 1, 2014 10:48 AM
>To: Luke; dev@oodt.apache.org
>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu; 
>'Zichuan Wang'
>Subject: Re: re: Question about OODT file manager
>
>Dear Luke, just confirming, we solved this in class right? It had to do 
>with the batch stub not being turned on.
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398) NASA Jet 
>Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department University of 
>Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Luke <sh...@usc.edu>
>Date: Tuesday, October 28, 2014 at 12:52 PM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
><zi...@usc.edu>
>Subject: RE: re: Question about OODT file manager
>
>>Dear Professor Mattamnn,
>>Thanks a lot Professor Mattmann for the kind help, it is appreciated, 
>>sorry for getting back to you with my appreciation, I have been 
>>conducting tests with OODT based on your advice, but unfortunately I 
>>am having another problem....
>>
>>I am following the steps
>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exa
>>mpl
>>e
>>) to get a sense of how to get workflow to work.
>>The problem is that the File-Concatenator-PGE (by running the 
>>wmgr-client
>>command-line) does not seems to be invoked or executed, but I am 
>>seeing the tasks are getting stacked up in the workflow manager with 
>>status either "RSUBMIT" or "QUEUED", but they are not getting executed,
PFA:
>>workflow_monitor.jpg, please note, by default the workflow min pool 
>>size is 6; so here comes another problem, i have 6 submitted tasks 
>>with status RSUBMIT, but any new incoming tasks will be forwarded to 
>>the waiting QUEUE with status "QUEUED"...please refer to the 
>>workflow_monitor.jpg for details, where I have 3 QUEUED workflow task and
6 RSUMBITE tasks.
>>
>>Question 1): not sure why the workflow is not being executed, and 
>>hanging at the state of "RSUBMIT", after enabling the log level, I am 
>>seeing the following entry in the log, not sure if this has anything 
>>to do with the "hanging" problem where workflow is not getting 
>>executed and hanging at state of "RSUBMIT".
>>	Oct 28, 2014 3:35:07 AM
>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>safeCheckJobComplete
>>	WARNING: Exception checking completion status for job:
>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>java.lang.NullPointerException
>>
>>Question 2): I think currently on my side any new incoming workflow 
>>task I am sending with the following command is being directed to the 
>>waiting "QUEUE" because of the min pool size (i.e. 6) (I can increase 
>>this to a larger number though),
>>			./wmgr-client --url http://localhost:9200
>--operation --sendEvent
>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>	If possible, I would like to please know if there is a way we can
>purge
>>the queue and get rid of those workflow tasks either in "RSUMBIT" and 
>>"QUEUED" I have already sent, please kindly help.
>>
>>Very sorry for troubling you with this, to be honest I find OODT a bit 
>>challenging to grasp within a short time frame, probably because there 
>>is no book like OODT in action like Solr.... and what I am doing is 
>>just trial and error blended with guess, but I don’t want to make a 
>>blind guess, it will be appreciated if you can please also shed some 
>>lights on where I can get more information logging or other way where 
>>I can troubleshoot. I think it might be worth tracking what is 
>>happening when workflow reach the status "RSUBMIT" and how to get a 
>>specific logging info specific to it...
>>
>>Again your advice and kind help will be appreciated usual.
>>
>>
>>Thanks
>>Luke
>>
>>> -----Original Message-----
>>> From: Mattmann, Chris A (3980) 
>>> [mailto:chris.a.mattmann@jpl.nasa.gov]
>>> Sent: 2014年10月26日 22:18
>>> To: Luke; 'Zichuan Wang'
>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu; 
>>> dev@oodt.apache.org
>>> Subject: Re: re: Question about OODT file manager
>>> 
>>> Hi Luke,
>>> 
>>> Thanks and sorry it’s taken me a while to reply. Here are some 
>>>details
>>>below:
>>> 
>>> 
>>> -----Original Message-----
>>> From: Luke <sh...@usc.edu>
>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>> <zi...@usc.edu>
>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 
>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>> Subject: RE: re: Question about OODT file manager
>>> 
>>> >Hi Professor Mattmann and OODT DEV,
>>> >
>>> >Sorry to trouble you with this email, our team has been struggling 
>>> >in the oodt to send json files to solr.
>>> >One of the difficulties is still getting OODT workflow to call the 
>>> >poster.py in etllib.
>>> 
>>> Sorry that you’re having difficulty let me try and help.
>>> 
>>> >
>>> >I am not sure if my understanding is correct with OODT requirement, 
>>> >I hope you can please kindly advice and help with our confusion.
>>> >
>>> >a set of goals in my mind with OODT is as follows, please kindly 
>>> >confirm and clarify:
>>> >
>>> >1)
>>> >Get the File-Manager up and running.
>>> 
>>> Yep, hopefully as installed via OODT RADIX.
>>> 
>>> >2)
>>> >send all json files with command wmgr-client to the fileManager
>>>server.
>>> >(I believe we can achieve it with a bash script or probably  python 
>>> >that calls the command line sequentially with each json file name 
>>> >as
>>>an
>>> >argument?!)
>>> 
>>> Suggestion:
>>> 
>>> 1. Use the OODT crawler and file manager to crawl/index the JSON 
>>>files (in  place data transfer).
>>> 2. Take a look at CAS-PGE, it will help you write a workflow task 
>>>that will wrap  ETLlib and the poster command.
>>> 3. Once you are confident with #2, whip up a script that pages 
>>>through all of  your indexed JSON files, and then for each one, 
>>>submits a workflow event (you  may need to look into aggregating 
>>>them) that calls your CAS-PGE wrapped  poster task from ETLlib.
>>> 
>>> >3)
>>> >Once we have json files sent and stored in the File-Manager, we 
>>> >need
>>>to
>>> >get workflow-manager up and running, and we can create a workflow
>>>that
>>> >send those jsons file from the file manager to solr.
>>> 
>>> See above.
>>> 
>>> >4)
>>> >Create a workflow according to
>>> >Workflow2 User Guide
>>> 
>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Gui
>>>>de>
>>> >>>>>>>>>>> here comes the problem…..
>>> >         I am not sure how to create a workflow task which can call
>>>the
>>> >poster.py in python etllib, it looks like we need to create our own 
>>> >java  class that extend <TaskInstance> which is an abstract Java 
>>> >class with one abstract method that has the following signature:
>>> >
>>> >
>>> >protectedabstract ResultsState performExecution(ControlMetadata 
>>> >crtlMetadata);
>>> >         However, the detail of where to find the corresponding 
>>> >libs and where to put our implementation in workflow manager is 
>>> >being neglected  in that page.  I am not sure if we should use 
>>> >TaskInstance, but it seems the workflow has to have an interface 
>>> >thru which it can call the python code i.e. poster.py. and it looks 
>>> >like we need to embody the TaskInstance::performExecution by 
>>> >injecting the code  that calls the poster.py and return the
resultState.
>>> >
>>> >
>>> >It would be greatly appreciated if you could please shed some 
>>> >lights and advice how we can get a task instance to call the 
>>> >poster.py. BTW,
>>>I
>>> >am  also not sure if my understanding is correct, please kindly
>>>correct
>>> >it if inappropriate. Your help will be appreciated as usual.
>>> >
>>> >
>>> >
>>> >Thanks
>>> >Luke
>>> 
>>> Thanks Luke, see above. Let me know if it helps.
>>> 
>>> Cheers!
>>> 
>>> Chris
>>> 
>>> >
>>> >From: Mattmann, Chris A (3980) 
>>> >[mailto:chris.a.mattmann@jpl.nasa.gov]
>>> >
>>> >Sent: 2014年10月25日
>>> > 13:34
>>> >To: Zichuan Wang
>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu; 
>>> >xiaoyanj@usc.edu
>>> >Subject: Re: 回复: Question about OODT file manager
>>> >
>>> >
>>> >
>>> >Please cc
>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>detail
>>> >soon
>>> >
>>> >Sent from my iPhone
>>> 
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet 
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++
>>> Adjunct Associate Professor, Computer Science Department University 
>>> of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> >
>>> >
>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu> wrote:
>>> >
>>> >
>>> >Dear Professor,
>>> >
>>> >
>>> >
>>> >Could please also explain how I can crawl all JSON file name under 
>>> >a specific directory using CAS-PGE? I’ll work through this example 
>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+E
>>> >xam
>>> p
>>> >le,  but it doesn’t mention anything about crawling, instead it 
>>> >manually set the Input files paths...
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >
>>> >Zichuan Wang
>>> >
>>> >University of Southern California, Department of Computer Science
>>> >
>>> >
>>> >
>>> >
>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>> >写道:
>>> >
>>> >Dear Professor,
>>> >
>>> >
>>> >
>>> >In assignment 2 specification I noticed that you mentioned OODT 
>>> >File Manager, but from my understanding, we are using ETLLib poster 
>>> >which talks directly to Solr. So how can we use OODT File Manager 
>>> >in this assignment?
>>> >
>>> >
>>> >
>>> >--
>>> >
>>> >Zichuan Wang
>>> >
>>> >University of Southern California, Department of Computer Science
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>
>
>



Re: re: Question about OODT file manager

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Yes Luke, making the instructions better would be much appreciated!

If you have an account on the wiki please share it, else sign up
for an Apache OODT wiki account and please share it with me or anyone
else on dev@oodt and we’ll add you.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Luke liu <sh...@usc.edu>
Date: Sunday, November 2, 2014 at 1:32 AM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
<zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
<zi...@usc.edu>
Subject: RE: re: Question about OODT file manager

>Thanks Professor Mattmann, not running batch_stub was the main culprit and
>there were some other issues such as missing jars; and sorry for not
>confirming this right away, my laptop was actually crashing, and i just
>had
>time to fix it; BTW, I was able to get the cas-pge example to work, (even
>though I saw the workflow failed to pass the pre-condition in the log, the
>combined file and some metadata files (i.e.3 files) were still
>successfully
>ingested and placed in the output directory)
>
>BTW, i think there are a lot of mistakes in the documents, do you want us
>to
>help correct the document(i.e.
>https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>)?
>If possible, I would like to please share my notes with some problem steps
>mentioned there. 
>
>Anyway, thanks for your help and appreciated.
>
>Thanks
>Luke
>-----Original Message-----
>From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>Sent: Saturday, November 1, 2014 10:48 AM
>To: Luke; dev@oodt.apache.org
>Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>'Zichuan
>Wang'
>Subject: Re: re: Question about OODT file manager
>
>Dear Luke, just confirming, we solved this in class right? It had
>to do with the batch stub not being turned on.
>
>Cheers,
>Chris
>
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Chris Mattmann, Ph.D.
>Chief Architect
>Instrument Software and Science Data Systems Section (398)
>NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>Office: 168-519, Mailstop: 168-527
>Email: chris.a.mattmann@nasa.gov
>WWW:  http://sunset.usc.edu/~mattmann/
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>Adjunct Associate Professor, Computer Science Department
>University of Southern California, Los Angeles, CA 90089 USA
>++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
>-----Original Message-----
>From: Luke <sh...@usc.edu>
>Date: Tuesday, October 28, 2014 at 12:52 PM
>To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
><de...@oodt.apache.org>
>Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
><zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
><zi...@usc.edu>
>Subject: RE: re: Question about OODT file manager
>
>>Dear Professor Mattamnn,
>>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>>sorry for getting back to you with my appreciation, I have been
>>conducting tests with OODT based on your advice, but unfortunately I am
>>having another problem....
>>
>>I am following the steps
>>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exampl
>>e
>>) to get a sense of how to get workflow to work.
>>The problem is that the File-Concatenator-PGE (by running the wmgr-client
>>command-line) does not seems to be invoked or executed, but I am seeing
>>the tasks are getting stacked up in the workflow manager with status
>>either "RSUBMIT" or "QUEUED", but they are not getting executed, PFA:
>>workflow_monitor.jpg, please note, by default the workflow min pool size
>>is 6; so here comes another problem, i have 6 submitted tasks with status
>>RSUBMIT, but any new incoming tasks will be forwarded to the waiting
>>QUEUE with status "QUEUED"...please refer to the workflow_monitor.jpg for
>>details, where I have 3 QUEUED workflow task and 6 RSUMBITE tasks.
>>
>>Question 1): not sure why the workflow is not being executed, and hanging
>>at the state of "RSUBMIT", after enabling the log level, I am seeing the
>>following entry in the log, not sure if this has anything to do with the
>>"hanging" problem where workflow is not getting executed and hanging at
>>state of "RSUBMIT".
>>	Oct 28, 2014 3:35:07 AM
>>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>>safeCheckJobComplete
>>	WARNING: Exception checking completion status for job:
>>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>>java.lang.NullPointerException
>>
>>Question 2): I think currently on my side any new incoming workflow task
>>I am sending with the following command is being directed to the waiting
>>"QUEUE" because of the min pool size (i.e. 6) (I can increase this to a
>>larger number though),
>>			./wmgr-client --url http://localhost:9200
>--operation --sendEvent
>>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>>	If possible, I would like to please know if there is a way we can
>purge
>>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>>"QUEUED" I have already sent, please kindly help.
>>
>>Very sorry for troubling you with this, to be honest I find OODT a bit
>>challenging to grasp within a short time frame, probably because there is
>>no book like OODT in action like Solr.... and what I am doing is just
>>trial and error blended with guess, but I don’t want to make a blind
>>guess, it will be appreciated if you can please also shed some lights on
>>where I can get more information logging or other way where I can
>>troubleshoot. I think it might be worth tracking what is happening when
>>workflow reach the status "RSUBMIT" and how to get a specific logging
>>info specific to it...
>>
>>Again your advice and kind help will be appreciated usual.
>>
>>
>>Thanks
>>Luke
>>
>>> -----Original Message-----
>>> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>> Sent: 2014年10月26日 22:18
>>> To: Luke; 'Zichuan Wang'
>>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>>> dev@oodt.apache.org
>>> Subject: Re: re: Question about OODT file manager
>>> 
>>> Hi Luke,
>>> 
>>> Thanks and sorry it’s taken me a while to reply. Here are some details
>>>below:
>>> 
>>> 
>>> -----Original Message-----
>>> From: Luke <sh...@usc.edu>
>>> Date: Sunday, October 26, 2014 at 6:19 PM
>>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>>> <zi...@usc.edu>
>>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>>> Subject: RE: re: Question about OODT file manager
>>> 
>>> >Hi Professor Mattmann and OODT DEV,
>>> >
>>> >Sorry to trouble you with this email, our team has been struggling in
>>> >the oodt to send json files to solr.
>>> >One of the difficulties is still getting OODT workflow to call the
>>> >poster.py in etllib.
>>> 
>>> Sorry that you’re having difficulty let me try and help.
>>> 
>>> >
>>> >I am not sure if my understanding is correct with OODT requirement, I
>>> >hope you can please kindly advice and help with our confusion.
>>> >
>>> >a set of goals in my mind with OODT is as follows, please kindly
>>> >confirm and clarify:
>>> >
>>> >1)
>>> >Get the File-Manager up and running.
>>> 
>>> Yep, hopefully as installed via OODT RADIX.
>>> 
>>> >2)
>>> >send all json files with command wmgr-client to the fileManager
>>>server.
>>> >(I believe we can achieve it with a bash script or probably  python
>>> >that calls the command line sequentially with each json file name as
>>>an
>>> >argument?!)
>>> 
>>> Suggestion:
>>> 
>>> 1. Use the OODT crawler and file manager to crawl/index the JSON files
>>>(in
>>> place data transfer).
>>> 2. Take a look at CAS-PGE, it will help you write a workflow task that
>>>will wrap
>>> ETLlib and the poster command.
>>> 3. Once you are confident with #2, whip up a script that pages through
>>>all of
>>> your indexed JSON files, and then for each one, submits a workflow
>>>event (you
>>> may need to look into aggregating them) that calls your CAS-PGE wrapped
>>> poster task from ETLlib.
>>> 
>>> >3)
>>> >Once we have json files sent and stored in the File-Manager, we need
>>>to
>>> >get workflow-manager up and running, and we can create a workflow
>>>that
>>> >send those jsons file from the file manager to solr.
>>> 
>>> See above.
>>> 
>>> >4)
>>> >Create a workflow according to
>>> >Workflow2 User Guide
>>> 
>>>><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide>
>>> >>>>>>>>>>> here comes the problem…..
>>> >         I am not sure how to create a workflow task which can call
>>>the
>>> >poster.py in python etllib, it looks like we need to create our own
>>> >java  class that extend <TaskInstance> which is an abstract Java class
>>> >with one abstract method that has the following signature:
>>> >
>>> >
>>> >protectedabstract ResultsState performExecution(ControlMetadata
>>> >crtlMetadata);
>>> >         However, the detail of where to find the corresponding libs
>>> >and where to put our implementation in workflow manager is being
>>> >neglected  in that page.  I am not sure if we should use TaskInstance,
>>> >but it seems the workflow has to have an interface thru which it can
>>> >call the python code i.e. poster.py. and it looks like we need to
>>> >embody the TaskInstance::performExecution by injecting the code  that
>>> >calls the poster.py and return the resultState.
>>> >
>>> >
>>> >It would be greatly appreciated if you could please shed some lights
>>> >and advice how we can get a task instance to call the poster.py. BTW,
>>>I
>>> >am  also not sure if my understanding is correct, please kindly
>>>correct
>>> >it if inappropriate. Your help will be appreciated as usual.
>>> >
>>> >
>>> >
>>> >Thanks
>>> >Luke
>>> 
>>> Thanks Luke, see above. Let me know if it helps.
>>> 
>>> Cheers!
>>> 
>>> Chris
>>> 
>>> >
>>> >From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>>> >
>>> >Sent: 2014年10月25日
>>> > 13:34
>>> >To: Zichuan Wang
>>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu; xiaoyanj@usc.edu
>>> >Subject: Re: 回复: Question about OODT file manager
>>> >
>>> >
>>> >
>>> >Please cc
>>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in
>>>detail
>>> >soon
>>> >
>>> >Sent from my iPhone
>>> 
>>> 
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++
>>> Chris Mattmann, Ph.D.
>>> Chief Architect
>>> Instrument Software and Science Data Systems Section (398) NASA Jet
>>> Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 168-519, Mailstop: 168-527
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:  http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++
>>> Adjunct Associate Professor, Computer Science Department University of
>>> Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> ++
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> >
>>> >
>>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu> wrote:
>>> >
>>> >
>>> >Dear Professor,
>>> >
>>> >
>>> >
>>> >Could please also explain how I can crawl all JSON file name under a
>>> >specific directory using CAS-PGE? I’ll work through this example
>>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>>> p
>>> >le,  but it doesn’t mention anything about crawling, instead it
>>> >manually set the Input files paths...
>>> >
>>> >
>>> >
>>> >
>>> >--
>>> >
>>> >Zichuan Wang
>>> >
>>> >University of Southern California, Department of Computer Science
>>> >
>>> >
>>> >
>>> >
>>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>>> >写道:
>>> >
>>> >Dear Professor,
>>> >
>>> >
>>> >
>>> >In assignment 2 specification I noticed that you mentioned OODT File
>>> >Manager, but from my understanding, we are using ETLLib poster which
>>> >talks directly to Solr. So how can we use OODT File Manager in this
>>> >assignment?
>>> >
>>> >
>>> >
>>> >--
>>> >
>>> >Zichuan Wang
>>> >
>>> >University of Southern California, Department of Computer Science
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>> >
>>
>
>


RE: re: Question about OODT file manager

Posted by Luke liu <sh...@usc.edu>.
Thanks Professor Mattmann, not running batch_stub was the main culprit and
there were some other issues such as missing jars; and sorry for not
confirming this right away, my laptop was actually crashing, and i just had
time to fix it; BTW, I was able to get the cas-pge example to work, (even
though I saw the workflow failed to pass the pre-condition in the log, the
combined file and some metadata files (i.e.3 files) were still successfully
ingested and placed in the output directory) 

BTW, i think there are a lot of mistakes in the documents, do you want us to
help correct the document(i.e.
https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example )?
If possible, I would like to please share my notes with some problem steps
mentioned there. 

Anyway, thanks for your help and appreciated.

Thanks
Luke
-----Original Message-----
From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov] 
Sent: Saturday, November 1, 2014 10:48 AM
To: Luke; dev@oodt.apache.org
Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu; 'Zichuan
Wang'
Subject: Re: re: Question about OODT file manager

Dear Luke, just confirming, we solved this in class right? It had
to do with the batch stub not being turned on.

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Luke <sh...@usc.edu>
Date: Tuesday, October 28, 2014 at 12:52 PM
To: Chris Mattmann <Ch...@jpl.nasa.gov>, "dev@oodt.apache.org"
<de...@oodt.apache.org>
Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
<zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>, 'Zichuan Wang'
<zi...@usc.edu>
Subject: RE: re: Question about OODT file manager

>Dear Professor Mattamnn,
>Thanks a lot Professor Mattmann for the kind help, it is appreciated,
>sorry for getting back to you with my appreciation, I have been
>conducting tests with OODT based on your advice, but unfortunately I am
>having another problem....
>
>I am following the steps
>(https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Example
>) to get a sense of how to get workflow to work.
>The problem is that the File-Concatenator-PGE (by running the wmgr-client
>command-line) does not seems to be invoked or executed, but I am seeing
>the tasks are getting stacked up in the workflow manager with status
>either "RSUBMIT" or "QUEUED", but they are not getting executed, PFA:
>workflow_monitor.jpg, please note, by default the workflow min pool size
>is 6; so here comes another problem, i have 6 submitted tasks with status
>RSUBMIT, but any new incoming tasks will be forwarded to the waiting
>QUEUE with status "QUEUED"...please refer to the workflow_monitor.jpg for
>details, where I have 3 QUEUED workflow task and 6 RSUMBITE tasks.
>
>Question 1): not sure why the workflow is not being executed, and hanging
>at the state of "RSUBMIT", after enabling the log level, I am seeing the
>following entry in the log, not sure if this has anything to do with the
>"hanging" problem where workflow is not getting executed and hanging at
>state of "RSUBMIT".
>	Oct 28, 2014 3:35:07 AM
>org.apache.oodt.cas.workflow.engine.IterativeWorkflowProcessorThread
>safeCheckJobComplete
>	WARNING: Exception checking completion status for job:
>[2014-10-28T01:59:32.813-07:00]: Messsage: java.lang.Exception:
>java.lang.NullPointerException
>
>Question 2): I think currently on my side any new incoming workflow task
>I am sending with the following command is being directed to the waiting
>"QUEUE" because of the min pool size (i.e. 6) (I can increase this to a
>larger number though),
>			./wmgr-client --url http://localhost:9200
--operation --sendEvent
>--eventName fileconcatenator-pge --metaData --key RunID testNumber1
>	If possible, I would like to please know if there is a way we can
purge
>the queue and get rid of those workflow tasks either in "RSUMBIT" and
>"QUEUED" I have already sent, please kindly help.
>
>Very sorry for troubling you with this, to be honest I find OODT a bit
>challenging to grasp within a short time frame, probably because there is
>no book like OODT in action like Solr.... and what I am doing is just
>trial and error blended with guess, but I don’t want to make a blind
>guess, it will be appreciated if you can please also shed some lights on
>where I can get more information logging or other way where I can
>troubleshoot. I think it might be worth tracking what is happening when
>workflow reach the status "RSUBMIT" and how to get a specific logging
>info specific to it...
>
>Again your advice and kind help will be appreciated usual.
>
>
>Thanks
>Luke
>
>> -----Original Message-----
>> From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>> Sent: 2014年10月26日 22:18
>> To: Luke; 'Zichuan Wang'
>> Cc: 'Christian Alan Mattmann'; zhoujian@usc.edu; xiaoyanj@usc.edu;
>> dev@oodt.apache.org
>> Subject: Re: re: Question about OODT file manager
>> 
>> Hi Luke,
>> 
>> Thanks and sorry it’s taken me a while to reply. Here are some details
>>below:
>> 
>> 
>> -----Original Message-----
>> From: Luke <sh...@usc.edu>
>> Date: Sunday, October 26, 2014 at 6:19 PM
>> To: Chris Mattmann <Ch...@jpl.nasa.gov>, 'Zichuan Wang'
>> <zi...@usc.edu>
>> Cc: Chris Mattmann <ma...@usc.edu>, "zhoujian@usc.edu"
>> <zh...@usc.edu>, "xiaoyanj@usc.edu" <xi...@usc.edu>,
>> "dev@oodt.apache.org" <de...@oodt.apache.org>
>> Subject: RE: re: Question about OODT file manager
>> 
>> >Hi Professor Mattmann and OODT DEV,
>> >
>> >Sorry to trouble you with this email, our team has been struggling in
>> >the oodt to send json files to solr.
>> >One of the difficulties is still getting OODT workflow to call the
>> >poster.py in etllib.
>> 
>> Sorry that you’re having difficulty let me try and help.
>> 
>> >
>> >I am not sure if my understanding is correct with OODT requirement, I
>> >hope you can please kindly advice and help with our confusion.
>> >
>> >a set of goals in my mind with OODT is as follows, please kindly
>> >confirm and clarify:
>> >
>> >1)
>> >Get the File-Manager up and running.
>> 
>> Yep, hopefully as installed via OODT RADIX.
>> 
>> >2)
>> >send all json files with command wmgr-client to the fileManager server.
>> >(I believe we can achieve it with a bash script or probably  python
>> >that calls the command line sequentially with each json file name as an
>> >argument?!)
>> 
>> Suggestion:
>> 
>> 1. Use the OODT crawler and file manager to crawl/index the JSON files
>>(in
>> place data transfer).
>> 2. Take a look at CAS-PGE, it will help you write a workflow task that
>>will wrap
>> ETLlib and the poster command.
>> 3. Once you are confident with #2, whip up a script that pages through
>>all of
>> your indexed JSON files, and then for each one, submits a workflow
>>event (you
>> may need to look into aggregating them) that calls your CAS-PGE wrapped
>> poster task from ETLlib.
>> 
>> >3)
>> >Once we have json files sent and stored in the File-Manager, we need to
>> >get workflow-manager up and running, and we can create a workflow  that
>> >send those jsons file from the file manager to solr.
>> 
>> See above.
>> 
>> >4)
>> >Create a workflow according to
>> >Workflow2 User Guide
>> ><https://cwiki.apache.org/confluence/display/OODT/Workflow2+User+Guide>
>> >>>>>>>>>>> here comes the problem…..
>> >         I am not sure how to create a workflow task which can call the
>> >poster.py in python etllib, it looks like we need to create our own
>> >java  class that extend <TaskInstance> which is an abstract Java class
>> >with one abstract method that has the following signature:
>> >
>> >
>> >protectedabstract ResultsState performExecution(ControlMetadata
>> >crtlMetadata);
>> >         However, the detail of where to find the corresponding libs
>> >and where to put our implementation in workflow manager is being
>> >neglected  in that page.  I am not sure if we should use TaskInstance,
>> >but it seems the workflow has to have an interface thru which it can
>> >call the python code i.e. poster.py. and it looks like we need to
>> >embody the TaskInstance::performExecution by injecting the code  that
>> >calls the poster.py and return the resultState.
>> >
>> >
>> >It would be greatly appreciated if you could please shed some lights
>> >and advice how we can get a task instance to call the poster.py. BTW, I
>> >am  also not sure if my understanding is correct, please kindly correct
>> >it if inappropriate. Your help will be appreciated as usual.
>> >
>> >
>> >
>> >Thanks
>> >Luke
>> 
>> Thanks Luke, see above. Let me know if it helps.
>> 
>> Cheers!
>> 
>> Chris
>> 
>> >
>> >From: Mattmann, Chris A (3980) [mailto:chris.a.mattmann@jpl.nasa.gov]
>> >
>> >Sent: 2014年10月25日
>> > 13:34
>> >To: Zichuan Wang
>> >Cc: Christian Alan Mattmann; Luke; zhoujian@usc.edu; xiaoyanj@usc.edu
>> >Subject: Re: 回复: Question about OODT file manager
>> >
>> >
>> >
>> >Please cc
>> >dev@oodt.apache.org <ma...@oodt.apache.org> I will reply in detail
>> >soon
>> >
>> >Sent from my iPhone
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++
>> Chris Mattmann, Ph.D.
>> Chief Architect
>> Instrument Software and Science Data Systems Section (398) NASA Jet
>> Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 168-519, Mailstop: 168-527
>> Email: chris.a.mattmann@nasa.gov
>> WWW:  http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++
>> Adjunct Associate Professor, Computer Science Department University of
>> Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> ++
>> 
>> 
>> 
>> 
>> 
>> 
>> >
>> >
>> >On Oct 25, 2014, at 1:26 PM, "Zichuan Wang" <zi...@usc.edu> wrote:
>> >
>> >
>> >Dear Professor,
>> >
>> >
>> >
>> >Could please also explain how I can crawl all JSON file name under a
>> >specific directory using CAS-PGE? I’ll work through this example
>> >https://cwiki.apache.org/confluence/display/OODT/CAS-PGE+Learn+by+Exam
>> p
>> >le,  but it doesn’t mention anything about crawling, instead it
>> >manually set the Input files paths...
>> >
>> >
>> >
>> >
>> >--
>> >
>> >Zichuan Wang
>> >
>> >University of Southern California, Department of Computer Science
>> >
>> >
>> >
>> >
>> >在 2014年10月25日 星期六,下午12:10,Zichuan Wang
>> >写道:
>> >
>> >Dear Professor,
>> >
>> >
>> >
>> >In assignment 2 specification I noticed that you mentioned OODT File
>> >Manager, but from my understanding, we are using ETLLib poster which
>> >talks directly to Solr. So how can we use OODT File Manager in this
>> >assignment?
>> >
>> >
>> >
>> >--
>> >
>> >Zichuan Wang
>> >
>> >University of Southern California, Department of Computer Science
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>> >
>