You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oodt.apache.org by YunHee Kang <yu...@gmail.com> on 2012/08/06 18:46:39 UTC

Problem happened when I tried to run the script "crawler_launcher"

Hi Chris,

I got an error message when I tried to run crawler_launcher by using a
shell script. The error message may be caused by a  wrong URL of
filemgr.
 $ ./crawler_launcher.sh
ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
allowed for option
[longOption='filemgrUrl',shortOption='fm',description='File Manager
URL'] - Allowed values = [http://.*:\d*]

The following is the shell script that I wrote:
$ cat crawler_launcher.sh
#!/bin/sh
export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
./crawler_launcher \
       -op --launchStdCrawler \
       --productPath $STAGE_AREA\
       --filemgrUrl http://localhost:8000/\
       --failureDir /tmp \
       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
       --metFileExtension tmp \
       --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer

I am wondering if there is a problem in the URL of the filemgr or elsewhere

Thanks,
Yunhee

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi YunHee,

This looks like crawler option validation in the cmd-line-option-beans.xml
file, validating the filemgrUrl -- I don't think you can have the trailing slash at the 
end -- which is probably correct since that's not a valid URL I believe.

Cheers,
Chris

On Aug 6, 2012, at 9:46 AM, YunHee Kang wrote:

> Hi Chris,
> 
> I got an error message when I tried to run crawler_launcher by using a
> shell script. The error message may be caused by a  wrong URL of
> filemgr.
> $ ./crawler_launcher.sh
> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
> allowed for option
> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> URL'] - Allowed values = [http://.*:\d*]
> 
> The following is the shell script that I wrote:
> $ cat crawler_launcher.sh
> #!/bin/sh
> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> ./crawler_launcher \
>       -op --launchStdCrawler \
>       --productPath $STAGE_AREA\
>       --filemgrUrl http://localhost:8000/\
>       --failureDir /tmp \
>       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>       --metFileExtension tmp \
>       --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> 
> I am wondering if there is a problem in the URL of the filemgr or elsewhere
> 
> Thanks,
> Yunhee


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi YunHee,

This looks like crawler option validation in the cmd-line-option-beans.xml
file, validating the filemgrUrl -- I don't think you can have the trailing slash at the 
end -- which is probably correct since that's not a valid URL I believe.

Cheers,
Chris

On Aug 6, 2012, at 9:46 AM, YunHee Kang wrote:

> Hi Chris,
> 
> I got an error message when I tried to run crawler_launcher by using a
> shell script. The error message may be caused by a  wrong URL of
> filemgr.
> $ ./crawler_launcher.sh
> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
> allowed for option
> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> URL'] - Allowed values = [http://.*:\d*]
> 
> The following is the shell script that I wrote:
> $ cat crawler_launcher.sh
> #!/bin/sh
> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> ./crawler_launcher \
>       -op --launchStdCrawler \
>       --productPath $STAGE_AREA\
>       --filemgrUrl http://localhost:8000/\
>       --failureDir /tmp \
>       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>       --metFileExtension tmp \
>       --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> 
> I am wondering if there is a problem in the URL of the filemgr or elsewhere
> 
> Thanks,
> Yunhee


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by Sheryl John <sh...@gmail.com>.
Hi Yunhee,

I'm sorry for the confusion caused due to the guide in the oodt crawler
homepage. The new command line options were introduced recently and hence,
the option 'crawlerId' is obsolete now. It's now replaced by
(--launchStdCrawler or -stdPC).
If you run ./crawler_launcher -h, you should see the new cli options menu
added by Brian.



On Tue, Aug 7, 2012 at 7:03 AM, YunHee Kang <yu...@gmail.com> wrote:

> Hi Chris and Sheryl,
>
> I understood  my mistake after modifying a wrong URL with the "/".
> But there is the wrong  URL  that is used  as an option of
> crawler_launcher in the apache oodt
> homepage(http://oodt.apache.org/components/maven/crawler/user/).
>  --filemgrUrl http://localhost:9000/ \
> So it made me confused.
>
> I tried to run the command mentioned below  according to  the home
> page of apache oodt.
> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
> ERROR: Invalid option: 'crawlerId'
>
> But the error described above  was occurred.
> Is the option 'crawlerid'  obsolete ?
>
> Thanks,
> Yunhee
>
>
> 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> > Perfect, Sheryl, my thoughts exactly.
> >
> > Cheers,
> > Chris
> >
> > On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
> >
> >> Hi Yunhee,
> >>
> >> Check out this OODT wiki for crawler :
> >> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> >>
> >> Did you try giving 'http://localhost:8000' without the "/" in the end?
> >> Also, specify
> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
> >> for  'clientTransferer' option.
> >>
> >>
> >> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com>
> wrote:
> >>
> >>> Hi Chris,
> >>>
> >>> I got an error message when I tried to run crawler_launcher by using a
> >>> shell script. The error message may be caused by a  wrong URL of
> >>> filemgr.
> >>> $ ./crawler_launcher.sh
> >>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
> >>> allowed for option
> >>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> >>> URL'] - Allowed values = [http://.*:\d*]
> >>>
> >>> The following is the shell script that I wrote:
> >>> $ cat crawler_launcher.sh
> >>> #!/bin/sh
> >>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> >>> ./crawler_launcher \
> >>>       -op --launchStdCrawler \
> >>>       --productPath $STAGE_AREA\
> >>>       --filemgrUrl http://localhost:8000/\
> >>>       --failureDir /tmp \
> >>>       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
> >>>       --metFileExtension tmp \
> >>>       --clientTransferer
> >>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> >>>
> >>> I am wondering if there is a problem in the URL of the filemgr or
> elsewhere
> >>>
> >>> Thanks,
> >>> Yunhee
> >>>
> >>
> >>
> >>
> >> --
> >> -Sheryl
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
>



-- 
-Sheryl

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by Sheryl John <sh...@gmail.com>.
Hi Yunhee,


On Thu, Aug 9, 2012 at 8:19 PM, YunHee Kang <yu...@gmail.com> wrote:

> Hi Sheryl,
>
> First off, I tried to run crawler_launcher with an option "-autoPC".
> Then I got a warning message as follows:
> Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
>
> [/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5]
> Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> INFO: Handling file
>
> /home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp
> Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> WARNING: Failed to pass preconditions for ingest of product:
>
> [/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp]
>
> I think that the warning message is related with preconditions for ingest.
> According to the run script for crawler_launcher,  it was wrong to
> describe the option "pids" for the preconditions.
> #!/bin/sh
> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> ./crawler_launcher \
>       -op   -stdPC \
>       -mfx tmp\
>       --productPath $STAGE_AREA\
>       --filemgrUrl http://localhost:8000\
>        --failureDir /tmp \
>        --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>        --metFileExtension tmp \
>        -pids CheckThatDataFileSizeIsGreaterThanZero \
>        --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
> Let me know how to fix the warning.
>
>
I see that your data file is *.he5 and the metadata file is *.he5.info.tmp.
Specify your '-mfx' option as  'info.tmp'
StdProductCrawler adds your met file extension to the absolute path of the
data file. Try that and see if it ingests the data file. I should have
noticed this before, but I only caught it after testing it out.

Next I appied an option for metadata crawler  to the run script.
> #!/bin/sh
> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> ./crawler_launcher \
>        -op    -metPC\
>        -pp $STAGE_AREA\
>        -fm http://localhost:8000\
>        -mxc ../policy/crawler-config.xml\
>        -mx org.apache.oodt.cas.metadata.extractors.ExternMetExtractor\
>        -mxr ../policy/mime-extractor-map.xml\
>        --failureDir /tmp \
>        --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>        --metFileExtension tmp \
>        --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>
> I also get the error message as follows:
>
> ERROR: Failed to launch crawler : Error creating bean with name
> 'MetExtractorProductCrawler' defined in file
>
> [/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/bin/../policy/crawler-beans.xml]:
> Error setting property values; nested exception is
> org.springframework.beans.PropertyBatchUpdateException; nested
> PropertyAccessExceptions (1) are:
> PropertyAccessException 1:
> org.springframework.beans.MethodInvocationException: Property
> 'metExtractor' threw exception; nested exception is
> org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Failed
> to parse config file : Failed to parser
> '/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/policy/crawler-config.xml'
> : null
>
> I just used the property file crawler-config.xml (as follows) in the
> policy directory.
>
> <beans xmlns="http://www.springframework.org/schema/beans"
>         xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
> xmlns:p="http://www.springframework.org/schema/p"
>         xsi:schemaLocation="http://www.springframework.org/schema/beans
> http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
>         <bean
> class="org.apache.oodt.cas.crawl.util.CasPropertyOverrideConfigurer"
> />
>         <import resource="crawler-beans.xml" />
>         <import resource="action-beans.xml" />
>         <import resource="precondition-beans.xml" />
>         <import resource="naming-beans.xml" />
> </beans>
>
>

Your metextractor config (-mxc option) should be a config file for your
external meta-extractor and will look like this :
https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml

The crawler-config.xml is used by the crawler-launcher to read all the
actions, precondition etc.

I've not defined or used an external-met extractor before, but you can see
an example of an extern met-extractor and it's config in the wiki:
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help

So I need to understand how to write some xml files(including
> crawler-beans.xml, action-beans.xml, etc), which are imported into the
> file  crawler-config.xml .
> Could you share your experience with me ?
> Thanks,
> Yunhee
>
>
Yep, you should write the above mentioned extractor config file for your
specific external met-extractor. But, you don't have to write crawler-beans
or action-beans. You can just pick the actions ids you want in the
crawler-launcher cli '-actionIds or -ais' option and you can see these
listed in the action-beans.xml. The same applies for the crawler-beans and
the preconditions.

2012/8/10 Sheryl John <sh...@gmail.com>:
> > Hi Yunhee,
> >
> > What are the error messages you get while running the crawler?
> >
> > I've faced similar issues with crawler when I tried out the first time
> too.
> > I went through the crawler user guide to understand the architecture and
> > then understood how it worked only after running crawler with several
> times
> > to ingest files.
> > I agree we need to update the guide and if you want to know about the
> > MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page
> that
> > I mentioned before will give you an idea how to get it working (It
> mentions
> > the config files that you need to write for the above two crawlers).
> >
> >
> >
> > On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <yu...@gmail.com> wrote:
> >
> >> Hi Chris,
> >>
> >> I got a bunch of error messages when running the crawler_launcher
> script.
> >> First off, I think I need to understand  how to a crawler works.
> >> Can I get some materials to help me write configuration files for
> >> crawler_launcher ?
> >>
> >> Honestly I am not familiar with Crawler.
> >> But I will try to file a JIRA issue to update the Crawler user guide.
> >>
> >> Thanks,
> >> Yunhee
> >>
> >>
> >>
> >> 2012/8/9 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> >> > Hi YunHee,
> >> >
> >> > Sorry, we need to update the docs, that is for sure. Can you help
> >> > us remember by filing a JIRA issue to update the Crawler user
> >> > guide and to fix the URL there?
> >> >
> >> > As for crawlerId, yes it's obsolete, you can find the modern
> >> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h
> >> >
> >> > Cheers,
> >> > Chris
> >> >
> >> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
> >> >
> >> >> Hi Chris and Sheryl,
> >> >>
> >> >> I understood  my mistake after modifying a wrong URL with the "/".
> >> >> But there is the wrong  URL  that is used  as an option of
> >> >> crawler_launcher in the apache oodt
> >> >> homepage(http://oodt.apache.org/components/maven/crawler/user/).
> >> >> --filemgrUrl http://localhost:9000/ \
> >> >> So it made me confused.
> >> >>
> >> >> I tried to run the command mentioned below  according to  the home
> >> >> page of apache oodt.
> >> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
> >> >> ERROR: Invalid option: 'crawlerId'
> >> >>
> >> >> But the error described above  was occurred.
> >> >> Is the option 'crawlerid'  obsolete ?
> >> >>
> >> >> Thanks,
> >> >> Yunhee
> >> >>
> >> >>
> >> >> 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> >> >>> Perfect, Sheryl, my thoughts exactly.
> >> >>>
> >> >>> Cheers,
> >> >>> Chris
> >> >>>
> >> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
> >> >>>
> >> >>>> Hi Yunhee,
> >> >>>>
> >> >>>> Check out this OODT wiki for crawler :
> >> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> >> >>>>
> >> >>>> Did you try giving 'http://localhost:8000' without the "/" in the
> >> end?
> >> >>>> Also, specify
> >> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
> >> >>>> for  'clientTransferer' option.
> >> >>>>
> >> >>>>
> >> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com>
> >> wrote:
> >> >>>>
> >> >>>>> Hi Chris,
> >> >>>>>
> >> >>>>> I got an error message when I tried to run crawler_launcher by
> using
> >> a
> >> >>>>> shell script. The error message may be caused by a  wrong URL of
> >> >>>>> filemgr.
> >> >>>>> $ ./crawler_launcher.sh
> >> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is
> not
> >> >>>>> allowed for option
> >> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File
> Manager
> >> >>>>> URL'] - Allowed values = [http://.*:\d*]
> >> >>>>>
> >> >>>>> The following is the shell script that I wrote:
> >> >>>>> $ cat crawler_launcher.sh
> >> >>>>> #!/bin/sh
> >> >>>>> export
> STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> >> >>>>> ./crawler_launcher \
> >> >>>>>      -op --launchStdCrawler \
> >> >>>>>      --productPath $STAGE_AREA\
> >> >>>>>      --filemgrUrl http://localhost:8000/\
> >> >>>>>      --failureDir /tmp \
> >> >>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
> >> >>>>>      --metFileExtension tmp \
> >> >>>>>      --clientTransferer
> >> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> >> >>>>>
> >> >>>>> I am wondering if there is a problem in the URL of the filemgr or
> >> elsewhere
> >> >>>>>
> >> >>>>> Thanks,
> >> >>>>> Yunhee
> >> >>>>>
> >> >>>>
> >> >>>>
> >> >>>>
> >> >>>> --
> >> >>>> -Sheryl
> >> >>>
> >> >>>
> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >>> Chris Mattmann, Ph.D.
> >> >>> Senior Computer Scientist
> >> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> >>> Office: 171-266B, Mailstop: 171-246
> >> >>> Email: chris.a.mattmann@nasa.gov
> >> >>> WWW:   http://sunset.usc.edu/~mattmann/
> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >>> Adjunct Assistant Professor, Computer Science Department
> >> >>> University of Southern California, Los Angeles, CA 90089 USA
> >> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >>>
> >> >
> >> >
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Chris Mattmann, Ph.D.
> >> > Senior Computer Scientist
> >> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> > Office: 171-266B, Mailstop: 171-246
> >> > Email: chris.a.mattmann@nasa.gov
> >> > WWW:   http://sunset.usc.edu/~mattmann/
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> > Adjunct Assistant Professor, Computer Science Department
> >> > University of Southern California, Los Angeles, CA 90089 USA
> >> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> >
> >>
> >
> >
> >
> > --
> > -Sheryl
>



-- 
-Sheryl

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by YunHee Kang <yu...@gmail.com>.
Hi Sheryl,

First off, I tried to run crawler_launcher with an option "-autoPC".
Then I got a warning message as follows:
Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product:
[/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5]
Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file
/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp
Aug 10, 2012 11:12:26 AM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product:
[/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2/TES-Aura_L2-CO2-Nadir_r0000002147_F06_09.he5.info.tmp]

I think that the warning message is related with preconditions for ingest.
According to the run script for crawler_launcher,  it was wrong to
describe the option "pids" for the preconditions.
#!/bin/sh
export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
./crawler_launcher \
      -op   -stdPC \
      -mfx tmp\
      --productPath $STAGE_AREA\
      --filemgrUrl http://localhost:8000\
       --failureDir /tmp \
       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
       --metFileExtension tmp \
       -pids CheckThatDataFileSizeIsGreaterThanZero \
       --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
Let me know how to fix the warning.

Next I appied an option for metadata crawler  to the run script.
#!/bin/sh
export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
./crawler_launcher \
       -op    -metPC\
       -pp $STAGE_AREA\
       -fm http://localhost:8000\
       -mxc ../policy/crawler-config.xml\
       -mx org.apache.oodt.cas.metadata.extractors.ExternMetExtractor\
       -mxr ../policy/mime-extractor-map.xml\
       --failureDir /tmp \
       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
       --metFileExtension tmp \
       --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory

I also get the error message as follows:

ERROR: Failed to launch crawler : Error creating bean with name
'MetExtractorProductCrawler' defined in file
[/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/bin/../policy/crawler-beans.xml]:
Error setting property values; nested exception is
org.springframework.beans.PropertyBatchUpdateException; nested
PropertyAccessExceptions (1) are:
PropertyAccessException 1:
org.springframework.beans.MethodInvocationException: Property
'metExtractor' threw exception; nested exception is
org.apache.oodt.cas.metadata.exceptions.MetExtractionException: Failed
to parse config file : Failed to parser
'/home/yhkang/oodt-0.5/cas-crawler-0.5-SNAPSHOT/policy/crawler-config.xml'
: null

I just used the property file crawler-config.xml (as follows) in the
policy directory.

<beans xmlns="http://www.springframework.org/schema/beans"
        xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xmlns:p="http://www.springframework.org/schema/p"
        xsi:schemaLocation="http://www.springframework.org/schema/beans
http://www.springframework.org/schema/beans/spring-beans-2.5.xsd">
        <bean class="org.apache.oodt.cas.crawl.util.CasPropertyOverrideConfigurer"
/>
        <import resource="crawler-beans.xml" />
        <import resource="action-beans.xml" />
        <import resource="precondition-beans.xml" />
        <import resource="naming-beans.xml" />
</beans>

So I need to understand how to write some xml files(including
crawler-beans.xml, action-beans.xml, etc), which are imported into the
file  crawler-config.xml .
Could you share your experience with me ?
Thanks,
Yunhee

2012/8/10 Sheryl John <sh...@gmail.com>:
> Hi Yunhee,
>
> What are the error messages you get while running the crawler?
>
> I've faced similar issues with crawler when I tried out the first time too.
> I went through the crawler user guide to understand the architecture and
> then understood how it worked only after running crawler with several times
> to ingest files.
> I agree we need to update the guide and if you want to know about the
> MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page that
> I mentioned before will give you an idea how to get it working (It mentions
> the config files that you need to write for the above two crawlers).
>
>
>
> On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <yu...@gmail.com> wrote:
>
>> Hi Chris,
>>
>> I got a bunch of error messages when running the crawler_launcher script.
>> First off, I think I need to understand  how to a crawler works.
>> Can I get some materials to help me write configuration files for
>> crawler_launcher ?
>>
>> Honestly I am not familiar with Crawler.
>> But I will try to file a JIRA issue to update the Crawler user guide.
>>
>> Thanks,
>> Yunhee
>>
>>
>>
>> 2012/8/9 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
>> > Hi YunHee,
>> >
>> > Sorry, we need to update the docs, that is for sure. Can you help
>> > us remember by filing a JIRA issue to update the Crawler user
>> > guide and to fix the URL there?
>> >
>> > As for crawlerId, yes it's obsolete, you can find the modern
>> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h
>> >
>> > Cheers,
>> > Chris
>> >
>> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
>> >
>> >> Hi Chris and Sheryl,
>> >>
>> >> I understood  my mistake after modifying a wrong URL with the "/".
>> >> But there is the wrong  URL  that is used  as an option of
>> >> crawler_launcher in the apache oodt
>> >> homepage(http://oodt.apache.org/components/maven/crawler/user/).
>> >> --filemgrUrl http://localhost:9000/ \
>> >> So it made me confused.
>> >>
>> >> I tried to run the command mentioned below  according to  the home
>> >> page of apache oodt.
>> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
>> >> ERROR: Invalid option: 'crawlerId'
>> >>
>> >> But the error described above  was occurred.
>> >> Is the option 'crawlerid'  obsolete ?
>> >>
>> >> Thanks,
>> >> Yunhee
>> >>
>> >>
>> >> 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
>> >>> Perfect, Sheryl, my thoughts exactly.
>> >>>
>> >>> Cheers,
>> >>> Chris
>> >>>
>> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
>> >>>
>> >>>> Hi Yunhee,
>> >>>>
>> >>>> Check out this OODT wiki for crawler :
>> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>> >>>>
>> >>>> Did you try giving 'http://localhost:8000' without the "/" in the
>> end?
>> >>>> Also, specify
>> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
>> >>>> for  'clientTransferer' option.
>> >>>>
>> >>>>
>> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com>
>> wrote:
>> >>>>
>> >>>>> Hi Chris,
>> >>>>>
>> >>>>> I got an error message when I tried to run crawler_launcher by using
>> a
>> >>>>> shell script. The error message may be caused by a  wrong URL of
>> >>>>> filemgr.
>> >>>>> $ ./crawler_launcher.sh
>> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
>> >>>>> allowed for option
>> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
>> >>>>> URL'] - Allowed values = [http://.*:\d*]
>> >>>>>
>> >>>>> The following is the shell script that I wrote:
>> >>>>> $ cat crawler_launcher.sh
>> >>>>> #!/bin/sh
>> >>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
>> >>>>> ./crawler_launcher \
>> >>>>>      -op --launchStdCrawler \
>> >>>>>      --productPath $STAGE_AREA\
>> >>>>>      --filemgrUrl http://localhost:8000/\
>> >>>>>      --failureDir /tmp \
>> >>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>> >>>>>      --metFileExtension tmp \
>> >>>>>      --clientTransferer
>> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>> >>>>>
>> >>>>> I am wondering if there is a problem in the URL of the filemgr or
>> elsewhere
>> >>>>>
>> >>>>> Thanks,
>> >>>>> Yunhee
>> >>>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> -Sheryl
>> >>>
>> >>>
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Chris Mattmann, Ph.D.
>> >>> Senior Computer Scientist
>> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> >>> Office: 171-266B, Mailstop: 171-246
>> >>> Email: chris.a.mattmann@nasa.gov
>> >>> WWW:   http://sunset.usc.edu/~mattmann/
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>> Adjunct Assistant Professor, Computer Science Department
>> >>> University of Southern California, Los Angeles, CA 90089 USA
>> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >>>
>> >
>> >
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Chris Mattmann, Ph.D.
>> > Senior Computer Scientist
>> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> > Office: 171-266B, Mailstop: 171-246
>> > Email: chris.a.mattmann@nasa.gov
>> > WWW:   http://sunset.usc.edu/~mattmann/
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> > Adjunct Assistant Professor, Computer Science Department
>> > University of Southern California, Los Angeles, CA 90089 USA
>> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> >
>>
>
>
>
> --
> -Sheryl

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by Sheryl John <sh...@gmail.com>.
Hi Yunhee,

What are the error messages you get while running the crawler?

I've faced similar issues with crawler when I tried out the first time too.
I went through the crawler user guide to understand the architecture and
then understood how it worked only after running crawler with several times
to ingest files.
I agree we need to update the guide and if you want to know about the
MetExtractorProductCrawler and AutoDetectProductCrawler, the wiki page that
I mentioned before will give you an idea how to get it working (It mentions
the config files that you need to write for the above two crawlers).



On Thu, Aug 9, 2012 at 6:27 AM, YunHee Kang <yu...@gmail.com> wrote:

> Hi Chris,
>
> I got a bunch of error messages when running the crawler_launcher script.
> First off, I think I need to understand  how to a crawler works.
> Can I get some materials to help me write configuration files for
> crawler_launcher ?
>
> Honestly I am not familiar with Crawler.
> But I will try to file a JIRA issue to update the Crawler user guide.
>
> Thanks,
> Yunhee
>
>
>
> 2012/8/9 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> > Hi YunHee,
> >
> > Sorry, we need to update the docs, that is for sure. Can you help
> > us remember by filing a JIRA issue to update the Crawler user
> > guide and to fix the URL there?
> >
> > As for crawlerId, yes it's obsolete, you can find the modern
> > 0.4 and 0.5-trunk options by running ./crawler_launcher -h
> >
> > Cheers,
> > Chris
> >
> > On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
> >
> >> Hi Chris and Sheryl,
> >>
> >> I understood  my mistake after modifying a wrong URL with the "/".
> >> But there is the wrong  URL  that is used  as an option of
> >> crawler_launcher in the apache oodt
> >> homepage(http://oodt.apache.org/components/maven/crawler/user/).
> >> --filemgrUrl http://localhost:9000/ \
> >> So it made me confused.
> >>
> >> I tried to run the command mentioned below  according to  the home
> >> page of apache oodt.
> >> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
> >> ERROR: Invalid option: 'crawlerId'
> >>
> >> But the error described above  was occurred.
> >> Is the option 'crawlerid'  obsolete ?
> >>
> >> Thanks,
> >> Yunhee
> >>
> >>
> >> 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> >>> Perfect, Sheryl, my thoughts exactly.
> >>>
> >>> Cheers,
> >>> Chris
> >>>
> >>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
> >>>
> >>>> Hi Yunhee,
> >>>>
> >>>> Check out this OODT wiki for crawler :
> >>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> >>>>
> >>>> Did you try giving 'http://localhost:8000' without the "/" in the
> end?
> >>>> Also, specify
> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
> >>>> for  'clientTransferer' option.
> >>>>
> >>>>
> >>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com>
> wrote:
> >>>>
> >>>>> Hi Chris,
> >>>>>
> >>>>> I got an error message when I tried to run crawler_launcher by using
> a
> >>>>> shell script. The error message may be caused by a  wrong URL of
> >>>>> filemgr.
> >>>>> $ ./crawler_launcher.sh
> >>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
> >>>>> allowed for option
> >>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> >>>>> URL'] - Allowed values = [http://.*:\d*]
> >>>>>
> >>>>> The following is the shell script that I wrote:
> >>>>> $ cat crawler_launcher.sh
> >>>>> #!/bin/sh
> >>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> >>>>> ./crawler_launcher \
> >>>>>      -op --launchStdCrawler \
> >>>>>      --productPath $STAGE_AREA\
> >>>>>      --filemgrUrl http://localhost:8000/\
> >>>>>      --failureDir /tmp \
> >>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
> >>>>>      --metFileExtension tmp \
> >>>>>      --clientTransferer
> >>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> >>>>>
> >>>>> I am wondering if there is a problem in the URL of the filemgr or
> elsewhere
> >>>>>
> >>>>> Thanks,
> >>>>> Yunhee
> >>>>>
> >>>>
> >>>>
> >>>>
> >>>> --
> >>>> -Sheryl
> >>>
> >>>
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Chris Mattmann, Ph.D.
> >>> Senior Computer Scientist
> >>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >>> Office: 171-266B, Mailstop: 171-246
> >>> Email: chris.a.mattmann@nasa.gov
> >>> WWW:   http://sunset.usc.edu/~mattmann/
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>> Adjunct Assistant Professor, Computer Science Department
> >>> University of Southern California, Los Angeles, CA 90089 USA
> >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>>
> >
> >
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Chris Mattmann, Ph.D.
> > Senior Computer Scientist
> > NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> > Office: 171-266B, Mailstop: 171-246
> > Email: chris.a.mattmann@nasa.gov
> > WWW:   http://sunset.usc.edu/~mattmann/
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> > Adjunct Assistant Professor, Computer Science Department
> > University of Southern California, Los Angeles, CA 90089 USA
> > ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >
>



-- 
-Sheryl

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by YunHee Kang <yu...@gmail.com>.
Hi Chris,

I got a bunch of error messages when running the crawler_launcher script.
First off, I think I need to understand  how to a crawler works.
Can I get some materials to help me write configuration files for
crawler_launcher ?

Honestly I am not familiar with Crawler.
But I will try to file a JIRA issue to update the Crawler user guide.

Thanks,
Yunhee



2012/8/9 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> Hi YunHee,
>
> Sorry, we need to update the docs, that is for sure. Can you help
> us remember by filing a JIRA issue to update the Crawler user
> guide and to fix the URL there?
>
> As for crawlerId, yes it's obsolete, you can find the modern
> 0.4 and 0.5-trunk options by running ./crawler_launcher -h
>
> Cheers,
> Chris
>
> On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
>
>> Hi Chris and Sheryl,
>>
>> I understood  my mistake after modifying a wrong URL with the "/".
>> But there is the wrong  URL  that is used  as an option of
>> crawler_launcher in the apache oodt
>> homepage(http://oodt.apache.org/components/maven/crawler/user/).
>> --filemgrUrl http://localhost:9000/ \
>> So it made me confused.
>>
>> I tried to run the command mentioned below  according to  the home
>> page of apache oodt.
>> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
>> ERROR: Invalid option: 'crawlerId'
>>
>> But the error described above  was occurred.
>> Is the option 'crawlerid'  obsolete ?
>>
>> Thanks,
>> Yunhee
>>
>>
>> 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
>>> Perfect, Sheryl, my thoughts exactly.
>>>
>>> Cheers,
>>> Chris
>>>
>>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
>>>
>>>> Hi Yunhee,
>>>>
>>>> Check out this OODT wiki for crawler :
>>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>>>>
>>>> Did you try giving 'http://localhost:8000' without the "/" in the end?
>>>> Also, specify 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
>>>> for  'clientTransferer' option.
>>>>
>>>>
>>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com> wrote:
>>>>
>>>>> Hi Chris,
>>>>>
>>>>> I got an error message when I tried to run crawler_launcher by using a
>>>>> shell script. The error message may be caused by a  wrong URL of
>>>>> filemgr.
>>>>> $ ./crawler_launcher.sh
>>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
>>>>> allowed for option
>>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
>>>>> URL'] - Allowed values = [http://.*:\d*]
>>>>>
>>>>> The following is the shell script that I wrote:
>>>>> $ cat crawler_launcher.sh
>>>>> #!/bin/sh
>>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
>>>>> ./crawler_launcher \
>>>>>      -op --launchStdCrawler \
>>>>>      --productPath $STAGE_AREA\
>>>>>      --filemgrUrl http://localhost:8000/\
>>>>>      --failureDir /tmp \
>>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>>>>>      --metFileExtension tmp \
>>>>>      --clientTransferer
>>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>>>>>
>>>>> I am wondering if there is a problem in the URL of the filemgr or elsewhere
>>>>>
>>>>> Thanks,
>>>>> Yunhee
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> -Sheryl
>>>
>>>
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Chris Mattmann, Ph.D.
>>> Senior Computer Scientist
>>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>>> Office: 171-266B, Mailstop: 171-246
>>> Email: chris.a.mattmann@nasa.gov
>>> WWW:   http://sunset.usc.edu/~mattmann/
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>> Adjunct Assistant Professor, Computer Science Department
>>> University of Southern California, Los Angeles, CA 90089 USA
>>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by Thomas Bennett <lm...@gmail.com>.
Hey YunHee and Chris,

Sorry I've been out of the office for the last week and I've not gotten
around to check my emails yet :).

YunHee, apologies for the stale documentation relating to the crawler. I'll
head over there and update the guide to reflect the new interface when I
get some free time.

Thanks for trying in out - I'll post an update once I've fixed the
documentation.

Cheers,
Tom

On 8 August 2012 22:49, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi YunHee,
>
> Sorry, we need to update the docs, that is for sure. Can you help
> us remember by filing a JIRA issue to update the Crawler user
> guide and to fix the URL there?
>
> As for crawlerId, yes it's obsolete, you can find the modern
> 0.4 and 0.5-trunk options by running ./crawler_launcher -h
>
> Cheers,
> Chris
>
> On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:
>
> > Hi Chris and Sheryl,
> >
> > I understood  my mistake after modifying a wrong URL with the "/".
> > But there is the wrong  URL  that is used  as an option of
> > crawler_launcher in the apache oodt
> > homepage(http://oodt.apache.org/components/maven/crawler/user/).
> > --filemgrUrl http://localhost:9000/ \
> > So it made me confused.
> >
> > I tried to run the command mentioned below  according to  the home
> > page of apache oodt.
> > $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
> > ERROR: Invalid option: 'crawlerId'
> >
> > But the error described above  was occurred.
> > Is the option 'crawlerid'  obsolete ?
> >
> > Thanks,
> > Yunhee
> >
> >
> > 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> >> Perfect, Sheryl, my thoughts exactly.
> >>
> >> Cheers,
> >> Chris
> >>
> >> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
> >>
> >>> Hi Yunhee,
> >>>
> >>> Check out this OODT wiki for crawler :
> >>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> >>>
> >>> Did you try giving 'http://localhost:8000' without the "/" in the end?
> >>> Also, specify
> 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
> >>> for  'clientTransferer' option.
> >>>
> >>>
> >>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com>
> wrote:
> >>>
> >>>> Hi Chris,
> >>>>
> >>>> I got an error message when I tried to run crawler_launcher by using a
> >>>> shell script. The error message may be caused by a  wrong URL of
> >>>> filemgr.
> >>>> $ ./crawler_launcher.sh
> >>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
> >>>> allowed for option
> >>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> >>>> URL'] - Allowed values = [http://.*:\d*]
> >>>>
> >>>> The following is the shell script that I wrote:
> >>>> $ cat crawler_launcher.sh
> >>>> #!/bin/sh
> >>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> >>>> ./crawler_launcher \
> >>>>      -op --launchStdCrawler \
> >>>>      --productPath $STAGE_AREA\
> >>>>      --filemgrUrl http://localhost:8000/\
> >>>>      --failureDir /tmp \
> >>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
> >>>>      --metFileExtension tmp \
> >>>>      --clientTransferer
> >>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
> >>>>
> >>>> I am wondering if there is a problem in the URL of the filemgr or
> elsewhere
> >>>>
> >>>> Thanks,
> >>>> Yunhee
> >>>>
> >>>
> >>>
> >>>
> >>> --
> >>> -Sheryl
> >>
> >>
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Chris Mattmann, Ph.D.
> >> Senior Computer Scientist
> >> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> >> Office: 171-266B, Mailstop: 171-246
> >> Email: chris.a.mattmann@nasa.gov
> >> WWW:   http://sunset.usc.edu/~mattmann/
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >> Adjunct Assistant Professor, Computer Science Department
> >> University of Southern California, Los Angeles, CA 90089 USA
> >> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi YunHee,

Sorry, we need to update the docs, that is for sure. Can you help
us remember by filing a JIRA issue to update the Crawler user
guide and to fix the URL there?

As for crawlerId, yes it's obsolete, you can find the modern 
0.4 and 0.5-trunk options by running ./crawler_launcher -h

Cheers,
Chris

On Aug 7, 2012, at 7:03 AM, YunHee Kang wrote:

> Hi Chris and Sheryl,
> 
> I understood  my mistake after modifying a wrong URL with the "/".
> But there is the wrong  URL  that is used  as an option of
> crawler_launcher in the apache oodt
> homepage(http://oodt.apache.org/components/maven/crawler/user/).
> --filemgrUrl http://localhost:9000/ \
> So it made me confused.
> 
> I tried to run the command mentioned below  according to  the home
> page of apache oodt.
> $ ./crawler_launcher --crawlerId MetExtractorProductCrawler
> ERROR: Invalid option: 'crawlerId'
> 
> But the error described above  was occurred.
> Is the option 'crawlerid'  obsolete ?
> 
> Thanks,
> Yunhee
> 
> 
> 2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
>> Perfect, Sheryl, my thoughts exactly.
>> 
>> Cheers,
>> Chris
>> 
>> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
>> 
>>> Hi Yunhee,
>>> 
>>> Check out this OODT wiki for crawler :
>>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>>> 
>>> Did you try giving 'http://localhost:8000' without the "/" in the end?
>>> Also, specify 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
>>> for  'clientTransferer' option.
>>> 
>>> 
>>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com> wrote:
>>> 
>>>> Hi Chris,
>>>> 
>>>> I got an error message when I tried to run crawler_launcher by using a
>>>> shell script. The error message may be caused by a  wrong URL of
>>>> filemgr.
>>>> $ ./crawler_launcher.sh
>>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
>>>> allowed for option
>>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
>>>> URL'] - Allowed values = [http://.*:\d*]
>>>> 
>>>> The following is the shell script that I wrote:
>>>> $ cat crawler_launcher.sh
>>>> #!/bin/sh
>>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
>>>> ./crawler_launcher \
>>>>      -op --launchStdCrawler \
>>>>      --productPath $STAGE_AREA\
>>>>      --filemgrUrl http://localhost:8000/\
>>>>      --failureDir /tmp \
>>>>      --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>>>>      --metFileExtension tmp \
>>>>      --clientTransferer
>>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>>>> 
>>>> I am wondering if there is a problem in the URL of the filemgr or elsewhere
>>>> 
>>>> Thanks,
>>>> Yunhee
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> -Sheryl
>> 
>> 
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW:   http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> 


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by YunHee Kang <yu...@gmail.com>.
Hi Chris and Sheryl,

I understood  my mistake after modifying a wrong URL with the "/".
But there is the wrong  URL  that is used  as an option of
crawler_launcher in the apache oodt
homepage(http://oodt.apache.org/components/maven/crawler/user/).
 --filemgrUrl http://localhost:9000/ \
So it made me confused.

I tried to run the command mentioned below  according to  the home
page of apache oodt.
$ ./crawler_launcher --crawlerId MetExtractorProductCrawler
ERROR: Invalid option: 'crawlerId'

But the error described above  was occurred.
Is the option 'crawlerid'  obsolete ?

Thanks,
Yunhee


2012/8/7 Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>:
> Perfect, Sheryl, my thoughts exactly.
>
> Cheers,
> Chris
>
> On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:
>
>> Hi Yunhee,
>>
>> Check out this OODT wiki for crawler :
>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>>
>> Did you try giving 'http://localhost:8000' without the "/" in the end?
>> Also, specify 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
>> for  'clientTransferer' option.
>>
>>
>> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com> wrote:
>>
>>> Hi Chris,
>>>
>>> I got an error message when I tried to run crawler_launcher by using a
>>> shell script. The error message may be caused by a  wrong URL of
>>> filemgr.
>>> $ ./crawler_launcher.sh
>>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
>>> allowed for option
>>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
>>> URL'] - Allowed values = [http://.*:\d*]
>>>
>>> The following is the shell script that I wrote:
>>> $ cat crawler_launcher.sh
>>> #!/bin/sh
>>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
>>> ./crawler_launcher \
>>>       -op --launchStdCrawler \
>>>       --productPath $STAGE_AREA\
>>>       --filemgrUrl http://localhost:8000/\
>>>       --failureDir /tmp \
>>>       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>>>       --metFileExtension tmp \
>>>       --clientTransferer
>>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>>>
>>> I am wondering if there is a problem in the URL of the filemgr or elsewhere
>>>
>>> Thanks,
>>> Yunhee
>>>
>>
>>
>>
>> --
>> -Sheryl
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>

Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Perfect, Sheryl, my thoughts exactly.

Cheers,
Chris

On Aug 6, 2012, at 10:01 AM, Sheryl John wrote:

> Hi Yunhee,
> 
> Check out this OODT wiki for crawler :
> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
> 
> Did you try giving 'http://localhost:8000' without the "/" in the end?
> Also, specify 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
> for  'clientTransferer' option.
> 
> 
> On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com> wrote:
> 
>> Hi Chris,
>> 
>> I got an error message when I tried to run crawler_launcher by using a
>> shell script. The error message may be caused by a  wrong URL of
>> filemgr.
>> $ ./crawler_launcher.sh
>> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
>> allowed for option
>> [longOption='filemgrUrl',shortOption='fm',description='File Manager
>> URL'] - Allowed values = [http://.*:\d*]
>> 
>> The following is the shell script that I wrote:
>> $ cat crawler_launcher.sh
>> #!/bin/sh
>> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
>> ./crawler_launcher \
>>       -op --launchStdCrawler \
>>       --productPath $STAGE_AREA\
>>       --filemgrUrl http://localhost:8000/\
>>       --failureDir /tmp \
>>       --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>>       --metFileExtension tmp \
>>       --clientTransferer
>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>> 
>> I am wondering if there is a problem in the URL of the filemgr or elsewhere
>> 
>> Thanks,
>> Yunhee
>> 
> 
> 
> 
> -- 
> -Sheryl


++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Problem happened when I tried to run the script "crawler_launcher"

Posted by Sheryl John <sh...@gmail.com>.
Hi Yunhee,

Check out this OODT wiki for crawler :
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help

Did you try giving 'http://localhost:8000' without the "/" in the end?
Also, specify 'org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory'
for  'clientTransferer' option.


On Mon, Aug 6, 2012 at 9:46 AM, YunHee Kang <yu...@gmail.com> wrote:

> Hi Chris,
>
> I got an error message when I tried to run crawler_launcher by using a
> shell script. The error message may be caused by a  wrong URL of
> filemgr.
>  $ ./crawler_launcher.sh
> ERROR: Validation Failures: - Value 'http://localhost:8000/' is not
> allowed for option
> [longOption='filemgrUrl',shortOption='fm',description='File Manager
> URL'] - Allowed values = [http://.*:\d*]
>
> The following is the shell script that I wrote:
> $ cat crawler_launcher.sh
> #!/bin/sh
> export STAGE_AREA=/home/yhkang/oodt-0.5/cas-pushpull/staging/TESL2CO2
> ./crawler_launcher \
>        -op --launchStdCrawler \
>        --productPath $STAGE_AREA\
>        --filemgrUrl http://localhost:8000/\
>        --failureDir /tmp \
>        --actionIds DeleteDataFile MoveDataFileToFailureDir Unique \
>        --metFileExtension tmp \
>        --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferer
>
> I am wondering if there is a problem in the URL of the filemgr or elsewhere
>
> Thanks,
> Yunhee
>



-- 
-Sheryl