You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@oodt.apache.org by Thomas Bennett <tb...@ska.ac.za> on 2011/06/01 21:54:59 UTC
cas-crawler does not pass preconditions
Hi,
I've successfully got the CmdLineIngester working with an ExternMetExtractor
(written in python):
However, when I try launch the crawler I get a warning telling me the
the preconditions for ingest have not been met. No .met file has been
created.
Two questions:
1) I'm just wondering if there is any configuration that I'm missing.
2) Where should I start hunting in the code or logs to find out why my met
extractor was not run?
Kind regards,
Thomas
For your reference, here is the command and output.
bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath
/usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl
http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile
MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
--metExtractorConfig
/usr/local/meerkat/extractors/katextractor/katextractor.config
http://localhost:9000
StdProductCrawler
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file
/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product:
[/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
Re: cas-crawler does not pass preconditions
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Brian, thanks for this!
Cheers,
Chris
On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> hey thomas,
>
> you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
>
> * MetExtractorProductCrawler example configuration can be found in the source:
> - allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>
> * AutoDetectProductCrawler example configuration can be found in the source:
> - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>
> - your file might look something like:
>
> <mime-info>
>
>
>
> <mime-type type="product/hdf5">
>
>
> <glob pattern="*.h5"/>
>
>
> </mime-type>
>
>
> </mime-info>
> - maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>
> Hope this helps . . .
> -brian
>
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
>
>> Hi,
>>
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
>>
>> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
>>
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
>>
>> Kind regards,
>> Thomas
>>
>> For your reference, here is the command and output.
>>
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: cas-crawler does not pass preconditions
Posted by Cameron Goodale <si...@gmail.com>.
Thomas et. al,
The wiki page looks great! Thomas thank you for giving some concrete
examples with an explanation of what problem you were trying to solve. I
think the community will get a lot of mileage out of the page.
-Cameron
On Mon, Jun 6, 2011 at 5:10 AM, Thomas Bennett <tb...@ska.ac.za> wrote:
> Hi,
>
> Yes, thanks for the help Brian! Sorry for the late reply.
>
> I finally got around to getting my AutoDetectProductCrawler working.
> In response, Chris I hope you don't mind I've given some feedback about my
> experiences with the crawler on the wiki page that you created below. I hope
> thats okay. Please feel free to modify/add/revert as you wish.
>
> Cheers,
> Tom
>
>
> On 4 June 2011 07:40, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Brian, I created a wiki page with your guidance below:
>>
>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>>
>> Others can feel free to jump on and contribute.
>>
>> Cheers,
>> Chris
>>
>> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
>>
>> > hey thomas,
>> >
>> > you are using StdProductCrawler which assumes a *.met file already exist
>> for each file (it has only one precondition which is the existing of the
>> *.met file) . . . if you want a *.met file generated you will have to use
>> one of the other 2 crawlers. running: ./crawler_launcher -psc will give you
>> a list of supported crawlers. you can then run: ./crawler_launcher -h -cid
>> <crawler_id> where crawler id is one of the ids from the previous command .
>> . . unfortunately i don't think the other crawlers are documented all that
>> extensively . . . MetExtractorProductCrawler will use a single extractor for
>> all files . . . AutoDetectProductCrawler requires a mapping file to be
>> filled out an mime-types defined
>> >
>> > * MetExtractorProductCrawler example configuration can be found in the
>> source:
>> > - allows you to specify how the crawler will run your extractor
>> >
>> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>> >
>> > * AutoDetectProductCrawler example configuration can be found in the
>> source:
>> > - uses the same metadata extractor specification file (you will have
>> one of these for each mime-type)
>> > - allows you to define your mime-types -- that is, give a mime-type for
>> a given filename regular expression
>> >
>> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>> >
>> > - your file might look something like:
>> >
>> > <mime-info>
>> >
>> >
>> >
>> > <mime-type type="product/hdf5">
>> >
>> >
>> > <glob pattern="*.h5"/>
>> >
>> >
>> > </mime-type>
>> >
>> >
>> > </mime-info>
>> > - maps your mime-types to extractors
>> >
>> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>> >
>> > Hope this helps . . .
>> > -brian
>> >
>> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I've successfully got the CmdLineIngester working with an
>> ExternMetExtractor (written in python):
>> >>
>> >> However, when I try launch the crawler I get a warning telling me the
>> the preconditions for ingest have not been met. No .met file has been
>> created.
>> >>
>> >> Two questions:
>> >> 1) I'm just wondering if there is any configuration that I'm missing.
>> >> 2) Where should I start hunting in the code or logs to find out why my
>> met extractor was not run?
>> >>
>> >> Kind regards,
>> >> Thomas
>> >>
>> >> For your reference, here is the command and output.
>> >>
>> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath
>> /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl
>> http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile
>> MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer
>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>> --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> --metExtractorConfig
>> /usr/local/meerkat/extractors/katextractor/katextractor.config
>> >> http://localhost:9000
>> >> StdProductCrawler
>> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler
>> handleFile
>> >> INFO: Handling file
>> /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler
>> handleFile
>> >> WARNING: Failed to pass preconditions for ingest of product:
>> [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>> >>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW: http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>
> --
> Thomas Bennett
>
> SKA South Africa
>
> Office : +2721 506 7341
> Mobile : +2779 523 7105
> Email : tbennett@ska.ac.za
>
>
--
Sent from a Tin Can attached to a String
Re: cas-crawler does not pass preconditions
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Tom,
> I finally got around to getting my AutoDetectProductCrawler working. In response, Chris I hope you don't mind I've given some feedback about my experiences with the crawler on the wiki page that you created below. I hope thats okay. Please feel free to modify/add/revert as you wish.
Awesome!
Thanks for the contribution, Tom. Wow you really rocked that page! Keep em' comin'!
Cheers,
Chris
>
> Cheers,
> Tom
>
> On 4 June 2011 07:40, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
> Brian, I created a wiki page with your guidance below:
>
> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>
> Others can feel free to jump on and contribute.
>
> Cheers,
> Chris
>
> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
>
> > hey thomas,
> >
> > you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
> >
> > * MetExtractorProductCrawler example configuration can be found in the source:
> > - allows you to specify how the crawler will run your extractor
> > https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
> >
> > * AutoDetectProductCrawler example configuration can be found in the source:
> > - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> > - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> > https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
> >
> > - your file might look something like:
> >
> > <mime-info>
> >
> >
> >
> > <mime-type type="product/hdf5">
> >
> >
> > <glob pattern="*.h5"/>
> >
> >
> > </mime-type>
> >
> >
> > </mime-info>
> > - maps your mime-types to extractors
> > https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
> >
> > Hope this helps . . .
> > -brian
> >
> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
> >
> >> Hi,
> >>
> >> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
> >>
> >> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
> >>
> >> Two questions:
> >> 1) I'm just wondering if there is any configuration that I'm missing.
> >> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
> >>
> >> Kind regards,
> >> Thomas
> >>
> >> For your reference, here is the command and output.
> >>
> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
> >> http://localhost:9000
> >> StdProductCrawler
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
> >> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
> >> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
> --
> Thomas Bennett
>
> SKA South Africa
>
> Office : +2721 506 7341
> Mobile : +2779 523 7105
> Email : tbennett@ska.ac.za
>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: cas-crawler does not pass preconditions
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Tom,
> I finally got around to getting my AutoDetectProductCrawler working. In response, Chris I hope you don't mind I've given some feedback about my experiences with the crawler on the wiki page that you created below. I hope thats okay. Please feel free to modify/add/revert as you wish.
Awesome!
Thanks for the contribution, Tom. Wow you really rocked that page! Keep em' comin'!
Cheers,
Chris
>
> Cheers,
> Tom
>
> On 4 June 2011 07:40, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
> Brian, I created a wiki page with your guidance below:
>
> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>
> Others can feel free to jump on and contribute.
>
> Cheers,
> Chris
>
> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
>
> > hey thomas,
> >
> > you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
> >
> > * MetExtractorProductCrawler example configuration can be found in the source:
> > - allows you to specify how the crawler will run your extractor
> > https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
> >
> > * AutoDetectProductCrawler example configuration can be found in the source:
> > - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> > - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> > https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
> >
> > - your file might look something like:
> >
> > <mime-info>
> >
> >
> >
> > <mime-type type="product/hdf5">
> >
> >
> > <glob pattern="*.h5"/>
> >
> >
> > </mime-type>
> >
> >
> > </mime-info>
> > - maps your mime-types to extractors
> > https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
> >
> > Hope this helps . . .
> > -brian
> >
> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
> >
> >> Hi,
> >>
> >> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
> >>
> >> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
> >>
> >> Two questions:
> >> 1) I'm just wondering if there is any configuration that I'm missing.
> >> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
> >>
> >> Kind regards,
> >> Thomas
> >>
> >> For your reference, here is the command and output.
> >>
> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
> >> http://localhost:9000
> >> StdProductCrawler
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
> >> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
> >> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
> --
> Thomas Bennett
>
> SKA South Africa
>
> Office : +2721 506 7341
> Mobile : +2779 523 7105
> Email : tbennett@ska.ac.za
>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: cas-crawler does not pass preconditions
Posted by Cameron Goodale <si...@gmail.com>.
Thomas et. al,
The wiki page looks great! Thomas thank you for giving some concrete
examples with an explanation of what problem you were trying to solve. I
think the community will get a lot of mileage out of the page.
-Cameron
On Mon, Jun 6, 2011 at 5:10 AM, Thomas Bennett <tb...@ska.ac.za> wrote:
> Hi,
>
> Yes, thanks for the help Brian! Sorry for the late reply.
>
> I finally got around to getting my AutoDetectProductCrawler working.
> In response, Chris I hope you don't mind I've given some feedback about my
> experiences with the crawler on the wiki page that you created below. I hope
> thats okay. Please feel free to modify/add/revert as you wish.
>
> Cheers,
> Tom
>
>
> On 4 June 2011 07:40, Mattmann, Chris A (388J) <
> chris.a.mattmann@jpl.nasa.gov> wrote:
>
>> Brian, I created a wiki page with your guidance below:
>>
>> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>>
>> Others can feel free to jump on and contribute.
>>
>> Cheers,
>> Chris
>>
>> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
>>
>> > hey thomas,
>> >
>> > you are using StdProductCrawler which assumes a *.met file already exist
>> for each file (it has only one precondition which is the existing of the
>> *.met file) . . . if you want a *.met file generated you will have to use
>> one of the other 2 crawlers. running: ./crawler_launcher -psc will give you
>> a list of supported crawlers. you can then run: ./crawler_launcher -h -cid
>> <crawler_id> where crawler id is one of the ids from the previous command .
>> . . unfortunately i don't think the other crawlers are documented all that
>> extensively . . . MetExtractorProductCrawler will use a single extractor for
>> all files . . . AutoDetectProductCrawler requires a mapping file to be
>> filled out an mime-types defined
>> >
>> > * MetExtractorProductCrawler example configuration can be found in the
>> source:
>> > - allows you to specify how the crawler will run your extractor
>> >
>> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>> >
>> > * AutoDetectProductCrawler example configuration can be found in the
>> source:
>> > - uses the same metadata extractor specification file (you will have
>> one of these for each mime-type)
>> > - allows you to define your mime-types -- that is, give a mime-type for
>> a given filename regular expression
>> >
>> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>> >
>> > - your file might look something like:
>> >
>> > <mime-info>
>> >
>> >
>> >
>> > <mime-type type="product/hdf5">
>> >
>> >
>> > <glob pattern="*.h5"/>
>> >
>> >
>> > </mime-type>
>> >
>> >
>> > </mime-info>
>> > - maps your mime-types to extractors
>> >
>> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>> >
>> > Hope this helps . . .
>> > -brian
>> >
>> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za>
>> wrote:
>> >
>> >> Hi,
>> >>
>> >> I've successfully got the CmdLineIngester working with an
>> ExternMetExtractor (written in python):
>> >>
>> >> However, when I try launch the crawler I get a warning telling me the
>> the preconditions for ingest have not been met. No .met file has been
>> created.
>> >>
>> >> Two questions:
>> >> 1) I'm just wondering if there is any configuration that I'm missing.
>> >> 2) Where should I start hunting in the code or logs to find out why my
>> met extractor was not run?
>> >>
>> >> Kind regards,
>> >> Thomas
>> >>
>> >> For your reference, here is the command and output.
>> >>
>> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath
>> /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl
>> http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile
>> MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer
>> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>> --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
>> --metExtractorConfig
>> /usr/local/meerkat/extractors/katextractor/katextractor.config
>> >> http://localhost:9000
>> >> StdProductCrawler
>> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler
>> handleFile
>> >> INFO: Handling file
>> /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler
>> handleFile
>> >> WARNING: Failed to pass preconditions for ingest of product:
>> [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>> >>
>>
>>
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Chris Mattmann, Ph.D.
>> Senior Computer Scientist
>> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
>> Office: 171-266B, Mailstop: 171-246
>> Email: chris.a.mattmann@nasa.gov
>> WWW: http://sunset.usc.edu/~mattmann/
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> Adjunct Assistant Professor, Computer Science Department
>> University of Southern California, Los Angeles, CA 90089 USA
>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
>>
>
>
> --
> Thomas Bennett
>
> SKA South Africa
>
> Office : +2721 506 7341
> Mobile : +2779 523 7105
> Email : tbennett@ska.ac.za
>
>
--
Sent from a Tin Can attached to a String
Re: cas-crawler does not pass preconditions
Posted by holenoter <ho...@mac.com>.
no problem ... appreciate you adding to the documentation!
-brian
On Jun 06, 2011, at 05:10 AM, Thomas Bennett <tb...@ska.ac.za> wrote:
Hi,
Yes, thanks for the help Brian! Sorry for the late reply.
I finally got around to getting my AutoDetectProductCrawler working. In response, Chris I hope you don't mind I've given some feedback about my experiences with the crawler on the wiki page that you created below. I hope thats okay. Please feel free to modify/add/revert as you wish.
Cheers,
Tom
On 4 June 2011 07:40, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
Brian, I created a wiki page with your guidance below:
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
Others can feel free to jump on and contribute.
Cheers,
Chris
On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> hey thomas,
>
> you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
>
> * MetExtractorProductCrawler example configuration can be found in the source:
> - allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>
> * AutoDetectProductCrawler example configuration can be found in the source:
> - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>
> - your file might look something like:
>
> <mime-info>
>
>
>
> <mime-type type="product/hdf5">
>
>
> <glob pattern="*.h5"/>
>
>
> </mime-type>
>
>
> </mime-info>
> - maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>
> Hope this helps . . .
> -brian
>
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
>
>> Hi,
>>
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
>>
>> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
>>
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
>>
>> Kind regards,
>> Thomas
>>
>> For your reference, here is the command and output.
>>
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--
Thomas Bennett
SKA South Africa
Office : +2721 506 7341
Mobile : +2779 523 7105
Email : tbennett@ska.ac.za
Re: cas-crawler does not pass preconditions
Posted by holenoter <ho...@mac.com>.
no problem ... appreciate you adding to the documentation!
-brian
On Jun 06, 2011, at 05:10 AM, Thomas Bennett <tb...@ska.ac.za> wrote:
Hi,
Yes, thanks for the help Brian! Sorry for the late reply.
I finally got around to getting my�AutoDetectProductCrawler working. In�response, Chris I hope you don't mind I've given some feedback about my experiences with the crawler on the wiki page that you created below.�I hope thats okay. Please feel free to modify/add/revert as you wish.
Cheers,
Tom
On 4 June 2011 07:40, Mattmann, Chris A (388J) <ch...@jpl.nasa.gov> wrote:
Brian, I created a wiki page with your guidance below:
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
Others can feel free to jump on and contribute.
Cheers,
Chris
On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> hey thomas,
>
> you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. �running: ./crawler_launcher -psc will give you a list of supported crawlers. �you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
>
> * MetExtractorProductCrawler example configuration can be found in the source:
> �- allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>
> * AutoDetectProductCrawler example configuration can be found in the source:
> �- uses the same metadata extractor specification file (you will have one of these for each mime-type)
> �- allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>
> � �- your file might look something like:
>
> <mime-info>
>
>
>
> � � � <mime-type type="product/hdf5">
>
>
> <glob pattern="*.h5"/>
>
>
> </mime-type>
>
>
> � � � </mime-info>
> �- maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>
> Hope this helps . . .
> -brian
>
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
>
>> Hi,
>>
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
>>
>> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
>>
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
>>
>> Kind regards,
>> Thomas
>>
>> For your reference, here is the command and output.
>>
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: � http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
--
Thomas Bennett
SKA South Africa
Office :� +2721 506 7341
Mobile : +2779 523 7105
Email� :� tbennett@ska.ac.za
Re: cas-crawler does not pass preconditions
Posted by Thomas Bennett <tb...@ska.ac.za>.
Hi,
Yes, thanks for the help Brian! Sorry for the late reply.
I finally got around to getting my AutoDetectProductCrawler working.
In response, Chris I hope you don't mind I've given some feedback about my
experiences with the crawler on the wiki page that you created below. I hope
thats okay. Please feel free to modify/add/revert as you wish.
Cheers,
Tom
On 4 June 2011 07:40, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> Brian, I created a wiki page with your guidance below:
>
> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>
> Others can feel free to jump on and contribute.
>
> Cheers,
> Chris
>
> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
>
> > hey thomas,
> >
> > you are using StdProductCrawler which assumes a *.met file already exist
> for each file (it has only one precondition which is the existing of the
> *.met file) . . . if you want a *.met file generated you will have to use
> one of the other 2 crawlers. running: ./crawler_launcher -psc will give you
> a list of supported crawlers. you can then run: ./crawler_launcher -h -cid
> <crawler_id> where crawler id is one of the ids from the previous command .
> . . unfortunately i don't think the other crawlers are documented all that
> extensively . . . MetExtractorProductCrawler will use a single extractor for
> all files . . . AutoDetectProductCrawler requires a mapping file to be
> filled out an mime-types defined
> >
> > * MetExtractorProductCrawler example configuration can be found in the
> source:
> > - allows you to specify how the crawler will run your extractor
> >
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
> >
> > * AutoDetectProductCrawler example configuration can be found in the
> source:
> > - uses the same metadata extractor specification file (you will have one
> of these for each mime-type)
> > - allows you to define your mime-types -- that is, give a mime-type for
> a given filename regular expression
> >
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
> >
> > - your file might look something like:
> >
> > <mime-info>
> >
> >
> >
> > <mime-type type="product/hdf5">
> >
> >
> > <glob pattern="*.h5"/>
> >
> >
> > </mime-type>
> >
> >
> > </mime-info>
> > - maps your mime-types to extractors
> >
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
> >
> > Hope this helps . . .
> > -brian
> >
> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
> >
> >> Hi,
> >>
> >> I've successfully got the CmdLineIngester working with an
> ExternMetExtractor (written in python):
> >>
> >> However, when I try launch the crawler I get a warning telling me the
> the preconditions for ingest have not been met. No .met file has been
> created.
> >>
> >> Two questions:
> >> 1) I'm just wondering if there is any configuration that I'm missing.
> >> 2) Where should I start hunting in the code or logs to find out why my
> met extractor was not run?
> >>
> >> Kind regards,
> >> Thomas
> >>
> >> For your reference, here is the command and output.
> >>
> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath
> /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl
> http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile
> MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
> --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> --metExtractorConfig
> /usr/local/meerkat/extractors/katextractor/katextractor.config
> >> http://localhost:9000
> >> StdProductCrawler
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler
> handleFile
> >> INFO: Handling file
> /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> >> WARNING: Failed to pass preconditions for ingest of product:
> [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
--
Thomas Bennett
SKA South Africa
Office : +2721 506 7341
Mobile : +2779 523 7105
Email : tbennett@ska.ac.za
Re: cas-crawler does not pass preconditions
Posted by Thomas Bennett <tb...@ska.ac.za>.
Hi,
Yes, thanks for the help Brian! Sorry for the late reply.
I finally got around to getting my AutoDetectProductCrawler working.
In response, Chris I hope you don't mind I've given some feedback about my
experiences with the crawler on the wiki page that you created below. I hope
thats okay. Please feel free to modify/add/revert as you wish.
Cheers,
Tom
On 4 June 2011 07:40, Mattmann, Chris A (388J) <
chris.a.mattmann@jpl.nasa.gov> wrote:
> Brian, I created a wiki page with your guidance below:
>
> https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
>
> Others can feel free to jump on and contribute.
>
> Cheers,
> Chris
>
> On Jun 1, 2011, at 2:20 PM, holenoter wrote:
>
> > hey thomas,
> >
> > you are using StdProductCrawler which assumes a *.met file already exist
> for each file (it has only one precondition which is the existing of the
> *.met file) . . . if you want a *.met file generated you will have to use
> one of the other 2 crawlers. running: ./crawler_launcher -psc will give you
> a list of supported crawlers. you can then run: ./crawler_launcher -h -cid
> <crawler_id> where crawler id is one of the ids from the previous command .
> . . unfortunately i don't think the other crawlers are documented all that
> extensively . . . MetExtractorProductCrawler will use a single extractor for
> all files . . . AutoDetectProductCrawler requires a mapping file to be
> filled out an mime-types defined
> >
> > * MetExtractorProductCrawler example configuration can be found in the
> source:
> > - allows you to specify how the crawler will run your extractor
> >
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
> >
> > * AutoDetectProductCrawler example configuration can be found in the
> source:
> > - uses the same metadata extractor specification file (you will have one
> of these for each mime-type)
> > - allows you to define your mime-types -- that is, give a mime-type for
> a given filename regular expression
> >
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
> >
> > - your file might look something like:
> >
> > <mime-info>
> >
> >
> >
> > <mime-type type="product/hdf5">
> >
> >
> > <glob pattern="*.h5"/>
> >
> >
> > </mime-type>
> >
> >
> > </mime-info>
> > - maps your mime-types to extractors
> >
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
> >
> > Hope this helps . . .
> > -brian
> >
> > On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
> >
> >> Hi,
> >>
> >> I've successfully got the CmdLineIngester working with an
> ExternMetExtractor (written in python):
> >>
> >> However, when I try launch the crawler I get a warning telling me the
> the preconditions for ingest have not been met. No .met file has been
> created.
> >>
> >> Two questions:
> >> 1) I'm just wondering if there is any configuration that I'm missing.
> >> 2) Where should I start hunting in the code or logs to find out why my
> met extractor was not run?
> >>
> >> Kind regards,
> >> Thomas
> >>
> >> For your reference, here is the command and output.
> >>
> >> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath
> /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl
> http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile
> MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer
> org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
> --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor
> --metExtractorConfig
> /usr/local/meerkat/extractors/katextractor/katextractor.config
> >> http://localhost:9000
> >> StdProductCrawler
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
> >> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler
> handleFile
> >> INFO: Handling file
> /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
> >> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler
> handleFile
> >> WARNING: Failed to pass preconditions for ingest of product:
> [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
> >>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: chris.a.mattmann@nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
--
Thomas Bennett
SKA South Africa
Office : +2721 506 7341
Mobile : +2779 523 7105
Email : tbennett@ska.ac.za
Re: cas-crawler does not pass preconditions
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Brian, I created a wiki page with your guidance below:
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
Others can feel free to jump on and contribute.
Cheers,
Chris
On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> hey thomas,
>
> you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
>
> * MetExtractorProductCrawler example configuration can be found in the source:
> - allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>
> * AutoDetectProductCrawler example configuration can be found in the source:
> - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>
> - your file might look something like:
>
> <mime-info>
>
>
>
> <mime-type type="product/hdf5">
>
>
> <glob pattern="*.h5"/>
>
>
> </mime-type>
>
>
> </mime-info>
> - maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>
> Hope this helps . . .
> -brian
>
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
>
>> Hi,
>>
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
>>
>> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
>>
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
>>
>> Kind regards,
>> Thomas
>>
>> For your reference, here is the command and output.
>>
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: cas-crawler does not pass preconditions
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Brian, thanks for this!
Cheers,
Chris
On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> hey thomas,
>
> you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
>
> * MetExtractorProductCrawler example configuration can be found in the source:
> - allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>
> * AutoDetectProductCrawler example configuration can be found in the source:
> - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>
> - your file might look something like:
>
> <mime-info>
>
>
>
> <mime-type type="product/hdf5">
>
>
> <glob pattern="*.h5"/>
>
>
> </mime-type>
>
>
> </mime-info>
> - maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>
> Hope this helps . . .
> -brian
>
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
>
>> Hi,
>>
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
>>
>> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
>>
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
>>
>> Kind regards,
>> Thomas
>>
>> For your reference, here is the command and output.
>>
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: cas-crawler does not pass preconditions
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Brian, I created a wiki page with your guidance below:
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help
Others can feel free to jump on and contribute.
Cheers,
Chris
On Jun 1, 2011, at 2:20 PM, holenoter wrote:
> hey thomas,
>
> you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
>
> * MetExtractorProductCrawler example configuration can be found in the source:
> - allows you to specify how the crawler will run your extractor
> https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
>
> * AutoDetectProductCrawler example configuration can be found in the source:
> - uses the same metadata extractor specification file (you will have one of these for each mime-type)
> - allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
>
> - your file might look something like:
>
> <mime-info>
>
>
>
> <mime-type type="product/hdf5">
>
>
> <glob pattern="*.h5"/>
>
>
> </mime-type>
>
>
> </mime-info>
> - maps your mime-types to extractors
> https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
>
> Hope this helps . . .
> -brian
>
> On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
>
>> Hi,
>>
>> I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
>>
>> However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
>>
>> Two questions:
>> 1) I'm just wondering if there is any configuration that I'm missing.
>> 2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
>>
>> Kind regards,
>> Thomas
>>
>> For your reference, here is the command and output.
>>
>> bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
>> http://localhost:9000
>> StdProductCrawler
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawlProductCrawler crawl
>> INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cascrawl.ProductCrawler handleFile
>> INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
>> Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
>> WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
>>
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: chris.a.mattmann@nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: cas-crawler does not pass preconditions
Posted by holenoter <ho...@mac.com>.
hey thomas,
you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. �running: ./crawler_launcher -psc will give you a list of supported crawlers. �you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
* MetExtractorProductCrawler example configuration can be found in the source:
�- allows you to specify how the crawler will run your extractor
https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
* AutoDetectProductCrawler example configuration can be found in the source:
�- uses the same metadata extractor specification file (you will have one of these for each mime-type)
�- allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
� �- your file might look something like:
<mime-info>
<mime-type type="product/hdf5">
<glob pattern="*.h5"/>
</mime-type>
</mime-info>
�- maps your mime-types to extractors
https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
Hope this helps . . .
-brian
On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
Hi,
I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
However, when I try launch the crawler I get a warning telling me the the�preconditions�for ingest have not been met. No .met file has been created.
Two questions:
1) I'm just wondering if there is any configuration that I'm missing.
2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
Kind regards,
Thomas
For your reference, here is the command and output.
bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
http://localhost:9000
StdProductCrawler
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]
Re: cas-crawler does not pass preconditions
Posted by holenoter <ho...@mac.com>.
hey thomas,
you are using StdProductCrawler which assumes a *.met file already exist for each file (it has only one precondition which is the existing of the *.met file) . . . if you want a *.met file generated you will have to use one of the other 2 crawlers. running: ./crawler_launcher -psc will give you a list of supported crawlers. you can then run: ./crawler_launcher -h -cid <crawler_id> where crawler id is one of the ids from the previous command . . . unfortunately i don't think the other crawlers are documented all that extensively . . . MetExtractorProductCrawler will use a single extractor for all files . . . AutoDetectProductCrawler requires a mapping file to be filled out an mime-types defined
* MetExtractorProductCrawler example configuration can be found in the source:
- allows you to specify how the crawler will run your extractor
https://svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/resources/examples/extern-config.xml
* AutoDetectProductCrawler example configuration can be found in the source:
- uses the same metadata extractor specification file (you will have one of these for each mime-type)
- allows you to define your mime-types -- that is, give a mime-type for a given filename regular expression
https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mimetypes.xml
- your file might look something like:
<mime-info>
<mime-type type="product/hdf5">
<glob pattern="*.h5"/>
</mime-type>
</mime-info>
- maps your mime-types to extractors
https://svn.apache.org/repos/asf/oodt/trunk/crawler/src/main/resources/examples/mime-extractor-map.xml
Hope this helps . . .
-brian
On Jun 01, 2011, at 12:54 PM, Thomas Bennett <tb...@ska.ac.za> wrote:
Hi,
I've successfully got the CmdLineIngester working with an ExternMetExtractor (written in python):
However, when I try launch the crawler I get a warning telling me the the preconditions for ingest have not been met. No .met file has been created.
Two questions:
1) I'm just wondering if there is any configuration that I'm missing.
2) Where should I start hunting in the code or logs to find out why my met extractor was not run?
Kind regards,
Thomas
For your reference, here is the command and output.
bin$ ./crawler_launcher --crawlerId StdProductCrawler --productPath /usr/local/meerkat/data/staging/products/hdf5 --filemgrUrl http://localhost:9000 --failureDir /tmp --actionIds DeleteDataFile MoveDataFileToFailureDir Unique --metFileExtension met --clientTransferer org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory --metExtractor org.apache.oodt.cas.metadata.extractors.ExternMetExtractor --metExtractorConfig /usr/local/meerkat/extractors/katextractor/katextractor.config
http://localhost:9000
StdProductCrawler
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler crawl
INFO: Crawling /usr/local/meerkat/data/staging/products/hdf5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
INFO: Handling file /usr/local/meerkat/data/staging/products/hdf5/1263940095.h5
Jun 1, 2011 9:48:07 PM org.apache.oodt.cas.crawl.ProductCrawler handleFile
WARNING: Failed to pass preconditions for ingest of product: [/usr/local/meerkat/data/staging/products/hdf5/1263940095.h5]