You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@oodt.apache.org by Mengying Wang <wa...@usc.edu> on 2014/09/19 21:46:26 UTC

Where is the TikaCmdLineMetExtractor?

Dear everyone,

I am trying to integrate the Apache OODT Crawler with the Apache Tika.
According
to the Apache OODT Crawler Help (
https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help), I can
use the TikaCmdLineMetExtractor directly. However, when I run the command:

./crawler_launcher
--filemgrUrl http://localhost:9000
--operation --launchMetCrawler
--clientTransferer
org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
--productPath /usr/local/meerkat/data/staging/products/hdf5
--metExtractor
org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
--metExtractorConfig
/usr/local/meerkat/extractors/tikaextractor/tikaextractor.config

It raises an error:
ERROR: Validation Failures: - Value
'org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor' for
option metExtractor is not a valid class

I am wondering is this a software version problem? Which version of the
Apache OODT Crawler and the Apache Tika should I use to run such a command?
Currently, I am using the Apache OODT Cas-Crawler-0.6 and the Apache
Tika-1.6.

Thank you very much for your time and help!

Best,
Angela Wang

Re: Where is the TikaCmdLineMetExtractor?

Posted by MengYing Wang <me...@gmail.com>.
Dear Prof. Mattmann,

Yes, the TikaCmdLineMetExtractor is available in the Apache OODT
Cas-Crawler-0.7,
and it works very well. Thank you!

Best,
Angela Wang

On Sun, Sep 21, 2014 at 7:46 PM, Mattmann, Chris A (3980) <
chris.a.mattmann@jpl.nasa.gov> wrote:

> Hi Angela,
>
> The TikaExtractor is available here:
>
> svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/java/org/apache/oodt/
> cas/metadata/extractors/
>
> Looking at CHANGES.txt looks like this only showed up in 0.7, so you'll
> need to upgrade to 0.7 (just released).
>
> Cheers,
> Chris
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Chief Architect
> Instrument Software and Science Data Systems Section (398)
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 168-519, Mailstop: 168-527
> Email: chris.a.mattmann@nasa.gov
> WWW:  http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Associate Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
>
>
>
>
>
> -----Original Message-----
> From: Mengying Wang <wa...@usc.edu>
> Date: Friday, September 19, 2014 12:46 PM
> To: <de...@oodt.apache.org>
> Subject: Where is the TikaCmdLineMetExtractor?
>
> >Dear everyone,
> >I am trying to integrate the Apache OODT Crawler with the Apache Tika.
> >According to the Apache OODT Crawler Help
> >(https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help), I
> >can use the TikaCmdLineMetExtractor directly. However, when I run the
> >command:
> >
> >./crawler_launcher
> >--filemgrUrl http://localhost:9000
> >--operation --launchMetCrawler
> >--clientTransferer
> >org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
> >--productPath /usr/local/meerkat/data/staging/products/hdf5
> >--metExtractor
> >org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
> >--metExtractorConfig
> >/usr/local/meerkat/extractors/tikaextractor/tikaextractor.config
> >
> >It raises an error:
> >ERROR: Validation Failures: - Value
> >'org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor' for
> >option metExtractor is not a valid class
> >
> >
> >I am wondering is this a software version problem? Which version of the
> >Apache OODT Crawler and the Apache Tika should I use to run such a
> >command? Currently, I am using the Apache OODT Cas-Crawler-0.6 and the
> >Apache Tika-1.6.
> >
> >Thank you very much for your time and help!
> >
> >Best,
> >Angela Wang
> >
>
>


-- 
Best,
Mengying (Angela) Wang

Re: Where is the TikaCmdLineMetExtractor?

Posted by "Mattmann, Chris A (3980)" <ch...@jpl.nasa.gov>.
Hi Angela,

The TikaExtractor is available here:

svn.apache.org/repos/asf/oodt/trunk/metadata/src/main/java/org/apache/oodt/
cas/metadata/extractors/

Looking at CHANGES.txt looks like this only showed up in 0.7, so you'll
need to upgrade to 0.7 (just released).

Cheers,
Chris

++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattmann@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++






-----Original Message-----
From: Mengying Wang <wa...@usc.edu>
Date: Friday, September 19, 2014 12:46 PM
To: <de...@oodt.apache.org>
Subject: Where is the TikaCmdLineMetExtractor?

>Dear everyone,
>I am trying to integrate the Apache OODT Crawler with the Apache Tika.
>According to the Apache OODT Crawler Help
>(https://cwiki.apache.org/confluence/display/OODT/OODT+Crawler+Help), I
>can use the TikaCmdLineMetExtractor directly. However, when I run the
>command: 
>
>./crawler_launcher
>--filemgrUrl http://localhost:9000
>--operation --launchMetCrawler
>--clientTransferer
>org.apache.oodt.cas.filemgr.datatransfer.LocalDataTransferFactory
>--productPath /usr/local/meerkat/data/staging/products/hdf5
>--metExtractor 
>org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor
>--metExtractorConfig
>/usr/local/meerkat/extractors/tikaextractor/tikaextractor.config
>
>It raises an error:
>ERROR: Validation Failures: - Value
>'org.apache.oodt.cas.metadata.extractors.TikaCmdLineMetExtractor' for
>option metExtractor is not a valid class
>
>
>I am wondering is this a software version problem? Which version of the
>Apache OODT Crawler and the Apache Tika should I use to run such a
>command? Currently, I am using the Apache OODT Cas-Crawler-0.6 and the
>Apache Tika-1.6.
>
>Thank you very much for your time and help!
>
>Best,
>Angela Wang 
>