You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@buyways.nl> on 2010/06/21 20:04:46 UTC

The parse-tika plug-in in 1.1

Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.

 

 

Re: The parse-tika plug-in in 1.1

Posted by Alex McLintock <al...@gmail.com>.
On 22 June 2010 11:02, Markus Jelsma <ma...@buyways.nl> wrote:
> it's not documentented and newcomers would probably not recognize the need to
> compile it before they can use Tika.

I've been trying to update the Wiki with correct and improved
documentation but if the documentation which comes with Nutch is wrong
then please submit a bug report, or a patch, or just tell me so I can
try to do the same :-)

I don't like the fact that we (collectively) improve the code, but
leave the documentation in an out of date state.

Alex

Re: The parse-tika plug-in in 1.1

Posted by Markus Jelsma <ma...@buyways.nl>.
Well, it's not that a big problem of course, just another step before it's 
ready for testing. 

But i'm wondering, what would be a good reason not to ship the package as jar 
as well? I'd bet this is not going to be the first mail on this issue, it's 
not documentented and newcomers would probably not recognize the need to 
compile it before they can use Tika.


On Tuesday 22 June 2010 11:48:40 Alex McLintock wrote:
> Hi Markus,
> 
> >  The jars and wars are/were always just there and i can/could use them
> > instantly.
> 
> Sounds like we need to improve some documentation :-)
> 
> 
> I believe the package went to "source only" in the previous (1.0)
> version - so Chris is just following the current "best practice" by
> not creating all the jars in 1.1.
> 
> 
> I wasn't around for that decision but don't find it too onerous myself
> to run ant. Is it a problem for you?
> 
> Alex
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: The parse-tika plug-in in 1.1

Posted by Markus Jelsma <ma...@buyways.nl>.
Ah, i understand now! I confused the slightly older nightly build with the new 
release. And to make matters worse, i've also got an even older nightly build. 
One of them does not include the parse-tika plugin, for some reason...

Anyway. this solves the case though ;)


On Tuesday 22 June 2010 15:22:56 Mattmann, Chris A (388J) wrote:
> Hey Alex, and Markus,
> 
> In fact, it was my preference to go to source only, but I actually included
>  both source and binary in the 1.1 release to please the people who were
>  used to getting binary distributions :) However, in the future, I would
>  prefer and intend to pursue only doing source releases for Nutch. It's a
>  community though so I'll have to convince my committer compatriots :)
> 
> Regardless, parse-tika is in fact included in the Nutch 1.1 binary
>  distribution:
> 
> wget "http://mirror.cloudera.com/apache/nutch/apache-nutch-1.1-bin.tar.gz"
> tar xvzf apache-nutch-1.1-bin.tar.gz
> cd apache-nutch-1.1-bin
> unzip -l nutch-1.1.job | grep parse-tika
> 
> shows:
> 
>         0  06-07-10 05:28   plugins/parse-tika/
>     43033  06-07-10 05:28   plugins/parse-tika/asm-3.1.jar
>    189233  06-07-10 05:28   plugins/parse-tika/bcmail-jdk14-136.jar
>    229116  06-07-10 05:28   plugins/parse-tika/bcmail-jdk15-1.45.jar
>   1401560  06-07-10 05:28   plugins/parse-tika/bcprov-jdk14-136.jar
>   1663318  06-07-10 05:28   plugins/parse-tika/bcprov-jdk15-1.45.jar
>    143847  06-07-10 05:28   plugins/parse-tika/commons-compress-1.0.jar
>     60686  06-07-10 05:28   plugins/parse-tika/commons-logging-1.1.1.jar
>    313898  06-07-10 05:28   plugins/parse-tika/dom4j-1.6.1.jar
>    153220  06-07-10 05:28   plugins/parse-tika/fontbox-1.1.0.jar
>     28804  06-07-10 05:28  
>  plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar 51211  06-07-10
>  05:28   plugins/parse-tika/jempbox-1.1.0.jar
>     90929  06-07-10 05:28  
>  plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar 21227  06-07-10
>  05:28   plugins/parse-tika/parse-tika.jar
>   4709746  06-07-10 05:28   plugins/parse-tika/pdfbox-1.1.0.jar
>      2439  04-06-10 11:38   plugins/parse-tika/plugin.xml
>   1539291  06-07-10 05:28   plugins/parse-tika/poi-3.6.jar
>    412783  06-07-10 05:28   plugins/parse-tika/poi-ooxml-3.6.jar
>   3774332  06-07-10 05:28   plugins/parse-tika/poi-ooxml-schemas-3.6.jar
>    795888  06-07-10 05:28   plugins/parse-tika/poi-scratchpad-3.6.jar
>     90023  06-07-10 05:28   plugins/parse-tika/tagsoup-1.2.jar
>    215263  06-07-10 05:28   plugins/parse-tika/tika-parsers-0.7.jar
>    109318  06-07-10 05:28   plugins/parse-tika/xml-apis-1.0.b2.jar
>   2666695  06-07-10 05:28   plugins/parse-tika/xmlbeans-2.3.0.jar
> 
> So, I'm not sure what you are seeing?
> 
> Cheers,
> Chris
> 
> 
> 
> On 6/22/10 2:48 AM, "Alex McLintock" <al...@gmail.com> wrote:
> 
> Hi Markus,
> 
> >  The jars and wars are/were always just there and i can/could use them
> > instantly.
> 
> Sounds like we need to improve some documentation :-)
> 
> 
> I believe the package went to "source only" in the previous (1.0)
> version - so Chris is just following the current "best practice" by
> not creating all the jars in 1.1.
> 
> 
> I wasn't around for that decision but don't find it too onerous myself
> to run ant. Is it a problem for you?
> 
> Alex
> 
> 
> 
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW:   http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> 

Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350


Re: The parse-tika plug-in in 1.1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Alex, and Markus,

In fact, it was my preference to go to source only, but I actually included both source and binary in the 1.1 release to please the people who were used to getting binary distributions :) However, in the future, I would prefer and intend to pursue only doing source releases for Nutch. It's a community though so I'll have to convince my committer compatriots :)

Regardless, parse-tika is in fact included in the Nutch 1.1 binary distribution:

wget "http://mirror.cloudera.com/apache/nutch/apache-nutch-1.1-bin.tar.gz"
tar xvzf apache-nutch-1.1-bin.tar.gz
cd apache-nutch-1.1-bin
unzip -l nutch-1.1.job | grep parse-tika

shows:

        0  06-07-10 05:28   plugins/parse-tika/
    43033  06-07-10 05:28   plugins/parse-tika/asm-3.1.jar
   189233  06-07-10 05:28   plugins/parse-tika/bcmail-jdk14-136.jar
   229116  06-07-10 05:28   plugins/parse-tika/bcmail-jdk15-1.45.jar
  1401560  06-07-10 05:28   plugins/parse-tika/bcprov-jdk14-136.jar
  1663318  06-07-10 05:28   plugins/parse-tika/bcprov-jdk15-1.45.jar
   143847  06-07-10 05:28   plugins/parse-tika/commons-compress-1.0.jar
    60686  06-07-10 05:28   plugins/parse-tika/commons-logging-1.1.1.jar
   313898  06-07-10 05:28   plugins/parse-tika/dom4j-1.6.1.jar
   153220  06-07-10 05:28   plugins/parse-tika/fontbox-1.1.0.jar
    28804  06-07-10 05:28   plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar
    51211  06-07-10 05:28   plugins/parse-tika/jempbox-1.1.0.jar
    90929  06-07-10 05:28   plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar
    21227  06-07-10 05:28   plugins/parse-tika/parse-tika.jar
  4709746  06-07-10 05:28   plugins/parse-tika/pdfbox-1.1.0.jar
     2439  04-06-10 11:38   plugins/parse-tika/plugin.xml
  1539291  06-07-10 05:28   plugins/parse-tika/poi-3.6.jar
   412783  06-07-10 05:28   plugins/parse-tika/poi-ooxml-3.6.jar
  3774332  06-07-10 05:28   plugins/parse-tika/poi-ooxml-schemas-3.6.jar
   795888  06-07-10 05:28   plugins/parse-tika/poi-scratchpad-3.6.jar
    90023  06-07-10 05:28   plugins/parse-tika/tagsoup-1.2.jar
   215263  06-07-10 05:28   plugins/parse-tika/tika-parsers-0.7.jar
   109318  06-07-10 05:28   plugins/parse-tika/xml-apis-1.0.b2.jar
  2666695  06-07-10 05:28   plugins/parse-tika/xmlbeans-2.3.0.jar

So, I'm not sure what you are seeing?

Cheers,
Chris



On 6/22/10 2:48 AM, "Alex McLintock" <al...@gmail.com> wrote:

Hi Markus,

>  The jars and wars are/were always just there and i can/could use them instantly.

Sounds like we need to improve some documentation :-)


I believe the package went to "source only" in the previous (1.0)
version - so Chris is just following the current "best practice" by
not creating all the jars in 1.1.


I wasn't around for that decision but don't find it too onerous myself
to run ant. Is it a problem for you?

Alex



++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: Re: The parse-tika plug-in in 1.1

Posted by Alex McLintock <al...@gmail.com>.
Hi Markus,

>  The jars and wars are/were always just there and i can/could use them instantly.

Sounds like we need to improve some documentation :-)


I believe the package went to "source only" in the previous (1.0)
version - so Chris is just following the current "best practice" by
not creating all the jars in 1.1.


I wasn't around for that decision but don't find it too onerous myself
to run ant. Is it a problem for you?

Alex

RE: Re: The parse-tika plug-in in 1.1

Posted by Markus Jelsma <ma...@buyways.nl>.
Hmmm, i'm not building from source. I just download the package and get going! The jars and wars are/were always just there and i can/could use them instantly.

 

Maybe the compiled jar is just not included?
 
-----Original message-----
From: Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
Sent: Mon 21-06-2010 20:07
To: user@nutch.apache.org; 
Subject: Re: The parse-tika plug-in in 1.1

Hi Markus,

Hmmm: I see it here?

http://svn.apache.org/repos/asf/nutch/tags/relase-1.1/src/plugin/parse-tika/

Where aren't you seeing it in?

Cheers,
Chris



On 6/21/10 11:04 AM, "Markus Jelsma" <ma...@buyways.nl> wrote:

Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.







++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++


Re: The parse-tika plug-in in 1.1

Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Markus,

Hmmm: I see it here?

http://svn.apache.org/repos/asf/nutch/tags/relase-1.1/src/plugin/parse-tika/

Where aren't you seeing it in?

Cheers,
Chris



On 6/21/10 11:04 AM, "Markus Jelsma" <ma...@buyways.nl> wrote:

Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.







++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW:   http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++