You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Markus Jelsma <ma...@buyways.nl> on 2010/06/21 20:04:46 UTC
The parse-tika plug-in in 1.1
Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.
Re: The parse-tika plug-in in 1.1
Posted by Alex McLintock <al...@gmail.com>.
On 22 June 2010 11:02, Markus Jelsma <ma...@buyways.nl> wrote:
> it's not documentented and newcomers would probably not recognize the need to
> compile it before they can use Tika.
I've been trying to update the Wiki with correct and improved
documentation but if the documentation which comes with Nutch is wrong
then please submit a bug report, or a patch, or just tell me so I can
try to do the same :-)
I don't like the fact that we (collectively) improve the code, but
leave the documentation in an out of date state.
Alex
Re: The parse-tika plug-in in 1.1
Posted by Markus Jelsma <ma...@buyways.nl>.
Well, it's not that a big problem of course, just another step before it's
ready for testing.
But i'm wondering, what would be a good reason not to ship the package as jar
as well? I'd bet this is not going to be the first mail on this issue, it's
not documentented and newcomers would probably not recognize the need to
compile it before they can use Tika.
On Tuesday 22 June 2010 11:48:40 Alex McLintock wrote:
> Hi Markus,
>
> > The jars and wars are/were always just there and i can/could use them
> > instantly.
>
> Sounds like we need to improve some documentation :-)
>
>
> I believe the package went to "source only" in the previous (1.0)
> version - so Chris is just following the current "best practice" by
> not creating all the jars in 1.1.
>
>
> I wasn't around for that decision but don't find it too onerous myself
> to run ant. Is it a problem for you?
>
> Alex
>
Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: The parse-tika plug-in in 1.1
Posted by Markus Jelsma <ma...@buyways.nl>.
Ah, i understand now! I confused the slightly older nightly build with the new
release. And to make matters worse, i've also got an even older nightly build.
One of them does not include the parse-tika plugin, for some reason...
Anyway. this solves the case though ;)
On Tuesday 22 June 2010 15:22:56 Mattmann, Chris A (388J) wrote:
> Hey Alex, and Markus,
>
> In fact, it was my preference to go to source only, but I actually included
> both source and binary in the 1.1 release to please the people who were
> used to getting binary distributions :) However, in the future, I would
> prefer and intend to pursue only doing source releases for Nutch. It's a
> community though so I'll have to convince my committer compatriots :)
>
> Regardless, parse-tika is in fact included in the Nutch 1.1 binary
> distribution:
>
> wget "http://mirror.cloudera.com/apache/nutch/apache-nutch-1.1-bin.tar.gz"
> tar xvzf apache-nutch-1.1-bin.tar.gz
> cd apache-nutch-1.1-bin
> unzip -l nutch-1.1.job | grep parse-tika
>
> shows:
>
> 0 06-07-10 05:28 plugins/parse-tika/
> 43033 06-07-10 05:28 plugins/parse-tika/asm-3.1.jar
> 189233 06-07-10 05:28 plugins/parse-tika/bcmail-jdk14-136.jar
> 229116 06-07-10 05:28 plugins/parse-tika/bcmail-jdk15-1.45.jar
> 1401560 06-07-10 05:28 plugins/parse-tika/bcprov-jdk14-136.jar
> 1663318 06-07-10 05:28 plugins/parse-tika/bcprov-jdk15-1.45.jar
> 143847 06-07-10 05:28 plugins/parse-tika/commons-compress-1.0.jar
> 60686 06-07-10 05:28 plugins/parse-tika/commons-logging-1.1.1.jar
> 313898 06-07-10 05:28 plugins/parse-tika/dom4j-1.6.1.jar
> 153220 06-07-10 05:28 plugins/parse-tika/fontbox-1.1.0.jar
> 28804 06-07-10 05:28
> plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar 51211 06-07-10
> 05:28 plugins/parse-tika/jempbox-1.1.0.jar
> 90929 06-07-10 05:28
> plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar 21227 06-07-10
> 05:28 plugins/parse-tika/parse-tika.jar
> 4709746 06-07-10 05:28 plugins/parse-tika/pdfbox-1.1.0.jar
> 2439 04-06-10 11:38 plugins/parse-tika/plugin.xml
> 1539291 06-07-10 05:28 plugins/parse-tika/poi-3.6.jar
> 412783 06-07-10 05:28 plugins/parse-tika/poi-ooxml-3.6.jar
> 3774332 06-07-10 05:28 plugins/parse-tika/poi-ooxml-schemas-3.6.jar
> 795888 06-07-10 05:28 plugins/parse-tika/poi-scratchpad-3.6.jar
> 90023 06-07-10 05:28 plugins/parse-tika/tagsoup-1.2.jar
> 215263 06-07-10 05:28 plugins/parse-tika/tika-parsers-0.7.jar
> 109318 06-07-10 05:28 plugins/parse-tika/xml-apis-1.0.b2.jar
> 2666695 06-07-10 05:28 plugins/parse-tika/xmlbeans-2.3.0.jar
>
> So, I'm not sure what you are seeing?
>
> Cheers,
> Chris
>
>
>
> On 6/22/10 2:48 AM, "Alex McLintock" <al...@gmail.com> wrote:
>
> Hi Markus,
>
> > The jars and wars are/were always just there and i can/could use them
> > instantly.
>
> Sounds like we need to improve some documentation :-)
>
>
> I believe the package went to "source only" in the previous (1.0)
> version - so Chris is just following the current "best practice" by
> not creating all the jars in 1.1.
>
>
> I wasn't around for that decision but don't find it too onerous myself
> to run ant. Is it a problem for you?
>
> Alex
>
>
>
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Chris Mattmann, Ph.D.
> Senior Computer Scientist
> NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
> Office: 171-266B, Mailstop: 171-246
> Email: Chris.Mattmann@jpl.nasa.gov
> WWW: http://sunset.usc.edu/~mattmann/
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
> Adjunct Assistant Professor, Computer Science Department
> University of Southern California, Los Angeles, CA 90089 USA
> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>
Markus Jelsma - Technisch Architect - Buyways BV
http://www.linkedin.com/in/markus17
050-8536620 / 06-50258350
Re: The parse-tika plug-in in 1.1
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hey Alex, and Markus,
In fact, it was my preference to go to source only, but I actually included both source and binary in the 1.1 release to please the people who were used to getting binary distributions :) However, in the future, I would prefer and intend to pursue only doing source releases for Nutch. It's a community though so I'll have to convince my committer compatriots :)
Regardless, parse-tika is in fact included in the Nutch 1.1 binary distribution:
wget "http://mirror.cloudera.com/apache/nutch/apache-nutch-1.1-bin.tar.gz"
tar xvzf apache-nutch-1.1-bin.tar.gz
cd apache-nutch-1.1-bin
unzip -l nutch-1.1.job | grep parse-tika
shows:
0 06-07-10 05:28 plugins/parse-tika/
43033 06-07-10 05:28 plugins/parse-tika/asm-3.1.jar
189233 06-07-10 05:28 plugins/parse-tika/bcmail-jdk14-136.jar
229116 06-07-10 05:28 plugins/parse-tika/bcmail-jdk15-1.45.jar
1401560 06-07-10 05:28 plugins/parse-tika/bcprov-jdk14-136.jar
1663318 06-07-10 05:28 plugins/parse-tika/bcprov-jdk15-1.45.jar
143847 06-07-10 05:28 plugins/parse-tika/commons-compress-1.0.jar
60686 06-07-10 05:28 plugins/parse-tika/commons-logging-1.1.1.jar
313898 06-07-10 05:28 plugins/parse-tika/dom4j-1.6.1.jar
153220 06-07-10 05:28 plugins/parse-tika/fontbox-1.1.0.jar
28804 06-07-10 05:28 plugins/parse-tika/geronimo-stax-api_1.0_spec-1.0.1.jar
51211 06-07-10 05:28 plugins/parse-tika/jempbox-1.1.0.jar
90929 06-07-10 05:28 plugins/parse-tika/metadata-extractor-2.4.0-beta-1.jar
21227 06-07-10 05:28 plugins/parse-tika/parse-tika.jar
4709746 06-07-10 05:28 plugins/parse-tika/pdfbox-1.1.0.jar
2439 04-06-10 11:38 plugins/parse-tika/plugin.xml
1539291 06-07-10 05:28 plugins/parse-tika/poi-3.6.jar
412783 06-07-10 05:28 plugins/parse-tika/poi-ooxml-3.6.jar
3774332 06-07-10 05:28 plugins/parse-tika/poi-ooxml-schemas-3.6.jar
795888 06-07-10 05:28 plugins/parse-tika/poi-scratchpad-3.6.jar
90023 06-07-10 05:28 plugins/parse-tika/tagsoup-1.2.jar
215263 06-07-10 05:28 plugins/parse-tika/tika-parsers-0.7.jar
109318 06-07-10 05:28 plugins/parse-tika/xml-apis-1.0.b2.jar
2666695 06-07-10 05:28 plugins/parse-tika/xmlbeans-2.3.0.jar
So, I'm not sure what you are seeing?
Cheers,
Chris
On 6/22/10 2:48 AM, "Alex McLintock" <al...@gmail.com> wrote:
Hi Markus,
> The jars and wars are/were always just there and i can/could use them instantly.
Sounds like we need to improve some documentation :-)
I believe the package went to "source only" in the previous (1.0)
version - so Chris is just following the current "best practice" by
not creating all the jars in 1.1.
I wasn't around for that decision but don't find it too onerous myself
to run ant. Is it a problem for you?
Alex
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: Re: The parse-tika plug-in in 1.1
Posted by Alex McLintock <al...@gmail.com>.
Hi Markus,
> The jars and wars are/were always just there and i can/could use them instantly.
Sounds like we need to improve some documentation :-)
I believe the package went to "source only" in the previous (1.0)
version - so Chris is just following the current "best practice" by
not creating all the jars in 1.1.
I wasn't around for that decision but don't find it too onerous myself
to run ant. Is it a problem for you?
Alex
RE: Re: The parse-tika plug-in in 1.1
Posted by Markus Jelsma <ma...@buyways.nl>.
Hmmm, i'm not building from source. I just download the package and get going! The jars and wars are/were always just there and i can/could use them instantly.
Maybe the compiled jar is just not included?
-----Original message-----
From: Mattmann, Chris A (388J) <ch...@jpl.nasa.gov>
Sent: Mon 21-06-2010 20:07
To: user@nutch.apache.org;
Subject: Re: The parse-tika plug-in in 1.1
Hi Markus,
Hmmm: I see it here?
http://svn.apache.org/repos/asf/nutch/tags/relase-1.1/src/plugin/parse-tika/
Where aren't you seeing it in?
Cheers,
Chris
On 6/21/10 11:04 AM, "Markus Jelsma" <ma...@buyways.nl> wrote:
Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Re: The parse-tika plug-in in 1.1
Posted by "Mattmann, Chris A (388J)" <ch...@jpl.nasa.gov>.
Hi Markus,
Hmmm: I see it here?
http://svn.apache.org/repos/asf/nutch/tags/relase-1.1/src/plugin/parse-tika/
Where aren't you seeing it in?
Cheers,
Chris
On 6/21/10 11:04 AM, "Markus Jelsma" <ma...@buyways.nl> wrote:
Well, where is it now? The parse-plugins.xml still refers to it, but it's not present in the plugins/ directory.
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Chris Mattmann, Ph.D.
Senior Computer Scientist
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 171-266B, Mailstop: 171-246
Email: Chris.Mattmann@jpl.nasa.gov
WWW: http://sunset.usc.edu/~mattmann/
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Adjunct Assistant Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++