You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by jepse <jp...@jepse.net> on 2011/12/22 13:41:20 UTC

HtmlParser parse-html-plugin

Hi,

my concern is to use the Nutch HtmlParser as a standalone Application.
Therefor i followed the instructions for RunNutchInEclipse. Now i have a
working Eclipse Project, wich i can use to start my claimed plugin in a
standalone Application (running the main class in HtmlParser.java). Now i
need to extract this Runtime Configuration for a standalone App. is there a
way to execute a specific class with the relevant classpath?

Cheers, Philippe

--
View this message in context: http://lucene.472066.n3.nabble.com/HtmlParser-parse-html-plugin-tp3606486p3606486.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: HtmlParser parse-html-plugin

Posted by Lewis John Mcgibbney <le...@gmail.com>.
Absolutely Markus.

Maybe you would consider using the parse-tika plugin within the
application. As the Nutch code DOES NOT DO any parsing, it will also
give you a great understanding of how the parse plugins fit into the
various core classes.

Thanks

On Fri, Dec 23, 2011 at 10:09 AM, Markus Jelsma
<ma...@openindex.io> wrote:
> I would recommend using Tika for parsing. It does much more and is being
> maintained as well.
>
> http://tika.apache.org/
>
> On Thursday 22 December 2011 13:41:20 jepse wrote:
>> Hi,
>>
>> my concern is to use the Nutch HtmlParser as a standalone Application.
>> Therefor i followed the instructions for RunNutchInEclipse. Now i have a
>> working Eclipse Project, wich i can use to start my claimed plugin in a
>> standalone Application (running the main class in HtmlParser.java). Now i
>> need to extract this Runtime Configuration for a standalone App. is there a
>> way to execute a specific class with the relevant classpath?
>>
>> Cheers, Philippe
>>
>> --
>> View this message in context:
>> http://lucene.472066.n3.nabble.com/HtmlParser-parse-html-plugin-tp3606486p
>> 3606486.html Sent from the Nutch - User mailing list archive at Nabble.com.
>
> --
> Markus Jelsma - CTO - Openindex



-- 
Lewis

Re: HtmlParser parse-html-plugin

Posted by Markus Jelsma <ma...@openindex.io>.
I would recommend using Tika for parsing. It does much more and is being 
maintained as well.

http://tika.apache.org/

On Thursday 22 December 2011 13:41:20 jepse wrote:
> Hi,
> 
> my concern is to use the Nutch HtmlParser as a standalone Application.
> Therefor i followed the instructions for RunNutchInEclipse. Now i have a
> working Eclipse Project, wich i can use to start my claimed plugin in a
> standalone Application (running the main class in HtmlParser.java). Now i
> need to extract this Runtime Configuration for a standalone App. is there a
> way to execute a specific class with the relevant classpath?
> 
> Cheers, Philippe
> 
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/HtmlParser-parse-html-plugin-tp3606486p
> 3606486.html Sent from the Nutch - User mailing list archive at Nabble.com.

-- 
Markus Jelsma - CTO - Openindex