You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by Yash Thenuan Thenuan <ri...@iiita.ac.in> on 2018/03/13 13:21:25 UTC

Dependency between plugins

I am trying to import Htmlparser in my custom parser.
I did it in the same way by which Htmlparser imports lib-nekohtml but it
didn't worked.
Can anybody please tell me how to do it?

RE: RE: Dependency between plugins

Posted by Yossi Tamari <yo...@pipl.com>.
If you look at the code of the HTML parser, you'll see that the parameter is passed the variable "root", the same variable that is passed to the methods that extract the outlinks, the title, and the text. So it simply can’t be null. It may be an issue with what toString is printing for this element (for example it may be printing the name of the root element, and it happens to not have a name).
Again, I strongly recommend debugging, so you can see the real value there.

> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 15 March 2018 10:26
> To: user@nutch.apache.org
> Subject: RE: RE: Dependency between plugins
> 
> Yes  I am using Html parser and yes the document is getting parsed but
> document fragment is printing null.
> 
> On 15 Mar 2018 13:52, "Yossi Tamari" <yo...@pipl.com> wrote:
> 
> > Is your parser the HTML parser? I can say from experience that the
> > document is passed.
> > I really recommend debugging in local mode rather than using sysout.
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 15 March 2018 10:13
> > > To: user@nutch.apache.org
> > > Subject: RE: RE: Dependency between plugins
> > >
> > > I tried printing the contents of document fragment in
> > > parsefilter-regex
> > by writing
> > > System.out.println(doc) but its printing null!! And document is
> > > getting
> > parsed!!
> > >
> > > On 15 Mar 2018 13:15, "Yossi Tamari" <yo...@pipl.com> wrote:
> > >
> > > > Parse filters receive a DocumentFragment as their fourth parameter.
> > > >
> > > > > -----Original Message-----
> > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > Sent: 15 March 2018 08:50
> > > > > To: user@nutch.apache.org
> > > > > Subject: Re: RE: Dependency between plugins
> > > > >
> > > > > Hi Jorge and Yossi,
> > > > > The reason why I am trying to do it is exactly what yossi said
> > > > > "removing
> > > > nutch
> > > > > overhead", I didn't thought that it would be that complicated,
> > > > > All I am
> > > > trying is to
> > > > > call the existing parsers from my own parser, but I am not able
> > > > > to do it
> > > > correctly,
> > > > > may be chain approach is a better idea to do that but *do parse
> > > > > filter
> > > > receives
> > > > > any DOM object?* as a parameter so by accessing that I can
> > > > > extract the
> > > > data I
> > > > > want??
> > > > >
> > > > >
> > > > > On Wed, Mar 14, 2018 at 7:36 PM, Yossi Tamari
> > > > > <yo...@pipl.com>
> > > > > wrote:
> > > > >
> > > > > > There is no built-in mechanism for this. However, are you sure
> > > > > > you really want a parser for each website, rather than a
> > > > > > parse-filter for each website (which will take the results of
> > > > > > the HTML parser and apply some domain specific customizations)?
> > > > > > In both cases you can use a dispatcher approach, which your
> > > > > > custom parser is, or a chain approach (every parser that is
> > > > > > not intended for this domain returns null, or each
> > > > > > parse-filter that is not intended for this domain returns the ParseResult
> that it received).
> > > > > > The advantage of the chain approach is that each new website
> > > > > > parser is a first-class, reusable Nutch object. The advantage
> > > > > > of the dispatcher approach is that you don't need to deal with
> > > > > > a lot of the Nutch overhead, but it is more monolithic (You
> > > > > > can end up with one huge plugin that needs to be constantly
> > > > > > modified whenever one of the
> > > > websites is
> > > > > modified).
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > Sent: 14 March 2018 15:28
> > > > > > > To: user@nutch.apache.org
> > > > > > > Subject: Re: RE: Dependency between plugins
> > > > > > >
> > > > > > > Is there a way in nutch by which we can use different parser
> > > > > > > for
> > > > > > different
> > > > > > > websites?
> > > > > > > I am trying to do this by writing a custom parser which will
> > > > > > > call
> > > > > > different parsers
> > > > > > > for different websites?
> > > > > > >
> > > > > > > On 14 Mar 2018 14:19, "Semyon Semyonov"
> > > > > <se...@mail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > As a side note,
> > > > > > > >
> > > > > > > > I had to implement my own parser with extra functionality,
> > > > > > > > simple copy/past of the code of HTMLparser did the job.
> > > > > > > >
> > > > > > > > If you want to inherit instead of copy paste it can be a
> > > > > > > > bad idea at
> > > > > > all.
> > > > > > > > HTML parser is a concrete non abstract class, therefore
> > > > > > > > the inheritance will not be so smooth as in case of
> > > > > > > > contract implementations(the plugins are contracts, ie
> > > > > > > > interfaces) and can
> > > > > > easily break
> > > > > > > some OOP rules.
> > > > > > > >
> > > > > > > >
> > > > > > > > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > > > > > > > From: "Yossi Tamari" <yo...@pipl.com>
> > > > > > > > To: user@nutch.apache.org
> > > > > > > > Subject: RE: Dependency between plugins One suggestion I
> > > > > > > > can make is to ensure that the html-parse plugin is built
> > > > > > > > before your plugin (since you are including the jars that
> > > > > > > > are generated in its
> > > > build).
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > > > Sent: 14 March 2018 09:55
> > > > > > > > > To: user@nutch.apache.org
> > > > > > > > > Subject: Re: Dependency between plugins
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > > It didn't worked in ant runtime.
> > > > > > > > > I included "import org.apache.nutch.parse.html;" in my
> > > > > > > > > custom parser
> > > > > > > > code.
> > > > > > > > > but it is throwing errror while i am doing ant runtime.
> > > > > > > > >
> > > > > > > > > [javac]
> > > > > > > > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/pars
> > > > > > > > > e-
> > > > > > > > >
> > > > >
> custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > > > > > > > error: cannot find symbol
> > > > > > > > >
> > > > > > > > > [javac] import org.apache.nutch.parse.html;
> > > > > > > > >
> > > > > > > > > [javac] ^
> > > > > > > > >
> > > > > > > > > [javac] symbol: class html
> > > > > > > > >
> > > > > > > > > [javac] location: package org.apache.nutch.parse
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > below are the xml files of my parser
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > My ivy.xml
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > <ivy-module version="1.0">
> > > > > > > > >
> > > > > > > > > <info organisation="org.apache.nutch"
> > > > > > > > > module="${ant.project.name}">
> > > > > > > > >
> > > > > > > > > <license name="Apache 2.0"/>
> > > > > > > > >
> > > > > > > > > <ivyauthor name="Apache Nutch Team"
> > > > > > > > > url="http://nutch.apache.org"/>
> > > > > > > > >
> > > > > > > > > <description>
> > > > > > > > >
> > > > > > > > > Apache Nutch
> > > > > > > > >
> > > > > > > > > </description>
> > > > > > > > >
> > > > > > > > > </info>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > <configurations>
> > > > > > > > >
> > > > > > > > > <include file="../../../ivy/ivy-configurations.xml"/>
> > > > > > > > >
> > > > > > > > > </configurations>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > <publications>
> > > > > > > > >
> > > > > > > > > <!--get the artifact from our module name-->
> > > > > > > > >
> > > > > > > > > <artifact conf="master"/>
> > > > > > > > >
> > > > > > > > > </publications>
> > > > > > > > >
> > > > > > > > > </ivy-module>
> > > > > > > > >
> > > > > > > > > build.xml
> > > > > > > > >
> > > > > > > > > <project name="parse-custom" default="jar-core">
> > > > > > > > >
> > > > > > > > > <import file="../build-plugin.xml"/>
> > > > > > > > >
> > > > > > > > > <!-- Build compilation dependencies --> <target
> > > > > > > > > name="deps-jar"> <ant target="compile-test"
> > inheritall="false"
> > > > > > > > > dir="../parse-html"/> </target>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > <path id="plugin.deps">
> > > > > > > > > <fileset dir="${nutch.root}/build"> <include
> > > > > > > > > name="**/parse-html/*.jar" /> </fileset> </path>
> > > > > > > > >
> > > > > > > > > <!-- Deploy Unit test dependencies --> <target
> > > > > > > > > name="deps-test"> <ant target="deploy" inheritall="false"
> > > > > > > > > dir="../parse-html"/> <ant target="deploy" inheritall="false"
> > > > > > > > > dir="../nutch-extensionpoints"/> </target>
> > > > > > > > >
> > > > > > > > > </project>
> > > > > > > > >
> > > > > > > > > plugin.xml
> > > > > > > > >
> > > > > > > > > <plugin
> > > > > > > > > id="parse-custom"
> > > > > > > > > name="Custom Parse Plug-in"
> > > > > > > > > version="1.0.0"
> > > > > > > > > provider-name="nutch.org">
> > > > > > > > >
> > > > > > > > > <runtime>
> > > > > > > > > <library name="parse-custom.jar"> <export name="*"/>
> > > > > > > > > </library> </runtime>
> > > > > > > > >
> > > > > > > > > <requires>
> > > > > > > > > <import plugin="parse-html"/> <import
> > > > > > > > > plugin="nutch-extensionpoints"/> </requires> <extension
> > > > > > > > > id="org.apache.nutch.parse.custom"
> > > > > > > > > name="CustomParse"
> > > > > > > > > point="org.apache.nutch.parse.Parser">
> > > > > > > > >
> > > > > > > > > <implementation id="org.apache.nutch.parse.
> > custom.CustomParser"
> > > > > > > > > class="org.apache.nutch.parse.custom.CustomParser">
> > > > > > > > > <parameter name="contentType"
> > > > > > > > > value="text/html|application/xhtml+xml"/>
> > > > > > > > > <parameter name="pathSuffix" value=""/>
> > > > > > > > > </implementation>
> > > > > > > > >
> > > > > > > > > </extension>
> > > > > > > > >
> > > > > > > > > </plugin>
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > > > > > > > <yo...@pipl.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Yash,
> > > > > > > > > >
> > > > > > > > > > I don't know how to do it, I never tried, but if I had
> > > > > > > > > > to it would be a trial and error thing....
> > > > > > > > > >
> > > > > > > > > > If you want to increase the chances that someone will
> > > > > > > > > > answer your question, I suggest you provide as much
> > > > > > > > > > information as
> > > > possible:
> > > > > > > > > > Where did it not work? In "ant runtime", or when
> > > > > > > > > > running in
> > > > Hadoop?
> > > > > > > > > > What was the error message?
> > > > > > > > > > What is the content of your build.xml, plugin.xml, and
> > ivy.xml?
> > > > > > > > > > Is parse-html configured in your plugin-includes?
> > > > > > > > > >
> > > > > > > > > > If it's a problem during execution, I would suggest
> > > > > > > > > > looking at or debugging the code of PluginClassLoader.
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > -----Original Message-----
> > > > > > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > > > > > Sent: 14 March 2018 08:34
> > > > > > > > > > > To: user@nutch.apache.org
> > > > > > > > > > > Subject: Re: Dependency between plugins
> > > > > > > > > > >
> > > > > > > > > > > Anybody please help me out regarding this.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan
> > > > > > > > > > > Thenuan < rit2014049@iiita.ac.in> wrote:
> > > > > > > > > > >
> > > > > > > > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > > > > > > > I did it in the same way by which Htmlparser
> > > > > > > > > > > > imports lib-nekohtml but it didn't worked.
> > > > > > > > > > > > Can anybody please tell me how to do it?
> > > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >


RE: RE: Dependency between plugins

Posted by Yash Thenuan Thenuan <ri...@iiita.ac.in>.
Yes  I am using Html parser and yes the document is getting parsed but
document fragment is printing null.

On 15 Mar 2018 13:52, "Yossi Tamari" <yo...@pipl.com> wrote:

> Is your parser the HTML parser? I can say from experience that the
> document is passed.
> I really recommend debugging in local mode rather than using sysout.
>
> > -----Original Message-----
> > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > Sent: 15 March 2018 10:13
> > To: user@nutch.apache.org
> > Subject: RE: RE: Dependency between plugins
> >
> > I tried printing the contents of document fragment in parsefilter-regex
> by writing
> > System.out.println(doc) but its printing null!! And document is getting
> parsed!!
> >
> > On 15 Mar 2018 13:15, "Yossi Tamari" <yo...@pipl.com> wrote:
> >
> > > Parse filters receive a DocumentFragment as their fourth parameter.
> > >
> > > > -----Original Message-----
> > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > Sent: 15 March 2018 08:50
> > > > To: user@nutch.apache.org
> > > > Subject: Re: RE: Dependency between plugins
> > > >
> > > > Hi Jorge and Yossi,
> > > > The reason why I am trying to do it is exactly what yossi said
> > > > "removing
> > > nutch
> > > > overhead", I didn't thought that it would be that complicated, All I
> > > > am
> > > trying is to
> > > > call the existing parsers from my own parser, but I am not able to
> > > > do it
> > > correctly,
> > > > may be chain approach is a better idea to do that but *do parse
> > > > filter
> > > receives
> > > > any DOM object?* as a parameter so by accessing that I can extract
> > > > the
> > > data I
> > > > want??
> > > >
> > > >
> > > > On Wed, Mar 14, 2018 at 7:36 PM, Yossi Tamari
> > > > <yo...@pipl.com>
> > > > wrote:
> > > >
> > > > > There is no built-in mechanism for this. However, are you sure you
> > > > > really want a parser for each website, rather than a parse-filter
> > > > > for each website (which will take the results of the HTML parser
> > > > > and apply some domain specific customizations)?
> > > > > In both cases you can use a dispatcher approach, which your custom
> > > > > parser is, or a chain approach (every parser that is not intended
> > > > > for this domain returns null, or each parse-filter that is not
> > > > > intended for this domain returns the ParseResult that it received).
> > > > > The advantage of the chain approach is that each new website
> > > > > parser is a first-class, reusable Nutch object. The advantage of
> > > > > the dispatcher approach is that you don't need to deal with a lot
> > > > > of the Nutch overhead, but it is more monolithic (You can end up
> > > > > with one huge plugin that needs to be constantly modified whenever
> > > > > one of the
> > > websites is
> > > > modified).
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > Sent: 14 March 2018 15:28
> > > > > > To: user@nutch.apache.org
> > > > > > Subject: Re: RE: Dependency between plugins
> > > > > >
> > > > > > Is there a way in nutch by which we can use different parser for
> > > > > different
> > > > > > websites?
> > > > > > I am trying to do this by writing a custom parser which will
> > > > > > call
> > > > > different parsers
> > > > > > for different websites?
> > > > > >
> > > > > > On 14 Mar 2018 14:19, "Semyon Semyonov"
> > > > <se...@mail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > As a side note,
> > > > > > >
> > > > > > > I had to implement my own parser with extra functionality,
> > > > > > > simple copy/past of the code of HTMLparser did the job.
> > > > > > >
> > > > > > > If you want to inherit instead of copy paste it can be a bad
> > > > > > > idea at
> > > > > all.
> > > > > > > HTML parser is a concrete non abstract class, therefore the
> > > > > > > inheritance will not be so smooth as in case of contract
> > > > > > > implementations(the plugins are contracts, ie interfaces) and
> > > > > > > can
> > > > > easily break
> > > > > > some OOP rules.
> > > > > > >
> > > > > > >
> > > > > > > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > > > > > > From: "Yossi Tamari" <yo...@pipl.com>
> > > > > > > To: user@nutch.apache.org
> > > > > > > Subject: RE: Dependency between plugins One suggestion I can
> > > > > > > make is to ensure that the html-parse plugin is built before
> > > > > > > your plugin (since you are including the jars that are
> > > > > > > generated in its
> > > build).
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > > Sent: 14 March 2018 09:55
> > > > > > > > To: user@nutch.apache.org
> > > > > > > > Subject: Re: Dependency between plugins
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > > It didn't worked in ant runtime.
> > > > > > > > I included "import org.apache.nutch.parse.html;" in my
> > > > > > > > custom parser
> > > > > > > code.
> > > > > > > > but it is throwing errror while i am doing ant runtime.
> > > > > > > >
> > > > > > > > [javac]
> > > > > > > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > > > > > >
> > > > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > > > > > > error: cannot find symbol
> > > > > > > >
> > > > > > > > [javac] import org.apache.nutch.parse.html;
> > > > > > > >
> > > > > > > > [javac] ^
> > > > > > > >
> > > > > > > > [javac] symbol: class html
> > > > > > > >
> > > > > > > > [javac] location: package org.apache.nutch.parse
> > > > > > > >
> > > > > > > >
> > > > > > > > below are the xml files of my parser
> > > > > > > >
> > > > > > > >
> > > > > > > > My ivy.xml
> > > > > > > >
> > > > > > > >
> > > > > > > > <ivy-module version="1.0">
> > > > > > > >
> > > > > > > > <info organisation="org.apache.nutch"
> > > > > > > > module="${ant.project.name}">
> > > > > > > >
> > > > > > > > <license name="Apache 2.0"/>
> > > > > > > >
> > > > > > > > <ivyauthor name="Apache Nutch Team"
> > > > > > > > url="http://nutch.apache.org"/>
> > > > > > > >
> > > > > > > > <description>
> > > > > > > >
> > > > > > > > Apache Nutch
> > > > > > > >
> > > > > > > > </description>
> > > > > > > >
> > > > > > > > </info>
> > > > > > > >
> > > > > > > >
> > > > > > > > <configurations>
> > > > > > > >
> > > > > > > > <include file="../../../ivy/ivy-configurations.xml"/>
> > > > > > > >
> > > > > > > > </configurations>
> > > > > > > >
> > > > > > > >
> > > > > > > > <publications>
> > > > > > > >
> > > > > > > > <!--get the artifact from our module name-->
> > > > > > > >
> > > > > > > > <artifact conf="master"/>
> > > > > > > >
> > > > > > > > </publications>
> > > > > > > >
> > > > > > > > </ivy-module>
> > > > > > > >
> > > > > > > > build.xml
> > > > > > > >
> > > > > > > > <project name="parse-custom" default="jar-core">
> > > > > > > >
> > > > > > > > <import file="../build-plugin.xml"/>
> > > > > > > >
> > > > > > > > <!-- Build compilation dependencies --> <target
> > > > > > > > name="deps-jar"> <ant target="compile-test"
> inheritall="false"
> > > > > > > > dir="../parse-html"/> </target>
> > > > > > > >
> > > > > > > >
> > > > > > > > <path id="plugin.deps">
> > > > > > > > <fileset dir="${nutch.root}/build"> <include
> > > > > > > > name="**/parse-html/*.jar" /> </fileset> </path>
> > > > > > > >
> > > > > > > > <!-- Deploy Unit test dependencies --> <target
> > > > > > > > name="deps-test"> <ant target="deploy" inheritall="false"
> > > > > > > > dir="../parse-html"/> <ant target="deploy" inheritall="false"
> > > > > > > > dir="../nutch-extensionpoints"/> </target>
> > > > > > > >
> > > > > > > > </project>
> > > > > > > >
> > > > > > > > plugin.xml
> > > > > > > >
> > > > > > > > <plugin
> > > > > > > > id="parse-custom"
> > > > > > > > name="Custom Parse Plug-in"
> > > > > > > > version="1.0.0"
> > > > > > > > provider-name="nutch.org">
> > > > > > > >
> > > > > > > > <runtime>
> > > > > > > > <library name="parse-custom.jar"> <export name="*"/>
> > > > > > > > </library> </runtime>
> > > > > > > >
> > > > > > > > <requires>
> > > > > > > > <import plugin="parse-html"/> <import
> > > > > > > > plugin="nutch-extensionpoints"/> </requires> <extension
> > > > > > > > id="org.apache.nutch.parse.custom"
> > > > > > > > name="CustomParse"
> > > > > > > > point="org.apache.nutch.parse.Parser">
> > > > > > > >
> > > > > > > > <implementation id="org.apache.nutch.parse.
> custom.CustomParser"
> > > > > > > > class="org.apache.nutch.parse.custom.CustomParser">
> > > > > > > > <parameter name="contentType"
> > > > > > > > value="text/html|application/xhtml+xml"/>
> > > > > > > > <parameter name="pathSuffix" value=""/> </implementation>
> > > > > > > >
> > > > > > > > </extension>
> > > > > > > >
> > > > > > > > </plugin>
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > > > > > > <yo...@pipl.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Yash,
> > > > > > > > >
> > > > > > > > > I don't know how to do it, I never tried, but if I had to
> > > > > > > > > it would be a trial and error thing....
> > > > > > > > >
> > > > > > > > > If you want to increase the chances that someone will
> > > > > > > > > answer your question, I suggest you provide as much
> > > > > > > > > information as
> > > possible:
> > > > > > > > > Where did it not work? In "ant runtime", or when running
> > > > > > > > > in
> > > Hadoop?
> > > > > > > > > What was the error message?
> > > > > > > > > What is the content of your build.xml, plugin.xml, and
> ivy.xml?
> > > > > > > > > Is parse-html configured in your plugin-includes?
> > > > > > > > >
> > > > > > > > > If it's a problem during execution, I would suggest
> > > > > > > > > looking at or debugging the code of PluginClassLoader.
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > > > > Sent: 14 March 2018 08:34
> > > > > > > > > > To: user@nutch.apache.org
> > > > > > > > > > Subject: Re: Dependency between plugins
> > > > > > > > > >
> > > > > > > > > > Anybody please help me out regarding this.
> > > > > > > > > >
> > > > > > > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > > > > > > rit2014049@iiita.ac.in> wrote:
> > > > > > > > > >
> > > > > > > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > > > > > > I did it in the same way by which Htmlparser imports
> > > > > > > > > > > lib-nekohtml but it didn't worked.
> > > > > > > > > > > Can anybody please tell me how to do it?
> > > > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>

RE: RE: Dependency between plugins

Posted by Yossi Tamari <yo...@pipl.com>.
Is your parser the HTML parser? I can say from experience that the document is passed.
I really recommend debugging in local mode rather than using sysout.

> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 15 March 2018 10:13
> To: user@nutch.apache.org
> Subject: RE: RE: Dependency between plugins
> 
> I tried printing the contents of document fragment in parsefilter-regex by writing
> System.out.println(doc) but its printing null!! And document is getting parsed!!
> 
> On 15 Mar 2018 13:15, "Yossi Tamari" <yo...@pipl.com> wrote:
> 
> > Parse filters receive a DocumentFragment as their fourth parameter.
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 15 March 2018 08:50
> > > To: user@nutch.apache.org
> > > Subject: Re: RE: Dependency between plugins
> > >
> > > Hi Jorge and Yossi,
> > > The reason why I am trying to do it is exactly what yossi said
> > > "removing
> > nutch
> > > overhead", I didn't thought that it would be that complicated, All I
> > > am
> > trying is to
> > > call the existing parsers from my own parser, but I am not able to
> > > do it
> > correctly,
> > > may be chain approach is a better idea to do that but *do parse
> > > filter
> > receives
> > > any DOM object?* as a parameter so by accessing that I can extract
> > > the
> > data I
> > > want??
> > >
> > >
> > > On Wed, Mar 14, 2018 at 7:36 PM, Yossi Tamari
> > > <yo...@pipl.com>
> > > wrote:
> > >
> > > > There is no built-in mechanism for this. However, are you sure you
> > > > really want a parser for each website, rather than a parse-filter
> > > > for each website (which will take the results of the HTML parser
> > > > and apply some domain specific customizations)?
> > > > In both cases you can use a dispatcher approach, which your custom
> > > > parser is, or a chain approach (every parser that is not intended
> > > > for this domain returns null, or each parse-filter that is not
> > > > intended for this domain returns the ParseResult that it received).
> > > > The advantage of the chain approach is that each new website
> > > > parser is a first-class, reusable Nutch object. The advantage of
> > > > the dispatcher approach is that you don't need to deal with a lot
> > > > of the Nutch overhead, but it is more monolithic (You can end up
> > > > with one huge plugin that needs to be constantly modified whenever
> > > > one of the
> > websites is
> > > modified).
> > > >
> > > > > -----Original Message-----
> > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > Sent: 14 March 2018 15:28
> > > > > To: user@nutch.apache.org
> > > > > Subject: Re: RE: Dependency between plugins
> > > > >
> > > > > Is there a way in nutch by which we can use different parser for
> > > > different
> > > > > websites?
> > > > > I am trying to do this by writing a custom parser which will
> > > > > call
> > > > different parsers
> > > > > for different websites?
> > > > >
> > > > > On 14 Mar 2018 14:19, "Semyon Semyonov"
> > > <se...@mail.com>
> > > > > wrote:
> > > > >
> > > > > > As a side note,
> > > > > >
> > > > > > I had to implement my own parser with extra functionality,
> > > > > > simple copy/past of the code of HTMLparser did the job.
> > > > > >
> > > > > > If you want to inherit instead of copy paste it can be a bad
> > > > > > idea at
> > > > all.
> > > > > > HTML parser is a concrete non abstract class, therefore the
> > > > > > inheritance will not be so smooth as in case of contract
> > > > > > implementations(the plugins are contracts, ie interfaces) and
> > > > > > can
> > > > easily break
> > > > > some OOP rules.
> > > > > >
> > > > > >
> > > > > > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > > > > > From: "Yossi Tamari" <yo...@pipl.com>
> > > > > > To: user@nutch.apache.org
> > > > > > Subject: RE: Dependency between plugins One suggestion I can
> > > > > > make is to ensure that the html-parse plugin is built before
> > > > > > your plugin (since you are including the jars that are
> > > > > > generated in its
> > build).
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > Sent: 14 March 2018 09:55
> > > > > > > To: user@nutch.apache.org
> > > > > > > Subject: Re: Dependency between plugins
> > > > > > >
> > > > > > > Hi,
> > > > > > > It didn't worked in ant runtime.
> > > > > > > I included "import org.apache.nutch.parse.html;" in my
> > > > > > > custom parser
> > > > > > code.
> > > > > > > but it is throwing errror while i am doing ant runtime.
> > > > > > >
> > > > > > > [javac]
> > > > > > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > > > > >
> > > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > > > > > error: cannot find symbol
> > > > > > >
> > > > > > > [javac] import org.apache.nutch.parse.html;
> > > > > > >
> > > > > > > [javac] ^
> > > > > > >
> > > > > > > [javac] symbol: class html
> > > > > > >
> > > > > > > [javac] location: package org.apache.nutch.parse
> > > > > > >
> > > > > > >
> > > > > > > below are the xml files of my parser
> > > > > > >
> > > > > > >
> > > > > > > My ivy.xml
> > > > > > >
> > > > > > >
> > > > > > > <ivy-module version="1.0">
> > > > > > >
> > > > > > > <info organisation="org.apache.nutch"
> > > > > > > module="${ant.project.name}">
> > > > > > >
> > > > > > > <license name="Apache 2.0"/>
> > > > > > >
> > > > > > > <ivyauthor name="Apache Nutch Team"
> > > > > > > url="http://nutch.apache.org"/>
> > > > > > >
> > > > > > > <description>
> > > > > > >
> > > > > > > Apache Nutch
> > > > > > >
> > > > > > > </description>
> > > > > > >
> > > > > > > </info>
> > > > > > >
> > > > > > >
> > > > > > > <configurations>
> > > > > > >
> > > > > > > <include file="../../../ivy/ivy-configurations.xml"/>
> > > > > > >
> > > > > > > </configurations>
> > > > > > >
> > > > > > >
> > > > > > > <publications>
> > > > > > >
> > > > > > > <!--get the artifact from our module name-->
> > > > > > >
> > > > > > > <artifact conf="master"/>
> > > > > > >
> > > > > > > </publications>
> > > > > > >
> > > > > > > </ivy-module>
> > > > > > >
> > > > > > > build.xml
> > > > > > >
> > > > > > > <project name="parse-custom" default="jar-core">
> > > > > > >
> > > > > > > <import file="../build-plugin.xml"/>
> > > > > > >
> > > > > > > <!-- Build compilation dependencies --> <target
> > > > > > > name="deps-jar"> <ant target="compile-test" inheritall="false"
> > > > > > > dir="../parse-html"/> </target>
> > > > > > >
> > > > > > >
> > > > > > > <path id="plugin.deps">
> > > > > > > <fileset dir="${nutch.root}/build"> <include
> > > > > > > name="**/parse-html/*.jar" /> </fileset> </path>
> > > > > > >
> > > > > > > <!-- Deploy Unit test dependencies --> <target
> > > > > > > name="deps-test"> <ant target="deploy" inheritall="false"
> > > > > > > dir="../parse-html"/> <ant target="deploy" inheritall="false"
> > > > > > > dir="../nutch-extensionpoints"/> </target>
> > > > > > >
> > > > > > > </project>
> > > > > > >
> > > > > > > plugin.xml
> > > > > > >
> > > > > > > <plugin
> > > > > > > id="parse-custom"
> > > > > > > name="Custom Parse Plug-in"
> > > > > > > version="1.0.0"
> > > > > > > provider-name="nutch.org">
> > > > > > >
> > > > > > > <runtime>
> > > > > > > <library name="parse-custom.jar"> <export name="*"/>
> > > > > > > </library> </runtime>
> > > > > > >
> > > > > > > <requires>
> > > > > > > <import plugin="parse-html"/> <import
> > > > > > > plugin="nutch-extensionpoints"/> </requires> <extension
> > > > > > > id="org.apache.nutch.parse.custom"
> > > > > > > name="CustomParse"
> > > > > > > point="org.apache.nutch.parse.Parser">
> > > > > > >
> > > > > > > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > > > > > > class="org.apache.nutch.parse.custom.CustomParser">
> > > > > > > <parameter name="contentType"
> > > > > > > value="text/html|application/xhtml+xml"/>
> > > > > > > <parameter name="pathSuffix" value=""/> </implementation>
> > > > > > >
> > > > > > > </extension>
> > > > > > >
> > > > > > > </plugin>
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > > > > > <yo...@pipl.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Yash,
> > > > > > > >
> > > > > > > > I don't know how to do it, I never tried, but if I had to
> > > > > > > > it would be a trial and error thing....
> > > > > > > >
> > > > > > > > If you want to increase the chances that someone will
> > > > > > > > answer your question, I suggest you provide as much
> > > > > > > > information as
> > possible:
> > > > > > > > Where did it not work? In "ant runtime", or when running
> > > > > > > > in
> > Hadoop?
> > > > > > > > What was the error message?
> > > > > > > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > > > > > > Is parse-html configured in your plugin-includes?
> > > > > > > >
> > > > > > > > If it's a problem during execution, I would suggest
> > > > > > > > looking at or debugging the code of PluginClassLoader.
> > > > > > > >
> > > > > > > >
> > > > > > > > > -----Original Message-----
> > > > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > > > Sent: 14 March 2018 08:34
> > > > > > > > > To: user@nutch.apache.org
> > > > > > > > > Subject: Re: Dependency between plugins
> > > > > > > > >
> > > > > > > > > Anybody please help me out regarding this.
> > > > > > > > >
> > > > > > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > > > > > rit2014049@iiita.ac.in> wrote:
> > > > > > > > >
> > > > > > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > > > > > I did it in the same way by which Htmlparser imports
> > > > > > > > > > lib-nekohtml but it didn't worked.
> > > > > > > > > > Can anybody please tell me how to do it?
> > > > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >


RE: RE: Dependency between plugins

Posted by Yash Thenuan Thenuan <ri...@iiita.ac.in>.
I tried printing the contents of document fragment in parsefilter-regex by
writing System.out.println(doc) but its printing null!! And document is
getting parsed!!

On 15 Mar 2018 13:15, "Yossi Tamari" <yo...@pipl.com> wrote:

> Parse filters receive a DocumentFragment as their fourth parameter.
>
> > -----Original Message-----
> > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > Sent: 15 March 2018 08:50
> > To: user@nutch.apache.org
> > Subject: Re: RE: Dependency between plugins
> >
> > Hi Jorge and Yossi,
> > The reason why I am trying to do it is exactly what yossi said "removing
> nutch
> > overhead", I didn't thought that it would be that complicated, All I am
> trying is to
> > call the existing parsers from my own parser, but I am not able to do it
> correctly,
> > may be chain approach is a better idea to do that but *do parse filter
> receives
> > any DOM object?* as a parameter so by accessing that I can extract the
> data I
> > want??
> >
> >
> > On Wed, Mar 14, 2018 at 7:36 PM, Yossi Tamari <yo...@pipl.com>
> > wrote:
> >
> > > There is no built-in mechanism for this. However, are you sure you
> > > really want a parser for each website, rather than a parse-filter for
> > > each website (which will take the results of the HTML parser and apply
> > > some domain specific customizations)?
> > > In both cases you can use a dispatcher approach, which your custom
> > > parser is, or a chain approach (every parser that is not intended for
> > > this domain returns null, or each parse-filter that is not intended
> > > for this domain returns the ParseResult that it received).
> > > The advantage of the chain approach is that each new website parser is
> > > a first-class, reusable Nutch object. The advantage of the dispatcher
> > > approach is that you don't need to deal with a lot of the Nutch
> > > overhead, but it is more monolithic (You can end up with one huge
> > > plugin that needs to be constantly modified whenever one of the
> websites is
> > modified).
> > >
> > > > -----Original Message-----
> > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > Sent: 14 March 2018 15:28
> > > > To: user@nutch.apache.org
> > > > Subject: Re: RE: Dependency between plugins
> > > >
> > > > Is there a way in nutch by which we can use different parser for
> > > different
> > > > websites?
> > > > I am trying to do this by writing a custom parser which will call
> > > different parsers
> > > > for different websites?
> > > >
> > > > On 14 Mar 2018 14:19, "Semyon Semyonov"
> > <se...@mail.com>
> > > > wrote:
> > > >
> > > > > As a side note,
> > > > >
> > > > > I had to implement my own parser with extra functionality, simple
> > > > > copy/past of the code of HTMLparser did the job.
> > > > >
> > > > > If you want to inherit instead of copy paste it can be a bad idea
> > > > > at
> > > all.
> > > > > HTML parser is a concrete non abstract class, therefore the
> > > > > inheritance will not be so smooth as in case of contract
> > > > > implementations(the plugins are contracts, ie interfaces) and can
> > > easily break
> > > > some OOP rules.
> > > > >
> > > > >
> > > > > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > > > > From: "Yossi Tamari" <yo...@pipl.com>
> > > > > To: user@nutch.apache.org
> > > > > Subject: RE: Dependency between plugins One suggestion I can make
> > > > > is to ensure that the html-parse plugin is built before your
> > > > > plugin (since you are including the jars that are generated in its
> build).
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > Sent: 14 March 2018 09:55
> > > > > > To: user@nutch.apache.org
> > > > > > Subject: Re: Dependency between plugins
> > > > > >
> > > > > > Hi,
> > > > > > It didn't worked in ant runtime.
> > > > > > I included "import org.apache.nutch.parse.html;" in my custom
> > > > > > parser
> > > > > code.
> > > > > > but it is throwing errror while i am doing ant runtime.
> > > > > >
> > > > > > [javac]
> > > > > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > > > >
> > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > > > > error: cannot find symbol
> > > > > >
> > > > > > [javac] import org.apache.nutch.parse.html;
> > > > > >
> > > > > > [javac] ^
> > > > > >
> > > > > > [javac] symbol: class html
> > > > > >
> > > > > > [javac] location: package org.apache.nutch.parse
> > > > > >
> > > > > >
> > > > > > below are the xml files of my parser
> > > > > >
> > > > > >
> > > > > > My ivy.xml
> > > > > >
> > > > > >
> > > > > > <ivy-module version="1.0">
> > > > > >
> > > > > > <info organisation="org.apache.nutch"
> > > > > > module="${ant.project.name}">
> > > > > >
> > > > > > <license name="Apache 2.0"/>
> > > > > >
> > > > > > <ivyauthor name="Apache Nutch Team"
> > > > > > url="http://nutch.apache.org"/>
> > > > > >
> > > > > > <description>
> > > > > >
> > > > > > Apache Nutch
> > > > > >
> > > > > > </description>
> > > > > >
> > > > > > </info>
> > > > > >
> > > > > >
> > > > > > <configurations>
> > > > > >
> > > > > > <include file="../../../ivy/ivy-configurations.xml"/>
> > > > > >
> > > > > > </configurations>
> > > > > >
> > > > > >
> > > > > > <publications>
> > > > > >
> > > > > > <!--get the artifact from our module name-->
> > > > > >
> > > > > > <artifact conf="master"/>
> > > > > >
> > > > > > </publications>
> > > > > >
> > > > > > </ivy-module>
> > > > > >
> > > > > > build.xml
> > > > > >
> > > > > > <project name="parse-custom" default="jar-core">
> > > > > >
> > > > > > <import file="../build-plugin.xml"/>
> > > > > >
> > > > > > <!-- Build compilation dependencies --> <target name="deps-jar">
> > > > > > <ant target="compile-test" inheritall="false"
> > > > > > dir="../parse-html"/> </target>
> > > > > >
> > > > > >
> > > > > > <path id="plugin.deps">
> > > > > > <fileset dir="${nutch.root}/build"> <include
> > > > > > name="**/parse-html/*.jar" /> </fileset> </path>
> > > > > >
> > > > > > <!-- Deploy Unit test dependencies --> <target name="deps-test">
> > > > > > <ant target="deploy" inheritall="false" dir="../parse-html"/>
> > > > > > <ant target="deploy" inheritall="false"
> > > > > > dir="../nutch-extensionpoints"/> </target>
> > > > > >
> > > > > > </project>
> > > > > >
> > > > > > plugin.xml
> > > > > >
> > > > > > <plugin
> > > > > > id="parse-custom"
> > > > > > name="Custom Parse Plug-in"
> > > > > > version="1.0.0"
> > > > > > provider-name="nutch.org">
> > > > > >
> > > > > > <runtime>
> > > > > > <library name="parse-custom.jar"> <export name="*"/> </library>
> > > > > > </runtime>
> > > > > >
> > > > > > <requires>
> > > > > > <import plugin="parse-html"/>
> > > > > > <import plugin="nutch-extensionpoints"/> </requires> <extension
> > > > > > id="org.apache.nutch.parse.custom"
> > > > > > name="CustomParse"
> > > > > > point="org.apache.nutch.parse.Parser">
> > > > > >
> > > > > > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > > > > > class="org.apache.nutch.parse.custom.CustomParser">
> > > > > > <parameter name="contentType"
> > > > > > value="text/html|application/xhtml+xml"/>
> > > > > > <parameter name="pathSuffix" value=""/> </implementation>
> > > > > >
> > > > > > </extension>
> > > > > >
> > > > > > </plugin>
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > > > > <yo...@pipl.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Yash,
> > > > > > >
> > > > > > > I don't know how to do it, I never tried, but if I had to it
> > > > > > > would be a trial and error thing....
> > > > > > >
> > > > > > > If you want to increase the chances that someone will answer
> > > > > > > your question, I suggest you provide as much information as
> possible:
> > > > > > > Where did it not work? In "ant runtime", or when running in
> Hadoop?
> > > > > > > What was the error message?
> > > > > > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > > > > > Is parse-html configured in your plugin-includes?
> > > > > > >
> > > > > > > If it's a problem during execution, I would suggest looking at
> > > > > > > or debugging the code of PluginClassLoader.
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > > Sent: 14 March 2018 08:34
> > > > > > > > To: user@nutch.apache.org
> > > > > > > > Subject: Re: Dependency between plugins
> > > > > > > >
> > > > > > > > Anybody please help me out regarding this.
> > > > > > > >
> > > > > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > > > > rit2014049@iiita.ac.in> wrote:
> > > > > > > >
> > > > > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > > > > I did it in the same way by which Htmlparser imports
> > > > > > > > > lib-nekohtml but it didn't worked.
> > > > > > > > > Can anybody please tell me how to do it?
> > > > > > > > >
> > > > > > >
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>

RE: RE: Dependency between plugins

Posted by Yossi Tamari <yo...@pipl.com>.
Parse filters receive a DocumentFragment as their fourth parameter.

> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 15 March 2018 08:50
> To: user@nutch.apache.org
> Subject: Re: RE: Dependency between plugins
> 
> Hi Jorge and Yossi,
> The reason why I am trying to do it is exactly what yossi said "removing nutch
> overhead", I didn't thought that it would be that complicated, All I am trying is to
> call the existing parsers from my own parser, but I am not able to do it correctly,
> may be chain approach is a better idea to do that but *do parse filter receives
> any DOM object?* as a parameter so by accessing that I can extract the data I
> want??
> 
> 
> On Wed, Mar 14, 2018 at 7:36 PM, Yossi Tamari <yo...@pipl.com>
> wrote:
> 
> > There is no built-in mechanism for this. However, are you sure you
> > really want a parser for each website, rather than a parse-filter for
> > each website (which will take the results of the HTML parser and apply
> > some domain specific customizations)?
> > In both cases you can use a dispatcher approach, which your custom
> > parser is, or a chain approach (every parser that is not intended for
> > this domain returns null, or each parse-filter that is not intended
> > for this domain returns the ParseResult that it received).
> > The advantage of the chain approach is that each new website parser is
> > a first-class, reusable Nutch object. The advantage of the dispatcher
> > approach is that you don't need to deal with a lot of the Nutch
> > overhead, but it is more monolithic (You can end up with one huge
> > plugin that needs to be constantly modified whenever one of the websites is
> modified).
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 14 March 2018 15:28
> > > To: user@nutch.apache.org
> > > Subject: Re: RE: Dependency between plugins
> > >
> > > Is there a way in nutch by which we can use different parser for
> > different
> > > websites?
> > > I am trying to do this by writing a custom parser which will call
> > different parsers
> > > for different websites?
> > >
> > > On 14 Mar 2018 14:19, "Semyon Semyonov"
> <se...@mail.com>
> > > wrote:
> > >
> > > > As a side note,
> > > >
> > > > I had to implement my own parser with extra functionality, simple
> > > > copy/past of the code of HTMLparser did the job.
> > > >
> > > > If you want to inherit instead of copy paste it can be a bad idea
> > > > at
> > all.
> > > > HTML parser is a concrete non abstract class, therefore the
> > > > inheritance will not be so smooth as in case of contract
> > > > implementations(the plugins are contracts, ie interfaces) and can
> > easily break
> > > some OOP rules.
> > > >
> > > >
> > > > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > > > From: "Yossi Tamari" <yo...@pipl.com>
> > > > To: user@nutch.apache.org
> > > > Subject: RE: Dependency between plugins One suggestion I can make
> > > > is to ensure that the html-parse plugin is built before your
> > > > plugin (since you are including the jars that are generated in its build).
> > > >
> > > > > -----Original Message-----
> > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > Sent: 14 March 2018 09:55
> > > > > To: user@nutch.apache.org
> > > > > Subject: Re: Dependency between plugins
> > > > >
> > > > > Hi,
> > > > > It didn't worked in ant runtime.
> > > > > I included "import org.apache.nutch.parse.html;" in my custom
> > > > > parser
> > > > code.
> > > > > but it is throwing errror while i am doing ant runtime.
> > > > >
> > > > > [javac]
> > > > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > > >
> custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > > > error: cannot find symbol
> > > > >
> > > > > [javac] import org.apache.nutch.parse.html;
> > > > >
> > > > > [javac] ^
> > > > >
> > > > > [javac] symbol: class html
> > > > >
> > > > > [javac] location: package org.apache.nutch.parse
> > > > >
> > > > >
> > > > > below are the xml files of my parser
> > > > >
> > > > >
> > > > > My ivy.xml
> > > > >
> > > > >
> > > > > <ivy-module version="1.0">
> > > > >
> > > > > <info organisation="org.apache.nutch"
> > > > > module="${ant.project.name}">
> > > > >
> > > > > <license name="Apache 2.0"/>
> > > > >
> > > > > <ivyauthor name="Apache Nutch Team"
> > > > > url="http://nutch.apache.org"/>
> > > > >
> > > > > <description>
> > > > >
> > > > > Apache Nutch
> > > > >
> > > > > </description>
> > > > >
> > > > > </info>
> > > > >
> > > > >
> > > > > <configurations>
> > > > >
> > > > > <include file="../../../ivy/ivy-configurations.xml"/>
> > > > >
> > > > > </configurations>
> > > > >
> > > > >
> > > > > <publications>
> > > > >
> > > > > <!--get the artifact from our module name-->
> > > > >
> > > > > <artifact conf="master"/>
> > > > >
> > > > > </publications>
> > > > >
> > > > > </ivy-module>
> > > > >
> > > > > build.xml
> > > > >
> > > > > <project name="parse-custom" default="jar-core">
> > > > >
> > > > > <import file="../build-plugin.xml"/>
> > > > >
> > > > > <!-- Build compilation dependencies --> <target name="deps-jar">
> > > > > <ant target="compile-test" inheritall="false"
> > > > > dir="../parse-html"/> </target>
> > > > >
> > > > >
> > > > > <path id="plugin.deps">
> > > > > <fileset dir="${nutch.root}/build"> <include
> > > > > name="**/parse-html/*.jar" /> </fileset> </path>
> > > > >
> > > > > <!-- Deploy Unit test dependencies --> <target name="deps-test">
> > > > > <ant target="deploy" inheritall="false" dir="../parse-html"/>
> > > > > <ant target="deploy" inheritall="false"
> > > > > dir="../nutch-extensionpoints"/> </target>
> > > > >
> > > > > </project>
> > > > >
> > > > > plugin.xml
> > > > >
> > > > > <plugin
> > > > > id="parse-custom"
> > > > > name="Custom Parse Plug-in"
> > > > > version="1.0.0"
> > > > > provider-name="nutch.org">
> > > > >
> > > > > <runtime>
> > > > > <library name="parse-custom.jar"> <export name="*"/> </library>
> > > > > </runtime>
> > > > >
> > > > > <requires>
> > > > > <import plugin="parse-html"/>
> > > > > <import plugin="nutch-extensionpoints"/> </requires> <extension
> > > > > id="org.apache.nutch.parse.custom"
> > > > > name="CustomParse"
> > > > > point="org.apache.nutch.parse.Parser">
> > > > >
> > > > > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > > > > class="org.apache.nutch.parse.custom.CustomParser">
> > > > > <parameter name="contentType"
> > > > > value="text/html|application/xhtml+xml"/>
> > > > > <parameter name="pathSuffix" value=""/> </implementation>
> > > > >
> > > > > </extension>
> > > > >
> > > > > </plugin>
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > > > <yo...@pipl.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Yash,
> > > > > >
> > > > > > I don't know how to do it, I never tried, but if I had to it
> > > > > > would be a trial and error thing....
> > > > > >
> > > > > > If you want to increase the chances that someone will answer
> > > > > > your question, I suggest you provide as much information as possible:
> > > > > > Where did it not work? In "ant runtime", or when running in Hadoop?
> > > > > > What was the error message?
> > > > > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > > > > Is parse-html configured in your plugin-includes?
> > > > > >
> > > > > > If it's a problem during execution, I would suggest looking at
> > > > > > or debugging the code of PluginClassLoader.
> > > > > >
> > > > > >
> > > > > > > -----Original Message-----
> > > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > > Sent: 14 March 2018 08:34
> > > > > > > To: user@nutch.apache.org
> > > > > > > Subject: Re: Dependency between plugins
> > > > > > >
> > > > > > > Anybody please help me out regarding this.
> > > > > > >
> > > > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > > > rit2014049@iiita.ac.in> wrote:
> > > > > > >
> > > > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > > > I did it in the same way by which Htmlparser imports
> > > > > > > > lib-nekohtml but it didn't worked.
> > > > > > > > Can anybody please tell me how to do it?
> > > > > > > >
> > > > > >
> > > > > >
> > > >
> > > >
> >
> >


Re: RE: Dependency between plugins

Posted by Yash Thenuan Thenuan <ri...@iiita.ac.in>.
Hi Jorge and Yossi,
The reason why I am trying to do it is exactly what yossi said "removing
nutch overhead", I didn't thought that it would be that complicated, All I
am trying is to call the existing parsers from my own parser, but I am not
able to do it correctly, may be chain approach is a better idea to do that
but *do parse filter receives any DOM object?* as a parameter so by
accessing that I can extract the data I want??


On Wed, Mar 14, 2018 at 7:36 PM, Yossi Tamari <yo...@pipl.com> wrote:

> There is no built-in mechanism for this. However, are you sure you really
> want a parser for each website, rather than a parse-filter for each website
> (which will take the results of the HTML parser and apply some domain
> specific customizations)?
> In both cases you can use a dispatcher approach, which your custom parser
> is, or a chain approach (every parser that is not intended for this domain
> returns null, or each parse-filter that is not intended for this domain
> returns the ParseResult that it received).
> The advantage of the chain approach is that each new website parser is a
> first-class, reusable Nutch object. The advantage of the dispatcher
> approach is that you don't need to deal with a lot of the Nutch overhead,
> but it is more monolithic (You can end up with one huge plugin that needs
> to be constantly modified whenever one of the websites is modified).
>
> > -----Original Message-----
> > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > Sent: 14 March 2018 15:28
> > To: user@nutch.apache.org
> > Subject: Re: RE: Dependency between plugins
> >
> > Is there a way in nutch by which we can use different parser for
> different
> > websites?
> > I am trying to do this by writing a custom parser which will call
> different parsers
> > for different websites?
> >
> > On 14 Mar 2018 14:19, "Semyon Semyonov" <se...@mail.com>
> > wrote:
> >
> > > As a side note,
> > >
> > > I had to implement my own parser with extra functionality, simple
> > > copy/past of the code of HTMLparser did the job.
> > >
> > > If you want to inherit instead of copy paste it can be a bad idea at
> all.
> > > HTML parser is a concrete non abstract class, therefore the
> > > inheritance will not be so smooth as in case of contract
> > > implementations(the plugins are contracts, ie interfaces) and can
> easily break
> > some OOP rules.
> > >
> > >
> > > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > > From: "Yossi Tamari" <yo...@pipl.com>
> > > To: user@nutch.apache.org
> > > Subject: RE: Dependency between plugins One suggestion I can make is
> > > to ensure that the html-parse plugin is built before your plugin
> > > (since you are including the jars that are generated in its build).
> > >
> > > > -----Original Message-----
> > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > Sent: 14 March 2018 09:55
> > > > To: user@nutch.apache.org
> > > > Subject: Re: Dependency between plugins
> > > >
> > > > Hi,
> > > > It didn't worked in ant runtime.
> > > > I included "import org.apache.nutch.parse.html;" in my custom parser
> > > code.
> > > > but it is throwing errror while i am doing ant runtime.
> > > >
> > > > [javac]
> > > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > > error: cannot find symbol
> > > >
> > > > [javac] import org.apache.nutch.parse.html;
> > > >
> > > > [javac] ^
> > > >
> > > > [javac] symbol: class html
> > > >
> > > > [javac] location: package org.apache.nutch.parse
> > > >
> > > >
> > > > below are the xml files of my parser
> > > >
> > > >
> > > > My ivy.xml
> > > >
> > > >
> > > > <ivy-module version="1.0">
> > > >
> > > > <info organisation="org.apache.nutch" module="${ant.project.name}">
> > > >
> > > > <license name="Apache 2.0"/>
> > > >
> > > > <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
> > > >
> > > > <description>
> > > >
> > > > Apache Nutch
> > > >
> > > > </description>
> > > >
> > > > </info>
> > > >
> > > >
> > > > <configurations>
> > > >
> > > > <include file="../../../ivy/ivy-configurations.xml"/>
> > > >
> > > > </configurations>
> > > >
> > > >
> > > > <publications>
> > > >
> > > > <!--get the artifact from our module name-->
> > > >
> > > > <artifact conf="master"/>
> > > >
> > > > </publications>
> > > >
> > > > </ivy-module>
> > > >
> > > > build.xml
> > > >
> > > > <project name="parse-custom" default="jar-core">
> > > >
> > > > <import file="../build-plugin.xml"/>
> > > >
> > > > <!-- Build compilation dependencies --> <target name="deps-jar">
> > > > <ant target="compile-test" inheritall="false" dir="../parse-html"/>
> > > > </target>
> > > >
> > > >
> > > > <path id="plugin.deps">
> > > > <fileset dir="${nutch.root}/build">
> > > > <include name="**/parse-html/*.jar" /> </fileset> </path>
> > > >
> > > > <!-- Deploy Unit test dependencies --> <target name="deps-test">
> > > > <ant target="deploy" inheritall="false" dir="../parse-html"/> <ant
> > > > target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
> > > > </target>
> > > >
> > > > </project>
> > > >
> > > > plugin.xml
> > > >
> > > > <plugin
> > > > id="parse-custom"
> > > > name="Custom Parse Plug-in"
> > > > version="1.0.0"
> > > > provider-name="nutch.org">
> > > >
> > > > <runtime>
> > > > <library name="parse-custom.jar">
> > > > <export name="*"/>
> > > > </library>
> > > > </runtime>
> > > >
> > > > <requires>
> > > > <import plugin="parse-html"/>
> > > > <import plugin="nutch-extensionpoints"/> </requires> <extension
> > > > id="org.apache.nutch.parse.custom"
> > > > name="CustomParse"
> > > > point="org.apache.nutch.parse.Parser">
> > > >
> > > > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > > > class="org.apache.nutch.parse.custom.CustomParser">
> > > > <parameter name="contentType"
> > > > value="text/html|application/xhtml+xml"/>
> > > > <parameter name="pathSuffix" value=""/> </implementation>
> > > >
> > > > </extension>
> > > >
> > > > </plugin>
> > > >
> > > >
> > > >
> > > >
> > > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > > <yo...@pipl.com>
> > > > wrote:
> > > >
> > > > > Hi Yash,
> > > > >
> > > > > I don't know how to do it, I never tried, but if I had to it would
> > > > > be a trial and error thing....
> > > > >
> > > > > If you want to increase the chances that someone will answer your
> > > > > question, I suggest you provide as much information as possible:
> > > > > Where did it not work? In "ant runtime", or when running in Hadoop?
> > > > > What was the error message?
> > > > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > > > Is parse-html configured in your plugin-includes?
> > > > >
> > > > > If it's a problem during execution, I would suggest looking at or
> > > > > debugging the code of PluginClassLoader.
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > > Sent: 14 March 2018 08:34
> > > > > > To: user@nutch.apache.org
> > > > > > Subject: Re: Dependency between plugins
> > > > > >
> > > > > > Anybody please help me out regarding this.
> > > > > >
> > > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > > rit2014049@iiita.ac.in> wrote:
> > > > > >
> > > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > > I did it in the same way by which Htmlparser imports
> > > > > > > lib-nekohtml but it didn't worked.
> > > > > > > Can anybody please tell me how to do it?
> > > > > > >
> > > > >
> > > > >
> > >
> > >
>
>

RE: RE: Dependency between plugins

Posted by Yossi Tamari <yo...@pipl.com>.
There is no built-in mechanism for this. However, are you sure you really want a parser for each website, rather than a parse-filter for each website (which will take the results of the HTML parser and apply some domain specific customizations)?
In both cases you can use a dispatcher approach, which your custom parser is, or a chain approach (every parser that is not intended for this domain returns null, or each parse-filter that is not intended for this domain returns the ParseResult that it received).
The advantage of the chain approach is that each new website parser is a first-class, reusable Nutch object. The advantage of the dispatcher approach is that you don't need to deal with a lot of the Nutch overhead, but it is more monolithic (You can end up with one huge plugin that needs to be constantly modified whenever one of the websites is modified). 

> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 14 March 2018 15:28
> To: user@nutch.apache.org
> Subject: Re: RE: Dependency between plugins
> 
> Is there a way in nutch by which we can use different parser for different
> websites?
> I am trying to do this by writing a custom parser which will call different parsers
> for different websites?
> 
> On 14 Mar 2018 14:19, "Semyon Semyonov" <se...@mail.com>
> wrote:
> 
> > As a side note,
> >
> > I had to implement my own parser with extra functionality, simple
> > copy/past of the code of HTMLparser did the job.
> >
> > If you want to inherit instead of copy paste it can be a bad idea at all.
> > HTML parser is a concrete non abstract class, therefore the
> > inheritance will not be so smooth as in case of contract
> > implementations(the plugins are contracts, ie interfaces) and can easily break
> some OOP rules.
> >
> >
> > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > From: "Yossi Tamari" <yo...@pipl.com>
> > To: user@nutch.apache.org
> > Subject: RE: Dependency between plugins One suggestion I can make is
> > to ensure that the html-parse plugin is built before your plugin
> > (since you are including the jars that are generated in its build).
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 14 March 2018 09:55
> > > To: user@nutch.apache.org
> > > Subject: Re: Dependency between plugins
> > >
> > > Hi,
> > > It didn't worked in ant runtime.
> > > I included "import org.apache.nutch.parse.html;" in my custom parser
> > code.
> > > but it is throwing errror while i am doing ant runtime.
> > >
> > > [javac]
> > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > error: cannot find symbol
> > >
> > > [javac] import org.apache.nutch.parse.html;
> > >
> > > [javac] ^
> > >
> > > [javac] symbol: class html
> > >
> > > [javac] location: package org.apache.nutch.parse
> > >
> > >
> > > below are the xml files of my parser
> > >
> > >
> > > My ivy.xml
> > >
> > >
> > > <ivy-module version="1.0">
> > >
> > > <info organisation="org.apache.nutch" module="${ant.project.name}">
> > >
> > > <license name="Apache 2.0"/>
> > >
> > > <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
> > >
> > > <description>
> > >
> > > Apache Nutch
> > >
> > > </description>
> > >
> > > </info>
> > >
> > >
> > > <configurations>
> > >
> > > <include file="../../../ivy/ivy-configurations.xml"/>
> > >
> > > </configurations>
> > >
> > >
> > > <publications>
> > >
> > > <!--get the artifact from our module name-->
> > >
> > > <artifact conf="master"/>
> > >
> > > </publications>
> > >
> > > </ivy-module>
> > >
> > > build.xml
> > >
> > > <project name="parse-custom" default="jar-core">
> > >
> > > <import file="../build-plugin.xml"/>
> > >
> > > <!-- Build compilation dependencies --> <target name="deps-jar">
> > > <ant target="compile-test" inheritall="false" dir="../parse-html"/>
> > > </target>
> > >
> > >
> > > <path id="plugin.deps">
> > > <fileset dir="${nutch.root}/build">
> > > <include name="**/parse-html/*.jar" /> </fileset> </path>
> > >
> > > <!-- Deploy Unit test dependencies --> <target name="deps-test">
> > > <ant target="deploy" inheritall="false" dir="../parse-html"/> <ant
> > > target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
> > > </target>
> > >
> > > </project>
> > >
> > > plugin.xml
> > >
> > > <plugin
> > > id="parse-custom"
> > > name="Custom Parse Plug-in"
> > > version="1.0.0"
> > > provider-name="nutch.org">
> > >
> > > <runtime>
> > > <library name="parse-custom.jar">
> > > <export name="*"/>
> > > </library>
> > > </runtime>
> > >
> > > <requires>
> > > <import plugin="parse-html"/>
> > > <import plugin="nutch-extensionpoints"/> </requires> <extension
> > > id="org.apache.nutch.parse.custom"
> > > name="CustomParse"
> > > point="org.apache.nutch.parse.Parser">
> > >
> > > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > > class="org.apache.nutch.parse.custom.CustomParser">
> > > <parameter name="contentType"
> > > value="text/html|application/xhtml+xml"/>
> > > <parameter name="pathSuffix" value=""/> </implementation>
> > >
> > > </extension>
> > >
> > > </plugin>
> > >
> > >
> > >
> > >
> > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari
> > > <yo...@pipl.com>
> > > wrote:
> > >
> > > > Hi Yash,
> > > >
> > > > I don't know how to do it, I never tried, but if I had to it would
> > > > be a trial and error thing....
> > > >
> > > > If you want to increase the chances that someone will answer your
> > > > question, I suggest you provide as much information as possible:
> > > > Where did it not work? In "ant runtime", or when running in Hadoop?
> > > > What was the error message?
> > > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > > Is parse-html configured in your plugin-includes?
> > > >
> > > > If it's a problem during execution, I would suggest looking at or
> > > > debugging the code of PluginClassLoader.
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > Sent: 14 March 2018 08:34
> > > > > To: user@nutch.apache.org
> > > > > Subject: Re: Dependency between plugins
> > > > >
> > > > > Anybody please help me out regarding this.
> > > > >
> > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > rit2014049@iiita.ac.in> wrote:
> > > > >
> > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > I did it in the same way by which Htmlparser imports
> > > > > > lib-nekohtml but it didn't worked.
> > > > > > Can anybody please tell me how to do it?
> > > > > >
> > > >
> > > >
> >
> >


Tika boilerpipe extractors

Posted by "Arora, Madhvi" <ma...@Automationdirect.com>.
Hi All,


We are using Nutch 1.13 and Solr 6. I am trying to use one of the parsers that come with Tika boilerpipe support.  I am getting best result for pages where there are only outlinks with CanolaExtractor in a page like this:

https://support.automationdirect.com/faq/dl205.php

But checking from Solr Admin Tool, unfortunately the parser is leaving out several outlinks in the indexed content. I do not know why CanolaExtractor would leave out certain outlinks.

If I do not use boilerpipe on Nutch then all the outlink gets indexed. To not use tika extractor I changed the property:

<property>
  <name>tika.extractor</name>
  <value>none</value>
  <description>
  Which text extraction algorithm to use. Valid values are: boilerpipe or none.
  </description>
</property>

Anyone knows why CanolaExtractor cannot extract all the outlinks? Also which Tika Extractor should be used for the above mentioned  page example?


Any help will be great!

Thanks,

Madhvi

Re: RE: Dependency between plugins

Posted by Jorge Betancourt <be...@gmail.com>.
Is there any reason why writing a `HtmlParseFilter` would not be enough?
The HTML parser will execute its own logic and provide a DOM representation
to all the filters and you can extract your own data from the DOM tree.

At the moment individual parsers are matched by mimetype (see
https://github.com/apache/nutch/blob/master/conf/parse-plugins.xml).

Regards,

On Wed, Mar 14, 2018 at 2:27 PM, Yash Thenuan Thenuan <
rit2014049@iiita.ac.in> wrote:

> Is there a way in nutch by which we can use different parser for different
> websites?
> I am trying to do this by writing a custom parser which will call different
> parsers for different websites?
>
> On 14 Mar 2018 14:19, "Semyon Semyonov" <se...@mail.com> wrote:
>
> > As a side note,
> >
> > I had to implement my own parser with extra functionality, simple
> > copy/past of the code of HTMLparser did the job.
> >
> > If you want to inherit instead of copy paste it can be a bad idea at all.
> > HTML parser is a concrete non abstract class, therefore the inheritance
> > will not be so smooth as in case of contract implementations(the plugins
> > are contracts, ie interfaces) and can easily break some OOP rules.
> >
> >
> > Sent: Wednesday, March 14, 2018 at 9:18 AM
> > From: "Yossi Tamari" <yo...@pipl.com>
> > To: user@nutch.apache.org
> > Subject: RE: Dependency between plugins
> > One suggestion I can make is to ensure that the html-parse plugin is
> built
> > before your plugin (since you are including the jars that are generated
> in
> > its build).
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 14 March 2018 09:55
> > > To: user@nutch.apache.org
> > > Subject: Re: Dependency between plugins
> > >
> > > Hi,
> > > It didn't worked in ant runtime.
> > > I included "import org.apache.nutch.parse.html;" in my custom parser
> > code.
> > > but it is throwing errror while i am doing ant runtime.
> > >
> > > [javac]
> > > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > > error: cannot find symbol
> > >
> > > [javac] import org.apache.nutch.parse.html;
> > >
> > > [javac] ^
> > >
> > > [javac] symbol: class html
> > >
> > > [javac] location: package org.apache.nutch.parse
> > >
> > >
> > > below are the xml files of my parser
> > >
> > >
> > > My ivy.xml
> > >
> > >
> > > <ivy-module version="1.0">
> > >
> > > <info organisation="org.apache.nutch" module="${ant.project.name}">
> > >
> > > <license name="Apache 2.0"/>
> > >
> > > <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
> > >
> > > <description>
> > >
> > > Apache Nutch
> > >
> > > </description>
> > >
> > > </info>
> > >
> > >
> > > <configurations>
> > >
> > > <include file="../../../ivy/ivy-configurations.xml"/>
> > >
> > > </configurations>
> > >
> > >
> > > <publications>
> > >
> > > <!--get the artifact from our module name-->
> > >
> > > <artifact conf="master"/>
> > >
> > > </publications>
> > >
> > > </ivy-module>
> > >
> > > build.xml
> > >
> > > <project name="parse-custom" default="jar-core">
> > >
> > > <import file="../build-plugin.xml"/>
> > >
> > > <!-- Build compilation dependencies -->
> > > <target name="deps-jar">
> > > <ant target="compile-test" inheritall="false" dir="../parse-html"/>
> > > </target>
> > >
> > >
> > > <path id="plugin.deps">
> > > <fileset dir="${nutch.root}/build">
> > > <include name="**/parse-html/*.jar" />
> > > </fileset>
> > > </path>
> > >
> > > <!-- Deploy Unit test dependencies -->
> > > <target name="deps-test">
> > > <ant target="deploy" inheritall="false" dir="../parse-html"/>
> > > <ant target="deploy" inheritall="false" dir="../nutch-extensionpoints"
> />
> > > </target>
> > >
> > > </project>
> > >
> > > plugin.xml
> > >
> > > <plugin
> > > id="parse-custom"
> > > name="Custom Parse Plug-in"
> > > version="1.0.0"
> > > provider-name="nutch.org">
> > >
> > > <runtime>
> > > <library name="parse-custom.jar">
> > > <export name="*"/>
> > > </library>
> > > </runtime>
> > >
> > > <requires>
> > > <import plugin="parse-html"/>
> > > <import plugin="nutch-extensionpoints"/>
> > > </requires>
> > > <extension id="org.apache.nutch.parse.custom"
> > > name="CustomParse"
> > > point="org.apache.nutch.parse.Parser">
> > >
> > > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > > class="org.apache.nutch.parse.custom.CustomParser">
> > > <parameter name="contentType"
> > > value="text/html|application/xhtml+xml"/>
> > > <parameter name="pathSuffix" value=""/>
> > > </implementation>
> > >
> > > </extension>
> > >
> > > </plugin>
> > >
> > >
> > >
> > >
> > > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari <yo...@pipl.com>
> > > wrote:
> > >
> > > > Hi Yash,
> > > >
> > > > I don't know how to do it, I never tried, but if I had to it would be
> > > > a trial and error thing....
> > > >
> > > > If you want to increase the chances that someone will answer your
> > > > question, I suggest you provide as much information as possible:
> > > > Where did it not work? In "ant runtime", or when running in Hadoop?
> > > > What was the error message?
> > > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > > Is parse-html configured in your plugin-includes?
> > > >
> > > > If it's a problem during execution, I would suggest looking at or
> > > > debugging the code of PluginClassLoader.
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > > Sent: 14 March 2018 08:34
> > > > > To: user@nutch.apache.org
> > > > > Subject: Re: Dependency between plugins
> > > > >
> > > > > Anybody please help me out regarding this.
> > > > >
> > > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > > rit2014049@iiita.ac.in> wrote:
> > > > >
> > > > > > I am trying to import Htmlparser in my custom parser.
> > > > > > I did it in the same way by which Htmlparser imports lib-nekohtml
> > > > > > but it didn't worked.
> > > > > > Can anybody please tell me how to do it?
> > > > > >
> > > >
> > > >
> >
> >
>

Re: RE: Dependency between plugins

Posted by Yash Thenuan Thenuan <ri...@iiita.ac.in>.
Is there a way in nutch by which we can use different parser for different
websites?
I am trying to do this by writing a custom parser which will call different
parsers for different websites?

On 14 Mar 2018 14:19, "Semyon Semyonov" <se...@mail.com> wrote:

> As a side note,
>
> I had to implement my own parser with extra functionality, simple
> copy/past of the code of HTMLparser did the job.
>
> If you want to inherit instead of copy paste it can be a bad idea at all.
> HTML parser is a concrete non abstract class, therefore the inheritance
> will not be so smooth as in case of contract implementations(the plugins
> are contracts, ie interfaces) and can easily break some OOP rules.
>
>
> Sent: Wednesday, March 14, 2018 at 9:18 AM
> From: "Yossi Tamari" <yo...@pipl.com>
> To: user@nutch.apache.org
> Subject: RE: Dependency between plugins
> One suggestion I can make is to ensure that the html-parse plugin is built
> before your plugin (since you are including the jars that are generated in
> its build).
>
> > -----Original Message-----
> > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > Sent: 14 March 2018 09:55
> > To: user@nutch.apache.org
> > Subject: Re: Dependency between plugins
> >
> > Hi,
> > It didn't worked in ant runtime.
> > I included "import org.apache.nutch.parse.html;" in my custom parser
> code.
> > but it is throwing errror while i am doing ant runtime.
> >
> > [javac]
> > /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> > custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> > error: cannot find symbol
> >
> > [javac] import org.apache.nutch.parse.html;
> >
> > [javac] ^
> >
> > [javac] symbol: class html
> >
> > [javac] location: package org.apache.nutch.parse
> >
> >
> > below are the xml files of my parser
> >
> >
> > My ivy.xml
> >
> >
> > <ivy-module version="1.0">
> >
> > <info organisation="org.apache.nutch" module="${ant.project.name}">
> >
> > <license name="Apache 2.0"/>
> >
> > <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
> >
> > <description>
> >
> > Apache Nutch
> >
> > </description>
> >
> > </info>
> >
> >
> > <configurations>
> >
> > <include file="../../../ivy/ivy-configurations.xml"/>
> >
> > </configurations>
> >
> >
> > <publications>
> >
> > <!--get the artifact from our module name-->
> >
> > <artifact conf="master"/>
> >
> > </publications>
> >
> > </ivy-module>
> >
> > build.xml
> >
> > <project name="parse-custom" default="jar-core">
> >
> > <import file="../build-plugin.xml"/>
> >
> > <!-- Build compilation dependencies -->
> > <target name="deps-jar">
> > <ant target="compile-test" inheritall="false" dir="../parse-html"/>
> > </target>
> >
> >
> > <path id="plugin.deps">
> > <fileset dir="${nutch.root}/build">
> > <include name="**/parse-html/*.jar" />
> > </fileset>
> > </path>
> >
> > <!-- Deploy Unit test dependencies -->
> > <target name="deps-test">
> > <ant target="deploy" inheritall="false" dir="../parse-html"/>
> > <ant target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
> > </target>
> >
> > </project>
> >
> > plugin.xml
> >
> > <plugin
> > id="parse-custom"
> > name="Custom Parse Plug-in"
> > version="1.0.0"
> > provider-name="nutch.org">
> >
> > <runtime>
> > <library name="parse-custom.jar">
> > <export name="*"/>
> > </library>
> > </runtime>
> >
> > <requires>
> > <import plugin="parse-html"/>
> > <import plugin="nutch-extensionpoints"/>
> > </requires>
> > <extension id="org.apache.nutch.parse.custom"
> > name="CustomParse"
> > point="org.apache.nutch.parse.Parser">
> >
> > <implementation id="org.apache.nutch.parse.custom.CustomParser"
> > class="org.apache.nutch.parse.custom.CustomParser">
> > <parameter name="contentType"
> > value="text/html|application/xhtml+xml"/>
> > <parameter name="pathSuffix" value=""/>
> > </implementation>
> >
> > </extension>
> >
> > </plugin>
> >
> >
> >
> >
> > On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari <yo...@pipl.com>
> > wrote:
> >
> > > Hi Yash,
> > >
> > > I don't know how to do it, I never tried, but if I had to it would be
> > > a trial and error thing....
> > >
> > > If you want to increase the chances that someone will answer your
> > > question, I suggest you provide as much information as possible:
> > > Where did it not work? In "ant runtime", or when running in Hadoop?
> > > What was the error message?
> > > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > > Is parse-html configured in your plugin-includes?
> > >
> > > If it's a problem during execution, I would suggest looking at or
> > > debugging the code of PluginClassLoader.
> > >
> > >
> > > > -----Original Message-----
> > > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > > Sent: 14 March 2018 08:34
> > > > To: user@nutch.apache.org
> > > > Subject: Re: Dependency between plugins
> > > >
> > > > Anybody please help me out regarding this.
> > > >
> > > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > > rit2014049@iiita.ac.in> wrote:
> > > >
> > > > > I am trying to import Htmlparser in my custom parser.
> > > > > I did it in the same way by which Htmlparser imports lib-nekohtml
> > > > > but it didn't worked.
> > > > > Can anybody please tell me how to do it?
> > > > >
> > >
> > >
>
>

Re: RE: Dependency between plugins

Posted by Semyon Semyonov <se...@mail.com>.
As a side note,

I had to implement my own parser with extra functionality, simple copy/past of the code of HTMLparser did the job.

If you want to inherit instead of copy paste it can be a bad idea at all. HTML parser is a concrete non abstract class, therefore the inheritance will not be so smooth as in case of contract implementations(the plugins are contracts, ie interfaces) and can easily break some OOP rules.
 

Sent: Wednesday, March 14, 2018 at 9:18 AM
From: "Yossi Tamari" <yo...@pipl.com>
To: user@nutch.apache.org
Subject: RE: Dependency between plugins
One suggestion I can make is to ensure that the html-parse plugin is built before your plugin (since you are including the jars that are generated in its build).

> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 14 March 2018 09:55
> To: user@nutch.apache.org
> Subject: Re: Dependency between plugins
>
> Hi,
> It didn't worked in ant runtime.
> I included "import org.apache.nutch.parse.html;" in my custom parser code.
> but it is throwing errror while i am doing ant runtime.
>
> [javac]
> /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> error: cannot find symbol
>
> [javac] import org.apache.nutch.parse.html;
>
> [javac] ^
>
> [javac] symbol: class html
>
> [javac] location: package org.apache.nutch.parse
>
>
> below are the xml files of my parser
>
>
> My ivy.xml
>
>
> <ivy-module version="1.0">
>
> <info organisation="org.apache.nutch" module="${ant.project.name}">
>
> <license name="Apache 2.0"/>
>
> <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
>
> <description>
>
> Apache Nutch
>
> </description>
>
> </info>
>
>
> <configurations>
>
> <include file="../../../ivy/ivy-configurations.xml"/>
>
> </configurations>
>
>
> <publications>
>
> <!--get the artifact from our module name-->
>
> <artifact conf="master"/>
>
> </publications>
>
> </ivy-module>
>
> build.xml
>
> <project name="parse-custom" default="jar-core">
>
> <import file="../build-plugin.xml"/>
>
> <!-- Build compilation dependencies -->
> <target name="deps-jar">
> <ant target="compile-test" inheritall="false" dir="../parse-html"/>
> </target>
>
>
> <path id="plugin.deps">
> <fileset dir="${nutch.root}/build">
> <include name="**/parse-html/*.jar" />
> </fileset>
> </path>
>
> <!-- Deploy Unit test dependencies -->
> <target name="deps-test">
> <ant target="deploy" inheritall="false" dir="../parse-html"/>
> <ant target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
> </target>
>
> </project>
>
> plugin.xml
>
> <plugin
> id="parse-custom"
> name="Custom Parse Plug-in"
> version="1.0.0"
> provider-name="nutch.org">
>
> <runtime>
> <library name="parse-custom.jar">
> <export name="*"/>
> </library>
> </runtime>
>
> <requires>
> <import plugin="parse-html"/>
> <import plugin="nutch-extensionpoints"/>
> </requires>
> <extension id="org.apache.nutch.parse.custom"
> name="CustomParse"
> point="org.apache.nutch.parse.Parser">
>
> <implementation id="org.apache.nutch.parse.custom.CustomParser"
> class="org.apache.nutch.parse.custom.CustomParser">
> <parameter name="contentType"
> value="text/html|application/xhtml+xml"/>
> <parameter name="pathSuffix" value=""/>
> </implementation>
>
> </extension>
>
> </plugin>
>
>
>
>
> On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari <yo...@pipl.com>
> wrote:
>
> > Hi Yash,
> >
> > I don't know how to do it, I never tried, but if I had to it would be
> > a trial and error thing....
> >
> > If you want to increase the chances that someone will answer your
> > question, I suggest you provide as much information as possible:
> > Where did it not work? In "ant runtime", or when running in Hadoop?
> > What was the error message?
> > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > Is parse-html configured in your plugin-includes?
> >
> > If it's a problem during execution, I would suggest looking at or
> > debugging the code of PluginClassLoader.
> >
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 14 March 2018 08:34
> > > To: user@nutch.apache.org
> > > Subject: Re: Dependency between plugins
> > >
> > > Anybody please help me out regarding this.
> > >
> > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > rit2014049@iiita.ac.in> wrote:
> > >
> > > > I am trying to import Htmlparser in my custom parser.
> > > > I did it in the same way by which Htmlparser imports lib-nekohtml
> > > > but it didn't worked.
> > > > Can anybody please tell me how to do it?
> > > >
> >
> >
 

RE: Dependency between plugins

Posted by Yossi Tamari <yo...@pipl.com>.
One suggestion I can make is to ensure that the html-parse plugin is built before your plugin (since you are including the jars that are generated in its build). 

> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 14 March 2018 09:55
> To: user@nutch.apache.org
> Subject: Re: Dependency between plugins
> 
> Hi,
> It didn't worked in ant runtime.
> I included  "import org.apache.nutch.parse.html;" in my custom parser code.
> but it is throwing errror while i am doing ant runtime.
> 
>     [javac]
> /Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-
> custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
> error: cannot find symbol
> 
>     [javac] import org.apache.nutch.parse.html;
> 
>     [javac]                              ^
> 
>     [javac]   symbol:   class html
> 
>     [javac]   location: package org.apache.nutch.parse
> 
> 
> below are the xml files of my parser
> 
> 
> My ivy.xml
> 
> 
> <ivy-module version="1.0">
> 
>   <info organisation="org.apache.nutch" module="${ant.project.name}">
> 
>     <license name="Apache 2.0"/>
> 
>     <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>
> 
>     <description>
> 
>         Apache Nutch
> 
>     </description>
> 
>   </info>
> 
> 
>   <configurations>
> 
>     <include file="../../../ivy/ivy-configurations.xml"/>
> 
>   </configurations>
> 
> 
>   <publications>
> 
>     <!--get the artifact from our module name-->
> 
>     <artifact conf="master"/>
> 
>   </publications>
> 
> </ivy-module>
> 
> build.xml
> 
> <project name="parse-custom" default="jar-core">
> 
>   <import file="../build-plugin.xml"/>
> 
>   <!-- Build compilation dependencies -->
>   <target name="deps-jar">
>     <ant target="compile-test" inheritall="false" dir="../parse-html"/>
>   </target>
> 
> 
>   <path id="plugin.deps">
>         <fileset dir="${nutch.root}/build">
>             <include name="**/parse-html/*.jar" />
>         </fileset>
> </path>
> 
>   <!-- Deploy Unit test dependencies -->
>   <target name="deps-test">
>     <ant target="deploy" inheritall="false" dir="../parse-html"/>
>     <ant target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
>   </target>
> 
> </project>
> 
> plugin.xml
> 
> <plugin
>    id="parse-custom"
>    name="Custom Parse Plug-in"
>    version="1.0.0"
>    provider-name="nutch.org">
> 
>    <runtime>
>       <library name="parse-custom.jar">
>          <export name="*"/>
>       </library>
>    </runtime>
> 
>    <requires>
>       <import plugin="parse-html"/>
>       <import plugin="nutch-extensionpoints"/>
>     </requires>
>    <extension id="org.apache.nutch.parse.custom"
>               name="CustomParse"
>               point="org.apache.nutch.parse.Parser">
> 
>       <implementation id="org.apache.nutch.parse.custom.CustomParser"
>                       class="org.apache.nutch.parse.custom.CustomParser">
>         <parameter name="contentType"
> value="text/html|application/xhtml+xml"/>
>         <parameter name="pathSuffix" value=""/>
>       </implementation>
> 
>    </extension>
> 
> </plugin>
> 
> 
> 
> 
> On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari <yo...@pipl.com>
> wrote:
> 
> > Hi Yash,
> >
> > I don't know how to do it, I never tried, but if I had to it would be
> > a trial and error thing....
> >
> > If you want to increase the chances that someone will answer your
> > question, I suggest you provide as much information as possible:
> > Where did it not work? In "ant runtime", or when running in Hadoop?
> > What was the error message?
> > What is the content of your build.xml, plugin.xml, and ivy.xml?
> > Is parse-html configured in your plugin-includes?
> >
> > If it's a problem during execution, I would suggest looking at or
> > debugging the code of PluginClassLoader.
> >
> >
> > > -----Original Message-----
> > > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > > Sent: 14 March 2018 08:34
> > > To: user@nutch.apache.org
> > > Subject: Re: Dependency between plugins
> > >
> > > Anybody please help me out regarding this.
> > >
> > > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > > rit2014049@iiita.ac.in> wrote:
> > >
> > > > I am trying to import Htmlparser in my custom parser.
> > > > I did it in the same way by which Htmlparser imports lib-nekohtml
> > > > but it didn't worked.
> > > > Can anybody please tell me how to do it?
> > > >
> >
> >


Re: Dependency between plugins

Posted by Yash Thenuan Thenuan <ri...@iiita.ac.in>.
Hi,
It didn't worked in ant runtime.
I included  "import org.apache.nutch.parse.html;" in my custom parser code.
but it is throwing errror while i am doing ant runtime.

    [javac]
/Users/yasht/Downloads/apache-nutch-1.14/src/plugin/parse-custom/src/java/org/apache/nutch/parse/custom/CustomParser.java:41:
error: cannot find symbol

    [javac] import org.apache.nutch.parse.html;

    [javac]                              ^

    [javac]   symbol:   class html

    [javac]   location: package org.apache.nutch.parse


below are the xml files of my parser


My ivy.xml


<ivy-module version="1.0">

  <info organisation="org.apache.nutch" module="${ant.project.name}">

    <license name="Apache 2.0"/>

    <ivyauthor name="Apache Nutch Team" url="http://nutch.apache.org"/>

    <description>

        Apache Nutch

    </description>

  </info>


  <configurations>

    <include file="../../../ivy/ivy-configurations.xml"/>

  </configurations>


  <publications>

    <!--get the artifact from our module name-->

    <artifact conf="master"/>

  </publications>

</ivy-module>

build.xml

<project name="parse-custom" default="jar-core">

  <import file="../build-plugin.xml"/>

  <!-- Build compilation dependencies -->
  <target name="deps-jar">
    <ant target="compile-test" inheritall="false" dir="../parse-html"/>
  </target>


  <path id="plugin.deps">
        <fileset dir="${nutch.root}/build">
            <include name="**/parse-html/*.jar" />
        </fileset>
</path>

  <!-- Deploy Unit test dependencies -->
  <target name="deps-test">
    <ant target="deploy" inheritall="false" dir="../parse-html"/>
    <ant target="deploy" inheritall="false" dir="../nutch-extensionpoints"/>
  </target>

</project>

plugin.xml

<plugin
   id="parse-custom"
   name="Custom Parse Plug-in"
   version="1.0.0"
   provider-name="nutch.org">

   <runtime>
      <library name="parse-custom.jar">
         <export name="*"/>
      </library>
   </runtime>

   <requires>
      <import plugin="parse-html"/>
      <import plugin="nutch-extensionpoints"/>
    </requires>
   <extension id="org.apache.nutch.parse.custom"
              name="CustomParse"
              point="org.apache.nutch.parse.Parser">

      <implementation id="org.apache.nutch.parse.custom.CustomParser"
                      class="org.apache.nutch.parse.custom.CustomParser">
        <parameter name="contentType"
value="text/html|application/xhtml+xml"/>
        <parameter name="pathSuffix" value=""/>
      </implementation>

   </extension>

</plugin>




On Wed, Mar 14, 2018 at 1:02 PM, Yossi Tamari <yo...@pipl.com> wrote:

> Hi Yash,
>
> I don't know how to do it, I never tried, but if I had to it would be a
> trial and error thing....
>
> If you want to increase the chances that someone will answer your
> question, I suggest you provide as much information as possible:
> Where did it not work? In "ant runtime", or when running in Hadoop? What
> was the error message?
> What is the content of your build.xml, plugin.xml, and ivy.xml?
> Is parse-html configured in your plugin-includes?
>
> If it's a problem during execution, I would suggest looking at or
> debugging the code of PluginClassLoader.
>
>
> > -----Original Message-----
> > From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> > Sent: 14 March 2018 08:34
> > To: user@nutch.apache.org
> > Subject: Re: Dependency between plugins
> >
> > Anybody please help me out regarding this.
> >
> > On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> > rit2014049@iiita.ac.in> wrote:
> >
> > > I am trying to import Htmlparser in my custom parser.
> > > I did it in the same way by which Htmlparser imports lib-nekohtml but
> > > it didn't worked.
> > > Can anybody please tell me how to do it?
> > >
>
>

RE: Dependency between plugins

Posted by Yossi Tamari <yo...@pipl.com>.
Hi Yash,

I don't know how to do it, I never tried, but if I had to it would be a trial and error thing....

If you want to increase the chances that someone will answer your question, I suggest you provide as much information as possible:
Where did it not work? In "ant runtime", or when running in Hadoop? What was the error message?
What is the content of your build.xml, plugin.xml, and ivy.xml?
Is parse-html configured in your plugin-includes?

If it's a problem during execution, I would suggest looking at or debugging the code of PluginClassLoader.


> -----Original Message-----
> From: Yash Thenuan Thenuan <ri...@iiita.ac.in>
> Sent: 14 March 2018 08:34
> To: user@nutch.apache.org
> Subject: Re: Dependency between plugins
> 
> Anybody please help me out regarding this.
> 
> On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
> rit2014049@iiita.ac.in> wrote:
> 
> > I am trying to import Htmlparser in my custom parser.
> > I did it in the same way by which Htmlparser imports lib-nekohtml but
> > it didn't worked.
> > Can anybody please tell me how to do it?
> >


Re: Dependency between plugins

Posted by Yash Thenuan Thenuan <ri...@iiita.ac.in>.
Anybody please help me out regarding this.

On Tue, Mar 13, 2018 at 6:51 PM, Yash Thenuan Thenuan <
rit2014049@iiita.ac.in> wrote:

> I am trying to import Htmlparser in my custom parser.
> I did it in the same way by which Htmlparser imports lib-nekohtml but it
> didn't worked.
> Can anybody please tell me how to do it?
>