You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@nutch.apache.org by "Prashant More (प्रशांत मोरे)" <mo...@gmail.com> on 2013/02/06 07:24:31 UTC

Customizing Nutch 1.5 in Eclipse Juno

Hi,
   I am modifying the nutch source to direct the crawled content to mysql
db in my own database structure for further processing. Initially, I
condigured Nutch1.5 source with eclipse Juno and it crawls the data on my
files system, as expected. Then I wrote some code for directing the crawled
data to my DB.

I added the code to the Nutch source and added the required libraries to
the build path. But it is unable to find my packages in libraries and
hadoop packages, during the build time.

I placed my jars/libraries in NUTCH_HOME/lib, as this is used by build.xml
for compiling.

It is showing compile error while building, however, when I made changes in
Nutch source it did not show any errors.

Kindly let me know what am i missing?
-- 
More Prashant

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by Julien Nioche <li...@gmail.com>.
or use https://issues.apache.org/jira/browse/NUTCH-1047 and write your own
indexing backend. That's exactly what NUTCH-1047 is for.

On 22 February 2013 09:10, feng lu <am...@gmail.com> wrote:

> Hi Parshant
>
> I think the fastest method  to is use nutch 2.1 like Tejas says , it can
> extend your own back-end DB through Apache Gora. But it currently only
> support HBase, Cassandra etc.
>
> But if you want to modify the source code of nutch 1.x to meet your needs,
> you can see the ParseOutputFormat class,  it used to output the parsed data
> include content,outlinks, metadata etc. your can implement your own
> ParseOutputFormat to direct information to your DB.
>
> But i still  do not recommend to modify the source code.
>
>
> On Fri, Feb 22, 2013 at 2:45 PM, Prashant More (प्रशांत मोरे) <
> morepj@gmail.com> wrote:
>
> > Thank you Tejas.
> >
> > Your tips helped a lot.
> >
> > One more thing is, after building, the plugin.folder property should
> point
> > to build/plugins for executing the crawl.
> >
> > Now it crawling fine. My concern is to locate object which has the
> content
> > and its metadata so that I can capture that and direct to my DB, as
> > mentioned earlier. How to do that?
> >
> > Thanks,
> >
> > --
> > Prashant More
> >
> >
> > On Thu, Feb 7, 2013 at 11:40 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > >wrote:
> >
> > > On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) <
> > > morepj@gmail.com> wrote:
> > >
> > > > Thank you Tejas.
> > > > I have added all the libraries/jars mentioned in [1], along with my
> > > source
> > > > jar and other required jars to the classpath. The difference between
> > the
> > > > bin/nutch script and the tutorial [1] is adding java's tools.jar in
> the
> > > > script, and not adding nutch's build directory in eclipse as we want
> to
> > > use
> > > > the source for building nutch.
> > >
> > > Ok.
> > >
> > >
> > > > I have added the tools.jar and instead of
> > > > build directory, I have added nutch's java source to the classpath.
> > > >
> > > > [1] http://wiki.apache.org/nutch/RunNutchInEclipse
> > > >
> > > > Still it is giving the same error.
> > > >
> > > What is the name of that package that you are adding: is it
> > > org.apache.nutch.XXXX or something else ?
> > > How do you compile the code in Eclipse: running the ant build file or
> > some
> > > other way ?
> > > These are relevant chunks in build.xml [1] that might help you: lines
> > > 86-100, 455-460.
> > > If you are running ant build file, try to print the classpath formed in
> > the
> > > compile-core target ([2] tells how to do that).
> > > There are 2 possibilities:
> > > 1. the extra jars you added are not in the classpath: in this case, you
> > can
> > > debug the "copy-libs" target and check what all things are getting
> > copied.
> > > 2. the extra jars you added are in the classpath and yet you see
> > > compilation error: This might be strange but leading towards an
> eclispe +
> > > ant issue and probably wont have to do with nutch.
> > >
> > > [1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup
> > > [2] : http://www.javalobby.org/java/forums/t71033.html
> > >
> > >
> > > >
> > > > Thanks,
> > > > Prashant More
> > > >
> > > > On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <
> tejas.patil.cs@gmail.com
> > > > >wrote:
> > > >
> > > > > If you see the bin/nutch script, there are lot of things that are
> to
> > be
> > > > > added to the CP before the actual nutch class is invoked. Looking
> at
> > > the
> > > > > script you will get a hint about what is missing. Also, beware of
> > your
> > > > > package naming. Build script it looks at specific places only for
> > > source
> > > > > files. eg.
> > > > > includes="org/apache/nutch/**/*.java"
> > > > > Tweaking the build file or placing your classes at right place
> might
> > > help
> > > > > you here.
> > > > >
> > > > > thanks,
> > > > > Tejas Patil
> > > > >
> > > > >
> > > > >
> > > > > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> > > > > morepj@gmail.com> wrote:
> > > > >
> > > > > > Thank you, Tejas.
> > > > > >
> > > > > > My DB is already in place, for processing, I have configured and
> > used
> > > > > > Nutch1.0 from shell script, but I want to configure and modify
> > using
> > > > > > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> > > > > >
> > > > > > Thanks,
> > > > > > Prashant More
> > > > > >
> > > > > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <
> > > tejas.patil.cs@gmail.com
> > > > > > >wrote:
> > > > > >
> > > > > > > Have you considered using nutch 2.x ? It has support for doing
> > > this.
> > > > > > Google
> > > > > > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > > > > > >
> > > > > > > [0] : http://nlp.solutions.asia/?p=180
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Tejas Patil
> > > > > > >
> > > > > > >
> > > > > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > > > > > morepj@gmail.com> wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >    I am modifying the nutch source to direct the crawled
> > content
> > > to
> > > > > > mysql
> > > > > > > > db in my own database structure for further processing.
> > > Initially,
> > > > I
> > > > > > > > condigured Nutch1.5 source with eclipse Juno and it crawls
> the
> > > data
> > > > > on
> > > > > > my
> > > > > > > > files system, as expected. Then I wrote some code for
> directing
> > > the
> > > > > > > crawled
> > > > > > > > data to my DB.
> > > > > > > >
> > > > > > > > I added the code to the Nutch source and added the required
> > > > libraries
> > > > > > to
> > > > > > > > the build path. But it is unable to find my packages in
> > libraries
> > > > and
> > > > > > > > hadoop packages, during the build time.
> > > > > > > >
> > > > > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used
> > by
> > > > > > > build.xml
> > > > > > > > for compiling.
> > > > > > > >
> > > > > > > > It is showing compile error while building, however, when I
> > made
> > > > > > changes
> > > > > > > in
> > > > > > > > Nutch source it did not show any errors.
> > > > > > > >
> > > > > > > > Kindly let me know what am i missing?
> > > > > > > > --
> > > > > > > > More Prashant
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by feng lu <am...@gmail.com>.
Hi Parshant

I think the fastest method  to is use nutch 2.1 like Tejas says , it can
extend your own back-end DB through Apache Gora. But it currently only
support HBase, Cassandra etc.

But if you want to modify the source code of nutch 1.x to meet your needs,
you can see the ParseOutputFormat class,  it used to output the parsed data
include content,outlinks, metadata etc. your can implement your own
ParseOutputFormat to direct information to your DB.

But i still  do not recommend to modify the source code.


On Fri, Feb 22, 2013 at 2:45 PM, Prashant More (प्रशांत मोरे) <
morepj@gmail.com> wrote:

> Thank you Tejas.
>
> Your tips helped a lot.
>
> One more thing is, after building, the plugin.folder property should point
> to build/plugins for executing the crawl.
>
> Now it crawling fine. My concern is to locate object which has the content
> and its metadata so that I can capture that and direct to my DB, as
> mentioned earlier. How to do that?
>
> Thanks,
>
> --
> Prashant More
>
>
> On Thu, Feb 7, 2013 at 11:40 AM, Tejas Patil <tejas.patil.cs@gmail.com
> >wrote:
>
> > On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) <
> > morepj@gmail.com> wrote:
> >
> > > Thank you Tejas.
> > > I have added all the libraries/jars mentioned in [1], along with my
> > source
> > > jar and other required jars to the classpath. The difference between
> the
> > > bin/nutch script and the tutorial [1] is adding java's tools.jar in the
> > > script, and not adding nutch's build directory in eclipse as we want to
> > use
> > > the source for building nutch.
> >
> > Ok.
> >
> >
> > > I have added the tools.jar and instead of
> > > build directory, I have added nutch's java source to the classpath.
> > >
> > > [1] http://wiki.apache.org/nutch/RunNutchInEclipse
> > >
> > > Still it is giving the same error.
> > >
> > What is the name of that package that you are adding: is it
> > org.apache.nutch.XXXX or something else ?
> > How do you compile the code in Eclipse: running the ant build file or
> some
> > other way ?
> > These are relevant chunks in build.xml [1] that might help you: lines
> > 86-100, 455-460.
> > If you are running ant build file, try to print the classpath formed in
> the
> > compile-core target ([2] tells how to do that).
> > There are 2 possibilities:
> > 1. the extra jars you added are not in the classpath: in this case, you
> can
> > debug the "copy-libs" target and check what all things are getting
> copied.
> > 2. the extra jars you added are in the classpath and yet you see
> > compilation error: This might be strange but leading towards an eclispe +
> > ant issue and probably wont have to do with nutch.
> >
> > [1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup
> > [2] : http://www.javalobby.org/java/forums/t71033.html
> >
> >
> > >
> > > Thanks,
> > > Prashant More
> > >
> > > On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > > >wrote:
> > >
> > > > If you see the bin/nutch script, there are lot of things that are to
> be
> > > > added to the CP before the actual nutch class is invoked. Looking at
> > the
> > > > script you will get a hint about what is missing. Also, beware of
> your
> > > > package naming. Build script it looks at specific places only for
> > source
> > > > files. eg.
> > > > includes="org/apache/nutch/**/*.java"
> > > > Tweaking the build file or placing your classes at right place might
> > help
> > > > you here.
> > > >
> > > > thanks,
> > > > Tejas Patil
> > > >
> > > >
> > > >
> > > > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> > > > morepj@gmail.com> wrote:
> > > >
> > > > > Thank you, Tejas.
> > > > >
> > > > > My DB is already in place, for processing, I have configured and
> used
> > > > > Nutch1.0 from shell script, but I want to configure and modify
> using
> > > > > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> > > > >
> > > > > Thanks,
> > > > > Prashant More
> > > > >
> > > > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <
> > tejas.patil.cs@gmail.com
> > > > > >wrote:
> > > > >
> > > > > > Have you considered using nutch 2.x ? It has support for doing
> > this.
> > > > > Google
> > > > > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > > > > >
> > > > > > [0] : http://nlp.solutions.asia/?p=180
> > > > > >
> > > > > > Thanks,
> > > > > > Tejas Patil
> > > > > >
> > > > > >
> > > > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > > > > morepj@gmail.com> wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >    I am modifying the nutch source to direct the crawled
> content
> > to
> > > > > mysql
> > > > > > > db in my own database structure for further processing.
> > Initially,
> > > I
> > > > > > > condigured Nutch1.5 source with eclipse Juno and it crawls the
> > data
> > > > on
> > > > > my
> > > > > > > files system, as expected. Then I wrote some code for directing
> > the
> > > > > > crawled
> > > > > > > data to my DB.
> > > > > > >
> > > > > > > I added the code to the Nutch source and added the required
> > > libraries
> > > > > to
> > > > > > > the build path. But it is unable to find my packages in
> libraries
> > > and
> > > > > > > hadoop packages, during the build time.
> > > > > > >
> > > > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used
> by
> > > > > > build.xml
> > > > > > > for compiling.
> > > > > > >
> > > > > > > It is showing compile error while building, however, when I
> made
> > > > > changes
> > > > > > in
> > > > > > > Nutch source it did not show any errors.
> > > > > > >
> > > > > > > Kindly let me know what am i missing?
> > > > > > > --
> > > > > > > More Prashant
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>



-- 
Don't Grow Old, Grow Up... :-)

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by "Prashant More (प्रशांत मोरे)" <mo...@gmail.com>.
Thank you Tejas.

Your tips helped a lot.

One more thing is, after building, the plugin.folder property should point
to build/plugins for executing the crawl.

Now it crawling fine. My concern is to locate object which has the content
and its metadata so that I can capture that and direct to my DB, as
mentioned earlier. How to do that?

Thanks,

--
Prashant More


On Thu, Feb 7, 2013 at 11:40 AM, Tejas Patil <te...@gmail.com>wrote:

> On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) <
> morepj@gmail.com> wrote:
>
> > Thank you Tejas.
> > I have added all the libraries/jars mentioned in [1], along with my
> source
> > jar and other required jars to the classpath. The difference between the
> > bin/nutch script and the tutorial [1] is adding java's tools.jar in the
> > script, and not adding nutch's build directory in eclipse as we want to
> use
> > the source for building nutch.
>
> Ok.
>
>
> > I have added the tools.jar and instead of
> > build directory, I have added nutch's java source to the classpath.
> >
> > [1] http://wiki.apache.org/nutch/RunNutchInEclipse
> >
> > Still it is giving the same error.
> >
> What is the name of that package that you are adding: is it
> org.apache.nutch.XXXX or something else ?
> How do you compile the code in Eclipse: running the ant build file or some
> other way ?
> These are relevant chunks in build.xml [1] that might help you: lines
> 86-100, 455-460.
> If you are running ant build file, try to print the classpath formed in the
> compile-core target ([2] tells how to do that).
> There are 2 possibilities:
> 1. the extra jars you added are not in the classpath: in this case, you can
> debug the "copy-libs" target and check what all things are getting copied.
> 2. the extra jars you added are in the classpath and yet you see
> compilation error: This might be strange but leading towards an eclispe +
> ant issue and probably wont have to do with nutch.
>
> [1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup
> [2] : http://www.javalobby.org/java/forums/t71033.html
>
>
> >
> > Thanks,
> > Prashant More
> >
> > On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <tejas.patil.cs@gmail.com
> > >wrote:
> >
> > > If you see the bin/nutch script, there are lot of things that are to be
> > > added to the CP before the actual nutch class is invoked. Looking at
> the
> > > script you will get a hint about what is missing. Also, beware of your
> > > package naming. Build script it looks at specific places only for
> source
> > > files. eg.
> > > includes="org/apache/nutch/**/*.java"
> > > Tweaking the build file or placing your classes at right place might
> help
> > > you here.
> > >
> > > thanks,
> > > Tejas Patil
> > >
> > >
> > >
> > > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> > > morepj@gmail.com> wrote:
> > >
> > > > Thank you, Tejas.
> > > >
> > > > My DB is already in place, for processing, I have configured and used
> > > > Nutch1.0 from shell script, but I want to configure and modify using
> > > > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> > > >
> > > > Thanks,
> > > > Prashant More
> > > >
> > > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <
> tejas.patil.cs@gmail.com
> > > > >wrote:
> > > >
> > > > > Have you considered using nutch 2.x ? It has support for doing
> this.
> > > > Google
> > > > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > > > >
> > > > > [0] : http://nlp.solutions.asia/?p=180
> > > > >
> > > > > Thanks,
> > > > > Tejas Patil
> > > > >
> > > > >
> > > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > > > morepj@gmail.com> wrote:
> > > > >
> > > > > > Hi,
> > > > > >    I am modifying the nutch source to direct the crawled content
> to
> > > > mysql
> > > > > > db in my own database structure for further processing.
> Initially,
> > I
> > > > > > condigured Nutch1.5 source with eclipse Juno and it crawls the
> data
> > > on
> > > > my
> > > > > > files system, as expected. Then I wrote some code for directing
> the
> > > > > crawled
> > > > > > data to my DB.
> > > > > >
> > > > > > I added the code to the Nutch source and added the required
> > libraries
> > > > to
> > > > > > the build path. But it is unable to find my packages in libraries
> > and
> > > > > > hadoop packages, during the build time.
> > > > > >
> > > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used by
> > > > > build.xml
> > > > > > for compiling.
> > > > > >
> > > > > > It is showing compile error while building, however, when I made
> > > > changes
> > > > > in
> > > > > > Nutch source it did not show any errors.
> > > > > >
> > > > > > Kindly let me know what am i missing?
> > > > > > --
> > > > > > More Prashant
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by Tejas Patil <te...@gmail.com>.
On Wed, Feb 6, 2013 at 9:23 PM, Prashant More (प्रशांत मोरे) <
morepj@gmail.com> wrote:

> Thank you Tejas.
> I have added all the libraries/jars mentioned in [1], along with my source
> jar and other required jars to the classpath. The difference between the
> bin/nutch script and the tutorial [1] is adding java's tools.jar in the
> script, and not adding nutch's build directory in eclipse as we want to use
> the source for building nutch.

Ok.


> I have added the tools.jar and instead of
> build directory, I have added nutch's java source to the classpath.
>
> [1] http://wiki.apache.org/nutch/RunNutchInEclipse
>
> Still it is giving the same error.
>
What is the name of that package that you are adding: is it
org.apache.nutch.XXXX or something else ?
How do you compile the code in Eclipse: running the ant build file or some
other way ?
These are relevant chunks in build.xml [1] that might help you: lines
86-100, 455-460.
If you are running ant build file, try to print the classpath formed in the
compile-core target ([2] tells how to do that).
There are 2 possibilities:
1. the extra jars you added are not in the classpath: in this case, you can
debug the "copy-libs" target and check what all things are getting copied.
2. the extra jars you added are in the classpath and yet you see
compilation error: This might be strange but leading towards an eclispe +
ant issue and probably wont have to do with nutch.

[1] : http://svn.apache.org/viewvc/nutch/trunk/build.xml?view=markup
[2] : http://www.javalobby.org/java/forums/t71033.html


>
> Thanks,
> Prashant More
>
> On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <tejas.patil.cs@gmail.com
> >wrote:
>
> > If you see the bin/nutch script, there are lot of things that are to be
> > added to the CP before the actual nutch class is invoked. Looking at the
> > script you will get a hint about what is missing. Also, beware of your
> > package naming. Build script it looks at specific places only for source
> > files. eg.
> > includes="org/apache/nutch/**/*.java"
> > Tweaking the build file or placing your classes at right place might help
> > you here.
> >
> > thanks,
> > Tejas Patil
> >
> >
> >
> > On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> > morepj@gmail.com> wrote:
> >
> > > Thank you, Tejas.
> > >
> > > My DB is already in place, for processing, I have configured and used
> > > Nutch1.0 from shell script, but I want to configure and modify using
> > > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> > >
> > > Thanks,
> > > Prashant More
> > >
> > > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <tejas.patil.cs@gmail.com
> > > >wrote:
> > >
> > > > Have you considered using nutch 2.x ? It has support for doing this.
> > > Google
> > > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > > >
> > > > [0] : http://nlp.solutions.asia/?p=180
> > > >
> > > > Thanks,
> > > > Tejas Patil
> > > >
> > > >
> > > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > > morepj@gmail.com> wrote:
> > > >
> > > > > Hi,
> > > > >    I am modifying the nutch source to direct the crawled content to
> > > mysql
> > > > > db in my own database structure for further processing. Initially,
> I
> > > > > condigured Nutch1.5 source with eclipse Juno and it crawls the data
> > on
> > > my
> > > > > files system, as expected. Then I wrote some code for directing the
> > > > crawled
> > > > > data to my DB.
> > > > >
> > > > > I added the code to the Nutch source and added the required
> libraries
> > > to
> > > > > the build path. But it is unable to find my packages in libraries
> and
> > > > > hadoop packages, during the build time.
> > > > >
> > > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used by
> > > > build.xml
> > > > > for compiling.
> > > > >
> > > > > It is showing compile error while building, however, when I made
> > > changes
> > > > in
> > > > > Nutch source it did not show any errors.
> > > > >
> > > > > Kindly let me know what am i missing?
> > > > > --
> > > > > More Prashant
> > > > >
> > > >
> > >
> >
>

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by "Prashant More (प्रशांत मोरे)" <mo...@gmail.com>.
Thank you Tejas.
I have added all the libraries/jars mentioned in [1], along with my source
jar and other required jars to the classpath. The difference between the
bin/nutch script and the tutorial [1] is adding java's tools.jar in the
script, and not adding nutch's build directory in eclipse as we want to use
the source for building nutch. I have added the tools.jar and instead of
build directory, I have added nutch's java source to the classpath.

[1] http://wiki.apache.org/nutch/RunNutchInEclipse

Still it is giving the same error.

Thanks,
Prashant More

On Thu, Feb 7, 2013 at 5:30 AM, Tejas Patil <te...@gmail.com>wrote:

> If you see the bin/nutch script, there are lot of things that are to be
> added to the CP before the actual nutch class is invoked. Looking at the
> script you will get a hint about what is missing. Also, beware of your
> package naming. Build script it looks at specific places only for source
> files. eg.
> includes="org/apache/nutch/**/*.java"
> Tweaking the build file or placing your classes at right place might help
> you here.
>
> thanks,
> Tejas Patil
>
>
>
> On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
> morepj@gmail.com> wrote:
>
> > Thank you, Tejas.
> >
> > My DB is already in place, for processing, I have configured and used
> > Nutch1.0 from shell script, but I want to configure and modify using
> > eclipse for Nutch1.5. So at present I do not want to use 2.1.
> >
> > Thanks,
> > Prashant More
> >
> > On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <tejas.patil.cs@gmail.com
> > >wrote:
> >
> > > Have you considered using nutch 2.x ? It has support for doing this.
> > Google
> > > out "nutch 2.x mySQL" to get some good tutorials like [0].
> > >
> > > [0] : http://nlp.solutions.asia/?p=180
> > >
> > > Thanks,
> > > Tejas Patil
> > >
> > >
> > > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > > morepj@gmail.com> wrote:
> > >
> > > > Hi,
> > > >    I am modifying the nutch source to direct the crawled content to
> > mysql
> > > > db in my own database structure for further processing. Initially, I
> > > > condigured Nutch1.5 source with eclipse Juno and it crawls the data
> on
> > my
> > > > files system, as expected. Then I wrote some code for directing the
> > > crawled
> > > > data to my DB.
> > > >
> > > > I added the code to the Nutch source and added the required libraries
> > to
> > > > the build path. But it is unable to find my packages in libraries and
> > > > hadoop packages, during the build time.
> > > >
> > > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used by
> > > build.xml
> > > > for compiling.
> > > >
> > > > It is showing compile error while building, however, when I made
> > changes
> > > in
> > > > Nutch source it did not show any errors.
> > > >
> > > > Kindly let me know what am i missing?
> > > > --
> > > > More Prashant
> > > >
> > >
> >
>

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by Tejas Patil <te...@gmail.com>.
If you see the bin/nutch script, there are lot of things that are to be
added to the CP before the actual nutch class is invoked. Looking at the
script you will get a hint about what is missing. Also, beware of your
package naming. Build script it looks at specific places only for source
files. eg.
includes="org/apache/nutch/**/*.java"
Tweaking the build file or placing your classes at right place might help
you here.

thanks,
Tejas Patil



On Wed, Feb 6, 2013 at 12:30 AM, Prashant More (प्रशांत मोरे) <
morepj@gmail.com> wrote:

> Thank you, Tejas.
>
> My DB is already in place, for processing, I have configured and used
> Nutch1.0 from shell script, but I want to configure and modify using
> eclipse for Nutch1.5. So at present I do not want to use 2.1.
>
> Thanks,
> Prashant More
>
> On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <tejas.patil.cs@gmail.com
> >wrote:
>
> > Have you considered using nutch 2.x ? It has support for doing this.
> Google
> > out "nutch 2.x mySQL" to get some good tutorials like [0].
> >
> > [0] : http://nlp.solutions.asia/?p=180
> >
> > Thanks,
> > Tejas Patil
> >
> >
> > On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> > morepj@gmail.com> wrote:
> >
> > > Hi,
> > >    I am modifying the nutch source to direct the crawled content to
> mysql
> > > db in my own database structure for further processing. Initially, I
> > > condigured Nutch1.5 source with eclipse Juno and it crawls the data on
> my
> > > files system, as expected. Then I wrote some code for directing the
> > crawled
> > > data to my DB.
> > >
> > > I added the code to the Nutch source and added the required libraries
> to
> > > the build path. But it is unable to find my packages in libraries and
> > > hadoop packages, during the build time.
> > >
> > > I placed my jars/libraries in NUTCH_HOME/lib, as this is used by
> > build.xml
> > > for compiling.
> > >
> > > It is showing compile error while building, however, when I made
> changes
> > in
> > > Nutch source it did not show any errors.
> > >
> > > Kindly let me know what am i missing?
> > > --
> > > More Prashant
> > >
> >
>

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by "Prashant More (प्रशांत मोरे)" <mo...@gmail.com>.
Thank you, Tejas.

My DB is already in place, for processing, I have configured and used
Nutch1.0 from shell script, but I want to configure and modify using
eclipse for Nutch1.5. So at present I do not want to use 2.1.

Thanks,
Prashant More

On Wed, Feb 6, 2013 at 12:41 PM, Tejas Patil <te...@gmail.com>wrote:

> Have you considered using nutch 2.x ? It has support for doing this. Google
> out "nutch 2.x mySQL" to get some good tutorials like [0].
>
> [0] : http://nlp.solutions.asia/?p=180
>
> Thanks,
> Tejas Patil
>
>
> On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
> morepj@gmail.com> wrote:
>
> > Hi,
> >    I am modifying the nutch source to direct the crawled content to mysql
> > db in my own database structure for further processing. Initially, I
> > condigured Nutch1.5 source with eclipse Juno and it crawls the data on my
> > files system, as expected. Then I wrote some code for directing the
> crawled
> > data to my DB.
> >
> > I added the code to the Nutch source and added the required libraries to
> > the build path. But it is unable to find my packages in libraries and
> > hadoop packages, during the build time.
> >
> > I placed my jars/libraries in NUTCH_HOME/lib, as this is used by
> build.xml
> > for compiling.
> >
> > It is showing compile error while building, however, when I made changes
> in
> > Nutch source it did not show any errors.
> >
> > Kindly let me know what am i missing?
> > --
> > More Prashant
> >
>

Re: Customizing Nutch 1.5 in Eclipse Juno

Posted by Tejas Patil <te...@gmail.com>.
Have you considered using nutch 2.x ? It has support for doing this. Google
out "nutch 2.x mySQL" to get some good tutorials like [0].

[0] : http://nlp.solutions.asia/?p=180

Thanks,
Tejas Patil


On Tue, Feb 5, 2013 at 10:24 PM, Prashant More (प्रशांत मोरे) <
morepj@gmail.com> wrote:

> Hi,
>    I am modifying the nutch source to direct the crawled content to mysql
> db in my own database structure for further processing. Initially, I
> condigured Nutch1.5 source with eclipse Juno and it crawls the data on my
> files system, as expected. Then I wrote some code for directing the crawled
> data to my DB.
>
> I added the code to the Nutch source and added the required libraries to
> the build path. But it is unable to find my packages in libraries and
> hadoop packages, during the build time.
>
> I placed my jars/libraries in NUTCH_HOME/lib, as this is used by build.xml
> for compiling.
>
> It is showing compile error while building, however, when I made changes in
> Nutch source it did not show any errors.
>
> Kindly let me know what am i missing?
> --
> More Prashant
>