You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by Andrew Libby <an...@gmail.com> on 2006/05/01 15:31:38 UTC

A Developer's getting started doc?

Greetings,

I'm learning Nutch, and would like to insert debugging statements to
learn more about how Nutch works.  Specifically, I'm trying to debug
problems I'm having with the subcollections plugin. 

To this end, I'm looking to have a development copy of nutch running. 
Is there a good way to do this?  I'm looking to have the webapp running,
and do crawls of small local sites and then do a edit - compile - run
cycle. 

Can anyone offer advice or describe how they go about doing this?

Thanks in advance.

Andy

-- 
Andrew Libby                                  
alibby@philadelphiariders.com
http://philadelphiariders.com/



Re: A Developer's getting started doc?

Posted by Lukas Vlcek <lu...@gmail.com>.
You are right Thomas. I haven't expressed the goal yet. And I agree
that the most important mission is probably delivery of stable
release. I'll stop spamming this thread with my complaints because in
fact the real trouble for me is setting my IDE correctly (btw: I found
Stefan's media-style wiki helpful). Therefore I thought that having
maven's project.xml file would help me a lot. There are a lot of other
issues to focus on right now.

Regards,
Lukas

Lukas

On 5/4/06, TDLN <di...@gmail.com> wrote:
> Lukas,
>
> Actually before proposing any solution, you should identify the
> problem. In this case IMO the problem has not been identified; the
> build system is fine, the scripts are not really complex and do what
> is expected. I can therefor fully understand if the focus of
> development is not on replacing Ant with Maven or whatever other build
> system, but on delilvering a stable release.
>
> Rgrds, Thomas
>
> On 5/3/06, Lukas Vlcek <lu...@gmail.com> wrote:
> > Thanks Thomas,
> >
> > I gave a quick glance at Ivy. It looks interesting.
> > But does it really bring heavy simplification over Maven if I need
> > more advanced stuff? Does it allow jelly integration? How much it is
> > adopted across open-source community? Is there any up-to-date Ivy
> > repository apart from Maven repositries?
> >
> > I know that these questions shouldn't be discussed in nutch-dev
> > maillist, however, it would be really benefit for me if
> > Nutch/Lucene/Hadoop is maintained by any project management system (be
> > it Ivy, Maven, M2 ...). Ant is good but I believe we could get more...
> >
> > Regards,
> > Lukas
> >
> > On 5/3/06, TDLN <di...@gmail.com> wrote:
> > > Hi Lukas,
> > >
> > > the .classpath and .project files are in the attached zip. You might
> > > get some errors first, because you still have to download and add some
> > > dependencies yourself (like PDFBox).
> > >
> > > I personally am not such a big fan of Maven - IMO it adds a lot of
> > > overhead for just small benefits. One thing I do like about it is that
> > > makes it possible to manage your dependencies in a clean way. But this
> > > can also be achieved by a lightweight open source (BSD license) add-on
> > > to Ant like Ivy Dependency Management
> > > (http://www.jayasoft.fr/org/modules/ivy/overview.php). It has the
> > > benefits of Maven, without the overhead and learning curve involved.
> > >
> > > Rgrds. Thomas
> > >
> > >
> > >
> > > On 5/2/06, Lukas Vlcek <lu...@gmail.com> wrote:
> > > > Thomas,
> > > >
> > > > I would really appreciate your .classpath and .project files for
> > > > Eclipse (for Nutch-trunk). Could you send them to me? Or could you
> > > > upload them somewhere?
> > > >
> > > > I don't think I am novice in terms of Eclipse but frankly I am to lazy
> > > > configuring all these settings manually. I do use Maven all the time
> > > > for my own projects. I think somebody already noted that using Maven
> > > > for Nutch/Lucene/Hadoop would be highly appreciated. IMHO Maven is a
> > > > good investment (and thanks to mavenide Eclipse could learn where are
> > > > the sources, where are tests, what libs to use [they wouldn't need to
> > > > be part of SVN repository anynmore] ...).
> > > >
> > > > Is there any plan to migrate to Maven? I can participate if anybody is
> > > > interested.
> > > >
> > > > Regards,
> > > > Lukas
> > > >
> > > > On 5/2/06, TDLN <di...@gmail.com> wrote:
> > > > > Hi Andrew,
> > > > >
> > > > > you can either get one of the distributions, a nightly build, or check
> > > > > out directly from SVN to get the sources.
> > > > >
> > > > > Then I would suggest checking the targets in the ant build file; there
> > > > > are targets for compiling. cleaning and testing. Use 'ant tar' to make
> > > > > a release tarball that you can deploy in your sandbox. Add the bin
> > > > > directory to your path and off you go.
> > > > >
> > > > > BTW: Nutch uses JDK 1.4 logging - changing the default log level from
> > > > > INFO to FINE already gives much more information.
> > > > >
> > > > > If you like to use Eclipse to mount the sources. just let me know, I
> > > > > can send you the required .classpath and .project files.
> > > > >
> > > > > HTH, Thomas
> > > > >
> > > > > On 5/1/06, Andrew Libby <an...@gmail.com> wrote:
> > > > > >
> > > > > > Greetings,
> > > > > >
> > > > > > I'm learning Nutch, and would like to insert debugging statements to
> > > > > > learn more about how Nutch works.  Specifically, I'm trying to debug
> > > > > > problems I'm having with the subcollections plugin.
> > > > > >
> > > > > > To this end, I'm looking to have a development copy of nutch running.
> > > > > > Is there a good way to do this?  I'm looking to have the webapp running,
> > > > > > and do crawls of small local sites and then do a edit - compile - run
> > > > > > cycle.
> > > > > >
> > > > > > Can anyone offer advice or describe how they go about doing this?
> > > > > >
> > > > > > Thanks in advance.
> > > > > >
> > > > > > Andy
> > > > > >
> > > > > > --
> > > > > > Andrew Libby
> > > > > > alibby@philadelphiariders.com
> > > > > > http://philadelphiariders.com/
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> > >
> > >
> >
>

Re: A Developer's getting started doc?

Posted by TDLN <di...@gmail.com>.
Lukas,

Actually before proposing any solution, you should identify the
problem. In this case IMO the problem has not been identified; the
build system is fine, the scripts are not really complex and do what
is expected. I can therefor fully understand if the focus of
development is not on replacing Ant with Maven or whatever other build
system, but on delilvering a stable release.

Rgrds, Thomas

On 5/3/06, Lukas Vlcek <lu...@gmail.com> wrote:
> Thanks Thomas,
>
> I gave a quick glance at Ivy. It looks interesting.
> But does it really bring heavy simplification over Maven if I need
> more advanced stuff? Does it allow jelly integration? How much it is
> adopted across open-source community? Is there any up-to-date Ivy
> repository apart from Maven repositries?
>
> I know that these questions shouldn't be discussed in nutch-dev
> maillist, however, it would be really benefit for me if
> Nutch/Lucene/Hadoop is maintained by any project management system (be
> it Ivy, Maven, M2 ...). Ant is good but I believe we could get more...
>
> Regards,
> Lukas
>
> On 5/3/06, TDLN <di...@gmail.com> wrote:
> > Hi Lukas,
> >
> > the .classpath and .project files are in the attached zip. You might
> > get some errors first, because you still have to download and add some
> > dependencies yourself (like PDFBox).
> >
> > I personally am not such a big fan of Maven - IMO it adds a lot of
> > overhead for just small benefits. One thing I do like about it is that
> > makes it possible to manage your dependencies in a clean way. But this
> > can also be achieved by a lightweight open source (BSD license) add-on
> > to Ant like Ivy Dependency Management
> > (http://www.jayasoft.fr/org/modules/ivy/overview.php). It has the
> > benefits of Maven, without the overhead and learning curve involved.
> >
> > Rgrds. Thomas
> >
> >
> >
> > On 5/2/06, Lukas Vlcek <lu...@gmail.com> wrote:
> > > Thomas,
> > >
> > > I would really appreciate your .classpath and .project files for
> > > Eclipse (for Nutch-trunk). Could you send them to me? Or could you
> > > upload them somewhere?
> > >
> > > I don't think I am novice in terms of Eclipse but frankly I am to lazy
> > > configuring all these settings manually. I do use Maven all the time
> > > for my own projects. I think somebody already noted that using Maven
> > > for Nutch/Lucene/Hadoop would be highly appreciated. IMHO Maven is a
> > > good investment (and thanks to mavenide Eclipse could learn where are
> > > the sources, where are tests, what libs to use [they wouldn't need to
> > > be part of SVN repository anynmore] ...).
> > >
> > > Is there any plan to migrate to Maven? I can participate if anybody is
> > > interested.
> > >
> > > Regards,
> > > Lukas
> > >
> > > On 5/2/06, TDLN <di...@gmail.com> wrote:
> > > > Hi Andrew,
> > > >
> > > > you can either get one of the distributions, a nightly build, or check
> > > > out directly from SVN to get the sources.
> > > >
> > > > Then I would suggest checking the targets in the ant build file; there
> > > > are targets for compiling. cleaning and testing. Use 'ant tar' to make
> > > > a release tarball that you can deploy in your sandbox. Add the bin
> > > > directory to your path and off you go.
> > > >
> > > > BTW: Nutch uses JDK 1.4 logging - changing the default log level from
> > > > INFO to FINE already gives much more information.
> > > >
> > > > If you like to use Eclipse to mount the sources. just let me know, I
> > > > can send you the required .classpath and .project files.
> > > >
> > > > HTH, Thomas
> > > >
> > > > On 5/1/06, Andrew Libby <an...@gmail.com> wrote:
> > > > >
> > > > > Greetings,
> > > > >
> > > > > I'm learning Nutch, and would like to insert debugging statements to
> > > > > learn more about how Nutch works.  Specifically, I'm trying to debug
> > > > > problems I'm having with the subcollections plugin.
> > > > >
> > > > > To this end, I'm looking to have a development copy of nutch running.
> > > > > Is there a good way to do this?  I'm looking to have the webapp running,
> > > > > and do crawls of small local sites and then do a edit - compile - run
> > > > > cycle.
> > > > >
> > > > > Can anyone offer advice or describe how they go about doing this?
> > > > >
> > > > > Thanks in advance.
> > > > >
> > > > > Andy
> > > > >
> > > > > --
> > > > > Andrew Libby
> > > > > alibby@philadelphiariders.com
> > > > > http://philadelphiariders.com/
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> >
> >
>

Re: A Developer's getting started doc?

Posted by Lukas Vlcek <lu...@gmail.com>.
Thanks Thomas,

I gave a quick glance at Ivy. It looks interesting.
But does it really bring heavy simplification over Maven if I need
more advanced stuff? Does it allow jelly integration? How much it is
adopted across open-source community? Is there any up-to-date Ivy
repository apart from Maven repositries?

I know that these questions shouldn't be discussed in nutch-dev
maillist, however, it would be really benefit for me if
Nutch/Lucene/Hadoop is maintained by any project management system (be
it Ivy, Maven, M2 ...). Ant is good but I believe we could get more...

Regards,
Lukas

On 5/3/06, TDLN <di...@gmail.com> wrote:
> Hi Lukas,
>
> the .classpath and .project files are in the attached zip. You might
> get some errors first, because you still have to download and add some
> dependencies yourself (like PDFBox).
>
> I personally am not such a big fan of Maven - IMO it adds a lot of
> overhead for just small benefits. One thing I do like about it is that
> makes it possible to manage your dependencies in a clean way. But this
> can also be achieved by a lightweight open source (BSD license) add-on
> to Ant like Ivy Dependency Management
> (http://www.jayasoft.fr/org/modules/ivy/overview.php). It has the
> benefits of Maven, without the overhead and learning curve involved.
>
> Rgrds. Thomas
>
>
>
> On 5/2/06, Lukas Vlcek <lu...@gmail.com> wrote:
> > Thomas,
> >
> > I would really appreciate your .classpath and .project files for
> > Eclipse (for Nutch-trunk). Could you send them to me? Or could you
> > upload them somewhere?
> >
> > I don't think I am novice in terms of Eclipse but frankly I am to lazy
> > configuring all these settings manually. I do use Maven all the time
> > for my own projects. I think somebody already noted that using Maven
> > for Nutch/Lucene/Hadoop would be highly appreciated. IMHO Maven is a
> > good investment (and thanks to mavenide Eclipse could learn where are
> > the sources, where are tests, what libs to use [they wouldn't need to
> > be part of SVN repository anynmore] ...).
> >
> > Is there any plan to migrate to Maven? I can participate if anybody is
> > interested.
> >
> > Regards,
> > Lukas
> >
> > On 5/2/06, TDLN <di...@gmail.com> wrote:
> > > Hi Andrew,
> > >
> > > you can either get one of the distributions, a nightly build, or check
> > > out directly from SVN to get the sources.
> > >
> > > Then I would suggest checking the targets in the ant build file; there
> > > are targets for compiling. cleaning and testing. Use 'ant tar' to make
> > > a release tarball that you can deploy in your sandbox. Add the bin
> > > directory to your path and off you go.
> > >
> > > BTW: Nutch uses JDK 1.4 logging - changing the default log level from
> > > INFO to FINE already gives much more information.
> > >
> > > If you like to use Eclipse to mount the sources. just let me know, I
> > > can send you the required .classpath and .project files.
> > >
> > > HTH, Thomas
> > >
> > > On 5/1/06, Andrew Libby <an...@gmail.com> wrote:
> > > >
> > > > Greetings,
> > > >
> > > > I'm learning Nutch, and would like to insert debugging statements to
> > > > learn more about how Nutch works.  Specifically, I'm trying to debug
> > > > problems I'm having with the subcollections plugin.
> > > >
> > > > To this end, I'm looking to have a development copy of nutch running.
> > > > Is there a good way to do this?  I'm looking to have the webapp running,
> > > > and do crawls of small local sites and then do a edit - compile - run
> > > > cycle.
> > > >
> > > > Can anyone offer advice or describe how they go about doing this?
> > > >
> > > > Thanks in advance.
> > > >
> > > > Andy
> > > >
> > > > --
> > > > Andrew Libby
> > > > alibby@philadelphiariders.com
> > > > http://philadelphiariders.com/
> > > >
> > > >
> > > >
> > >
> >
>
>
>

Re: A Developer's getting started doc?

Posted by TDLN <di...@gmail.com>.
Hi Lukas,

the .classpath and .project files are in the attached zip. You might
get some errors first, because you still have to download and add some
dependencies yourself (like PDFBox).

I personally am not such a big fan of Maven - IMO it adds a lot of
overhead for just small benefits. One thing I do like about it is that
makes it possible to manage your dependencies in a clean way. But this
can also be achieved by a lightweight open source (BSD license) add-on
to Ant like Ivy Dependency Management
(http://www.jayasoft.fr/org/modules/ivy/overview.php). It has the
benefits of Maven, without the overhead and learning curve involved.

Rgrds. Thomas



On 5/2/06, Lukas Vlcek <lu...@gmail.com> wrote:
> Thomas,
>
> I would really appreciate your .classpath and .project files for
> Eclipse (for Nutch-trunk). Could you send them to me? Or could you
> upload them somewhere?
>
> I don't think I am novice in terms of Eclipse but frankly I am to lazy
> configuring all these settings manually. I do use Maven all the time
> for my own projects. I think somebody already noted that using Maven
> for Nutch/Lucene/Hadoop would be highly appreciated. IMHO Maven is a
> good investment (and thanks to mavenide Eclipse could learn where are
> the sources, where are tests, what libs to use [they wouldn't need to
> be part of SVN repository anynmore] ...).
>
> Is there any plan to migrate to Maven? I can participate if anybody is
> interested.
>
> Regards,
> Lukas
>
> On 5/2/06, TDLN <di...@gmail.com> wrote:
> > Hi Andrew,
> >
> > you can either get one of the distributions, a nightly build, or check
> > out directly from SVN to get the sources.
> >
> > Then I would suggest checking the targets in the ant build file; there
> > are targets for compiling. cleaning and testing. Use 'ant tar' to make
> > a release tarball that you can deploy in your sandbox. Add the bin
> > directory to your path and off you go.
> >
> > BTW: Nutch uses JDK 1.4 logging - changing the default log level from
> > INFO to FINE already gives much more information.
> >
> > If you like to use Eclipse to mount the sources. just let me know, I
> > can send you the required .classpath and .project files.
> >
> > HTH, Thomas
> >
> > On 5/1/06, Andrew Libby <an...@gmail.com> wrote:
> > >
> > > Greetings,
> > >
> > > I'm learning Nutch, and would like to insert debugging statements to
> > > learn more about how Nutch works.  Specifically, I'm trying to debug
> > > problems I'm having with the subcollections plugin.
> > >
> > > To this end, I'm looking to have a development copy of nutch running.
> > > Is there a good way to do this?  I'm looking to have the webapp running,
> > > and do crawls of small local sites and then do a edit - compile - run
> > > cycle.
> > >
> > > Can anyone offer advice or describe how they go about doing this?
> > >
> > > Thanks in advance.
> > >
> > > Andy
> > >
> > > --
> > > Andrew Libby
> > > alibby@philadelphiariders.com
> > > http://philadelphiariders.com/
> > >
> > >
> > >
> >
>

Re: A Developer's getting started doc?

Posted by Lukas Vlcek <lu...@gmail.com>.
Thomas,

I would really appreciate your .classpath and .project files for
Eclipse (for Nutch-trunk). Could you send them to me? Or could you
upload them somewhere?

I don't think I am novice in terms of Eclipse but frankly I am to lazy
configuring all these settings manually. I do use Maven all the time
for my own projects. I think somebody already noted that using Maven
for Nutch/Lucene/Hadoop would be highly appreciated. IMHO Maven is a
good investment (and thanks to mavenide Eclipse could learn where are
the sources, where are tests, what libs to use [they wouldn't need to
be part of SVN repository anynmore] ...).

Is there any plan to migrate to Maven? I can participate if anybody is
interested.

Regards,
Lukas

On 5/2/06, TDLN <di...@gmail.com> wrote:
> Hi Andrew,
>
> you can either get one of the distributions, a nightly build, or check
> out directly from SVN to get the sources.
>
> Then I would suggest checking the targets in the ant build file; there
> are targets for compiling. cleaning and testing. Use 'ant tar' to make
> a release tarball that you can deploy in your sandbox. Add the bin
> directory to your path and off you go.
>
> BTW: Nutch uses JDK 1.4 logging - changing the default log level from
> INFO to FINE already gives much more information.
>
> If you like to use Eclipse to mount the sources. just let me know, I
> can send you the required .classpath and .project files.
>
> HTH, Thomas
>
> On 5/1/06, Andrew Libby <an...@gmail.com> wrote:
> >
> > Greetings,
> >
> > I'm learning Nutch, and would like to insert debugging statements to
> > learn more about how Nutch works.  Specifically, I'm trying to debug
> > problems I'm having with the subcollections plugin.
> >
> > To this end, I'm looking to have a development copy of nutch running.
> > Is there a good way to do this?  I'm looking to have the webapp running,
> > and do crawls of small local sites and then do a edit - compile - run
> > cycle.
> >
> > Can anyone offer advice or describe how they go about doing this?
> >
> > Thanks in advance.
> >
> > Andy
> >
> > --
> > Andrew Libby
> > alibby@philadelphiariders.com
> > http://philadelphiariders.com/
> >
> >
> >
>

Re: A Developer's getting started doc?

Posted by TDLN <di...@gmail.com>.
Hi Andrew,

you can either get one of the distributions, a nightly build, or check
out directly from SVN to get the sources.

Then I would suggest checking the targets in the ant build file; there
are targets for compiling. cleaning and testing. Use 'ant tar' to make
a release tarball that you can deploy in your sandbox. Add the bin
directory to your path and off you go.

BTW: Nutch uses JDK 1.4 logging - changing the default log level from
INFO to FINE already gives much more information.

If you like to use Eclipse to mount the sources. just let me know, I
can send you the required .classpath and .project files.

HTH, Thomas

On 5/1/06, Andrew Libby <an...@gmail.com> wrote:
>
> Greetings,
>
> I'm learning Nutch, and would like to insert debugging statements to
> learn more about how Nutch works.  Specifically, I'm trying to debug
> problems I'm having with the subcollections plugin.
>
> To this end, I'm looking to have a development copy of nutch running.
> Is there a good way to do this?  I'm looking to have the webapp running,
> and do crawls of small local sites and then do a edit - compile - run
> cycle.
>
> Can anyone offer advice or describe how they go about doing this?
>
> Thanks in advance.
>
> Andy
>
> --
> Andrew Libby
> alibby@philadelphiariders.com
> http://philadelphiariders.com/
>
>
>