You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by David Worms <da...@simpledesign.com> on 2003/01/28 00:05:38 UTC
Avalonized WebCrawler
Lucene developers,
This mail follow a few threads which took place 2-3 months ago on both
Lucene and Avalon lists:
http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
They were related to porting the WebCrawler app into a component based
application using Avalon. During the past few days, I did just that and
I will be happy to share the code with the community. There is still a
lot to do, but my goal was to contact you once the code reach a similar
level of development as the one in CVS. I did not contact the list
before because I wasn't sure were I was going :), and because I do not
have a CVS access at Apache.
You can download the code @ http://67.116.155.180/~wdavidw/crawler.zip
Both the sources and binaries are present. On my local environment, I
use Maven as the build system. It isn't included in the dowload because
some of the jar I used are recent CVS snapshots not present on the
Maven remote location( ibiblio.org). If I am not mistaken, all the
required library are present in the zip file.
Overall, the code behave just like the present crawler hosted on the
Lucene Sandbox repository. Since I mostly did some re-factoring on this
code-base, it will be quite easy for the developer(s) to find out what
happens. All the comments, methods, ...., remains the same. I only
changes the most relevant parts. You will find the code divided in 2
packages, the original package "de.lanlab.*" and the new one
"org.crawl.*". The reason behind this separation is that everytime I
created a new component, I moved its code into the second package for
clarity.
As the Avalon container, I choose to use Fortress. It is a stable and
almost released container (a matter of weeks). I am seriously thinking
about Merlin, but it is no priority for now.
Here is a list of the created components/services:
fetcher-task-factory
host-manager
host-resolver
url-message-factory
web-document-factory
message-handler
message-listener-selector
. url-length-stage
. url-scope-stage
. robot-exclusion-stage
. url-visited-stage
. known-path-stage
. fetcher-stage
storage-pipeline
thread-monitor
fetcher-thread-factory
server-thread-factory
url-normalizer
url-visited-manager
one more to appear: thread-pool-manager
Configuration:
At this time, every config property is hard coded in the component
class. It will be a fast and easy task to integrate the config file
because the component already implement the Avalon configuration
lifecycle.
Logging:
I had some hard time using fortress logging service. For now, only two
logger are working, one for the fortress system, the other for the
crawler. Once i understand where the logging issues is coming from,
each component could have his own logger without any code changes.
Integration:
Fortress can easily be plugged to any time of environment or as a
standalone application. I am planning to write a phoenix block soon.
Client connection:
The current Observer service will change completly. Instead of printing
informations to the console, it will export some sort of application
state descriptor object via AltRMI, or anything else. It will be up to
the client to render those information.
Speed:
When running the current code against the Avalonized one, I get very
similar speed results. The only difference is that it takes somehow
longer for the new one to reach a stable speed (about 15 secondes).
Avalon:
I kept having a simplistic use of Avalon. For now, I didn't want to use
all the tools available. There are few domains were Avalon could
provide more functionalities:
- the lifestyle handler (both in Fortress and Merlin), which could
replace the usage of factories for example.
- the thread library, because I didn't want to change any of the
current code.
- the event library, which will reinforce an SEDA architecture.
Javadocs:
None, I kept the ones present in the past. I will describe every
service in more details soon, when I finish with all the refactoring.
Lucene:
I think Lucene should be separated from the crawler. One could easily
write a service which will schedule crawling process and export the
results. Then, this service could use those results to create/update a
Lucene index.
Future:
I am committed to pursue the development of the crawler. I hope many
current and future developers will follow me. With your consent, I
would likely move this project to SourceForge, but all opinions are
welcome.
David
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Avalonized WebCrawler
Posted by Clemens Marschner <cm...@lanlab.de>.
Great news, this will push us forward!
Will have a look on it immediately (after breakfast, of course ! :-)
Clemens
----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>; "Avalon
framework users" <av...@jakarta.apache.org>
Sent: Tuesday, January 28, 2003 12:55 AM
Subject: Re: Avalonized WebCrawler
> Oh, no need to swallow any pride - some of us have been meaning to do
> this.....when we have more time...hah.
> So just a big thank you from us!
>
> Otis
>
>
> --- Paul Hammant <Pa...@yahoo.com> wrote:
> > David,
> >
> > Great work. I sure hope the Lucene peeps can swallow (a little)
> > pride
> > and merge the best bits. It is always difficult receiving a mountain
> > of
> > changes...
> >
> > I look forward to using some of the componentsoutside Lucene, and the
> >
> > whole thing inside Phoenix when you have it ready :-)))
> >
> > - Paul H
> > (hammant@apache)
> >
> > >
> > > Lucene developers,
> > >
> > > This mail follow a few threads which took place 2-3 months ago on
> > both
> > > Lucene and Avalon lists:
> > >
> > > http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
> > > http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
> > >
> > > They were related to porting the WebCrawler app into a component
> > based
> > > application using Avalon. During the past few days, I did just that
> >
> > > and I will be happy to share the code with the community. There is
> > > still a lot to do, but my goal was to contact you once the code
> > reach
> > > a similar level of development as the one in CVS. I did not contact
> >
> > > the list before because I wasn't sure were I was going :), and
> > because
> > > I do not have a CVS access at Apache.
> > >
> > > You can download the code @
> > http://67.116.155.180/~wdavidw/crawler.zip
> > >
> > > Both the sources and binaries are present. On my local environment,
> > I
> > > use Maven as the build system. It isn't included in the dowload
> > > because some of the jar I used are recent CVS snapshots not present
> > on
> > > the Maven remote location( ibiblio.org). If I am not mistaken, all
> > the
> > > required library are present in the zip file.
> > >
> > > Overall, the code behave just like the present crawler hosted on
> > the
> > > Lucene Sandbox repository. Since I mostly did some re-factoring on
> > > this code-base, it will be quite easy for the developer(s) to find
> > out
> > > what happens. All the comments, methods, ...., remains the same. I
> > > only changes the most relevant parts. You will find the code
> > divided
> > > in 2 packages, the original package "de.lanlab.*" and the new one
> > > "org.crawl.*". The reason behind this separation is that everytime
> > I
> > > created a new component, I moved its code into the second package
> > for
> > > clarity.
> > >
> > > As the Avalon container, I choose to use Fortress. It is a stable
> > and
> > > almost released container (a matter of weeks). I am seriously
> > thinking
> > > about Merlin, but it is no priority for now.
> > >
> > > Here is a list of the created components/services:
> > >
> > > fetcher-task-factory
> > > host-manager
> > > host-resolver
> > > url-message-factory
> > > web-document-factory
> > > message-handler
> > > message-listener-selector
> > > . url-length-stage
> > > . url-scope-stage
> > > . robot-exclusion-stage
> > > . url-visited-stage
> > > . known-path-stage
> > > . fetcher-stage
> > > storage-pipeline
> > > thread-monitor
> > > fetcher-thread-factory
> > > server-thread-factory
> > > url-normalizer
> > > url-visited-manager
> > > one more to appear: thread-pool-manager
> > >
> > > Configuration:
> > > At this time, every config property is hard coded in the component
> > > class. It will be a fast and easy task to integrate the config file
> >
> > > because the component already implement the Avalon configuration
> > > lifecycle.
> > >
> > > Logging:
> > > I had some hard time using fortress logging service. For now, only
> > two
> > > logger are working, one for the fortress system, the other for the
> > > crawler. Once i understand where the logging issues is coming from,
> >
> > > each component could have his own logger without any code changes.
> > >
> > > Integration:
> > > Fortress can easily be plugged to any time of environment or as a
> > > standalone application. I am planning to write a phoenix block
> > soon.
> > >
> > > Client connection:
> > > The current Observer service will change completly. Instead of
> > > printing informations to the console, it will export some sort of
> > > application state descriptor object via AltRMI, or anything else.
> > It
> > > will be up to the client to render those information.
> > >
> > > Speed:
> > > When running the current code against the Avalonized one, I get
> > very
> > > similar speed results. The only difference is that it takes somehow
> >
> > > longer for the new one to reach a stable speed (about 15 secondes).
> > >
> > > Avalon:
> > > I kept having a simplistic use of Avalon. For now, I didn't want to
> >
> > > use all the tools available. There are few domains were Avalon
> > could
> > > provide more functionalities:
> > > - the lifestyle handler (both in Fortress and Merlin), which could
> > > replace the usage of factories for example.
> > > - the thread library, because I didn't want to change any of the
> > > current code.
> > > - the event library, which will reinforce an SEDA architecture.
> > >
> > > Javadocs:
> > > None, I kept the ones present in the past. I will describe every
> > > service in more details soon, when I finish with all the
> > refactoring.
> > >
> > > Lucene:
> > > I think Lucene should be separated from the crawler. One could
> > easily
> > > write a service which will schedule crawling process and export the
> >
> > > results. Then, this service could use those results to
> > create/update a
> > > Lucene index.
> > >
> > > Future:
> > > I am committed to pursue the development of the crawler. I hope
> > many
> > > current and future developers will follow me. With your consent, I
> > > would likely move this project to SourceForge, but all opinions are
> >
> > > welcome.
> > >
> > > David
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > >
> > >
> > >
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Avalonized WebCrawler
Posted by Otis Gospodnetic <ot...@yahoo.com>.
Oh, no need to swallow any pride - some of us have been meaning to do
this.....when we have more time...hah.
So just a big thank you from us!
Otis
--- Paul Hammant <Pa...@yahoo.com> wrote:
> David,
>
> Great work. I sure hope the Lucene peeps can swallow (a little)
> pride
> and merge the best bits. It is always difficult receiving a mountain
> of
> changes...
>
> I look forward to using some of the componentsoutside Lucene, and the
>
> whole thing inside Phoenix when you have it ready :-)))
>
> - Paul H
> (hammant@apache)
>
> >
> > Lucene developers,
> >
> > This mail follow a few threads which took place 2-3 months ago on
> both
> > Lucene and Avalon lists:
> >
> > http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
> > http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
> >
> > They were related to porting the WebCrawler app into a component
> based
> > application using Avalon. During the past few days, I did just that
>
> > and I will be happy to share the code with the community. There is
> > still a lot to do, but my goal was to contact you once the code
> reach
> > a similar level of development as the one in CVS. I did not contact
>
> > the list before because I wasn't sure were I was going :), and
> because
> > I do not have a CVS access at Apache.
> >
> > You can download the code @
> http://67.116.155.180/~wdavidw/crawler.zip
> >
> > Both the sources and binaries are present. On my local environment,
> I
> > use Maven as the build system. It isn't included in the dowload
> > because some of the jar I used are recent CVS snapshots not present
> on
> > the Maven remote location( ibiblio.org). If I am not mistaken, all
> the
> > required library are present in the zip file.
> >
> > Overall, the code behave just like the present crawler hosted on
> the
> > Lucene Sandbox repository. Since I mostly did some re-factoring on
> > this code-base, it will be quite easy for the developer(s) to find
> out
> > what happens. All the comments, methods, ...., remains the same. I
> > only changes the most relevant parts. You will find the code
> divided
> > in 2 packages, the original package "de.lanlab.*" and the new one
> > "org.crawl.*". The reason behind this separation is that everytime
> I
> > created a new component, I moved its code into the second package
> for
> > clarity.
> >
> > As the Avalon container, I choose to use Fortress. It is a stable
> and
> > almost released container (a matter of weeks). I am seriously
> thinking
> > about Merlin, but it is no priority for now.
> >
> > Here is a list of the created components/services:
> >
> > fetcher-task-factory
> > host-manager
> > host-resolver
> > url-message-factory
> > web-document-factory
> > message-handler
> > message-listener-selector
> > . url-length-stage
> > . url-scope-stage
> > . robot-exclusion-stage
> > . url-visited-stage
> > . known-path-stage
> > . fetcher-stage
> > storage-pipeline
> > thread-monitor
> > fetcher-thread-factory
> > server-thread-factory
> > url-normalizer
> > url-visited-manager
> > one more to appear: thread-pool-manager
> >
> > Configuration:
> > At this time, every config property is hard coded in the component
> > class. It will be a fast and easy task to integrate the config file
>
> > because the component already implement the Avalon configuration
> > lifecycle.
> >
> > Logging:
> > I had some hard time using fortress logging service. For now, only
> two
> > logger are working, one for the fortress system, the other for the
> > crawler. Once i understand where the logging issues is coming from,
>
> > each component could have his own logger without any code changes.
> >
> > Integration:
> > Fortress can easily be plugged to any time of environment or as a
> > standalone application. I am planning to write a phoenix block
> soon.
> >
> > Client connection:
> > The current Observer service will change completly. Instead of
> > printing informations to the console, it will export some sort of
> > application state descriptor object via AltRMI, or anything else.
> It
> > will be up to the client to render those information.
> >
> > Speed:
> > When running the current code against the Avalonized one, I get
> very
> > similar speed results. The only difference is that it takes somehow
>
> > longer for the new one to reach a stable speed (about 15 secondes).
> >
> > Avalon:
> > I kept having a simplistic use of Avalon. For now, I didn't want to
>
> > use all the tools available. There are few domains were Avalon
> could
> > provide more functionalities:
> > - the lifestyle handler (both in Fortress and Merlin), which could
> > replace the usage of factories for example.
> > - the thread library, because I didn't want to change any of the
> > current code.
> > - the event library, which will reinforce an SEDA architecture.
> >
> > Javadocs:
> > None, I kept the ones present in the past. I will describe every
> > service in more details soon, when I finish with all the
> refactoring.
> >
> > Lucene:
> > I think Lucene should be separated from the crawler. One could
> easily
> > write a service which will schedule crawling process and export the
>
> > results. Then, this service could use those results to
> create/update a
> > Lucene index.
> >
> > Future:
> > I am committed to pursue the development of the crawler. I hope
> many
> > current and future developers will follow me. With your consent, I
> > would likely move this project to SourceForge, but all opinions are
>
> > welcome.
> >
> > David
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
> >
> >
>
>
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Avalonized WebCrawler
Posted by Paul Hammant <Pa...@yahoo.com>.
David,
Great work. I sure hope the Lucene peeps can swallow (a little) pride
and merge the best bits. It is always difficult receiving a mountain of
changes...
I look forward to using some of the componentsoutside Lucene, and the
whole thing inside Phoenix when you have it ready :-)))
- Paul H
(hammant@apache)
>
> Lucene developers,
>
> This mail follow a few threads which took place 2-3 months ago on both
> Lucene and Avalon lists:
>
> http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
> http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
>
> They were related to porting the WebCrawler app into a component based
> application using Avalon. During the past few days, I did just that
> and I will be happy to share the code with the community. There is
> still a lot to do, but my goal was to contact you once the code reach
> a similar level of development as the one in CVS. I did not contact
> the list before because I wasn't sure were I was going :), and because
> I do not have a CVS access at Apache.
>
> You can download the code @ http://67.116.155.180/~wdavidw/crawler.zip
>
> Both the sources and binaries are present. On my local environment, I
> use Maven as the build system. It isn't included in the dowload
> because some of the jar I used are recent CVS snapshots not present on
> the Maven remote location( ibiblio.org). If I am not mistaken, all the
> required library are present in the zip file.
>
> Overall, the code behave just like the present crawler hosted on the
> Lucene Sandbox repository. Since I mostly did some re-factoring on
> this code-base, it will be quite easy for the developer(s) to find out
> what happens. All the comments, methods, ...., remains the same. I
> only changes the most relevant parts. You will find the code divided
> in 2 packages, the original package "de.lanlab.*" and the new one
> "org.crawl.*". The reason behind this separation is that everytime I
> created a new component, I moved its code into the second package for
> clarity.
>
> As the Avalon container, I choose to use Fortress. It is a stable and
> almost released container (a matter of weeks). I am seriously thinking
> about Merlin, but it is no priority for now.
>
> Here is a list of the created components/services:
>
> fetcher-task-factory
> host-manager
> host-resolver
> url-message-factory
> web-document-factory
> message-handler
> message-listener-selector
> . url-length-stage
> . url-scope-stage
> . robot-exclusion-stage
> . url-visited-stage
> . known-path-stage
> . fetcher-stage
> storage-pipeline
> thread-monitor
> fetcher-thread-factory
> server-thread-factory
> url-normalizer
> url-visited-manager
> one more to appear: thread-pool-manager
>
> Configuration:
> At this time, every config property is hard coded in the component
> class. It will be a fast and easy task to integrate the config file
> because the component already implement the Avalon configuration
> lifecycle.
>
> Logging:
> I had some hard time using fortress logging service. For now, only two
> logger are working, one for the fortress system, the other for the
> crawler. Once i understand where the logging issues is coming from,
> each component could have his own logger without any code changes.
>
> Integration:
> Fortress can easily be plugged to any time of environment or as a
> standalone application. I am planning to write a phoenix block soon.
>
> Client connection:
> The current Observer service will change completly. Instead of
> printing informations to the console, it will export some sort of
> application state descriptor object via AltRMI, or anything else. It
> will be up to the client to render those information.
>
> Speed:
> When running the current code against the Avalonized one, I get very
> similar speed results. The only difference is that it takes somehow
> longer for the new one to reach a stable speed (about 15 secondes).
>
> Avalon:
> I kept having a simplistic use of Avalon. For now, I didn't want to
> use all the tools available. There are few domains were Avalon could
> provide more functionalities:
> - the lifestyle handler (both in Fortress and Merlin), which could
> replace the usage of factories for example.
> - the thread library, because I didn't want to change any of the
> current code.
> - the event library, which will reinforce an SEDA architecture.
>
> Javadocs:
> None, I kept the ones present in the past. I will describe every
> service in more details soon, when I finish with all the refactoring.
>
> Lucene:
> I think Lucene should be separated from the crawler. One could easily
> write a service which will schedule crawling process and export the
> results. Then, this service could use those results to create/update a
> Lucene index.
>
> Future:
> I am committed to pursue the development of the crawler. I hope many
> current and future developers will follow me. With your consent, I
> would likely move this project to SourceForge, but all opinions are
> welcome.
>
> David
>
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>
>
>
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Avalonized WebCrawler
Posted by David Worms <da...@simpledesign.com>.
On Thursday, January 30, 2003, at 06:44 AM, ckirkendall wrote:
> David,
> I have been trying to look at the code. However the source
> organization in the zip does not follow the jar classes. Instead of
> "org.crawl.*" the files are located in "com.celavi.*".
I refactor the code before sending it to the list. Other than that, it
is the same.
> Also I have not
> been able to get anything other than google to run. It looks like this
> is hardcoded into you Main.java. I may just be a little confused on how
> to use this though. Should I be using your Main.java or another class.
>
> Creighton
You are right, everything is hard coded,... but ready to be configured.
I wanted first to get a working prototype, send it to the list, and see
were the code could go. I'll be working on the code for the next 2
weeks and, then, send something a little more usefull. Otherwise, you
should be able to create an ant script (or maven..) and compile it, run
it, hack it.
David
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Avalonized WebCrawler
Posted by ckirkendall <ck...@hobsons-us.com>.
David,
I have been trying to look at the code. However the source
organization in the zip does not follow the jar classes. Instead of
"org.crawl.*" the files are located in "com.celavi.*". Also I have not
been able to get anything other than google to run. It looks like this
is hardcoded into you Main.java. I may just be a little confused on how
to use this though. Should I be using your Main.java or another class.
Creighton
On Tue, 2003-01-28 at 16:40, David Worms wrote:
>
> On Tuesday, January 28, 2003, at 07:48 AM, Otis Gospodnetic wrote:
>
> > David,
> >
> > I haven't had the change to look at the code yet, but I wanted to
> > support what Clemens said (before you start a project on SF).
> > Please give us some time to evaluate this.
> ...
> >>
> >> I could imagine using lucene or lucene-sandbox as repository
> >>
>
> It was just a proposal, I leave the choice to the Lucene community.
>
> > just one thing, while I'm reading the code: Have you had a look on our
> > thoughts here:
> > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/
> > larm/docs/
>
> By the way, thanks Clemens, I just read the documents. I know
> understand better the big picture and it explains why Lucene is a good
> place to host the project.
>
> David
>
>
> --
> To unsubscribe, e-mail: <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org
Re: Avalonized WebCrawler
Posted by David Worms <da...@simpledesign.com>.
On Tuesday, January 28, 2003, at 07:48 AM, Otis Gospodnetic wrote:
> David,
>
> I haven't had the change to look at the code yet, but I wanted to
> support what Clemens said (before you start a project on SF).
> Please give us some time to evaluate this.
...
>>
>> I could imagine using lucene or lucene-sandbox as repository
>>
It was just a proposal, I leave the choice to the Lucene community.
> just one thing, while I'm reading the code: Have you had a look on our
> thoughts here:
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/
> larm/docs/
By the way, thanks Clemens, I just read the documents. I know
understand better the big picture and it explains why Lucene is a good
place to host the project.
David
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Avalonized WebCrawler
Posted by Otis Gospodnetic <ot...@yahoo.com>.
David,
I haven't had the change to look at the code yet, but I wanted to
support what Clemens said (before you start a project on SF).
Please give us some time to evaluate this.
I think Lucene user community would benefit if things were under a
single Jakarta/Lucene roof, too.
Give us some time to have a look at your work and see if it's suitable
for Lucene Sandbox.....a few seconds passes....I just skimmed your
email and my feeling is that Lucene Sandbox would be the place to put
your work, if you don't mind us doing that.
Thanks again.
Otis
--- Clemens Marschner <cm...@lanlab.de> wrote:
> David,
>
> just one thing, while I'm reading the code: Have you had a look on
> our
> thoughts here:
>
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/larm/docs/
> ?
>
> > Future:
> > I am committed to pursue the development of the crawler. I hope
> many
> > current and future developers will follow me.
>
> That's great news!
>
> > With your consent, I
> > would likely move this project to SourceForge, but all opinions are
> > welcome.
>
> I could imagine using lucene or lucene-sandbox as repository
>
> Clemens
>
>
> --
> To unsubscribe, e-mail:
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
>
__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>
Re: Avalonized WebCrawler
Posted by Clemens Marschner <cm...@lanlab.de>.
David,
just one thing, while I'm reading the code: Have you had a look on our
thoughts here:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/larm/docs/
?
> Future:
> I am committed to pursue the development of the crawler. I hope many
> current and future developers will follow me.
That's great news!
> With your consent, I
> would likely move this project to SourceForge, but all opinions are
> welcome.
I could imagine using lucene or lucene-sandbox as repository
Clemens
--
To unsubscribe, e-mail: <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>