You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@lucene.apache.org by David Worms <da...@simpledesign.com> on 2003/01/28 00:05:38 UTC

Avalonized WebCrawler

Lucene developers,

This mail follow a few threads which took place 2-3 months ago on both 
Lucene and Avalon lists:

http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2

They were related to porting the WebCrawler app into a component based 
application using Avalon. During the past few days, I did just that and 
I will be happy to share the code with the community. There is still a 
lot to do, but my goal was to contact you once the code reach a similar 
level of development as the one in CVS. I did not contact the list 
before because I wasn't sure were I was going :), and because I do not 
have a CVS access at Apache.

You can download the code @ http://67.116.155.180/~wdavidw/crawler.zip

Both the sources and binaries are present. On my local environment, I 
use Maven as the build system. It isn't included in the dowload because 
some of the jar I used are recent CVS snapshots not present on the 
Maven remote location( ibiblio.org). If I am not mistaken, all the 
required library are present in the zip file.

Overall, the code behave just like the present crawler hosted on the 
Lucene Sandbox repository. Since I mostly did some re-factoring on this 
code-base, it will be quite easy for the developer(s) to find out what 
happens. All the comments, methods, ...., remains the same. I only 
changes the most relevant parts. You will find the code divided in 2 
packages, the original package "de.lanlab.*" and the new one 
"org.crawl.*". The reason behind this separation is that everytime I 
created a new component, I moved its code into the second package for 
clarity.

As the Avalon container, I choose to use Fortress. It is a stable and 
almost released container (a matter of weeks). I am seriously thinking 
about Merlin, but it is no priority for now.

Here is a list of the created components/services:

fetcher-task-factory
host-manager
host-resolver
url-message-factory
web-document-factory
message-handler
message-listener-selector
  . url-length-stage
  . url-scope-stage
  . robot-exclusion-stage
  . url-visited-stage
  . known-path-stage
  . fetcher-stage
storage-pipeline
thread-monitor
fetcher-thread-factory
server-thread-factory
url-normalizer
url-visited-manager
one more to appear: thread-pool-manager

Configuration:
At this time, every config property is hard coded in the component 
class. It will be a fast and easy task to integrate the config file 
because the component already implement the Avalon configuration 
lifecycle.

Logging:
I had some hard time using fortress logging service. For now, only two 
logger are working, one for the fortress system, the other for the 
crawler. Once i understand where the logging issues is coming from, 
each component could have his own logger without any code changes.

Integration:
Fortress can easily be plugged to any time of environment or as a 
standalone application. I am planning to write a phoenix block soon.

Client connection:
The current Observer service will change completly. Instead of printing 
informations to the console, it will export some sort of application 
state descriptor object via AltRMI, or anything else. It will be up to 
the client to render those information.

Speed:
When running the current code against the Avalonized one, I get very 
similar speed results. The only difference is that it takes somehow 
longer for the new one to reach a stable speed (about 15 secondes).

Avalon:
I kept having a simplistic use of Avalon. For now, I didn't want to use 
all the tools available. There are few domains were Avalon could 
provide more functionalities:
- the lifestyle handler (both in Fortress and Merlin), which could 
replace the usage of factories for example.
- the thread library, because I didn't want to change any of the 
current code.
- the event library, which will reinforce an SEDA architecture.

Javadocs:
None, I kept the ones present in the past. I will describe every 
service in more details soon, when I finish with all the refactoring.

Lucene:
I think Lucene should be separated from the crawler. One could easily 
write a service which will schedule crawling process and export the 
results. Then, this service could use those results to create/update a 
Lucene index.

Future:
I am committed to pursue the development of the crawler. I hope many 
current and future developers will follow me. With your consent, I 
would likely move this project to SourceForge, but all opinions are 
welcome.

David


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Avalonized WebCrawler

Posted by Clemens Marschner <cm...@lanlab.de>.

Great news, this will push us forward!

Will have a look on it immediately (after breakfast, of course ! :-)

Clemens

----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>; "Avalon
framework users" <av...@jakarta.apache.org>
Sent: Tuesday, January 28, 2003 12:55 AM
Subject: Re: Avalonized WebCrawler


> Oh, no need to swallow any pride - some of us have been meaning to do
> this.....when we have more time...hah.
> So just a big thank you from us!
>
> Otis
>
>
> --- Paul Hammant <Pa...@yahoo.com> wrote:
> > David,
> >
> > Great work.  I sure hope the Lucene peeps can swallow (a little)
> > pride
> > and merge the best bits.  It is always difficult receiving a mountain
> > of
> > changes...
> >
> > I look forward to using some of the componentsoutside Lucene, and the
> >
> > whole thing inside Phoenix when you have it ready :-)))
> >
> > - Paul H
> > (hammant@apache)
> >
> > >
> > > Lucene developers,
> > >
> > > This mail follow a few threads which took place 2-3 months ago on
> > both
> > > Lucene and Avalon lists:
> > >
> > > http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
> > > http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
> > >
> > > They were related to porting the WebCrawler app into a component
> > based
> > > application using Avalon. During the past few days, I did just that
> >
> > > and I will be happy to share the code with the community. There is
> > > still a lot to do, but my goal was to contact you once the code
> > reach
> > > a similar level of development as the one in CVS. I did not contact
> >
> > > the list before because I wasn't sure were I was going :), and
> > because
> > > I do not have a CVS access at Apache.
> > >
> > > You can download the code @
> > http://67.116.155.180/~wdavidw/crawler.zip
> > >
> > > Both the sources and binaries are present. On my local environment,
> > I
> > > use Maven as the build system. It isn't included in the dowload
> > > because some of the jar I used are recent CVS snapshots not present
> > on
> > > the Maven remote location( ibiblio.org). If I am not mistaken, all
> > the
> > > required library are present in the zip file.
> > >
> > > Overall, the code behave just like the present crawler hosted on
> > the
> > > Lucene Sandbox repository. Since I mostly did some re-factoring on
> > > this code-base, it will be quite easy for the developer(s) to find
> > out
> > > what happens. All the comments, methods, ...., remains the same. I
> > > only changes the most relevant parts. You will find the code
> > divided
> > > in 2 packages, the original package "de.lanlab.*" and the new one
> > > "org.crawl.*". The reason behind this separation is that everytime
> > I
> > > created a new component, I moved its code into the second package
> > for
> > > clarity.
> > >
> > > As the Avalon container, I choose to use Fortress. It is a stable
> > and
> > > almost released container (a matter of weeks). I am seriously
> > thinking
> > > about Merlin, but it is no priority for now.
> > >
> > > Here is a list of the created components/services:
> > >
> > > fetcher-task-factory
> > > host-manager
> > > host-resolver
> > > url-message-factory
> > > web-document-factory
> > > message-handler
> > > message-listener-selector
> > >  . url-length-stage
> > >  . url-scope-stage
> > >  . robot-exclusion-stage
> > >  . url-visited-stage
> > >  . known-path-stage
> > >  . fetcher-stage
> > > storage-pipeline
> > > thread-monitor
> > > fetcher-thread-factory
> > > server-thread-factory
> > > url-normalizer
> > > url-visited-manager
> > > one more to appear: thread-pool-manager
> > >
> > > Configuration:
> > > At this time, every config property is hard coded in the component
> > > class. It will be a fast and easy task to integrate the config file
> >
> > > because the component already implement the Avalon configuration
> > > lifecycle.
> > >
> > > Logging:
> > > I had some hard time using fortress logging service. For now, only
> > two
> > > logger are working, one for the fortress system, the other for the
> > > crawler. Once i understand where the logging issues is coming from,
> >
> > > each component could have his own logger without any code changes.
> > >
> > > Integration:
> > > Fortress can easily be plugged to any time of environment or as a
> > > standalone application. I am planning to write a phoenix block
> > soon.
> > >
> > > Client connection:
> > > The current Observer service will change completly. Instead of
> > > printing informations to the console, it will export some sort of
> > > application state descriptor object via AltRMI, or anything else.
> > It
> > > will be up to the client to render those information.
> > >
> > > Speed:
> > > When running the current code against the Avalonized one, I get
> > very
> > > similar speed results. The only difference is that it takes somehow
> >
> > > longer for the new one to reach a stable speed (about 15 secondes).
> > >
> > > Avalon:
> > > I kept having a simplistic use of Avalon. For now, I didn't want to
> >
> > > use all the tools available. There are few domains were Avalon
> > could
> > > provide more functionalities:
> > > - the lifestyle handler (both in Fortress and Merlin), which could
> > > replace the usage of factories for example.
> > > - the thread library, because I didn't want to change any of the
> > > current code.
> > > - the event library, which will reinforce an SEDA architecture.
> > >
> > > Javadocs:
> > > None, I kept the ones present in the past. I will describe every
> > > service in more details soon, when I finish with all the
> > refactoring.
> > >
> > > Lucene:
> > > I think Lucene should be separated from the crawler. One could
> > easily
> > > write a service which will schedule crawling process and export the
> >
> > > results. Then, this service could use those results to
> > create/update a
> > > Lucene index.
> > >
> > > Future:
> > > I am committed to pursue the development of the crawler. I hope
> > many
> > > current and future developers will follow me. With your consent, I
> > > would likely move this project to SourceForge, but all opinions are
> >
> > > welcome.
> > >
> > > David
> > >
> > >
> > > --
> > > To unsubscribe, e-mail:
> > > <ma...@jakarta.apache.org>
> > > For additional commands, e-mail:
> > > <ma...@jakarta.apache.org>
> > >
> > >
> > >
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do you Yahoo!?
> Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
> http://mailplus.yahoo.com
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Avalonized WebCrawler

Posted by Otis Gospodnetic <ot...@yahoo.com>.

Oh, no need to swallow any pride - some of us have been meaning to do
this.....when we have more time...hah.
So just a big thank you from us!

Otis


--- Paul Hammant <Pa...@yahoo.com> wrote:
> David,
> 
> Great work.  I sure hope the Lucene peeps can swallow (a little)
> pride 
> and merge the best bits.  It is always difficult receiving a mountain
> of 
> changes...
> 
> I look forward to using some of the componentsoutside Lucene, and the
> 
> whole thing inside Phoenix when you have it ready :-)))
> 
> - Paul H
> (hammant@apache)
> 
> >
> > Lucene developers,
> >
> > This mail follow a few threads which took place 2-3 months ago on
> both 
> > Lucene and Avalon lists:
> >
> > http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
> > http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
> >
> > They were related to porting the WebCrawler app into a component
> based 
> > application using Avalon. During the past few days, I did just that
> 
> > and I will be happy to share the code with the community. There is 
> > still a lot to do, but my goal was to contact you once the code
> reach 
> > a similar level of development as the one in CVS. I did not contact
> 
> > the list before because I wasn't sure were I was going :), and
> because 
> > I do not have a CVS access at Apache.
> >
> > You can download the code @
> http://67.116.155.180/~wdavidw/crawler.zip
> >
> > Both the sources and binaries are present. On my local environment,
> I 
> > use Maven as the build system. It isn't included in the dowload 
> > because some of the jar I used are recent CVS snapshots not present
> on 
> > the Maven remote location( ibiblio.org). If I am not mistaken, all
> the 
> > required library are present in the zip file.
> >
> > Overall, the code behave just like the present crawler hosted on
> the 
> > Lucene Sandbox repository. Since I mostly did some re-factoring on 
> > this code-base, it will be quite easy for the developer(s) to find
> out 
> > what happens. All the comments, methods, ...., remains the same. I 
> > only changes the most relevant parts. You will find the code
> divided 
> > in 2 packages, the original package "de.lanlab.*" and the new one 
> > "org.crawl.*". The reason behind this separation is that everytime
> I 
> > created a new component, I moved its code into the second package
> for 
> > clarity.
> >
> > As the Avalon container, I choose to use Fortress. It is a stable
> and 
> > almost released container (a matter of weeks). I am seriously
> thinking 
> > about Merlin, but it is no priority for now.
> >
> > Here is a list of the created components/services:
> >
> > fetcher-task-factory
> > host-manager
> > host-resolver
> > url-message-factory
> > web-document-factory
> > message-handler
> > message-listener-selector
> >  . url-length-stage
> >  . url-scope-stage
> >  . robot-exclusion-stage
> >  . url-visited-stage
> >  . known-path-stage
> >  . fetcher-stage
> > storage-pipeline
> > thread-monitor
> > fetcher-thread-factory
> > server-thread-factory
> > url-normalizer
> > url-visited-manager
> > one more to appear: thread-pool-manager
> >
> > Configuration:
> > At this time, every config property is hard coded in the component 
> > class. It will be a fast and easy task to integrate the config file
> 
> > because the component already implement the Avalon configuration 
> > lifecycle.
> >
> > Logging:
> > I had some hard time using fortress logging service. For now, only
> two 
> > logger are working, one for the fortress system, the other for the 
> > crawler. Once i understand where the logging issues is coming from,
> 
> > each component could have his own logger without any code changes.
> >
> > Integration:
> > Fortress can easily be plugged to any time of environment or as a 
> > standalone application. I am planning to write a phoenix block
> soon.
> >
> > Client connection:
> > The current Observer service will change completly. Instead of 
> > printing informations to the console, it will export some sort of 
> > application state descriptor object via AltRMI, or anything else.
> It 
> > will be up to the client to render those information.
> >
> > Speed:
> > When running the current code against the Avalonized one, I get
> very 
> > similar speed results. The only difference is that it takes somehow
> 
> > longer for the new one to reach a stable speed (about 15 secondes).
> >
> > Avalon:
> > I kept having a simplistic use of Avalon. For now, I didn't want to
> 
> > use all the tools available. There are few domains were Avalon
> could 
> > provide more functionalities:
> > - the lifestyle handler (both in Fortress and Merlin), which could 
> > replace the usage of factories for example.
> > - the thread library, because I didn't want to change any of the 
> > current code.
> > - the event library, which will reinforce an SEDA architecture.
> >
> > Javadocs:
> > None, I kept the ones present in the past. I will describe every 
> > service in more details soon, when I finish with all the
> refactoring.
> >
> > Lucene:
> > I think Lucene should be separated from the crawler. One could
> easily 
> > write a service which will schedule crawling process and export the
> 
> > results. Then, this service could use those results to
> create/update a 
> > Lucene index.
> >
> > Future:
> > I am committed to pursue the development of the crawler. I hope
> many 
> > current and future developers will follow me. With your consent, I 
> > would likely move this project to SourceForge, but all opinions are
> 
> > welcome.
> >
> > David
> >
> >
> > -- 
> > To unsubscribe, e-mail:   
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail: 
> > <ma...@jakarta.apache.org>
> >
> >
> >
> 
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Avalonized WebCrawler

Posted by Paul Hammant <Pa...@yahoo.com>.

David,

Great work.  I sure hope the Lucene peeps can swallow (a little) pride 
and merge the best bits.  It is always difficult receiving a mountain of 
changes...

I look forward to using some of the componentsoutside Lucene, and the 
whole thing inside Phoenix when you have it ready :-)))

- Paul H
(hammant@apache)

>
> Lucene developers,
>
> This mail follow a few threads which took place 2-3 months ago on both 
> Lucene and Avalon lists:
>
> http://marc.theaimsgroup.com/?l=lucene-dev&m=101518595918785&w=2
> http://marc.theaimsgroup.com/?l=avalon-users&m=103706452017829&w=2
>
> They were related to porting the WebCrawler app into a component based 
> application using Avalon. During the past few days, I did just that 
> and I will be happy to share the code with the community. There is 
> still a lot to do, but my goal was to contact you once the code reach 
> a similar level of development as the one in CVS. I did not contact 
> the list before because I wasn't sure were I was going :), and because 
> I do not have a CVS access at Apache.
>
> You can download the code @ http://67.116.155.180/~wdavidw/crawler.zip
>
> Both the sources and binaries are present. On my local environment, I 
> use Maven as the build system. It isn't included in the dowload 
> because some of the jar I used are recent CVS snapshots not present on 
> the Maven remote location( ibiblio.org). If I am not mistaken, all the 
> required library are present in the zip file.
>
> Overall, the code behave just like the present crawler hosted on the 
> Lucene Sandbox repository. Since I mostly did some re-factoring on 
> this code-base, it will be quite easy for the developer(s) to find out 
> what happens. All the comments, methods, ...., remains the same. I 
> only changes the most relevant parts. You will find the code divided 
> in 2 packages, the original package "de.lanlab.*" and the new one 
> "org.crawl.*". The reason behind this separation is that everytime I 
> created a new component, I moved its code into the second package for 
> clarity.
>
> As the Avalon container, I choose to use Fortress. It is a stable and 
> almost released container (a matter of weeks). I am seriously thinking 
> about Merlin, but it is no priority for now.
>
> Here is a list of the created components/services:
>
> fetcher-task-factory
> host-manager
> host-resolver
> url-message-factory
> web-document-factory
> message-handler
> message-listener-selector
>  . url-length-stage
>  . url-scope-stage
>  . robot-exclusion-stage
>  . url-visited-stage
>  . known-path-stage
>  . fetcher-stage
> storage-pipeline
> thread-monitor
> fetcher-thread-factory
> server-thread-factory
> url-normalizer
> url-visited-manager
> one more to appear: thread-pool-manager
>
> Configuration:
> At this time, every config property is hard coded in the component 
> class. It will be a fast and easy task to integrate the config file 
> because the component already implement the Avalon configuration 
> lifecycle.
>
> Logging:
> I had some hard time using fortress logging service. For now, only two 
> logger are working, one for the fortress system, the other for the 
> crawler. Once i understand where the logging issues is coming from, 
> each component could have his own logger without any code changes.
>
> Integration:
> Fortress can easily be plugged to any time of environment or as a 
> standalone application. I am planning to write a phoenix block soon.
>
> Client connection:
> The current Observer service will change completly. Instead of 
> printing informations to the console, it will export some sort of 
> application state descriptor object via AltRMI, or anything else. It 
> will be up to the client to render those information.
>
> Speed:
> When running the current code against the Avalonized one, I get very 
> similar speed results. The only difference is that it takes somehow 
> longer for the new one to reach a stable speed (about 15 secondes).
>
> Avalon:
> I kept having a simplistic use of Avalon. For now, I didn't want to 
> use all the tools available. There are few domains were Avalon could 
> provide more functionalities:
> - the lifestyle handler (both in Fortress and Merlin), which could 
> replace the usage of factories for example.
> - the thread library, because I didn't want to change any of the 
> current code.
> - the event library, which will reinforce an SEDA architecture.
>
> Javadocs:
> None, I kept the ones present in the past. I will describe every 
> service in more details soon, when I finish with all the refactoring.
>
> Lucene:
> I think Lucene should be separated from the crawler. One could easily 
> write a service which will schedule crawling process and export the 
> results. Then, this service could use those results to create/update a 
> Lucene index.
>
> Future:
> I am committed to pursue the development of the crawler. I hope many 
> current and future developers will follow me. With your consent, I 
> would likely move this project to SourceForge, but all opinions are 
> welcome.
>
> David
>
>
> -- 
> To unsubscribe, e-mail:   
> <ma...@jakarta.apache.org>
> For additional commands, e-mail: 
> <ma...@jakarta.apache.org>
>
>
>



--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Avalonized WebCrawler

Posted by David Worms <da...@simpledesign.com>.

On Thursday, January 30, 2003, at 06:44  AM, ckirkendall wrote:

> David,
> 	I have been trying to look at the code.  However the source
> organization in the zip does not follow the jar classes.  Instead of
> "org.crawl.*" the files are located in "com.celavi.*".

I refactor the code before sending it to the list. Other than that, it 
is the same.

> Also I have not
> been able to get anything other than google to run.  It looks like this
> is hardcoded into you Main.java. I may just be a little confused on how
> to use this though.  Should I be using your Main.java or another class.
>
> Creighton

You are right, everything is hard coded,... but ready to be configured. 
I wanted first to get a working prototype, send it to the list, and see 
were the code could go. I'll be working on the code for the next 2 
weeks and, then, send something a little more usefull. Otherwise, you 
should be able to create an ant script (or maven..) and compile it, run 
it, hack it.

David

---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Avalonized WebCrawler

Posted by ckirkendall <ck...@hobsons-us.com>.

David,
	I have been trying to look at the code.  However the source
organization in the zip does not follow the jar classes.  Instead of
"org.crawl.*" the files are located in "com.celavi.*".  Also I have not
been able to get anything other than google to run.  It looks like this
is hardcoded into you Main.java. I may just be a little confused on how
to use this though.  Should I be using your Main.java or another class.

Creighton 


On Tue, 2003-01-28 at 16:40, David Worms wrote:
> 
> On Tuesday, January 28, 2003, at 07:48  AM, Otis Gospodnetic wrote:
> 
> > David,
> >
> > I haven't had the change to look at the code yet, but I wanted to
> > support what Clemens said (before you start a project on SF).
> > Please give us some time to evaluate this.
> ...
> >>
> >> I could imagine using lucene or lucene-sandbox as repository
> >>
> 
> It was just a proposal, I leave the choice to the Lucene community.
> 
> > just one thing, while I'm reading the code: Have you had a look on our
> > thoughts here:
> > http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/ 
> > larm/docs/
> 
> By the way, thanks Clemens, I just read the documents. I know  
> understand better the big picture and it explains why Lucene is a good  
> place to host the project.
> 
> David
> 
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>



---------------------------------------------------------------------
To unsubscribe, e-mail: lucene-dev-unsubscribe@jakarta.apache.org
For additional commands, e-mail: lucene-dev-help@jakarta.apache.org

Re: Avalonized WebCrawler

Posted by David Worms <da...@simpledesign.com>.

On Tuesday, January 28, 2003, at 07:48  AM, Otis Gospodnetic wrote:

> David,
>
> I haven't had the change to look at the code yet, but I wanted to
> support what Clemens said (before you start a project on SF).
> Please give us some time to evaluate this.
...
>>
>> I could imagine using lucene or lucene-sandbox as repository
>>

It was just a proposal, I leave the choice to the Lucene community.

> just one thing, while I'm reading the code: Have you had a look on our
> thoughts here:
> http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/ 
> larm/docs/

By the way, thanks Clemens, I just read the documents. I know  
understand better the big picture and it explains why Lucene is a good  
place to host the project.

David

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Avalonized WebCrawler

Posted by Otis Gospodnetic <ot...@yahoo.com>.

David,

I haven't had the change to look at the code yet, but I wanted to
support what Clemens said (before you start a project on SF).
Please give us some time to evaluate this.
I think Lucene user community would benefit if things were under a
single Jakarta/Lucene roof, too.
Give us some time to have a look at your work and see if it's suitable
for Lucene Sandbox.....a few seconds passes....I just skimmed your
email and my feeling is that Lucene Sandbox would be the place to put
your work, if you don't mind us doing that.

Thanks again.
Otis

--- Clemens Marschner <cm...@lanlab.de> wrote:
> David,
> 
> just one thing, while I'm reading the code: Have you had a look on
> our
> thoughts here:
>
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/larm/docs/
> ?
> 
> > Future:
> > I am committed to pursue the development of the crawler. I hope
> many
> > current and future developers will follow me.
> 
> That's great news!
> 
> > With your consent, I
> > would likely move this project to SourceForge, but all opinions are
> > welcome.
> 
> I could imagine using lucene or lucene-sandbox as repository
> 
> Clemens
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 

__________________________________________________
Do you Yahoo!?
Yahoo! Mail Plus - Powerful. Affordable. Sign up now.
http://mailplus.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Avalonized WebCrawler

Posted by Clemens Marschner <cm...@lanlab.de>.

David,

just one thing, while I'm reading the code: Have you had a look on our
thoughts here:
http://cvs.apache.org/viewcvs.cgi/jakarta-lucene-sandbox/projects/larm/docs/
?

> Future:
> I am committed to pursue the development of the crawler. I hope many
> current and future developers will follow me.

That's great news!

> With your consent, I
> would likely move this project to SourceForge, but all opinions are
> welcome.

I could imagine using lucene or lucene-sandbox as repository

Clemens


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>