You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Clemens Marschner <cm...@lanlab.de> on 2002/06/21 19:58:18 UTC

Avalon anybody?

 
> > One last thought:
> > - the crawler should be be started as a daemon process (at least
> > optionally)
> > - it should wake up from time to time to crawl changed pages
> > - it should provide a management and status interface to the outside.
> > - it internally needs the ability to run service jobs while crawling
> > (keeping memory tidy, collecting stats, etc.)
> > 
> > from what I know, these matters could be addressed by the Apache
> > Avalon/Phoenix project. Does anyone know anything about it?
> 
> To me Avalon looks relatively complex, but from what I've read it is a
> piece of software designed to allow applications like your crawler to
> run on top of it.  I'm stating the obvious, for some.

Does anybody have experience with Avalon Phoenix?

Some time ago I stepped over an app that used it. Was it Slide? Maybe.

Regards,

Clemens


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Avalon anybody?

Posted by "Andrew C. Oliver" <ac...@apache.org>.
Otis Gospodnetic wrote:

>Jakarta's James and Cocoon projects are written in Phoenix part of
>Avalon.  I just read an article about that on Tuesday.  The article was
>from http://www.onjava.com/, and it was just a very high level overview
>of Avalon.
>
>Otis
>  
>
+1

>--- Clemens Marschner <cm...@lanlab.de> wrote:
>  
>
>> 
>>    
>>
>>>>One last thought:
>>>>- the crawler should be be started as a daemon process (at least
>>>>optionally)
>>>>- it should wake up from time to time to crawl changed pages
>>>>- it should provide a management and status interface to the
>>>>        
>>>>
>>outside.
>>    
>>
>>>>- it internally needs the ability to run service jobs while
>>>>        
>>>>
>>crawling
>>    
>>
>>>>(keeping memory tidy, collecting stats, etc.)
>>>>
>>>>from what I know, these matters could be addressed by the Apache
>>>>Avalon/Phoenix project. Does anyone know anything about it?
>>>>        
>>>>
>>>To me Avalon looks relatively complex, but from what I've read it
>>>      
>>>
>>is a
>>    
>>
>>>piece of software designed to allow applications like your crawler
>>>      
>>>
>>to
>>    
>>
>>>run on top of it.  I'm stating the obvious, for some.
>>>      
>>>
>>Does anybody have experience with Avalon Phoenix?
>>
>>Some time ago I stepped over an app that used it. Was it Slide?
>>Maybe.
>>
>>Regards,
>>
>>Clemens
>>
>>
>>--
>>To unsubscribe, e-mail:  
>><ma...@jakarta.apache.org>
>>For additional commands, e-mail:
>><ma...@jakarta.apache.org>
>>
>>    
>>
>
>
>__________________________________________________
>Do You Yahoo!?
>Yahoo! - Official partner of 2002 FIFA World Cup
>http://fifaworldcup.yahoo.com
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
>  
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: (VERY COOL IDEA) Re: Interesting idea

Posted by Erik Hatcher <li...@ehatchersolutions.com>.
Yes, I have this same idea floating around for my "copious free time".

But its very similar to what Zoe (see previous posts on this) is, or at
least my ideas of integrating James and Lucene.

    Erik

p.s. I hope to get my Ant <index> task finally into the Sandbox later this
week - finally done with the book and life now needs a new purpose!  :)


----- Original Message -----
From: "Otis Gospodnetic" <ot...@yahoo.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Monday, July 08, 2002 8:06 PM
Subject: Re: (VERY COOL IDEA) Re: Interesting idea


>
> --- "Andrew C. Oliver" <ac...@apache.org> wrote:
> > Very cool Cool!  Might make Lucene into a useful plugin for James
> > too.
>
> _That_ (James plugin) is what I have been thinking about lately and was
> wondering why nobody wrote it already.
>
> Otis
>
>
> > -Andy
> >
> > Jon Scott Stevens wrote:
> >
> > >Adding support to Lucene for Nilsimsa seems like a cool idea...
> > >
> > >http://ixazon.dynip.com/~cmeclax/nilsimsa.html
> > >
> > >The index would be the hash and one could use Lucene to rank
> > searches based
> > >on the Nilsimsa rating of the results...
> > >
> > >-jon
> > >
> > >
> > >--
> > >To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > >For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> > >
> > >
> > >
> > >
> >
> >
> >
> >
> > --
> > To unsubscribe, e-mail:
> > <ma...@jakarta.apache.org>
> > For additional commands, e-mail:
> > <ma...@jakarta.apache.org>
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Sign up for SBC Yahoo! Dial - First Month Free
> http://sbc.yahoo.com
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: (VERY COOL IDEA) Re: Interesting idea

Posted by "Andrew C. Oliver" <ac...@apache.org>.
No, I get it.  Was just thinking.

>I think you guys are missing the point of the idea with integrating Nilsimsa
>and Lucene.
>
>Imagine that the index will be a constant size and much smaller (and faster
>to search) if you simply save the Nilsimsa hash and then get a nilsimsa
>result...
>
>-jon
>
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
>  
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: (VERY COOL IDEA) Re: Interesting idea

Posted by "Andrew C. Oliver" <ac...@apache.org>.
ditto.

Otis Gospodnetic wrote:

>No, I think I know why you thionk it would be cool.
>I was just reacting to the word 'plugin' that, combined with the word
>'Lucene' triggered the James association in my mind.
>Anyhow, nice idea that Nilsimsa.
>
>Otis
>
>--- Jon Scott Stevens <jo...@latchkey.com> wrote:
>  
>
>>on 7/8/02 5:06 PM, "Otis Gospodnetic" <ot...@yahoo.com>
>>wrote:
>>
>>    
>>
>>>--- "Andrew C. Oliver" <ac...@apache.org> wrote:
>>>      
>>>
>>>>Very cool Cool!  Might make Lucene into a useful plugin for James
>>>>too.  
>>>>        
>>>>
>>>_That_ (James plugin) is what I have been thinking about lately and
>>>      
>>>
>>was
>>    
>>
>>>wondering why nobody wrote it already.
>>>
>>>Otis
>>>      
>>>
>>I think you guys are missing the point of the idea with integrating
>>Nilsimsa
>>and Lucene.
>>
>>Imagine that the index will be a constant size and much smaller (and
>>faster
>>to search) if you simply save the Nilsimsa hash and then get a
>>nilsimsa
>>result...
>>
>>-jon
>>    
>>
>
>
>__________________________________________________
>Do You Yahoo!?
>Sign up for SBC Yahoo! Dial - First Month Free
>http://sbc.yahoo.com
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
>  
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: (VERY COOL IDEA) Re: Interesting idea

Posted by Otis Gospodnetic <ot...@yahoo.com>.
No, I think I know why you thionk it would be cool.
I was just reacting to the word 'plugin' that, combined with the word
'Lucene' triggered the James association in my mind.
Anyhow, nice idea that Nilsimsa.

Otis

--- Jon Scott Stevens <jo...@latchkey.com> wrote:
> on 7/8/02 5:06 PM, "Otis Gospodnetic" <ot...@yahoo.com>
> wrote:
> 
> > 
> > --- "Andrew C. Oliver" <ac...@apache.org> wrote:
> >> Very cool Cool!  Might make Lucene into a useful plugin for James
> >> too.  
> > 
> > _That_ (James plugin) is what I have been thinking about lately and
> was
> > wondering why nobody wrote it already.
> > 
> > Otis
> 
> I think you guys are missing the point of the idea with integrating
> Nilsimsa
> and Lucene.
> 
> Imagine that the index will be a constant size and much smaller (and
> faster
> to search) if you simply save the Nilsimsa hash and then get a
> nilsimsa
> result...
> 
> -jon


__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: (VERY COOL IDEA) Re: Interesting idea

Posted by Jon Scott Stevens <jo...@latchkey.com>.
on 7/8/02 5:06 PM, "Otis Gospodnetic" <ot...@yahoo.com> wrote:

> 
> --- "Andrew C. Oliver" <ac...@apache.org> wrote:
>> Very cool Cool!  Might make Lucene into a useful plugin for James
>> too.  
> 
> _That_ (James plugin) is what I have been thinking about lately and was
> wondering why nobody wrote it already.
> 
> Otis

I think you guys are missing the point of the idea with integrating Nilsimsa
and Lucene.

Imagine that the index will be a constant size and much smaller (and faster
to search) if you simply save the Nilsimsa hash and then get a nilsimsa
result...

-jon


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: (VERY COOL IDEA) Re: Interesting idea

Posted by Otis Gospodnetic <ot...@yahoo.com>.
--- "Andrew C. Oliver" <ac...@apache.org> wrote:
> Very cool Cool!  Might make Lucene into a useful plugin for James
> too.  

_That_ (James plugin) is what I have been thinking about lately and was
wondering why nobody wrote it already.

Otis


> -Andy
> 
> Jon Scott Stevens wrote:
> 
> >Adding support to Lucene for Nilsimsa seems like a cool idea...
> >
> >http://ixazon.dynip.com/~cmeclax/nilsimsa.html
> >
> >The index would be the hash and one could use Lucene to rank
> searches based
> >on the Nilsimsa rating of the results...
> >
> >-jon
> >
> >
> >--
> >To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> >For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> >
> >
> >  
> >
> 
> 
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Sign up for SBC Yahoo! Dial - First Month Free
http://sbc.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


(VERY COOL IDEA) Re: Interesting idea

Posted by "Andrew C. Oliver" <ac...@apache.org>.
Very cool Cool!  Might make Lucene into a useful plugin for James too.  

-Andy

Jon Scott Stevens wrote:

>Adding support to Lucene for Nilsimsa seems like a cool idea...
>
>http://ixazon.dynip.com/~cmeclax/nilsimsa.html
>
>The index would be the hash and one could use Lucene to rank searches based
>on the Nilsimsa rating of the results...
>
>-jon
>
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
>  
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Interesting idea

Posted by "Andrew C. Oliver" <ac...@apache.org>.
+1 -- Doug is a great source of information on all things indexing 
related.  Reading Doug's emails and articles is
very educational.

Jon Scott Stevens wrote:

>on 7/10/02 9:35 AM, "Doug Cutting" <cu...@lucene.com> wrote:
>
>  
>
>>Nilsimsa appears to use what is called a "signature file" approach in
>>the literature, while Lucene uses an "inverted file".  A search on
>>Google for "signature file versus inverted index" turns up a paper by
>>Zobel et. al. which concludes:
>>
>> Our conclusions are unequivocal. For typical document indexing
>> applications, current signature file techniques do not perform well
>> compared to current implementations of inverted file indexes.
>>
>>See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf
>>
>>Doug
>>    
>>
>
>Wow! Great response Doug. =) Learn something new every day!
>
>-jon
>
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>
>  
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Interesting idea

Posted by Jon Scott Stevens <jo...@latchkey.com>.
on 7/10/02 9:35 AM, "Doug Cutting" <cu...@lucene.com> wrote:

> Nilsimsa appears to use what is called a "signature file" approach in
> the literature, while Lucene uses an "inverted file".  A search on
> Google for "signature file versus inverted index" turns up a paper by
> Zobel et. al. which concludes:
> 
>  Our conclusions are unequivocal. For typical document indexing
>  applications, current signature file techniques do not perform well
>  compared to current implementations of inverted file indexes.
> 
> See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf
> 
> Doug

Wow! Great response Doug. =) Learn something new every day!

-jon


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


DateFieldYMD

Posted by Peter Carlson <ca...@bookandhammer.com>.
Hi,

Does anyone have an objection/better idea to adding a new Class called
DateFieldYMD. This would be very similar to DateField, but return a
different format.

This would also support the field dateToString(date) and convert it to the
format

YYYYMMDD
It would also add
DateTimeToString(date) with the format

YYYYMMDDTHHMMSS

Where T is the delimiter between the date and time. Just trying to follow a
pseudo convention and minimize the bits so no delimeter.

The reason why another class vs another method are the supporting methods
such as stringToDate() or stringToTime() which decodes the string to a Date
or a long would be confusing in one class.

I think this would meet the criteria of people who need support before 1970
and be more readable.

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Interesting idea

Posted by Doug Cutting <cu...@lucene.com>.
Jon Scott Stevens wrote:
> Adding support to Lucene for Nilsimsa seems like a cool idea...
> 
> http://ixazon.dynip.com/~cmeclax/nilsimsa.html
> 
> The index would be the hash and one could use Lucene to rank searches based
> on the Nilsimsa rating of the results...

Nilsimsa employs a very different model than Lucene.  So this would 
require a re-write of the indexing and search portions of Lucene, which 
is most of the code.

Nilsimsa appears to use what is called a "signature file" approach in 
the literature, while Lucene uses an "inverted file".  A search on 
Google for "signature file versus inverted index" turns up a paper by 
Zobel et. al. which concludes:

   Our conclusions are unequivocal. For typical document indexing
   applications, current signature file techniques do not perform well
   compared to current implementations of inverted file indexes.

See: http://www.cs.columbia.edu/~pirot/cs6111/Readings/zobel98.pdf

Doug


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Interesting idea

Posted by Jon Scott Stevens <jo...@latchkey.com>.
Adding support to Lucene for Nilsimsa seems like a cool idea...

http://ixazon.dynip.com/~cmeclax/nilsimsa.html

The index would be the hash and one could use Lucene to rank searches based
on the Nilsimsa rating of the results...

-jon


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Avalon anybody?

Posted by Otis Gospodnetic <ot...@yahoo.com>.
Jakarta's James and Cocoon projects are written in Phoenix part of
Avalon.  I just read an article about that on Tuesday.  The article was
from http://www.onjava.com/, and it was just a very high level overview
of Avalon.

Otis

--- Clemens Marschner <cm...@lanlab.de> wrote:
>  
> > > One last thought:
> > > - the crawler should be be started as a daemon process (at least
> > > optionally)
> > > - it should wake up from time to time to crawl changed pages
> > > - it should provide a management and status interface to the
> outside.
> > > - it internally needs the ability to run service jobs while
> crawling
> > > (keeping memory tidy, collecting stats, etc.)
> > > 
> > > from what I know, these matters could be addressed by the Apache
> > > Avalon/Phoenix project. Does anyone know anything about it?
> > 
> > To me Avalon looks relatively complex, but from what I've read it
> is a
> > piece of software designed to allow applications like your crawler
> to
> > run on top of it.  I'm stating the obvious, for some.
> 
> Does anybody have experience with Avalon Phoenix?
> 
> Some time ago I stepped over an app that used it. Was it Slide?
> Maybe.
> 
> Regards,
> 
> Clemens
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
Yahoo! - Official partner of 2002 FIFA World Cup
http://fifaworldcup.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>