You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by none none <ko...@lycos.com> on 2002/05/14 22:12:56 UTC

Lucene Sandbox

hi,
i have i question about the CVS , where i should post it?

if someone can help, my question is:
I don't know how use the "jcvs" to download code from the sandbox.
Do i need a username and password? if so where i can get one?

Thanks,


________________________________________________________
Outgrown your current e-mail service?
Get a 25MB Inbox, POP3 Access, No Ads and No Taglines with LYCOS MAIL PLUS.
http://login.mail.lycos.com/brandPage.shtml?pageId=plus

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Lucene Sandbox

Posted by Peter Carlson <ca...@bookandhammer.com>.
Hi,

Here is a link to how to use the apache CVS.

http://jakarta.apache.org/site/cvsindex.html

I hope this answers your question.

--Peter


On 5/14/02 1:12 PM, "none none" <ko...@lycos.com> wrote:

> hi,
> i have i question about the CVS , where i should post it?
> 
> if someone can help, my question is:
> I don't know how use the "jcvs" to download code from the sandbox.
> Do i need a username and password? if so where i can get one?
> 
> Thanks,
> 
> 
> ________________________________________________________
> Outgrown your current e-mail service?
> Get a 25MB Inbox, POP3 Access, No Ads and No Taglines with LYCOS MAIL PLUS.
> http://login.mail.lycos.com/brandPage.shtml?pageId=plus
> 
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
> 
> 


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Updated Jakarta Website with Lucene RC5 Release

Posted by Peter Carlson <ca...@bookandhammer.com>.
I have updated the Jakarta site pages (news, bin index and source index)
with the Lucene RC5 release.

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Adding a TermExpansionQuery

Posted by Dmitry Serebrennikov <dm...@earthlink.net>.
Great idea!
I would suggest the following considerations:
- this should be implemented as an interface that can support multiple 
implementations (such as something based on a simple lookup table and 
also something based on a wordNet-style synonim database)
- different implementations might be probably be used for the same index 
either at the same time or at different times
- for those using Lucene primarialy as an API, the implementations is 
best provided programmatically when creating a Searcher, I think.
- for those using Lucene as an application, perhaps the application 
framework (such as the one people are working on in the sandbox) can 
take care of finding and instantiating the right implementation.

Good luck!
Dmitry.

Peter Carlson wrote:

>Hi,
>
>I was thinking of adding a TermExpansionQuery, basically if it finds the
>term in a lookup table, then it would also include an associated set of
>terms.
>
>For example, if the search term was "pet" it might also add "dog", "cat",
>"bird"
>
>The issue that I am having is were to store the terms and how to have Lucene
>know where that information is stored.
>
>Should there be a Lucene properties file? Should this be another type of
>file in the Lucene index folder?
>
>Suggestions would be appreciated.
>
>--Peter
>
>
>--
>To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
>For additional commands, e-mail: <ma...@jakarta.apache.org>
>
>




--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Adding a TermExpansionQuery

Posted by Julien Nioche <Ju...@lingway.com>.
Hi folks,

Just a little advertising message for those who are interested in semantic
expansions :

http://kant.lingway.com/DemoUN is a demo of a multilingual IR system based
on Lucene

Please take a look at it  - feedback is welcome!

Julien



----- Original Message -----
From: "Peter Carlson" <ca...@bookandhammer.com>
To: "Lucene Developers List" <lu...@jakarta.apache.org>
Sent: Wednesday, May 15, 2002 7:06 AM
Subject: Re: Adding a TermExpansionQuery


> Hi Eric,
>
> Thanks for the feedback. My intention was to abstract the source, but one
of
> my questions was, does Lucene set a configuration file which will use this
> "Thesaurus" query, or will that have to be setup manually by the
developer.
>
> Currently, Lucene does not provide a configuration file.
>
> As far as if the information is in the index directory. I was thinking
this
> might be a nice place for this information to exist, then it doesn't add
any
> other overhead to the system (i.e. No configuration file) and might be
> easier to support multiple sources since the index has already been
> abstracted. If you wanted to share the "Thesaurus" across many different
> indices you could "copy" or "merge" that index component into the data
> source. This could even be part of the build process for a file system.
>
> --Peter
>
> On 5/15/02 6:45 AM, "Eric D. Friedman" <er...@conveysoftware.com> wrote:
>
> > Whichever storage mechanism you choose, you should be sure to abstract
its
> > interface so that people can make other choices.  With that out of the
way,
> > it doesn't matter too much whether you pick a properties file or an XML
> > file.
> >
> > That said, I wouldn't expect to find this data stored in the index
> > directory, since it's not part of the index and since users may want to
> > share the data across several indices.  I would also lean toward the
> > XML file (for a file solution, that is -- an RDBMS should be supported
> > too), since that lends itself more naturally to describing one-to-many
> > relations than a properties file does.
> >
> > Personal opinion: "Thesaurus" is a more descriptive term than
> > "TermExpansion." To me, term expansion suggests some kind of text
> > globbing, whereas a thesaurus is a reference (a "lookup table") that
> > provides *semantic* expansions of the kind you describe.  Oracle's
> > intermedia indexing engine has thesaurus features similar to what you
> > describe and calls them by that name.
>
>
> --
> To unsubscribe, e-mail:
<ma...@jakarta.apache.org>
> For additional commands, e-mail:
<ma...@jakarta.apache.org>
>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Adding a TermExpansionQuery

Posted by Otis Gospodnetic <ot...@yahoo.com>.
This sounds like something I could use :)
I'd say keep it out of the index for various reasons that a few people
already mentioned, and Thesaurus is an easier to understand word to
non-tech, non-IR people, I think.

Otis

--- Peter Carlson <ca...@bookandhammer.com> wrote:
> Hi Eric,
> 
> Thanks for the feedback. My intention was to abstract the source, but
> one of
> my questions was, does Lucene set a configuration file which will use
> this
> "Thesaurus" query, or will that have to be setup manually by the
> developer.
> 
> Currently, Lucene does not provide a configuration file.
> 
> As far as if the information is in the index directory. I was
> thinking this
> might be a nice place for this information to exist, then it doesn't
> add any
> other overhead to the system (i.e. No configuration file) and might
> be
> easier to support multiple sources since the index has already been
> abstracted. If you wanted to share the "Thesaurus" across many
> different
> indices you could "copy" or "merge" that index component into the
> data
> source. This could even be part of the build process for a file
> system.
> 
> --Peter
> 
> On 5/15/02 6:45 AM, "Eric D. Friedman" <er...@conveysoftware.com>
> wrote:
> 
> > Whichever storage mechanism you choose, you should be sure to
> abstract its
> > interface so that people can make other choices.  With that out of
> the way,
> > it doesn't matter too much whether you pick a properties file or an
> XML
> > file.
> > 
> > That said, I wouldn't expect to find this data stored in the index
> > directory, since it's not part of the index and since users may
> want to
> > share the data across several indices.  I would also lean toward
> the
> > XML file (for a file solution, that is -- an RDBMS should be
> supported
> > too), since that lends itself more naturally to describing
> one-to-many
> > relations than a properties file does.
> > 
> > Personal opinion: "Thesaurus" is a more descriptive term than
> > "TermExpansion." To me, term expansion suggests some kind of text
> > globbing, whereas a thesaurus is a reference (a "lookup table")
> that
> > provides *semantic* expansions of the kind you describe.  Oracle's
> > intermedia indexing engine has thesaurus features similar to what
> you
> > describe and calls them by that name.
> 
> 
> --
> To unsubscribe, e-mail:  
> <ma...@jakarta.apache.org>
> For additional commands, e-mail:
> <ma...@jakarta.apache.org>
> 


__________________________________________________
Do You Yahoo!?
LAUNCH - Your Yahoo! Music Experience
http://launch.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


RE: Adding a TermExpansionQuery

Posted by Landon Cox <lc...@interactive-media.com>.
Basic question of the ilk: exploring assumptions -

Why would Thesaurus/Expanded Terms be something 'stored' by Lucene at all?
Why wouldn't it be something provided by the application to a query that
accepted Thesaurus terms?  In other words, it seems like this functionality
can be built on what's there today - maybe some convenience classes that
accept the Thesaurus to build out the query string, but other than that, I
don't really see a need to change anything to accomodate this.

I can imagine different applications needing different Thesauruses but
having to use the same index. Unless the 'stored' Thesaurus was somehow
indexed by application name or even application instance or alternatively
marked as such in a properties file, all applications might have to live
with the same Thesaurus.

Seems easier and more flexible, with no change to Lucene required (except
convenience classes for query building if desired), to have the application
do the storing/providing.

Landon


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Adding a TermExpansionQuery

Posted by Peter Carlson <ca...@bookandhammer.com>.
Hi Eric,

Thanks for the feedback. My intention was to abstract the source, but one of
my questions was, does Lucene set a configuration file which will use this
"Thesaurus" query, or will that have to be setup manually by the developer.

Currently, Lucene does not provide a configuration file.

As far as if the information is in the index directory. I was thinking this
might be a nice place for this information to exist, then it doesn't add any
other overhead to the system (i.e. No configuration file) and might be
easier to support multiple sources since the index has already been
abstracted. If you wanted to share the "Thesaurus" across many different
indices you could "copy" or "merge" that index component into the data
source. This could even be part of the build process for a file system.

--Peter

On 5/15/02 6:45 AM, "Eric D. Friedman" <er...@conveysoftware.com> wrote:

> Whichever storage mechanism you choose, you should be sure to abstract its
> interface so that people can make other choices.  With that out of the way,
> it doesn't matter too much whether you pick a properties file or an XML
> file.
> 
> That said, I wouldn't expect to find this data stored in the index
> directory, since it's not part of the index and since users may want to
> share the data across several indices.  I would also lean toward the
> XML file (for a file solution, that is -- an RDBMS should be supported
> too), since that lends itself more naturally to describing one-to-many
> relations than a properties file does.
> 
> Personal opinion: "Thesaurus" is a more descriptive term than
> "TermExpansion." To me, term expansion suggests some kind of text
> globbing, whereas a thesaurus is a reference (a "lookup table") that
> provides *semantic* expansions of the kind you describe.  Oracle's
> intermedia indexing engine has thesaurus features similar to what you
> describe and calls them by that name.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Re: Adding a TermExpansionQuery

Posted by "Eric D. Friedman" <er...@conveysoftware.com>.
Whichever storage mechanism you choose, you should be sure to abstract its
interface so that people can make other choices.  With that out of the way,
it doesn't matter too much whether you pick a properties file or an XML
file.

That said, I wouldn't expect to find this data stored in the index
directory, since it's not part of the index and since users may want to
share the data across several indices.  I would also lean toward the
XML file (for a file solution, that is -- an RDBMS should be supported
too), since that lends itself more naturally to describing one-to-many
relations than a properties file does.

Personal opinion: "Thesaurus" is a more descriptive term than
"TermExpansion." To me, term expansion suggests some kind of text
globbing, whereas a thesaurus is a reference (a "lookup table") that
provides *semantic* expansions of the kind you describe.  Oracle's
intermedia indexing engine has thesaurus features similar to what you
describe and calls them by that name.

Eric

On Tue, 14 May 2002, Peter Carlson wrote:

> Hi,
>
> I was thinking of adding a TermExpansionQuery, basically if it finds the
> term in a lookup table, then it would also include an associated set of
> terms.
>
> For example, if the search term was "pet" it might also add "dog", "cat",
> "bird"
>
> The issue that I am having is were to store the terms and how to have Lucene
> know where that information is stored.
>
> Should there be a Lucene properties file? Should this be another type of
> file in the Lucene index folder?
>
> Suggestions would be appreciated.
>
> --Peter
>
>
> --
> To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
> For additional commands, e-mail: <ma...@jakarta.apache.org>
>


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>


Adding a TermExpansionQuery

Posted by Peter Carlson <ca...@bookandhammer.com>.
Hi,

I was thinking of adding a TermExpansionQuery, basically if it finds the
term in a lookup table, then it would also include an associated set of
terms.

For example, if the search term was "pet" it might also add "dog", "cat",
"bird"

The issue that I am having is were to store the terms and how to have Lucene
know where that information is stored.

Should there be a Lucene properties file? Should this be another type of
file in the Lucene index folder?

Suggestions would be appreciated.

--Peter


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>