You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by Joe Consumer <fa...@yahoo.com> on 2002/11/20 23:59:28 UTC

Help on creating and maintaining an index that changes

Hi,

I'm a lucene newbie.  I wanted to ask someone's expert
opinion on how to attack this issue.  I have a set of
documents (a catalog) that many clients want to
register with the search server.  While those clients
are reachable their catalog should be available, but
if they log off or disappear then I want to remove
their catalog from the index.

Currently, I have this implemented with two hashmaps. 
Their catalog is assigned a unique key in one hashmap,
and their catalog contents is parsed out into
keywords, and put into the master hashmap which
indexes into the other one.  When a client leaves I
remove their catalog from the first hashmap, and I
don't clean up the references in the master hashmap. 
If a search indexes a key that is null in the first
hashmap I remove the reference at that point from the
master Hashmap.

I want to do something similiar with Lucene, but I
don't know how to approach it.  I thought maybe
keeping the first hashmap as is, and building a
Directory in lucene that replaces the master Hashmap. 
 When I get hits back from lucene I look them up in
the first hashmap, and return those.

How do I put the needed information into Directory so
I can look them up in the first hashmap.  I would need
the unique id identifying the client, and a key that
identifies the document that the client has.

Then how do I clean up the Directory when a client is
not available?  How do I remove a document from
Lucene's Directory?

thanks in advance,
charlie


__________________________________________________
Do you Yahoo!?
Yahoo! Web Hosting - Let the expert host your site
http://webhosting.yahoo.com

--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>

Re: Help on creating and maintaining an index that changes

Posted by Karl Øie <ka...@gan.no>.

> I want to do something similiar with Lucene, but I
> don't know how to approach it.  I thought maybe
> keeping the first hashmap as is, and building a
> Directory in lucene that replaces the master Hashmap.
>  When I get hits back from lucene I look them up in
> the first hashmap, and return those.

If your index is big its probably best to do it this way. I got indexes 
that takes up to 12 hours to build and takes about 1gb of harddrive 
space but searching is still fast. if you put the client id's into 
keyword fields you can use lucenes to filter out hits from the clients 
you know is offline by using a boolean NOT, either manually or through 
the queryparser.

> How do I put the needed information into Directory so
> I can look them up in the first hashmap.  I would need
> the unique id identifying the client, and a key that
> identifies the document that the client has.

you add a keyword field to each document that contains the unique id 
identifying the client. This way you can search for documents from a 
client, and also filter out documents from that client.

> Then how do I clean up the Directory when a client is
> not available?  How do I remove a document from
> Lucene's Directory?

the org.apache.lucene.index.IndexReader class contains a delete() 
function to delete documents from lucene. But as said before, if your 
index is big it's best not to delete the documents just because a 
client goes offline, its better to filter out the hits.

mvh karl øie


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>