You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucene.apache.org by Halácsy Péter <ha...@axelero.hu> on 2002/04/02 21:55:29 UTC
lucene & avalon (was: Proposal for Lucene / new component)

Hello,
more than 1 month ago  I promissed to write an avalon example application.
Now in my project I need some avalon components so I "avalonized" lucene. I
published the package as a zip file:
www.extra.hu/halacsyp/lucelon.zip

The main idea is to make two manager component one for Searches and one for
Writers. This is something similar to DataSource/DriverManager and
Connection.
Interface of two main components:
public interface SearcherManager extends Component {
    public Searcher getSearcher();
}

public interface IndexWriterManager extends Component {
    public IndexWriter getWriter(boolean create);
}

You can configure:
1. exactly which implementing Manager class to use (I implemented two
SearcherManager: IndexSearcher and MultiSearcherManager and one
IndexWriterManager);
2. in my implementation you can configure which directory to use and for
writer mergeFactor, maxDocs ...

I rewrited to demo files: SearchFiles and IndexFiles to use my components.
You can compile and try it.

In my project I have two indexes in two different (filesystem) directory. I
have three Searchers:
1. one for directory I.
2. one for directory II
3. MultiSearcher

To configure this I have to write a config file:
<components logger="core">
  <directories>
    <filesystem
name="topics"><path>c://temp/index/topics</path></filesystem>
    <filesystem
name="messages"><path>c://temp/index/messages</path></filesystem>
  </directories>
  <analyzers>
   <!-- standard analyzer of lucene -->
        <standard name="standard">  <stopwords>
                <w>the</w>
                <w>a</w>
                <w>this</w>
                <w>that</w>
                <w>an</w>
                <w>or</w>
            </stopwords></standard>
  </analyzers>
  <searchers>
    <directory-searcher name="topics">
           <directory>topics</directory>
    </directory-searcher>
     <directory-searcher name="messages">
               <directory>messages</directory>
    </directory-searcher>
    <multi-searcher name="multi">
     <searcher>topics</searcher>
     <searcher>messages</searcher>
    </multi-searcher>
  </searchers>

  <writers>
    <directory-writer name="topics">
        <directory>topics</directory>
        <analyzer>standard</analyzer>
    </directory-writer>
    <directory-writer name="messages">
     <mergeFactor>20</mergeFactor>
        <directory>topics</directory>
        <analyzer>standard</analyzer>
    </directory-writer>

   </writers>

   </component>

Why is it good for me:
1. because I can hide the implementation details from the application
developer
2. I can confugre the system via config files
3. my logging system is ready to use (provided by apache logkit)
4. I can change the component's implementation without modification the code
(I'll change the analyzer because the standard lucene analyzer can't work
with ISO-8859-2 characters [I'll check it tomorrow])

I have to work on a better SearcherManager. We know that several thread can
reuse the same IndexReader but it should be closed and reopened when the
directory is modified. My problem is: i
Thread-1 gets an searcher and Thread-2 gets an other searcher; the two
Searcher uses the same IndexReader. Thread-1 has finished it's work and
close it. The Searcher will close the IndexReader that is used by Thread-2.
I think I've to implement something similar to (SQL) connection cache.

Thread 1 uses Searcher that uses an instance of CachedIndexReader. If
Thread-1 closes the cachedIndexReader it doesn't close the physical
IndexReader only notify the cache that it's close method was called.

Notice that we don't need to change the SearcherManager interface so I can
plug in new implementation (to be honest this kind of Manager classes could
be used without avalon: this is simply a use of abtract factory design
pattern)

Somethind other:
 how about an IndexWriter called BatchIndexWriter that uses a RAMDirectory
to buffer documents to add to the index:
// sketch
public void addDocument(Docuement d) {
    count++;
    ramWriter.addDocument(d);
    if(count > aLimit) {
        realWriter.addDirectory(ramWriter.getDirectory());
        ramWriter = new IndexWriter(new RAMDirectory());
       count = 0;
   }
}

of course value of limit could be configured

peter

ps: good tutorial:
http://jakarta.apache.org/avalon/developing/introduction.html)

-------------------------------------------

RE: Proposal for Lucene / new component
From: Andrew C. Oliver
Subject: RE: Proposal for Lucene / new component
Date: Sun, 03 Mar 2002 11:48:27 -0800

> I think if you need logging, configuring, threading, pooling (for the
crawler) and
>want to be component based you need a framework some thing like avalon. It
took one
>day to understand Avalon and write the first Hello world application but
you can save
>a lot of time while coding.
>

Great!  Can you post your work to get the Hello Avalon App somewhere?
If you could document along those lines as well then I'll be happy to go
and write a "getting started" guide for Avalon.

I'm not objecting to using Avalon provided I can actually understand
it.  I'm really close thanks to the fine work of Ken Barrozzi
(http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/poi/cocoon-poi/), but
I'm one step away from actually being about to start using Avalon.  Its
not a "I won't" its an "I can't" issue.


--
To unsubscribe, e-mail:   <ma...@jakarta.apache.org>
For additional commands, e-mail: <ma...@jakarta.apache.org>