You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Darren Govoni <da...@ontrenet.com> on 2008/11/16 16:39:43 UTC

InstantiatedIndex help

Hi gang,
   I am trying to trace the 2.4 API to create an InstantiatedIndex, but
its rather difficult to connect directory,reader,search,index etc just
reading the javadocs. 

    I have a (POI - plain old index) directory already and want to
create a faster InstantiatedIndex and IndexSearcher to query it like
before. What's the proper order to do this? 

Also, if anyone has any empirical data on the performance or reliability
of InstantiatedIndex, I'd be curious.

Thanks for the tips!
Darren


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help + first impression

Posted by karl wettin <ka...@gmail.com>.
On Wed, Nov 19, 2008 at 3:27 AM, karl wettin <ka...@gmail.com> wrote:
> rewritten query. I.e. this is probably as much a store related expense
> as it is a Levenshtein calculation expense.

"this is probably *not* as much a store related.." that is.


    karl

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help + first impression

Posted by karl wettin <ka...@gmail.com>.
The actual performance depends on how much you load to the index. Can
you tell us how many documents and how large these documents are that
you have in your index?

Compared with RAMDirectory I'vee seen performance boosts of

up to 100x in a small index that contains (1-20) Wikipedia sized
documents, an index I used to apply user search agents on as new data
arrived to the primary index.
up to 25x when placing massive amounts of span queries on the apriori
index in LUCENE-626. This index contained tens of thousands of
documents with only a few (5-20) terms each.
up to 15x in a relatively large ngram index for classifications using
LUCENE-626. This is pure skipTo operations.

Regarding the fuzzy query, try to see how much time was spent
rewriting the query and then how much time was spend querying. I'm
almost certain you'll notice that the time spent rewriting the query
(comparing edit distance between the terms of the index and the query
term) is overwhelming  compared to the time spend searching for the
rewritten query. I.e. this is probably as much a store related expense
as it is a Levenshtein calculation expense.


    karl

(this is my second reply, the first one seems to be lost in space?)

On Mon, Nov 17, 2008 at 1:37 AM, Darren Govoni <da...@ontrenet.com> wrote:
> After I switched to InstantiatedIndex from RAMDirectory (but using the
> reader from my RAMDirectory to create the InstantiatedIndex), I see a
> less than 25% (.25) improvement in speed. Nowhere near the 100x (100.00)
> speed mentioned in the documentation. Probably I am doing something
> wrong.
>
> I am using too, a fuzzy query. e.g. "word:house~0.80" but I'd expect the
> improvement to be because of physical representation (memory graph) and
> mostly unaffected by the query. no?
>
> Could there be some lazy loading going on in RAMDirectory that prevents
> InstantiatedIndex from building out its graph and getting the expected
> speed?
>
> thanks to anyone who can verify this.
>
>
> On Sun, 2008-11-16 at 12:37 -0500, Darren Govoni wrote:
>> Yeah. That makes sense. Its not too hard to wrap those extra steps so I
>> can end up with something simpler too. Like:
>>
>> iindex = InstantiatedIndex("path/to/my/index")
>>
>> I'm lazy so the intermediate hoops to jump through clutter my code.
>> Hehe.
>>
>> :)
>>
>> Darren
>>
>> On Sun, 2008-11-16 at 11:46 -0500, Mark Miller wrote:
>> > Can you start with an empty index? Then how about:
>> >
>> > // Adding these
>> >
>> >     iindex = InstantiatedIndex()
>> >     ireader = iindex.indexReaderFactory()
>> >     isearcher = IndexSearcher(ireader)
>> >
>> > If you want a copy from another IndexReader though, you have to get that reader from somewhere right?
>> >
>> > - Mark
>> >
>> >
>> >
>> > Darren Govoni wrote:
>> > > Hi Mark,
>> > >   Thanks for the tips. Here's what I will try (psuedo-code)
>> > >
>> > >     endirectory = RAMDirectory("index/dictionary.en")
>> > >     ensearcher = IndexSearcher(endirectory)
>> > >     // Adding these
>> > >     reader = ensearcher.getIndexReader()
>> > >     iindex = InstantiatedIndex(reader)
>> > >     ireader = iindex.indexReaderFactory()
>> > >     isearcher = IndexSearcher(ireader)
>> > >
>> > > Kind of round about way to get an InstantiatedIndex I guess,but maybe
>> > > there's a briefer way?
>> > >
>> > > Thank you.
>> > > Darren
>> > >
>> > > On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
>> > >
>> > >> Check out the docs at:
>> > >> http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html
>> > >>
>> > >> There is a performance graph there to check  out.
>> > >>
>> > >> The code should be fairly straightforward - you can make an
>> > >> InstantiatedIndex thats empty, or seed it with an IndexReader. Then you
>> > >> can make an InstantiatedReader or Writer, which take the
>> > >> InstantiatedIndex as a constructor arg.
>> > >>
>> > >> You should be able to just wrap that InstantiatedReader in a regular
>> > >> Searcher.
>> > >>
>> > >> Darren Govoni wrote:
>> > >>
>> > >>> Hi gang,
>> > >>>    I am trying to trace the 2.4 API to create an InstantiatedIndex, but
>> > >>> its rather difficult to connect directory,reader,search,index etc just
>> > >>> reading the javadocs.
>> > >>>
>> > >>>     I have a (POI - plain old index) directory already and want to
>> > >>> create a faster InstantiatedIndex and IndexSearcher to query it like
>> > >>> before. What's the proper order to do this?
>> > >>>
>> > >>> Also, if anyone has any empirical data on the performance or reliability
>> > >>> of InstantiatedIndex, I'd be curious.
>> > >>>
>> > >>> Thanks for the tips!
>> > >>> Darren
>> > >>>
>> > >>>
>> > >>> ---------------------------------------------------------------------
>> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >>> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>>
>> > >>>
>> > >>>
>> > >> ---------------------------------------------------------------------
>> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >>
>> > >>
>> > >
>> > >
>> > > ---------------------------------------------------------------------
>> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > > For additional commands, e-mail: java-user-help@lucene.apache.org
>> > >
>> > >
>> >
>> >
>> > ---------------------------------------------------------------------
>> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> > For additional commands, e-mail: java-user-help@lucene.apache.org
>> >
>>
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help + first impression

Posted by Darren Govoni <da...@ontrenet.com>.
After I switched to InstantiatedIndex from RAMDirectory (but using the
reader from my RAMDirectory to create the InstantiatedIndex), I see a
less than 25% (.25) improvement in speed. Nowhere near the 100x (100.00)
speed mentioned in the documentation. Probably I am doing something
wrong. 

I am using too, a fuzzy query. e.g. "word:house~0.80" but I'd expect the
improvement to be because of physical representation (memory graph) and
mostly unaffected by the query. no?

Could there be some lazy loading going on in RAMDirectory that prevents
InstantiatedIndex from building out its graph and getting the expected
speed?

thanks to anyone who can verify this.


On Sun, 2008-11-16 at 12:37 -0500, Darren Govoni wrote:
> Yeah. That makes sense. Its not too hard to wrap those extra steps so I
> can end up with something simpler too. Like:
> 
> iindex = InstantiatedIndex("path/to/my/index")
> 
> I'm lazy so the intermediate hoops to jump through clutter my code.
> Hehe.
> 
> :)
> 
> Darren
> 
> On Sun, 2008-11-16 at 11:46 -0500, Mark Miller wrote:
> > Can you start with an empty index? Then how about:
> > 
> > // Adding these
> > 
> >     iindex = InstantiatedIndex()
> >     ireader = iindex.indexReaderFactory()
> >     isearcher = IndexSearcher(ireader)
> > 
> > If you want a copy from another IndexReader though, you have to get that reader from somewhere right?
> > 
> > - Mark 
> > 
> > 
> > 
> > Darren Govoni wrote:
> > > Hi Mark,
> > >   Thanks for the tips. Here's what I will try (psuedo-code)
> > >
> > >     endirectory = RAMDirectory("index/dictionary.en")
> > >     ensearcher = IndexSearcher(endirectory)
> > >     // Adding these
> > >     reader = ensearcher.getIndexReader()
> > >     iindex = InstantiatedIndex(reader)
> > >     ireader = iindex.indexReaderFactory()
> > >     isearcher = IndexSearcher(ireader)
> > >
> > > Kind of round about way to get an InstantiatedIndex I guess,but maybe
> > > there's a briefer way?
> > >
> > > Thank you.
> > > Darren
> > >
> > > On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
> > >   
> > >> Check out the docs at: 
> > >> http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html
> > >>
> > >> There is a performance graph there to check  out.
> > >>
> > >> The code should be fairly straightforward - you can make an 
> > >> InstantiatedIndex thats empty, or seed it with an IndexReader. Then you 
> > >> can make an InstantiatedReader or Writer, which take the 
> > >> InstantiatedIndex as a constructor arg.
> > >>
> > >> You should be able to just wrap that InstantiatedReader in a regular 
> > >> Searcher.
> > >>
> > >> Darren Govoni wrote:
> > >>     
> > >>> Hi gang,
> > >>>    I am trying to trace the 2.4 API to create an InstantiatedIndex, but
> > >>> its rather difficult to connect directory,reader,search,index etc just
> > >>> reading the javadocs. 
> > >>>
> > >>>     I have a (POI - plain old index) directory already and want to
> > >>> create a faster InstantiatedIndex and IndexSearcher to query it like
> > >>> before. What's the proper order to do this? 
> > >>>
> > >>> Also, if anyone has any empirical data on the performance or reliability
> > >>> of InstantiatedIndex, I'd be curious.
> > >>>
> > >>> Thanks for the tips!
> > >>> Darren
> > >>>
> > >>>
> > >>> ---------------------------------------------------------------------
> > >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>>
> > >>>   
> > >>>       
> > >> ---------------------------------------------------------------------
> > >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > >> For additional commands, e-mail: java-user-help@lucene.apache.org
> > >>
> > >>     
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >   
> > 
> > 
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> > 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help

Posted by Darren Govoni <da...@ontrenet.com>.
Yeah. That makes sense. Its not too hard to wrap those extra steps so I
can end up with something simpler too. Like:

iindex = InstantiatedIndex("path/to/my/index")

I'm lazy so the intermediate hoops to jump through clutter my code.
Hehe.

:)

Darren

On Sun, 2008-11-16 at 11:46 -0500, Mark Miller wrote:
> Can you start with an empty index? Then how about:
> 
> // Adding these
> 
>     iindex = InstantiatedIndex()
>     ireader = iindex.indexReaderFactory()
>     isearcher = IndexSearcher(ireader)
> 
> If you want a copy from another IndexReader though, you have to get that reader from somewhere right?
> 
> - Mark 
> 
> 
> 
> Darren Govoni wrote:
> > Hi Mark,
> >   Thanks for the tips. Here's what I will try (psuedo-code)
> >
> >     endirectory = RAMDirectory("index/dictionary.en")
> >     ensearcher = IndexSearcher(endirectory)
> >     // Adding these
> >     reader = ensearcher.getIndexReader()
> >     iindex = InstantiatedIndex(reader)
> >     ireader = iindex.indexReaderFactory()
> >     isearcher = IndexSearcher(ireader)
> >
> > Kind of round about way to get an InstantiatedIndex I guess,but maybe
> > there's a briefer way?
> >
> > Thank you.
> > Darren
> >
> > On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
> >   
> >> Check out the docs at: 
> >> http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html
> >>
> >> There is a performance graph there to check  out.
> >>
> >> The code should be fairly straightforward - you can make an 
> >> InstantiatedIndex thats empty, or seed it with an IndexReader. Then you 
> >> can make an InstantiatedReader or Writer, which take the 
> >> InstantiatedIndex as a constructor arg.
> >>
> >> You should be able to just wrap that InstantiatedReader in a regular 
> >> Searcher.
> >>
> >> Darren Govoni wrote:
> >>     
> >>> Hi gang,
> >>>    I am trying to trace the 2.4 API to create an InstantiatedIndex, but
> >>> its rather difficult to connect directory,reader,search,index etc just
> >>> reading the javadocs. 
> >>>
> >>>     I have a (POI - plain old index) directory already and want to
> >>> create a faster InstantiatedIndex and IndexSearcher to query it like
> >>> before. What's the proper order to do this? 
> >>>
> >>> Also, if anyone has any empirical data on the performance or reliability
> >>> of InstantiatedIndex, I'd be curious.
> >>>
> >>> Thanks for the tips!
> >>> Darren
> >>>
> >>>
> >>> ---------------------------------------------------------------------
> >>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >>> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>>
> >>>   
> >>>       
> >> ---------------------------------------------------------------------
> >> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> >> For additional commands, e-mail: java-user-help@lucene.apache.org
> >>
> >>     
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help

Posted by Mark Miller <ma...@gmail.com>.
Can you start with an empty index? Then how about:

// Adding these

    iindex = InstantiatedIndex()
    ireader = iindex.indexReaderFactory()
    isearcher = IndexSearcher(ireader)

If you want a copy from another IndexReader though, you have to get that reader from somewhere right?

- Mark 



Darren Govoni wrote:
> Hi Mark,
>   Thanks for the tips. Here's what I will try (psuedo-code)
>
>     endirectory = RAMDirectory("index/dictionary.en")
>     ensearcher = IndexSearcher(endirectory)
>     // Adding these
>     reader = ensearcher.getIndexReader()
>     iindex = InstantiatedIndex(reader)
>     ireader = iindex.indexReaderFactory()
>     isearcher = IndexSearcher(ireader)
>
> Kind of round about way to get an InstantiatedIndex I guess,but maybe
> there's a briefer way?
>
> Thank you.
> Darren
>
> On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
>   
>> Check out the docs at: 
>> http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html
>>
>> There is a performance graph there to check  out.
>>
>> The code should be fairly straightforward - you can make an 
>> InstantiatedIndex thats empty, or seed it with an IndexReader. Then you 
>> can make an InstantiatedReader or Writer, which take the 
>> InstantiatedIndex as a constructor arg.
>>
>> You should be able to just wrap that InstantiatedReader in a regular 
>> Searcher.
>>
>> Darren Govoni wrote:
>>     
>>> Hi gang,
>>>    I am trying to trace the 2.4 API to create an InstantiatedIndex, but
>>> its rather difficult to connect directory,reader,search,index etc just
>>> reading the javadocs. 
>>>
>>>     I have a (POI - plain old index) directory already and want to
>>> create a faster InstantiatedIndex and IndexSearcher to query it like
>>> before. What's the proper order to do this? 
>>>
>>> Also, if anyone has any empirical data on the performance or reliability
>>> of InstantiatedIndex, I'd be curious.
>>>
>>> Thanks for the tips!
>>> Darren
>>>
>>>
>>> ---------------------------------------------------------------------
>>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>>
>>>   
>>>       
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>     
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help

Posted by Darren Govoni <da...@ontrenet.com>.
Hi Mark,
  Thanks for the tips. Here's what I will try (psuedo-code)

    endirectory = RAMDirectory("index/dictionary.en")
    ensearcher = IndexSearcher(endirectory)
    // Adding these
    reader = ensearcher.getIndexReader()
    iindex = InstantiatedIndex(reader)
    ireader = iindex.indexReaderFactory()
    isearcher = IndexSearcher(ireader)

Kind of round about way to get an InstantiatedIndex I guess,but maybe
there's a briefer way?

Thank you.
Darren

On Sun, 2008-11-16 at 10:50 -0500, Mark Miller wrote:
> Check out the docs at: 
> http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html
> 
> There is a performance graph there to check  out.
> 
> The code should be fairly straightforward - you can make an 
> InstantiatedIndex thats empty, or seed it with an IndexReader. Then you 
> can make an InstantiatedReader or Writer, which take the 
> InstantiatedIndex as a constructor arg.
> 
> You should be able to just wrap that InstantiatedReader in a regular 
> Searcher.
> 
> Darren Govoni wrote:
> > Hi gang,
> >    I am trying to trace the 2.4 API to create an InstantiatedIndex, but
> > its rather difficult to connect directory,reader,search,index etc just
> > reading the javadocs. 
> >
> >     I have a (POI - plain old index) directory already and want to
> > create a faster InstantiatedIndex and IndexSearcher to query it like
> > before. What's the proper order to do this? 
> >
> > Also, if anyone has any empirical data on the performance or reliability
> > of InstantiatedIndex, I'd be curious.
> >
> > Thanks for the tips!
> > Darren
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >   
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
> 


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstantiatedIndex help

Posted by Mark Miller <ma...@gmail.com>.
Check out the docs at: 
http://lucene.apache.org/java/2_4_0/api/contrib-instantiated/index.html

There is a performance graph there to check  out.

The code should be fairly straightforward - you can make an 
InstantiatedIndex thats empty, or seed it with an IndexReader. Then you 
can make an InstantiatedReader or Writer, which take the 
InstantiatedIndex as a constructor arg.

You should be able to just wrap that InstantiatedReader in a regular 
Searcher.

Darren Govoni wrote:
> Hi gang,
>    I am trying to trace the 2.4 API to create an InstantiatedIndex, but
> its rather difficult to connect directory,reader,search,index etc just
> reading the javadocs. 
>
>     I have a (POI - plain old index) directory already and want to
> create a faster InstantiatedIndex and IndexSearcher to query it like
> before. What's the proper order to do this? 
>
> Also, if anyone has any empirical data on the performance or reliability
> of InstantiatedIndex, I'd be curious.
>
> Thanks for the tips!
> Darren
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org