You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Causse <dc...@spotter.com> on 2008/11/19 11:00:35 UTC

InstatiatedIndex questions

Hi,

Here are some differences I noticed between InstanciatedIndex and 
RAMDirectory :

- RAMDirectory seems to do a reset on tokenStreams the first time, this 
permits to initialise some objects before starting streaming, 
InstanciatedIndex does not.
- I can Serialize a RAMDirectory but I cannot on a InstantiatedIndex 
because of : java.io.NotSerializableException: 
org.apache.lucene.index.TermVectorOffsetInfo

Do you consider this as problems or normal features?

Thank you.

David.

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstatiatedIndex questions

Posted by David Causse <dc...@spotter.com>.
Hi Karl,

The reset() problem is not very problematic I can adapt our TokenStreams.
For the Serialization : as we need to share very small indexes (200 docs 
max) in a cluster we need to serialize something.
I was planning to use the Java Serialization with maybe some compression 
on the resulting byte[] and as InstantiatedIndex is Serializable I was 
hoping to use the perf gain of your implementation in our context.
I will fix my working copy as you suggested.

Thank you.

David.

karl wettin a écrit :
> Hi David,
>
> thanks for the report! I suppose you speak of IndexWriter vs
> InstantiatedIndexWriter? These are definitely considered discrepancy
> problems. I've created a new issue in the tracker:
> http://issues.apache.org/jira/browse/LUCENE-1462
>
> For what reason do you try to serialize the InstantatedIndex? Could
> you perhaps use FSDirectory and IndexWriter instead, and then each
> time you update that index you replace your InstantiatedIndex with a
> new one constructed using the IndexReader argumented constructor of
> InstantiatedIndex?
>
> I'm afraid that I'm rather busy at the moment but I'll try to fix it
> ASAP. It should however be rather easy to fix if you just want to
> solve the specific problem: reset all pre-tokenized streams before
> they are tokenized in InstantiatedIndexWriter#addDocument and make
> TermVectorOffsetInfo implement Serializable.
>
>
>      karl
>
> On Wed, Nov 19, 2008 at 11:00 AM, David Causse <dc...@spotter.com> wrote:
>   
>> Hi,
>>
>> Here are some differences I noticed between InstanciatedIndex and
>> RAMDirectory :
>>
>> - RAMDirectory seems to do a reset on tokenStreams the first time, this
>> permits to initialise some objects before starting streaming,
>> InstanciatedIndex does not.
>> - I can Serialize a RAMDirectory but I cannot on a InstantiatedIndex because
>> of : java.io.NotSerializableException:
>> org.apache.lucene.index.TermVectorOffsetInfo
>>
>> Do you consider this as problems or normal features?
>>
>> Thank you.
>>
>> David.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>     
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: InstatiatedIndex questions

Posted by karl wettin <ka...@gmail.com>.
Hi David,

thanks for the report! I suppose you speak of IndexWriter vs
InstantiatedIndexWriter? These are definitely considered discrepancy
problems. I've created a new issue in the tracker:
http://issues.apache.org/jira/browse/LUCENE-1462

For what reason do you try to serialize the InstantatedIndex? Could
you perhaps use FSDirectory and IndexWriter instead, and then each
time you update that index you replace your InstantiatedIndex with a
new one constructed using the IndexReader argumented constructor of
InstantiatedIndex?

I'm afraid that I'm rather busy at the moment but I'll try to fix it
ASAP. It should however be rather easy to fix if you just want to
solve the specific problem: reset all pre-tokenized streams before
they are tokenized in InstantiatedIndexWriter#addDocument and make
TermVectorOffsetInfo implement Serializable.


     karl

On Wed, Nov 19, 2008 at 11:00 AM, David Causse <dc...@spotter.com> wrote:
> Hi,
>
> Here are some differences I noticed between InstanciatedIndex and
> RAMDirectory :
>
> - RAMDirectory seems to do a reset on tokenStreams the first time, this
> permits to initialise some objects before starting streaming,
> InstanciatedIndex does not.
> - I can Serialize a RAMDirectory but I cannot on a InstantiatedIndex because
> of : java.io.NotSerializableException:
> org.apache.lucene.index.TermVectorOffsetInfo
>
> Do you consider this as problems or normal features?
>
> Thank you.
>
> David.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org