You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by David Causse <dc...@spotter.com> on 2008/11/19 11:00:35 UTC
InstatiatedIndex questions
Hi,
Here are some differences I noticed between InstanciatedIndex and
RAMDirectory :
- RAMDirectory seems to do a reset on tokenStreams the first time, this
permits to initialise some objects before starting streaming,
InstanciatedIndex does not.
- I can Serialize a RAMDirectory but I cannot on a InstantiatedIndex
because of : java.io.NotSerializableException:
org.apache.lucene.index.TermVectorOffsetInfo
Do you consider this as problems or normal features?
Thank you.
David.
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: InstatiatedIndex questions
Posted by David Causse <dc...@spotter.com>.
Hi Karl,
The reset() problem is not very problematic I can adapt our TokenStreams.
For the Serialization : as we need to share very small indexes (200 docs
max) in a cluster we need to serialize something.
I was planning to use the Java Serialization with maybe some compression
on the resulting byte[] and as InstantiatedIndex is Serializable I was
hoping to use the perf gain of your implementation in our context.
I will fix my working copy as you suggested.
Thank you.
David.
karl wettin a écrit :
> Hi David,
>
> thanks for the report! I suppose you speak of IndexWriter vs
> InstantiatedIndexWriter? These are definitely considered discrepancy
> problems. I've created a new issue in the tracker:
> http://issues.apache.org/jira/browse/LUCENE-1462
>
> For what reason do you try to serialize the InstantatedIndex? Could
> you perhaps use FSDirectory and IndexWriter instead, and then each
> time you update that index you replace your InstantiatedIndex with a
> new one constructed using the IndexReader argumented constructor of
> InstantiatedIndex?
>
> I'm afraid that I'm rather busy at the moment but I'll try to fix it
> ASAP. It should however be rather easy to fix if you just want to
> solve the specific problem: reset all pre-tokenized streams before
> they are tokenized in InstantiatedIndexWriter#addDocument and make
> TermVectorOffsetInfo implement Serializable.
>
>
> karl
>
> On Wed, Nov 19, 2008 at 11:00 AM, David Causse <dc...@spotter.com> wrote:
>
>> Hi,
>>
>> Here are some differences I noticed between InstanciatedIndex and
>> RAMDirectory :
>>
>> - RAMDirectory seems to do a reset on tokenStreams the first time, this
>> permits to initialise some objects before starting streaming,
>> InstanciatedIndex does not.
>> - I can Serialize a RAMDirectory but I cannot on a InstantiatedIndex because
>> of : java.io.NotSerializableException:
>> org.apache.lucene.index.TermVectorOffsetInfo
>>
>> Do you consider this as problems or normal features?
>>
>> Thank you.
>>
>> David.
>>
>> ---------------------------------------------------------------------
>> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>> For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: InstatiatedIndex questions
Posted by karl wettin <ka...@gmail.com>.
Hi David,
thanks for the report! I suppose you speak of IndexWriter vs
InstantiatedIndexWriter? These are definitely considered discrepancy
problems. I've created a new issue in the tracker:
http://issues.apache.org/jira/browse/LUCENE-1462
For what reason do you try to serialize the InstantatedIndex? Could
you perhaps use FSDirectory and IndexWriter instead, and then each
time you update that index you replace your InstantiatedIndex with a
new one constructed using the IndexReader argumented constructor of
InstantiatedIndex?
I'm afraid that I'm rather busy at the moment but I'll try to fix it
ASAP. It should however be rather easy to fix if you just want to
solve the specific problem: reset all pre-tokenized streams before
they are tokenized in InstantiatedIndexWriter#addDocument and make
TermVectorOffsetInfo implement Serializable.
karl
On Wed, Nov 19, 2008 at 11:00 AM, David Causse <dc...@spotter.com> wrote:
> Hi,
>
> Here are some differences I noticed between InstanciatedIndex and
> RAMDirectory :
>
> - RAMDirectory seems to do a reset on tokenStreams the first time, this
> permits to initialise some objects before starting streaming,
> InstanciatedIndex does not.
> - I can Serialize a RAMDirectory but I cannot on a InstantiatedIndex because
> of : java.io.NotSerializableException:
> org.apache.lucene.index.TermVectorOffsetInfo
>
> Do you consider this as problems or normal features?
>
> Thank you.
>
> David.
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org