You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Prasenjit Mukherjee <pr...@aol.com> on 2006/03/29 06:57:55 UTC
Data structure of a Lucene Index
It seems to me that lucene doesn't use B-tree for its indexing storage.
Any paper/article which explains the theory behind data-structure of
single index(segment). I am not referring to the merge algorithm, I am
curious to know the storage structure of a single optimized lucene index.
Any pointer is greatly appreciated.
--Prasen
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Data structure of a Lucene Index
Posted by Prasenjit Mukherjee <pr...@aol.com>.
I think Doug's paper ( specifically the Seek and Transfer section ) is
the closest I could get. A little bit detailed explanation can be found
in Yates' book on Information-Retreival. I agree with Dimitry, a
detailed explanation (or even pointers to some existing arcticle would
be beneficial to all of us).
--prasen
------------------------------------------------------------
I talked about this a bit in a presentation at Haifa last year:
http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf
See the section on "Seek versus Transfer".
Doug
Dmitry Goldenberg wrote:
>Ideally, I'd love to see an article explaining both in detail: the index structure as well as the merge algorithm...
>
>________________________________
>
>From: Prasenjit Mukherjee [mailto:prasenjitm@aol.com]
>Sent: Tue 3/28/2006 11:57 PM
>To: java-user@lucene.apache.org
>Subject: Data structure of a Lucene Index
>
>
>
>It seems to me that lucene doesn't use B-tree for its indexing storage.
>Any paper/article which explains the theory behind data-structure of
>single index(segment). I am not referring to the merge algorithm, I am
>curious to know the storage structure of a single optimized lucene index.
>
>Any pointer is greatly appreciated.
>--Prasen
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
>
>
>
>------------------------------------------------------------------------
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
RE: Data structure of a Lucene Index
Posted by Dmitry Goldenberg <dm...@weblayers.com>.
Ideally, I'd love to see an article explaining both in detail: the index structure as well as the merge algorithm...
________________________________
From: Prasenjit Mukherjee [mailto:prasenjitm@aol.com]
Sent: Tue 3/28/2006 11:57 PM
To: java-user@lucene.apache.org
Subject: Data structure of a Lucene Index
It seems to me that lucene doesn't use B-tree for its indexing storage.
Any paper/article which explains the theory behind data-structure of
single index(segment). I am not referring to the merge algorithm, I am
curious to know the storage structure of a single optimized lucene index.
Any pointer is greatly appreciated.
--Prasen
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Data structure of a Lucene Index
Posted by Prasenjit Mukherjee <pr...@aol.com>.
I have already gone through the fileformat. What I was looking for, is
the underlying theory behind the chosen fileformats. I am sure those
fileformats were decided based on some theoritical axioms.
--prasen
erik@ehatchersolutions.com wrote:
>
> On Mar 28, 2006, at 11:57 PM, Prasenjit Mukherjee wrote:
>
>> It seems to me that lucene doesn't use B-tree for its indexing
>> storage. Any paper/article which explains the theory behind data-
>> structure of single index(segment). I am not referring to the
>> merge algorithm, I am curious to know the storage structure of a
>> single optimized lucene index.
>>
>> Any pointer is greatly appreciated.
>
>
> How about this for starters?
>
> http://lucene.apache.org/java/docs/fileformats.html
>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Data structure of a Lucene Index
Posted by Erik Hatcher <er...@ehatchersolutions.com>.
On Mar 28, 2006, at 11:57 PM, Prasenjit Mukherjee wrote:
> It seems to me that lucene doesn't use B-tree for its indexing
> storage. Any paper/article which explains the theory behind data-
> structure of single index(segment). I am not referring to the
> merge algorithm, I am curious to know the storage structure of a
> single optimized lucene index.
>
> Any pointer is greatly appreciated.
How about this for starters?
http://lucene.apache.org/java/docs/fileformats.html
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re[2]: Implemented subclasses of Similarity class in Lucene
Posted by Charlie <ch...@gmail.com>.
Hi Edgar,
Are there any technical reports explaining your design and
implementation of LM on Lucene? Or what source files are exactly "LM
extension"?
--
Best regards,
Charlie
---
Friday, May 26, 2006, 7:36:14 AM, you wrote:
> Hi Edgar,
> While doing the integration/updating for Lucene 1.9, could you be more
> open and clear about the design so that people can
> 1)Understand it,
> 2)Extend it,
> Just an recommendation.
> Cheers,
> Murat
> Edgar Meij wrote:
>> Hi Ganesh,
>>
>> We have developed a Language Modeling extension to Lucene at the
>> University of Amsterdam. It can be found here:
>>
>> http://ilps.science.uva.nl/Resources/#lm-lucen
>>
>> It was build around Lucene 1.4.3, so it isn't source compatible with
>> the latest Lucene version. We are currently working on
>> integrating/updating it to Lucene 1.9.
>>
>> Best,
>>
>> Edgar Meij
>>
>>
>> On 3/31/06, Ganesh Ramakrishnan
>> <ga...@yahoo.com> wrote:
>>
>>> Hi
>>>
>>> Is anyone aware of subclasses of the Similarity class in Lucene? Two
>>> subclasses are: DefaultSimilarity and SimilarityDelegator . Are any
>>> other implemented subclasses of Similarity, developed by anyone else
>>> available on the web? For example, Language Model based similarity,
>>> or Okapi-BM similarity or different TFIDF weighing scehemes for
>>> similarity.
>>>
>>> If so, can you point me to them?
>>>
>>> Thanks and regards,
>>> Ganesh.
>>>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Implemented subclasses of Similarity class in Lucene
Posted by Murat Yakici <mu...@cis.strath.ac.uk>.
Hi Edgar,
While doing the integration/updating for Lucene 1.9, could you be more
open and clear about the design so that people can
1)Understand it,
2)Extend it,
Just an recommendation.
Cheers,
Murat
Edgar Meij wrote:
> Hi Ganesh,
>
> We have developed a Language Modeling extension to Lucene at the
> University of Amsterdam. It can be found here:
>
> http://ilps.science.uva.nl/Resources/#lm-lucen
>
> It was build around Lucene 1.4.3, so it isn't source compatible with
> the latest Lucene version. We are currently working on
> integrating/updating it to Lucene 1.9.
>
> Best,
>
> Edgar Meij
>
>
> On 3/31/06, Ganesh Ramakrishnan <ga...@yahoo.com> wrote:
>
>> Hi
>>
>> Is anyone aware of subclasses of the Similarity class in Lucene? Two
>> subclasses are: DefaultSimilarity and SimilarityDelegator . Are any
>> other implemented subclasses of Similarity, developed by anyone else
>> available on the web? For example, Language Model based similarity,
>> or Okapi-BM similarity or different TFIDF weighing scehemes for
>> similarity.
>>
>> If so, can you point me to them?
>>
>> Thanks and regards,
>> Ganesh.
>>
>> ---------------------------------
>> Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low
>> rates.
>>
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Implemented subclasses of Similarity class in Lucene
Posted by Edgar Meij <ed...@gmail.com>.
Hi Ganesh,
We have developed a Language Modeling extension to Lucene at the
University of Amsterdam. It can be found here:
http://ilps.science.uva.nl/Resources/#lm-lucen
It was build around Lucene 1.4.3, so it isn't source compatible with
the latest Lucene version. We are currently working on
integrating/updating it to Lucene 1.9.
Best,
Edgar Meij
On 3/31/06, Ganesh Ramakrishnan <ga...@yahoo.com> wrote:
> Hi
>
> Is anyone aware of subclasses of the Similarity class in Lucene? Two subclasses are: DefaultSimilarity and SimilarityDelegator . Are any other implemented subclasses of Similarity, developed by anyone else available on the web? For example, Language Model based similarity, or Okapi-BM similarity or different TFIDF weighing scehemes for similarity.
>
> If so, can you point me to them?
>
> Thanks and regards,
> Ganesh.
>
> ---------------------------------
> Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low rates.
>
--
'An approximate answer to the right question is worth a great deal
more than a precise answer to the wrong question'
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Implemented subclasses of Similarity class in Lucene
Posted by Ganesh Ramakrishnan <ga...@yahoo.com>.
Hi
Is anyone aware of subclasses of the Similarity class in Lucene? Two subclasses are: DefaultSimilarity and SimilarityDelegator . Are any other implemented subclasses of Similarity, developed by anyone else available on the web? For example, Language Model based similarity, or Okapi-BM similarity or different TFIDF weighing scehemes for similarity.
If so, can you point me to them?
Thanks and regards,
Ganesh.
---------------------------------
Yahoo! Messenger with Voice. PC-to-Phone calls for ridiculously low rates.
Re: Data structure of a Lucene Index
Posted by Doug Cutting <cu...@apache.org>.
I talked about this a bit in a presentation at Haifa last year:
http://www.haifa.ibm.com/Workshops/ir2005/papers/DougCutting-Haifa05.pdf
See the section on "Seek versus Transfer".
Doug
Prasenjit Mukherjee wrote:
> It seems to me that lucene doesn't use B-tree for its indexing storage.
> Any paper/article which explains the theory behind data-structure of
> single index(segment). I am not referring to the merge algorithm, I am
> curious to know the storage structure of a single optimized lucene index.
>
> Any pointer is greatly appreciated.
> --Prasen
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org