You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Mathias Lux <ma...@juggle.at> on 2006/08/08 08:56:51 UTC

Linear search using reader vs. scorer implementation

Hi!

I'm working in my spare time on Lire, a content based image retrieval
library (searching for similar looking images in other words, see
http://www.semanticmetadata.net/lire) based on Lucene.

As the cbir features are medium sized integer vectors I put them into
fields, and read them with the IndexReader within a linear search
through the whole index for matching: As soon as a query is issued
(which is also an integer vector) I go through every doc, get the
corresponding feature vector and calculate the distance (L1 or L2,
depending on the type of feature).

Do you have any idea if and how I could implement a linear search (L1/L2
distance on integer vectors) using scorers, so that filters and other
features can be used?

regards,
  Mathias

ps. Yes I know that this is in general easy to implement within a
database, which I have done for oracle, mysql and derby .... but people
do want the lucene implementation and believe it or not: Lucene is super
 fast for linear search -> THX to the lucene team ;)


-- 
    '   '    '
      '   '    '     Mathias Lux
 o/          '  \o   mathias@juggle.at
 /-'            -\   skype://dermotte, icq # 1988617
/\               /\  http://www.SemanticMetadata.net

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Linear search using reader vs. scorer implementation

Posted by Paul Borgermans <pa...@gmail.com>.
Hi Mathias

I delved a bit further in the lucene docs and the book "Lucene in
action" (Ch 6.1, pp194-201): an alternative approach may be the use of
a custom sort with a dedicated implementation of the
SortComparatorSource, taking as arguments the array of vectors for
which a "distance" needs to be calculated.

As a custom sort operates on a single field, the three Lire vectors
should be encoded in a single fields as well (as far as I understand
the API).

The query could then be the MatchAllDocsQuery query (or for example a
keyword query for having a subset) combined with other filters which
limit the result to sort on even further.

I do not know the impact on performance though, this is still pure
theoretical and based on my (currently limited) understanding of
Lucene and Lire.

Best regards

--paul

On 8/8/06, Mathias Lux <ma...@juggle.at> wrote:
> Hi!
>
> I'm working in my spare time on Lire, a content based image retrieval
> library (searching for similar looking images in other words, see
> http://www.semanticmetadata.net/lire) based on Lucene.
>
> As the cbir features are medium sized integer vectors I put them into
> fields, and read them with the IndexReader within a linear search
> through the whole index for matching: As soon as a query is issued
> (which is also an integer vector) I go through every doc, get the
> corresponding feature vector and calculate the distance (L1 or L2,
> depending on the type of feature).
>
> Do you have any idea if and how I could implement a linear search (L1/L2
> distance on integer vectors) using scorers, so that filters and other
> features can be used?
>
> regards,
>   Mathias
>
> ps. Yes I know that this is in general easy to implement within a
> database, which I have done for oracle, mysql and derby .... but people
> do want the lucene implementation and believe it or not: Lucene is super
>  fast for linear search -> THX to the lucene team ;)
>
>
> --
>     '   '    '
>       '   '    '     Mathias Lux
>  o/          '  \o   mathias@juggle.at
>  /-'            -\   skype://dermotte, icq # 1988617
> /\               /\  http://www.SemanticMetadata.net
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
http://walhalla.wordpress.com

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org