You are viewing a plain text version of this content. The canonical link for it is here.

Posted to java-user@lucene.apache.org by "Vaijanath N. Rao" <va...@aol.com> on 2008/04/26 11:01:45 UTC

Lucene Indexing structure

Hi Lucene-user and Lucene-dev,

I want to use lucene as an backend for the Image search (Content based 
Image retrieval).

Indexing Mechanism:
a) Get the Image properties such as Texture Tamura (TT), Texture Edge 
Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) 
and Color Correlogram  (CC) .
b) Convert each of these vector into String and index into lucene as 
fields, thush each Image (document in terms of lucene) consist of 6 
fields Image name, TT field, TE field, CCV field, CH field and CC field.

Searching Mechanism:
a) For the search Image convert the Image into the above 5 properties.
b) for every field and for every value within the field construct the 
query, For example let's say the user wants to search only Color 
histogram based similarity and the query Image has 3 1 4 5 as the CH 
value the query will look like.
    query = "CH:3 CH:1CH:4 CH:5"
c) for the results returned convert all the field values back into float 
and do the distance computation and re-rank the document with lower the 
distance on the top and larger distance at the bottom.
for example:
    For above query assume that output has two documents
    with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 
4", so the distance computation will rank the second document higher 
than the first.

Obviously there is something wrong with the above approach (as to get 
the correct document we need to get all the documents and than do the 
required distance calculation), but that' due to lack of my knowledge of 
Luce and lucene's Index storage.

What I want to know how to improve upon the exsisting architecture other 
than making number of fields in the lucene equalling to total number of 
feature*size of each feature.

Any other pointer will be welcomed. Is there is any Range tree 
implementation within lucene which I can use for this operation.

--Thanks and Regards
Vaijanath N. Rao

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene Indexing structure

Posted by Grant Ingersoll <gs...@apache.org>.

Would a Function Query (ValueSourceQuery, see the  
org.apache.lucene.search.function package) work in this case?

-Grant

On May 4, 2008, at 9:35 AM, Vaijanath N. Rao wrote:

> Hi Chris,
>
> Sorry for the cross-posting and also for not making clear the  
> problem. Let me try to explain the problem at my hand.
>
> I am tying to write a CBIR (Content Based Image Reterival)  frame  
> work using lucene. As each document have entities such as title,  
> description, author and so on. I am decomposing each image and  
> extracting features like color histogram, texture and other  
> important attributes from every image and indexing it in lucene such  
> a way that each of this attribute is a field. I convert the float  
> values as string for every feature that I have extracted from the  
> image.
>
> While searching for similar image I extract the same set of features  
> for the query Image and than query lucene to get all those images  
> which have atleast one of the features, than I do the re-ranking  
> according to the difference of the features. Once the re-ranking is  
> done I submit the result.
> Here is where I need help, I need to know an optimal way to store  
> the values, so that searching take less time and I don't have to re- 
> ranking. Is there any way I can compare array of values rather than  
> one value.  What I essentially need is to get the query of type,  
> give me all those features which are less than K distance from the  
> current feature.
>
> --Thanks and Regagrds
> Vaijanath
>
> Chris Hostetter wrote:
>> : Hi Lucene-user and Lucene-dev,
>>
>> Please do not cross post -- java-user is the suitable place for  
>> your question.
>>
>> : Obviously there is something wrong with the above approach (as to  
>> get the
>> : correct document we need to get all the documents and than do the  
>> required
>> : distance calculation), but that' due to lack of my knowledge of  
>> Luce and
>> : lucene's Index storage.
>> : : What I want to know how to improve upon the exsisting  
>> architecture other than
>> : making number of fields in the lucene equalling to total number of
>> : feature*size of each feature.
>>
>> I suspect one of the reasons you haven't gotten much of a response  
>> yet is that people may not understand your problem statement -- I  
>> know nothing of Image Processing and even after googling "Color  
>> Histogram" I don't really understand how the examples you gave  
>> represent Color Histograms, or what it would mean to search on it  
>> with your example input.
>>
>> Perhaps you could describe in more detail what exactly some sample  
>> data looks like, why certian objects should match certain queries,  
>> (and just as importantly: why other objects shouldn't match, and  
>> give examples of one one object is a "better" match then another  
>> object for each example query.
>>
>> don't worry about Lucene Document/Field/QueryParse specifics --  
>> just explain the concepts you are dealing with.
>>
>>
>>
>> -Hoss
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>

--------------------------
Grant Ingersoll

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ







---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene Indexing structure

Posted by "Vaijanath N. Rao" <va...@aol.com>.

Hi Chris,

Sorry for the cross-posting and also for not making clear the problem. 
Let me try to explain the problem at my hand.

I am tying to write a CBIR (Content Based Image Reterival)  frame work 
using lucene. As each document have entities such as title, description, 
author and so on. I am decomposing each image and extracting features 
like color histogram, texture and other important attributes from every 
image and indexing it in lucene such a way that each of this attribute 
is a field. I convert the float values as string for every feature that 
I have extracted from the image.

While searching for similar image I extract the same set of features for 
the query Image and than query lucene to get all those images which have 
atleast one of the features, than I do the re-ranking according to the 
difference of the features. Once the re-ranking is done I submit the 
result.  

Here is where I need help, I need to know an optimal way to store the 
values, so that searching take less time and I don't have to re-ranking. 
Is there any way I can compare array of values rather than one value.  
What I essentially need is to get the query of type, give me all those 
features which are less than K distance from the current feature.

--Thanks and Regagrds
Vaijanath

Chris Hostetter wrote:
> : Hi Lucene-user and Lucene-dev,
>
> Please do not cross post -- java-user is the suitable place for your 
> question.
>
> : Obviously there is something wrong with the above approach (as to get the
> : correct document we need to get all the documents and than do the required
> : distance calculation), but that' due to lack of my knowledge of Luce and
> : lucene's Index storage.
> : 
> : What I want to know how to improve upon the exsisting architecture other than
> : making number of fields in the lucene equalling to total number of
> : feature*size of each feature.
>
> I suspect one of the reasons you haven't gotten much of a response yet is 
> that people may not understand your problem statement -- I know nothing of 
> Image Processing and even after googling "Color Histogram" I don't really 
> understand how the examples you gave represent Color Histograms, or what 
> it would mean to search on it with your example input.
>
> Perhaps you could describe in more detail what exactly some sample 
> data looks like, why certian objects should match certain queries, (and 
> just as importantly: why other objects shouldn't match, and give examples 
> of one one object is a "better" match then another object for each example 
> query.
>
> don't worry about Lucene Document/Field/QueryParse specifics -- just 
> explain the concepts you are dealing with.
>
>
>
> -Hoss
>
>   

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene Indexing structure

Posted by Chris Hostetter <ho...@fucit.org>.

: Hi Lucene-user and Lucene-dev,

Please do not cross post -- java-user is the suitable place for your 
question.

: Obviously there is something wrong with the above approach (as to get the
: correct document we need to get all the documents and than do the required
: distance calculation), but that' due to lack of my knowledge of Luce and
: lucene's Index storage.
: 
: What I want to know how to improve upon the exsisting architecture other than
: making number of fields in the lucene equalling to total number of
: feature*size of each feature.

I suspect one of the reasons you haven't gotten much of a response yet is 
that people may not understand your problem statement -- I know nothing of 
Image Processing and even after googling "Color Histogram" I don't really 
understand how the examples you gave represent Color Histograms, or what 
it would mean to search on it with your example input.

Perhaps you could describe in more detail what exactly some sample 
data looks like, why certian objects should match certain queries, (and 
just as importantly: why other objects shouldn't match, and give examples 
of one one object is a "better" match then another object for each example 
query.

don't worry about Lucene Document/Field/QueryParse specifics -- just 
explain the concepts you are dealing with.



-Hoss


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org

Re: Lucene Indexing structure

Posted by Glen Newton <gl...@gmail.com>.

Vaijanath,

I think I would do things in a different fashion:
Lucene default distance metric is based on tf/idf and the cosine
model, i.e. the frequencies of items. I believe the values that you
are adding as Fields are the values in n-space for each of these
image-based attributes. I don't believe Lucene's default ranking will
not work for this.

You need to alter Lucene so that it understands that the Fields you
are adding represent the n-space values and not tokens, and alter
Lucene so that it uses this n-space to determine distance.

I am not a Lucene internals expert, but I think you need to write a
custom Similarity[1] class for use in your IndexSearcher[2] and
IndexWriter[3] and I think you might need a custom analyser that
understands that you are putting in actual numbers, not tokens, that
you use when building the index as well as querying it.

There are probably things I am missing and there may be a better way
to do this....

-Glen

[1]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Similarity.html
[2]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)
[3]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html#getSimilarity()

2008/4/26 Vaijanath N. Rao <va...@aol.com>:
> Hi Lucene-user and Lucene-dev,
>
>  I want to use lucene as an backend for the Image search (Content based
> Image retrieval).
>
>  Indexing Mechanism:
>  a) Get the Image properties such as Texture Tamura (TT), Texture Edge
> Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
> Color Correlogram  (CC) .
>  b) Convert each of these vector into String and index into lucene as
> fields, thush each Image (document in terms of lucene) consist of 6 fields
> Image name, TT field, TE field, CCV field, CH field and CC field.
>
>  Searching Mechanism:
>  a) For the search Image convert the Image into the above 5 properties.
>  b) for every field and for every value within the field construct the
> query, For example let's say the user wants to search only Color histogram
> based similarity and the query Image has 3 1 4 5 as the CH value the query
> will look like.
>    query = "CH:3 CH:1CH:4 CH:5"
>  c) for the results returned convert all the field values back into float
> and do the distance computation and re-rank the document with lower the
> distance on the top and larger distance at the bottom.
>  for example:
>    For above query assume that output has two documents
>    with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
> the distance computation will rank the second document higher than the
> first.
>
>  Obviously there is something wrong with the above approach (as to get the
> correct document we need to get all the documents and than do the required
> distance calculation), but that' due to lack of my knowledge of Luce and
> lucene's Index storage.
>
>  What I want to know how to improve upon the exsisting architecture other
> than making number of fields in the lucene equalling to total number of
> feature*size of each feature.
>
>  Any other pointer will be welcomed. Is there is any Range tree
> implementation within lucene which I can use for this operation.
>
>  --Thanks and Regards
>  Vaijanath N. Rao
>
>  ---------------------------------------------------------------------
>  To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>  For additional commands, e-mail: java-user-help@lucene.apache.org
>
>



-- 

-

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org