You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by "Vaijanath N. Rao" <va...@aol.com> on 2008/04/26 11:01:45 UTC
Lucene Indexing structure
Hi Lucene-user and Lucene-dev,
I want to use lucene as an backend for the Image search (Content based
Image retrieval).
Indexing Mechanism:
a) Get the Image properties such as Texture Tamura (TT), Texture Edge
Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH)
and Color Correlogram (CC) .
b) Convert each of these vector into String and index into lucene as
fields, thush each Image (document in terms of lucene) consist of 6
fields Image name, TT field, TE field, CCV field, CH field and CC field.
Searching Mechanism:
a) For the search Image convert the Image into the above 5 properties.
b) for every field and for every value within the field construct the
query, For example let's say the user wants to search only Color
histogram based similarity and the query Image has 3 1 4 5 as the CH
value the query will look like.
query = "CH:3 CH:1CH:4 CH:5"
c) for the results returned convert all the field values back into float
and do the distance computation and re-rank the document with lower the
distance on the top and larger distance at the bottom.
for example:
For above query assume that output has two documents
with one having CH as "1 3 5 4" and other one having CH as " 3 1 5
4", so the distance computation will rank the second document higher
than the first.
Obviously there is something wrong with the above approach (as to get
the correct document we need to get all the documents and than do the
required distance calculation), but that' due to lack of my knowledge of
Luce and lucene's Index storage.
What I want to know how to improve upon the exsisting architecture other
than making number of fields in the lucene equalling to total number of
feature*size of each feature.
Any other pointer will be welcomed. Is there is any Range tree
implementation within lucene which I can use for this operation.
--Thanks and Regards
Vaijanath N. Rao
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene Indexing structure
Posted by Grant Ingersoll <gs...@apache.org>.
Would a Function Query (ValueSourceQuery, see the
org.apache.lucene.search.function package) work in this case?
-Grant
On May 4, 2008, at 9:35 AM, Vaijanath N. Rao wrote:
> Hi Chris,
>
> Sorry for the cross-posting and also for not making clear the
> problem. Let me try to explain the problem at my hand.
>
> I am tying to write a CBIR (Content Based Image Reterival) frame
> work using lucene. As each document have entities such as title,
> description, author and so on. I am decomposing each image and
> extracting features like color histogram, texture and other
> important attributes from every image and indexing it in lucene such
> a way that each of this attribute is a field. I convert the float
> values as string for every feature that I have extracted from the
> image.
>
> While searching for similar image I extract the same set of features
> for the query Image and than query lucene to get all those images
> which have atleast one of the features, than I do the re-ranking
> according to the difference of the features. Once the re-ranking is
> done I submit the result.
> Here is where I need help, I need to know an optimal way to store
> the values, so that searching take less time and I don't have to re-
> ranking. Is there any way I can compare array of values rather than
> one value. What I essentially need is to get the query of type,
> give me all those features which are less than K distance from the
> current feature.
>
> --Thanks and Regagrds
> Vaijanath
>
> Chris Hostetter wrote:
>> : Hi Lucene-user and Lucene-dev,
>>
>> Please do not cross post -- java-user is the suitable place for
>> your question.
>>
>> : Obviously there is something wrong with the above approach (as to
>> get the
>> : correct document we need to get all the documents and than do the
>> required
>> : distance calculation), but that' due to lack of my knowledge of
>> Luce and
>> : lucene's Index storage.
>> : : What I want to know how to improve upon the exsisting
>> architecture other than
>> : making number of fields in the lucene equalling to total number of
>> : feature*size of each feature.
>>
>> I suspect one of the reasons you haven't gotten much of a response
>> yet is that people may not understand your problem statement -- I
>> know nothing of Image Processing and even after googling "Color
>> Histogram" I don't really understand how the examples you gave
>> represent Color Histograms, or what it would mean to search on it
>> with your example input.
>>
>> Perhaps you could describe in more detail what exactly some sample
>> data looks like, why certian objects should match certain queries,
>> (and just as importantly: why other objects shouldn't match, and
>> give examples of one one object is a "better" match then another
>> object for each example query.
>>
>> don't worry about Lucene Document/Field/QueryParse specifics --
>> just explain the concepts you are dealing with.
>>
>>
>>
>> -Hoss
>>
>>
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
--------------------------
Grant Ingersoll
Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene Indexing structure
Posted by "Vaijanath N. Rao" <va...@aol.com>.
Hi Chris,
Sorry for the cross-posting and also for not making clear the problem.
Let me try to explain the problem at my hand.
I am tying to write a CBIR (Content Based Image Reterival) frame work
using lucene. As each document have entities such as title, description,
author and so on. I am decomposing each image and extracting features
like color histogram, texture and other important attributes from every
image and indexing it in lucene such a way that each of this attribute
is a field. I convert the float values as string for every feature that
I have extracted from the image.
While searching for similar image I extract the same set of features for
the query Image and than query lucene to get all those images which have
atleast one of the features, than I do the re-ranking according to the
difference of the features. Once the re-ranking is done I submit the
result.
Here is where I need help, I need to know an optimal way to store the
values, so that searching take less time and I don't have to re-ranking.
Is there any way I can compare array of values rather than one value.
What I essentially need is to get the query of type, give me all those
features which are less than K distance from the current feature.
--Thanks and Regagrds
Vaijanath
Chris Hostetter wrote:
> : Hi Lucene-user and Lucene-dev,
>
> Please do not cross post -- java-user is the suitable place for your
> question.
>
> : Obviously there is something wrong with the above approach (as to get the
> : correct document we need to get all the documents and than do the required
> : distance calculation), but that' due to lack of my knowledge of Luce and
> : lucene's Index storage.
> :
> : What I want to know how to improve upon the exsisting architecture other than
> : making number of fields in the lucene equalling to total number of
> : feature*size of each feature.
>
> I suspect one of the reasons you haven't gotten much of a response yet is
> that people may not understand your problem statement -- I know nothing of
> Image Processing and even after googling "Color Histogram" I don't really
> understand how the examples you gave represent Color Histograms, or what
> it would mean to search on it with your example input.
>
> Perhaps you could describe in more detail what exactly some sample
> data looks like, why certian objects should match certain queries, (and
> just as importantly: why other objects shouldn't match, and give examples
> of one one object is a "better" match then another object for each example
> query.
>
> don't worry about Lucene Document/Field/QueryParse specifics -- just
> explain the concepts you are dealing with.
>
>
>
> -Hoss
>
>
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene Indexing structure
Posted by Chris Hostetter <ho...@fucit.org>.
: Hi Lucene-user and Lucene-dev,
Please do not cross post -- java-user is the suitable place for your
question.
: Obviously there is something wrong with the above approach (as to get the
: correct document we need to get all the documents and than do the required
: distance calculation), but that' due to lack of my knowledge of Luce and
: lucene's Index storage.
:
: What I want to know how to improve upon the exsisting architecture other than
: making number of fields in the lucene equalling to total number of
: feature*size of each feature.
I suspect one of the reasons you haven't gotten much of a response yet is
that people may not understand your problem statement -- I know nothing of
Image Processing and even after googling "Color Histogram" I don't really
understand how the examples you gave represent Color Histograms, or what
it would mean to search on it with your example input.
Perhaps you could describe in more detail what exactly some sample
data looks like, why certian objects should match certain queries, (and
just as importantly: why other objects shouldn't match, and give examples
of one one object is a "better" match then another object for each example
query.
don't worry about Lucene Document/Field/QueryParse specifics -- just
explain the concepts you are dealing with.
-Hoss
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org
Re: Lucene Indexing structure
Posted by Glen Newton <gl...@gmail.com>.
Vaijanath,
I think I would do things in a different fashion:
Lucene default distance metric is based on tf/idf and the cosine
model, i.e. the frequencies of items. I believe the values that you
are adding as Fields are the values in n-space for each of these
image-based attributes. I don't believe Lucene's default ranking will
not work for this.
You need to alter Lucene so that it understands that the Fields you
are adding represent the n-space values and not tokens, and alter
Lucene so that it uses this n-space to determine distance.
I am not a Lucene internals expert, but I think you need to write a
custom Similarity[1] class for use in your IndexSearcher[2] and
IndexWriter[3] and I think you might need a custom analyser that
understands that you are putting in actual numbers, not tokens, that
you use when building the index as well as querying it.
There are probably things I am missing and there may be a better way
to do this....
-Glen
[1]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Similarity.html
[2]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/search/Searcher.html#setSimilarity(org.apache.lucene.search.Similarity)
[3]http://lucene.apache.org/java/2_2_0/api/org/apache/lucene/index/IndexWriter.html#getSimilarity()
2008/4/26 Vaijanath N. Rao <va...@aol.com>:
> Hi Lucene-user and Lucene-dev,
>
> I want to use lucene as an backend for the Image search (Content based
> Image retrieval).
>
> Indexing Mechanism:
> a) Get the Image properties such as Texture Tamura (TT), Texture Edge
> Histogram (TE), Color Coherence Vector (CCV) and Color Histogram (CH) and
> Color Correlogram (CC) .
> b) Convert each of these vector into String and index into lucene as
> fields, thush each Image (document in terms of lucene) consist of 6 fields
> Image name, TT field, TE field, CCV field, CH field and CC field.
>
> Searching Mechanism:
> a) For the search Image convert the Image into the above 5 properties.
> b) for every field and for every value within the field construct the
> query, For example let's say the user wants to search only Color histogram
> based similarity and the query Image has 3 1 4 5 as the CH value the query
> will look like.
> query = "CH:3 CH:1CH:4 CH:5"
> c) for the results returned convert all the field values back into float
> and do the distance computation and re-rank the document with lower the
> distance on the top and larger distance at the bottom.
> for example:
> For above query assume that output has two documents
> with one having CH as "1 3 5 4" and other one having CH as " 3 1 5 4", so
> the distance computation will rank the second document higher than the
> first.
>
> Obviously there is something wrong with the above approach (as to get the
> correct document we need to get all the documents and than do the required
> distance calculation), but that' due to lack of my knowledge of Luce and
> lucene's Index storage.
>
> What I want to know how to improve upon the exsisting architecture other
> than making number of fields in the lucene equalling to total number of
> feature*size of each feature.
>
> Any other pointer will be welcomed. Is there is any Range tree
> implementation within lucene which I can use for this operation.
>
> --Thanks and Regards
> Vaijanath N. Rao
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>
--
-
---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org