You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by Bernd Mueller <be...@scai.fhg.de> on 2008/06/10 14:13:23 UTC

indexing images of a document

Hello,

I have XML-documents containing image information. These images should 
be indexed with the document by having one additional field with the 
image stuff. Could anyone please give me some hints how I can manage this?

Regards,
Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: indexing images of a document

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Bernd,

On 06/10/2008 at 2:40 PM, Bernd Mueller wrote:
> Actually it is not about searching the binary data. I just
> wanna have the whole document (together with the "image stuff")
> to be stored in the index. So when I got a hit of a search on a
> document that I can extract the document with its images (and
> the images' meta information).
> 
> Nice would be some data structure to be stored in the
> "image"-field of the document (maybe a HashSet?). So I
> could extract the found documents with all their image information
> from the index for easy visualization purpose of the search results.

Lucene field values are flat - the kind of complex object serialization and de-serialization you're talking about is not directly supported.

That said, if you manage the packing and unpacking yourself, Lucene can help.  For example, if you serialize the image data and its metadata as a YAML file, with the binary data encoded as Base64 or something similar, you should be able to store and retrieve it as a String.  Or maybe you could store a .jar file in a binary field - that would probably be simplest.

Steve

> Steven A Rowe wrote:
> > Hi Bernd,
> > 
> > It's still not clear what you want to do.  What will a search look like?
> > 
> > On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
> > > I will try to explain what I mean with image stuff. An image in
> > > xml-documents is usually an url to the location where the image is
> > > stored. Additionally, such an image tag contains information like
> > > offset, width and height. I am thinking about storing the images
> > > (binary data) and meta-information (offset, width, height) in one
> > > field instead of using these tagging information as values for a field
> > > in the index.
> > 
> > You seem to be saying that you want to store the binary image along
> > with its metadata all in one field - is this correct?  Why do you want
> > to do this?
> > 
> > > I guess, your last statement "Lucene doesn't index binary data..."
> > > indicates that this isn't possible...
> > 
> > While Lucene will not *index* binary data, it can *store* it.
> > 
> > Steve
> > 
> > > Erick Erickson wrote:
> > > > You add as many fields to the document while indexing as you need to
> > > > correctly contain your "image stuff".
> > > > 
> > > > If this answer seems cryptic, it's as clear as your problem statement
> > > > <G>. To give you a meaningful answer, we need a much clearer problem
> > > > statement. What is "image stuff", what format is it in, and what do
> > > > you want to do with it?
> > > > 
> > > > Lucene doesn't index binary data...
> > > > 
> > > > Best
> > > > Erick
> > > > 
> > > > On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
> > > > <be...@scai.fhg.de> wrote:
> > > > 
> > > > > Hello,
> > > > > 
> > > > > I have XML-documents containing image information. These images
> > > > > should be indexed with the document by having one additional field
> > > > > with the image stuff. Could anyone please give me some hints how I
> > > > > can manage this?
> > > > > 
> > > > > Regards,
> > > > > Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: indexing images of a document

Posted by Bernd Mueller <be...@scai.fraunhofer.de>.
Hi Steve,


Actually it is not about searching the binary data. I just wanna have the
whole document (together with the "image stuff") to be stored in the
index. So when I got a hit of a search on a document that I can extract
the document with its images (and the images' meta information).

Nice would be some data structure to be stored in the "image"-field of the
document (maybe a HashSet?). So I could extract the found documents with
all their image information from the index for easy visualization purpose
of the search results.


Regards,

Bernd


Steven A Rowe wrote:
> Hi Bernd,
>
> It's still not clear what you want to do.  What will a search look like?
>
> On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
>> I will try to explain what I mean with image stuff. An image in
>> xml-documents is usually an url to the location where the image is
>> stored. Additionally, such an image tag contains information like
>> offset, width and height. I am thinking about storing the
>> images (binary data) and meta-information (offset, width, height) in one
>> field instead of using these tagging information as values for a field
>> in the index.
>
> You seem to be saying that you want to store the binary image along with
> its metadata all in one field - is this correct?  Why do you want to do
> this?
>
>> I guess, your last statement "Lucene doesn't index binary data..."
>> indicates that this isn't possible...
>
> While Lucene will not *index* binary data, it can *store* it.
>
> Steve
>
>> Erick Erickson wrote:
>> > You add as many fields to the document while indexing as you need to
>> > correctly contain your "image stuff".
>> >
>> > If this answer seems cryptic, it's as clear as your problem statement
>> > <G>. To give you a meaningful answer, we need a much clearer
>> > problem statement. What is "image stuff", what format is it in, and
>> > what do you want to do with it?
>> >
>> > Lucene doesn't index binary data...
>> >
>> > Best
>> > Erick
>> >
>> > On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
>> > <be...@scai.fhg.de> wrote:
>> >
>> > > Hello,
>> > >
>> > > I have XML-documents containing image information. These images
>> should
>> > > be indexed with the document by having one additional field with the
>> > > image stuff. Could anyone please give me some hints how I can manage
>> > > this?
>> > >
>> > > Regards,
>> > > Bernd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
Diplom Informatiker (FH)
Bernd Mueller
Fraunhofer SCAI
Schloß Birlinghoven
53754 Sankt Augustin

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


RE: indexing images of a document

Posted by Steven A Rowe <sa...@syr.edu>.
Hi Bernd,

It's still not clear what you want to do.  What will a search look like?

On 06/10/2008 at 8:36 AM, Bernd Mueller wrote:
> I will try to explain what I mean with image stuff. An image in
> xml-documents is usually an url to the location where the image is
> stored. Additionally, such an image tag contains information like
> offset, width and height. I am thinking about storing the
> images (binary data) and meta-information (offset, width, height) in one
> field instead of using these tagging information as values for a field
> in the index.

You seem to be saying that you want to store the binary image along with its metadata all in one field - is this correct?  Why do you want to do this?

> I guess, your last statement "Lucene doesn't index binary data..."
> indicates that this isn't possible...

While Lucene will not *index* binary data, it can *store* it.

Steve

> Erick Erickson wrote:
> > You add as many fields to the document while indexing as you need to
> > correctly contain your "image stuff".
> > 
> > If this answer seems cryptic, it's as clear as your problem statement
> > <G>. To give you a meaningful answer, we need a much clearer
> > problem statement. What is "image stuff", what format is it in, and
> > what do you want to do with it?
> > 
> > Lucene doesn't index binary data...
> > 
> > Best
> > Erick
> > 
> > On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller
> > <be...@scai.fhg.de> wrote:
> > 
> > > Hello,
> > > 
> > > I have XML-documents containing image information. These images should
> > > be indexed with the document by having one additional field with the
> > > image stuff. Could anyone please give me some hints how I can manage
> > > this?
> > > 
> > > Regards,
> > > Bernd

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexing images of a document

Posted by Bernd Mueller <be...@scai.fhg.de>.
Hi Erick,


Thanks for your fast reply.

I will try to explain what I mean with image stuff. An image in 
xml-documents is usually an url to the location where the image is 
stored. Additionally, such an image tag contains information like 
offset, width and height. I am thinking about storing the images (binary 
data) and meta-information (offset, width, height) in one field instead 
of using these tagging information as values for a field in the index.

I guess, your last statement "Lucene doesn't index binary data..." 
indicates that this isn't possible...


Regards,

Bernd


Erick Erickson wrote:

>You add as many fields to the document while indexing as you need to
>correctly contain your "image stuff".
>
>If this answer seems cryptic, it's as clear as your problem statement
><G>. To give you a meaningful answer, we need a much clearer
>problem statement. What is "image stuff", what format is it in, and
>what do you want to do with it?
>
>Lucene doesn't index binary data...
>
>Best
>Erick
>
>On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller <be...@scai.fhg.de>
>wrote:
>
>  
>
>>Hello,
>>
>>I have XML-documents containing image information. These images should be
>>indexed with the document by having one additional field with the image
>>stuff. Could anyone please give me some hints how I can manage this?
>>
>>Regards,
>>Bernd
>>
>>---------------------------------------------------------------------
>>To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
>>For additional commands, e-mail: java-user-help@lucene.apache.org
>>
>>
>>    
>>
>
>  
>


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: indexing images of a document

Posted by Erick Erickson <er...@gmail.com>.
You add as many fields to the document while indexing as you need to
correctly contain your "image stuff".

If this answer seems cryptic, it's as clear as your problem statement
<G>. To give you a meaningful answer, we need a much clearer
problem statement. What is "image stuff", what format is it in, and
what do you want to do with it?

Lucene doesn't index binary data...

Best
Erick

On Tue, Jun 10, 2008 at 8:13 AM, Bernd Mueller <be...@scai.fhg.de>
wrote:

> Hello,
>
> I have XML-documents containing image information. These images should be
> indexed with the document by having one additional field with the image
> stuff. Could anyone please give me some hints how I can manage this?
>
> Regards,
> Bernd
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>