You are viewing a plain text version of this content. The canonical link for it is here.
Posted to java-user@lucene.apache.org by MilleBii <mi...@gmail.com> on 2009/07/03 21:32:10 UTC

Storing a serialized object ?

I want to store in the index a data structure and load it back at search
time.

Is it safe to serialize the java object store it and load it back later ?
Presumably I need to store it binary, right ?

Otherwise I need to create my own store & load methods, waste of time.

-- 
-MilleBii-

Re: Storing a serialized object ?

Posted by MilleBii <mi...@gmail.com>.
Well,

During indexing phase (I'm actually running Nutch), I'm also extracting data
about my pages including some text fragments.
So I'd like to store the resulting objects in lucene index, and reload them
at search time for further manipulation.
I was wondering which way was the simplest.


2009/7/4 Simon Willnauer <si...@googlemail.com>

> Hi there,
>
> On Fri, Jul 3, 2009 at 9:32 PM, MilleBii<mi...@gmail.com> wrote:
> > I want to store in the index a data structure and load it back at search
> > time.
> >
> > Is it safe to serialize the java object store it and load it back later ?
> It won't be particularly fast nor efficient but it is gonna work.
> > Presumably I need to store it binary, right ?
> That is one way, or you do it base64 encoded in a text field if don't
> care about space at all. :)
> I agree with Eric, you should explain your usecase a little more to
> get a more detailed answer if it make sense or not.
>
> simon
> >
> > Otherwise I need to create my own store & load methods, waste of time.
> >
> > --
> > -MilleBii-
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-MilleBii-

Re: Storing a serialized object ?

Posted by MilleBii <mi...@gmail.com>.
OK thanks for the tip on Java object serialization performance.
Most of what I have to store/retrieve is straightforward so I can do it by
hand.
What pushed me on object serialization is that I want to store/retrieve text
fragment of undefined content.


2009/7/4 Simon Willnauer <si...@googlemail.com>

> On Sat, Jul 4, 2009 at 10:15 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
> >> That is one way, or you do it base64 encoded in a text field if don't
> >> care about space at all. :)
> just for clarification:
> one way Java Object Serialization - is not efficient at all It takes a
> lot of space and performance is crap.
> other way BASE64 encoded - might take even more space and time but
> uses string field
>
> >
> > Lucene also have binary fields for storing. Searching on such fields does
> > not make sense, so its ok to not be able to index them (how should that
> > work).
> >
> > I have this use case, too. Sometimes it is senseful to store arbitrary
> > objects as stored fields in the index and use then e.g. when displaying
> > search results.
> This usecase is totally valid I just doubt that storing a java object
> in there make a lot of sense (By using Java Object Serialization) as
> it is so damn slow. Many efficient serialization methods are around to
> do that way faster in a compact way.
>
> simon
> >
> > Uwe
> >
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-MilleBii-

Re: Storing a serialized object ?

Posted by Simon Willnauer <si...@googlemail.com>.
On Sat, Jul 4, 2009 at 10:15 AM, Uwe Schindler<uw...@thetaphi.de> wrote:
>> That is one way, or you do it base64 encoded in a text field if don't
>> care about space at all. :)
just for clarification:
one way Java Object Serialization - is not efficient at all It takes a
lot of space and performance is crap.
other way BASE64 encoded - might take even more space and time but
uses string field

>
> Lucene also have binary fields for storing. Searching on such fields does
> not make sense, so its ok to not be able to index them (how should that
> work).
>
> I have this use case, too. Sometimes it is senseful to store arbitrary
> objects as stored fields in the index and use then e.g. when displaying
> search results.
This usecase is totally valid I just doubt that storing a java object
in there make a lot of sense (By using Java Object Serialization) as
it is so damn slow. Many efficient serialization methods are around to
do that way faster in a compact way.

simon
>
> Uwe
>
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing a serialized object ?

Posted by MilleBii <mi...@gmail.com>.
OK, thx guys. I see the different options more clear now.

2009/7/4 Uwe Schindler <uw...@thetaphi.de>

> Then see my other mail about Java Serialization. It works (but not so
> fast),
> but is the simpliest way to do it.
>
> I do not use the serialized fields during searching, I store them only for
> usage in some special maintenance tasks on the indexed documents. So it's
> the same use-case.
>
> For this use case serialization speed is enough.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: MilleBii [mailto:millebii@gmail.com]
> > Sent: Saturday, July 04, 2009 10:26 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Storing a serialized object ?
> >
> > Right I'm not indexing such fields, they are actually a kind of document
> > property of my own
> >
> > 2009/7/4 Uwe Schindler <uw...@thetaphi.de>
> >
> > > > That is one way, or you do it base64 encoded in a text field if don't
> > > > care about space at all. :)
> > >
> > > Lucene also have binary fields for storing. Searching on such fields
> > does
> > > not make sense, so its ok to not be able to index them (how should that
> > > work).
> > >
> > > I have this use case, too. Sometimes it is senseful to store arbitrary
> > > objects as stored fields in the index and use then e.g. when displaying
> > > search results.
> > >
> > > Uwe
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > -MilleBii-
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-MilleBii-

Re: Storing a serialized object ?

Posted by MilleBii <mi...@gmail.com>.
OK, thx guys. I see the different options more clear now.

2009/7/4 Uwe Schindler <uw...@thetaphi.de>

> Then see my other mail about Java Serialization. It works (but not so
> fast),
> but is the simpliest way to do it.
>
> I do not use the serialized fields during searching, I store them only for
> usage in some special maintenance tasks on the indexed documents. So it's
> the same use-case.
>
> For this use case serialization speed is enough.
>
> -----
> Uwe Schindler
> H.-H.-Meier-Allee 63, D-28213 Bremen
> http://www.thetaphi.de
> eMail: uwe@thetaphi.de
>
> > -----Original Message-----
> > From: MilleBii [mailto:millebii@gmail.com]
> > Sent: Saturday, July 04, 2009 10:26 AM
> > To: java-user@lucene.apache.org
> > Subject: Re: Storing a serialized object ?
> >
> > Right I'm not indexing such fields, they are actually a kind of document
> > property of my own
> >
> > 2009/7/4 Uwe Schindler <uw...@thetaphi.de>
> >
> > > > That is one way, or you do it base64 encoded in a text field if don't
> > > > care about space at all. :)
> > >
> > > Lucene also have binary fields for storing. Searching on such fields
> > does
> > > not make sense, so its ok to not be able to index them (how should that
> > > work).
> > >
> > > I have this use case, too. Sometimes it is senseful to store arbitrary
> > > objects as stored fields in the index and use then e.g. when displaying
> > > search results.
> > >
> > > Uwe
> > >
> > >
> > > ---------------------------------------------------------------------
> > > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > > For additional commands, e-mail: java-user-help@lucene.apache.org
> > >
> > >
> >
> >
> > --
> > -MilleBii-
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-MilleBii-

RE: Storing a serialized object ?

Posted by Uwe Schindler <uw...@thetaphi.de>.
Then see my other mail about Java Serialization. It works (but not so fast),
but is the simpliest way to do it.

I do not use the serialized fields during searching, I store them only for
usage in some special maintenance tasks on the indexed documents. So it's
the same use-case.

For this use case serialization speed is enough.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: MilleBii [mailto:millebii@gmail.com]
> Sent: Saturday, July 04, 2009 10:26 AM
> To: java-user@lucene.apache.org
> Subject: Re: Storing a serialized object ?
> 
> Right I'm not indexing such fields, they are actually a kind of document
> property of my own
> 
> 2009/7/4 Uwe Schindler <uw...@thetaphi.de>
> 
> > > That is one way, or you do it base64 encoded in a text field if don't
> > > care about space at all. :)
> >
> > Lucene also have binary fields for storing. Searching on such fields
> does
> > not make sense, so its ok to not be able to index them (how should that
> > work).
> >
> > I have this use case, too. Sometimes it is senseful to store arbitrary
> > objects as stored fields in the index and use then e.g. when displaying
> > search results.
> >
> > Uwe
> >
> >
> > ---------------------------------------------------------------------
> > To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> > For additional commands, e-mail: java-user-help@lucene.apache.org
> >
> >
> 
> 
> --
> -MilleBii-


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing a serialized object ?

Posted by MilleBii <mi...@gmail.com>.
Right I'm not indexing such fields, they are actually a kind of document
property of my own

2009/7/4 Uwe Schindler <uw...@thetaphi.de>

> > That is one way, or you do it base64 encoded in a text field if don't
> > care about space at all. :)
>
> Lucene also have binary fields for storing. Searching on such fields does
> not make sense, so its ok to not be able to index them (how should that
> work).
>
> I have this use case, too. Sometimes it is senseful to store arbitrary
> objects as stored fields in the index and use then e.g. when displaying
> search results.
>
> Uwe
>
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-MilleBii-

RE: Storing a serialized object ?

Posted by Uwe Schindler <uw...@thetaphi.de>.
> That is one way, or you do it base64 encoded in a text field if don't
> care about space at all. :)

Lucene also have binary fields for storing. Searching on such fields does
not make sense, so its ok to not be able to index them (how should that
work).

I have this use case, too. Sometimes it is senseful to store arbitrary
objects as stored fields in the index and use then e.g. when displaying
search results.

Uwe


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing a serialized object ?

Posted by MilleBii <mi...@gmail.com>.
Well,

During indexing phase (I'm actually running Nutch), I'm also extracting data
about my pages including some text fragments.
So I'd like to store the resulting objects in lucene index, and reload them
at search time for further manipulation.
I was wondering which way was the simplest.


2009/7/4 Simon Willnauer <si...@googlemail.com>

> Hi there,
>
> On Fri, Jul 3, 2009 at 9:32 PM, MilleBii<mi...@gmail.com> wrote:
> > I want to store in the index a data structure and load it back at search
> > time.
> >
> > Is it safe to serialize the java object store it and load it back later ?
> It won't be particularly fast nor efficient but it is gonna work.
> > Presumably I need to store it binary, right ?
> That is one way, or you do it base64 encoded in a text field if don't
> care about space at all. :)
> I agree with Eric, you should explain your usecase a little more to
> get a more detailed answer if it make sense or not.
>
> simon
> >
> > Otherwise I need to create my own store & load methods, waste of time.
> >
> > --
> > -MilleBii-
> >
>
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
> For additional commands, e-mail: java-user-help@lucene.apache.org
>
>


-- 
-MilleBii-

Re: Storing a serialized object ?

Posted by Simon Willnauer <si...@googlemail.com>.
Hi there,

On Fri, Jul 3, 2009 at 9:32 PM, MilleBii<mi...@gmail.com> wrote:
> I want to store in the index a data structure and load it back at search
> time.
>
> Is it safe to serialize the java object store it and load it back later ?
It won't be particularly fast nor efficient but it is gonna work.
> Presumably I need to store it binary, right ?
That is one way, or you do it base64 encoded in a text field if don't
care about space at all. :)
I agree with Eric, you should explain your usecase a little more to
get a more detailed answer if it make sense or not.

simon
>
> Otherwise I need to create my own store & load methods, waste of time.
>
> --
> -MilleBii-
>

---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org


Re: Storing a serialized object ?

Posted by Amin Mohammed-Coleman <am...@gmail.com>.
Hi
I think you might want to look at Hibernate Search.  You can use projections
which basically store instance fields in the index.  It does not store the
object in a serialised form in the index.  It holds a reference (id) to the
persistant entity.


Cheers
Amin

On Sat, Jul 4, 2009 at 2:39 AM, Erick Erickson <er...@gmail.com>wrote:

> Hmmmm. I'm having trouble understanding what you want
> to accomplish and why you think storing a java object is appropriate
> to do in a Lucene index.
>
> Perhaps you could expand on your use case here.
>
> Best
> Erick
>
> On Fri, Jul 3, 2009 at 3:32 PM, MilleBii <mi...@gmail.com> wrote:
>
> > I want to store in the index a data structure and load it back at search
> > time.
> >
> > Is it safe to serialize the java object store it and load it back later ?
> > Presumably I need to store it binary, right ?
> >
> > Otherwise I need to create my own store & load methods, waste of time.
> >
> > --
> > -MilleBii-
> >
>

Re: Storing a serialized object ?

Posted by Erick Erickson <er...@gmail.com>.
Hmmmm. I'm having trouble understanding what you want
to accomplish and why you think storing a java object is appropriate
to do in a Lucene index.

Perhaps you could expand on your use case here.

Best
Erick

On Fri, Jul 3, 2009 at 3:32 PM, MilleBii <mi...@gmail.com> wrote:

> I want to store in the index a data structure and load it back at search
> time.
>
> Is it safe to serialize the java object store it and load it back later ?
> Presumably I need to store it binary, right ?
>
> Otherwise I need to create my own store & load methods, waste of time.
>
> --
> -MilleBii-
>

RE: Storing a serialized object ?

Posted by Uwe Schindler <uw...@thetaphi.de>.
You can add a serialized object easily as a stored field to a document, just
serialize the object to an byte[] array and store this in the index, e.g.:

ByteArrayOutputStream serData=new ByteArrayOutputStream();
ObjectOutputStream out=new ObjectOutputStream(serData);
try {
	out.writeObject(dataStringContents);
} finally  {
	out.close();
	serData.close();
}
doc.add(new Field("fieldname", serData.toByteArray(),
Field.Store.COMPRESS));

When have done a Lucene search, you can retrieve the object like this from
an Document instance:

byte[] serData=ldoc.getBinaryValue("fieldname");
if (serData!=null) {
	ObjectInputStream in=new ObjectInputStream(new
ByteArrayInputStream(serData));
	try {
		bla = in.readObject();
	} finally  {
		in.close();
	}
}

But this is only stored content, you cannot search inside the object,
because it is a) stored and b) Lucene does not know what terms are in it.

-----
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: MilleBii [mailto:millebii@gmail.com]
> Sent: Friday, July 03, 2009 9:32 PM
> To: java-user@lucene.apache.org
> Subject: Storing a serialized object ?
> 
> I want to store in the index a data structure and load it back at search
> time.
> 
> Is it safe to serialize the java object store it and load it back later ?
> Presumably I need to store it binary, right ?
> 
> Otherwise I need to create my own store & load methods, waste of time.
> 
> --
> -MilleBii-


---------------------------------------------------------------------
To unsubscribe, e-mail: java-user-unsubscribe@lucene.apache.org
For additional commands, e-mail: java-user-help@lucene.apache.org