You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@directory.apache.org by Ole Ersoy <ol...@yahoo.com> on 2006/09/08 15:03:12 UTC

Streaming / Serializing Big Objects

I accidentally deleted the original message...

The myfaces file upload component can be configured to
serialize objects larger than a specified size.

If that sounds useful, I can extract some code...

Cheers,
- Ole

__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Streaming / Serializing Big Objects

Posted by Ole Ersoy <ol...@yahoo.com>.

Cool - 

OK suppose we had a StateManager.

The StateManager 
has a decode method on it that reads a persistent file
and creates the directory tree.

The StateManager's encode method uses a list of
references to directory tree objects
creating a concatenated String of the string
representation of all these objects, and then writes
the string to a file, once all the concatenation is
done.

Am I getting any warmer?

I read a little about prevayler.  It just serializes
all the java objects that need to be peristed
immidiately as it becomes aware of them, I think, and
then keeps them updated as the objects mutate.  So if
the application crashes, on reboot it will read the
persistant files and be back up.  To make reboot more
efficient, the persistant files can be managed like I
described above with the StateManager on a clean
shutdown, which I think is what you are describing.

The reason I mention this is because as the directory
tree mutates, we would not want to persist the entire
tree per mutation right?  So we would have to either
use relational persistance, or write a single file
just containing the mutation.  

That would mean we are in more of an rsync like mode,
where if the server crashes, we load the original
directory tree file + any mutation files.

If the directory shuts down cleanly we encode all the
directory objects to one file and delete all the
"temporary" mutation files.

Incidentally EMF can be used for any type of
serialization, a concatenated file like the one I just
described, xml, relational persistance, etc.  One of
the benefits of EMF is that if for whatever reason
someone wanted to serialize to XML, implementing a
function to do so would be very straight forward.  If
someone wanted to serialize to a relational source,
that's easy too.

There's also the EMF Technology projects's Object
Constraint Language can be used to query the EMF
model...and  I would think it would be very useful for
creating directory like queries and coding the query
api.

There's an article on the eclipse site just written on
how to use it.

Cheers,
- Ole


--- Emmanuel Lecharny <el...@gmail.com> wrote:

> Ole,
> 
> just keep in mind that we are talking of byte[] or
> String, not complex Java
> objects :)
> 
> What we need is a simple mechanism that will allow
> the server to stream thos
> two kind of objects. The main issue, if we stream to
> disk, is to avoid
> zillions of small files to be created. We need a
> storage which will be able
> to store those blobs into a single file, even if
> it's 10 Gb large.
> 
> An other point is that we can't do XML : it's
> overkilling. You will have
> structures like :
> <jpegPhoto name="MyFace.jpg">
>   Ar45tYU...Rt==  (2Mbytes of base64 data)
> </jpegPhoto>
> 
> Don't over(ab)use XML ;)
> 
> (ok, I know : compared to the disk access, it's ate
> least 2 order of
> magnitude faster, but the less CPU we eat, the more
> can be used by other
> threads).
> 
> Any idea is welcome, and ma be we can start a page
> on confluence with those
> ideas. Atm, we are just in a
> 
> Emmanuel.
> 
> On 9/8/06, Ole Ersoy <ol...@yahoo.com> wrote:
> >
> >
> > 1-Decoder
> > So if the decoded request request object is above
> the
> > configured threshold, then ADS would need to
> persist
> > it per the configured persitance
> mechanism(Prevayler,
> > ...), otherwise we store it in memory.
> >
> > The myfaces upload component looks at it's size
> > threshold and serializes the uploaded file if it's
> > above the specified threshold.  I'm sure it's just
> > uses Java serialization straight up, but the
> component
> > can be hooked up to any integration/persistance
> layer
> > naturally.
> >
> > Suppose the whole directly tree was stored using
> the
> > Eclipse EMF API.
> >
> > The the decoder would map the request object
> directly
> > to a EMF object, and EMF's persistance mechanism
> could
> > be invoked to persist to xml, straight up object
> > serialization, the Service Data Object API could
> be
> > invoked to serialize to databases, etc.  Web
> Services
> > could be invoked, it's a pretty sexy API, with a
> lot
> > of possibilities.
> >
> > When it comes to streaming images, resources, etc.
> I
> > would think the tomcat API's should be really good
> for
> > that....
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> >
> > --- Emmanuel Lecharny < elecharny@gmail.com>
> wrote:
> >
> > > Here is what we have to do to stream large
> objects :
> > >
> > > 1- Decoder :
> > > When we read the user request, we decode it from
> > > ASN.1 BER to a byte[] or to
> > > a String, depending of the object Type. But
> > > basically, we get a byte[].
> > > Whatever, we have two concerns :
> > >  A- if the length of this object - which is
> always
> > > known- is above a certain
> > > size (let say 1K), then we must store the object
> > > somwhere else than in
> > > memory. To do so, we must have a storage which
> can
> > > handle Strings, byte[]
> > > and StreamedObject[]. This has an impact on all
> > > messages (we can't just
> > > work on some attributes, we have to be generic).
> So
> > > this is a huge
> > > refactoring, with accessors for those objects,
> and
> > > especially a Stream.read()
> > > accessor.
> > >  B- If we have to store a String (even a big
> one),
> > > we have to convert the
> > > byte[] to a String. If the String is big, then
> we
> > > must find a way to apply
> > > the byte[] -> String UTF8 conversion from a
> stream,
> > > and stream back the
> > > result. Not so easy ...
> > >
> > > 2- Database storage :
> > > Well, we now have decoded a request, and we have
> to
> > > store the value. The
> > > backend is not Stream ready at all. It should be
> > > able to handme a Stream and
> > > stores data without having to allocate a huge
> bunch
> > > of byte[].
> > > Another problem is the other operation : we read
> an
> > > entry from the backend,
> > > and we want a streamed data to remain streamed.
> > > Again, huge modification.
> > >
> > > 3- Encoder :
> > > Now, let suppose that we successfully get some
> data
> > > from the backend, and
> > > let's suppose that those data are streamed. We
> want
> > > to send them back to the
> > > client without having to create a big byte[].
> That
> > > means we must be able to
> > > ask MINA to send chunks of data until we are
> done
> > > with the streamed data.
> > > ATM, what we do is that we write a full PDU -
> result
> > > of the encode() method
> > > - and MINA send it all. Here, the mechanism will
> be
> > > totally different : we
> > > should inform MINA to send some data as soon as
> we
> > > have a block of bytes
> > > ready (if we send 1500 bytes long blocks, then
> we
> > > may have to call MINA many
> > > times for a jpegPhoto.
> > >
> > > I may have forgotten some issues, so please tell
> me
> > > ! Regarding using a
> > > existing piece of code, I have to say : "well,
> why
> > > not ?". Right now, I
> > > think we should think seriously about the point
> I
> > > mentionned, and may be on
> > > a confluence page. Streaming will take at least
> 2
> > > weeks to write... Any
> > > already written piece of code that can help is
> ok :)
> > >
> > > Emmanuel
> > >
> > > On 9/8/06, Ole Ersoy <ol...@yahoo.com>
> wrote:
> > > >
> > > > I accidentally deleted the original message...
> > > >
> > > > The myfaces file upload component can be
> > > configured to
> > > > serialize objects larger than a specified
> size.
> > > >
> > > > If that sounds useful, I can extract some
> code...
> 
=== message truncated ===


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Streaming / Serializing Big Objects

Posted by Emmanuel Lecharny <el...@gmail.com>.

Ole,

just keep in mind that we are talking of byte[] or String, not complex Java
objects :)

What we need is a simple mechanism that will allow the server to stream thos
two kind of objects. The main issue, if we stream to disk, is to avoid
zillions of small files to be created. We need a storage which will be able
to store those blobs into a single file, even if it's 10 Gb large.

An other point is that we can't do XML : it's overkilling. You will have
structures like :
<jpegPhoto name="MyFace.jpg">
  Ar45tYU...Rt==  (2Mbytes of base64 data)
</jpegPhoto>

Don't over(ab)use XML ;)

(ok, I know : compared to the disk access, it's ate least 2 order of
magnitude faster, but the less CPU we eat, the more can be used by other
threads).

Any idea is welcome, and ma be we can start a page on confluence with those
ideas. Atm, we are just in a

Emmanuel.

On 9/8/06, Ole Ersoy <ol...@yahoo.com> wrote:
>
>
> 1-Decoder
> So if the decoded request request object is above the
> configured threshold, then ADS would need to persist
> it per the configured persitance mechanism(Prevayler,
> ...), otherwise we store it in memory.
>
> The myfaces upload component looks at it's size
> threshold and serializes the uploaded file if it's
> above the specified threshold.  I'm sure it's just
> uses Java serialization straight up, but the component
> can be hooked up to any integration/persistance layer
> naturally.
>
> Suppose the whole directly tree was stored using the
> Eclipse EMF API.
>
> The the decoder would map the request object directly
> to a EMF object, and EMF's persistance mechanism could
> be invoked to persist to xml, straight up object
> serialization, the Service Data Object API could be
> invoked to serialize to databases, etc.  Web Services
> could be invoked, it's a pretty sexy API, with a lot
> of possibilities.
>
> When it comes to streaming images, resources, etc. I
> would think the tomcat API's should be really good for
> that....
>
>
>
>
>
>
>
>
>
>
>
> --- Emmanuel Lecharny < elecharny@gmail.com> wrote:
>
> > Here is what we have to do to stream large objects :
> >
> > 1- Decoder :
> > When we read the user request, we decode it from
> > ASN.1 BER to a byte[] or to
> > a String, depending of the object Type. But
> > basically, we get a byte[].
> > Whatever, we have two concerns :
> >  A- if the length of this object - which is always
> > known- is above a certain
> > size (let say 1K), then we must store the object
> > somwhere else than in
> > memory. To do so, we must have a storage which can
> > handle Strings, byte[]
> > and StreamedObject[]. This has an impact on all
> > messages (we can't just
> > work on some attributes, we have to be generic). So
> > this is a huge
> > refactoring, with accessors for those objects, and
> > especially a Stream.read()
> > accessor.
> >  B- If we have to store a String (even a big one),
> > we have to convert the
> > byte[] to a String. If the String is big, then we
> > must find a way to apply
> > the byte[] -> String UTF8 conversion from a stream,
> > and stream back the
> > result. Not so easy ...
> >
> > 2- Database storage :
> > Well, we now have decoded a request, and we have to
> > store the value. The
> > backend is not Stream ready at all. It should be
> > able to handme a Stream and
> > stores data without having to allocate a huge bunch
> > of byte[].
> > Another problem is the other operation : we read an
> > entry from the backend,
> > and we want a streamed data to remain streamed.
> > Again, huge modification.
> >
> > 3- Encoder :
> > Now, let suppose that we successfully get some data
> > from the backend, and
> > let's suppose that those data are streamed. We want
> > to send them back to the
> > client without having to create a big byte[]. That
> > means we must be able to
> > ask MINA to send chunks of data until we are done
> > with the streamed data.
> > ATM, what we do is that we write a full PDU - result
> > of the encode() method
> > - and MINA send it all. Here, the mechanism will be
> > totally different : we
> > should inform MINA to send some data as soon as we
> > have a block of bytes
> > ready (if we send 1500 bytes long blocks, then we
> > may have to call MINA many
> > times for a jpegPhoto.
> >
> > I may have forgotten some issues, so please tell me
> > ! Regarding using a
> > existing piece of code, I have to say : "well, why
> > not ?". Right now, I
> > think we should think seriously about the point I
> > mentionned, and may be on
> > a confluence page. Streaming will take at least 2
> > weeks to write... Any
> > already written piece of code that can help is ok :)
> >
> > Emmanuel
> >
> > On 9/8/06, Ole Ersoy <ol...@yahoo.com> wrote:
> > >
> > > I accidentally deleted the original message...
> > >
> > > The myfaces file upload component can be
> > configured to
> > > serialize objects larger than a specified size.
> > >
> > > If that sounds useful, I can extract some code...
> > >
> > > Cheers,
> > > - Ole
> > >
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam
> > protection around
> > > http://mail.yahoo.com
> > >
> >
> >
> >
> > --
> > Cordialement,
> > Emmanuel Lécharny
> >
>
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>



-- 
Cordialement,
Emmanuel Lécharny

Re: Streaming / Serializing Big Objects

Posted by Ole Ersoy <ol...@yahoo.com>.

1-Decoder
So if the decoded request request object is above the
configured threshold, then ADS would need to persist
it per the configured persitance mechanism(Prevayler,
...), otherwise we store it in memory.

The myfaces upload component looks at it's size
threshold and serializes the uploaded file if it's
above the specified threshold.  I'm sure it's just
uses Java serialization straight up, but the component
can be hooked up to any integration/persistance layer
naturally.

Suppose the whole directly tree was stored using the
Eclipse EMF API.

The the decoder would map the request object directly
to a EMF object, and EMF's persistance mechanism could
be invoked to persist to xml, straight up object
serialization, the Service Data Object API could be
invoked to serialize to databases, etc.  Web Services
could be invoked, it's a pretty sexy API, with a lot
of possibilities.

When it comes to streaming images, resources, etc. I
would think the tomcat API's should be really good for
that....











--- Emmanuel Lecharny <el...@gmail.com> wrote:

> Here is what we have to do to stream large objects :
> 
> 1- Decoder :
> When we read the user request, we decode it from
> ASN.1 BER to a byte[] or to
> a String, depending of the object Type. But
> basically, we get a byte[].
> Whatever, we have two concerns :
>  A- if the length of this object - which is always
> known- is above a certain
> size (let say 1K), then we must store the object
> somwhere else than in
> memory. To do so, we must have a storage which can
> handle Strings, byte[]
> and StreamedObject[]. This has an impact on all 
> messages (we can't just
> work on some attributes, we have to be generic). So
> this is a huge
> refactoring, with accessors for those objects, and
> especially a Stream.read()
> accessor.
>  B- If we have to store a String (even a big one),
> we have to convert the
> byte[] to a String. If the String is big, then we
> must find a way to apply
> the byte[] -> String UTF8 conversion from a stream,
> and stream back the
> result. Not so easy ...
> 
> 2- Database storage :
> Well, we now have decoded a request, and we have to
> store the value. The
> backend is not Stream ready at all. It should be
> able to handme a Stream and
> stores data without having to allocate a huge bunch
> of byte[].
> Another problem is the other operation : we read an
> entry from the backend,
> and we want a streamed data to remain streamed.
> Again, huge modification.
> 
> 3- Encoder :
> Now, let suppose that we successfully get some data
> from the backend, and
> let's suppose that those data are streamed. We want
> to send them back to the
> client without having to create a big byte[]. That
> means we must be able to
> ask MINA to send chunks of data until we are done
> with the streamed data.
> ATM, what we do is that we write a full PDU - result
> of the encode() method
> - and MINA send it all. Here, the mechanism will be
> totally different : we
> should inform MINA to send some data as soon as we
> have a block of bytes
> ready (if we send 1500 bytes long blocks, then we
> may have to call MINA many
> times for a jpegPhoto.
> 
> I may have forgotten some issues, so please tell me
> ! Regarding using a
> existing piece of code, I have to say : "well, why
> not ?". Right now, I
> think we should think seriously about the point I
> mentionned, and may be on
> a confluence page. Streaming will take at least 2
> weeks to write... Any
> already written piece of code that can help is ok :)
> 
> Emmanuel
> 
> On 9/8/06, Ole Ersoy <ol...@yahoo.com> wrote:
> >
> > I accidentally deleted the original message...
> >
> > The myfaces file upload component can be
> configured to
> > serialize objects larger than a specified size.
> >
> > If that sounds useful, I can extract some code...
> >
> > Cheers,
> > - Ole
> >
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam
> protection around
> > http://mail.yahoo.com
> >
> 
> 
> 
> -- 
> Cordialement,
> Emmanuel Lécharny
> 


__________________________________________________
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com

Re: Streaming / Serializing Big Objects

Posted by Emmanuel Lecharny <el...@gmail.com>.

Here is what we have to do to stream large objects :

1- Decoder :
When we read the user request, we decode it from ASN.1 BER to a byte[] or to
a String, depending of the object Type. But basically, we get a byte[].
Whatever, we have two concerns :
 A- if the length of this object - which is always known- is above a certain
size (let say 1K), then we must store the object somwhere else than in
memory. To do so, we must have a storage which can handle Strings, byte[]
and StreamedObject[]. This has an impact on all  messages (we can't just
work on some attributes, we have to be generic). So this is a huge
refactoring, with accessors for those objects, and especially a Stream.read()
accessor.
 B- If we have to store a String (even a big one), we have to convert the
byte[] to a String. If the String is big, then we must find a way to apply
the byte[] -> String UTF8 conversion from a stream, and stream back the
result. Not so easy ...

2- Database storage :
Well, we now have decoded a request, and we have to store the value. The
backend is not Stream ready at all. It should be able to handme a Stream and
stores data without having to allocate a huge bunch of byte[].
Another problem is the other operation : we read an entry from the backend,
and we want a streamed data to remain streamed. Again, huge modification.

3- Encoder :
Now, let suppose that we successfully get some data from the backend, and
let's suppose that those data are streamed. We want to send them back to the
client without having to create a big byte[]. That means we must be able to
ask MINA to send chunks of data until we are done with the streamed data.
ATM, what we do is that we write a full PDU - result of the encode() method
- and MINA send it all. Here, the mechanism will be totally different : we
should inform MINA to send some data as soon as we have a block of bytes
ready (if we send 1500 bytes long blocks, then we may have to call MINA many
times for a jpegPhoto.

I may have forgotten some issues, so please tell me ! Regarding using a
existing piece of code, I have to say : "well, why not ?". Right now, I
think we should think seriously about the point I mentionned, and may be on
a confluence page. Streaming will take at least 2 weeks to write... Any
already written piece of code that can help is ok :)

Emmanuel

On 9/8/06, Ole Ersoy <ol...@yahoo.com> wrote:
>
> I accidentally deleted the original message...
>
> The myfaces file upload component can be configured to
> serialize objects larger than a specified size.
>
> If that sounds useful, I can extract some code...
>
> Cheers,
> - Ole
>
> __________________________________________________
> Do You Yahoo!?
> Tired of spam?  Yahoo! Mail has the best spam protection around
> http://mail.yahoo.com
>

-- 
Cordialement,
Emmanuel Lécharny