You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@directory.apache.org by "Trustin Lee (JIRA)" <di...@incubator.apache.org> on 2005/06/24 16:11:11 UTC
[jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Standarzied serialization and deserialization of Name, Attribute, and Attributes.
---------------------------------------------------------------------------------
Key: DIREVE-170
URL: http://issues.apache.org/jira/browse/DIREVE-170
Project: Directory Server
Type: Improvement
Versions: 0.9
Reporter: Trustin Lee
Assigned to: Trustin Lee
We should provide standardized high-performance serialization/deserialization mechanism as a library.
Using Java serialization has couple of cons:
* Slow
* Resulting data is big
* Serialized data might be unable to read due to class signature changes
It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Tuesday 28 June 2005 12:47, Niclas Hedhman wrote:
> The codebase URLs is the third item which written out, which of course can
> be very large.
I want to correct myself. The above is not true for serialization itself, only
in combination with MarshalledObject, which is what RMI uses.
Sorry for the incorrect info.
Cheers
Niclas
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Trustin Lee <tr...@gmail.com>.
2005/6/29, Niclas Hedhman <ni...@hedhman.org>:
>
> On Tuesday 28 June 2005 23:39, Trustin Lee wrote:
> > If performance is not a problem, we can just go with object
> serialization,
> > but currently our performance is not really good, and it is being caused
> by
> > large extra I/O from object serialization.
>
> Ok. If you feel it is an urgent thing, then I won't have a problem with
> it.
> I just thought I needed to highlight what I know in this field, so you
> don't
> waste your time. I have seen quite a few people trying to "roll their own"
> and not anyone gaining anything significant (i.e. more than 10-20%
> performance boost).
We'll of course perform performance benchmark before simply dumping object
serialization. :)
Thank you so much for your advice, it was actually really helpful.
Trustin
--
what we call human nature is actually human habit
--
http://gleamynode.net/
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Tuesday 28 June 2005 23:39, Trustin Lee wrote:
> If performance is not a problem, we can just go with object serialization,
> but currently our performance is not really good, and it is being caused by
> large extra I/O from object serialization.
Ok. If you feel it is an urgent thing, then I won't have a problem with it.
I just thought I needed to highlight what I know in this field, so you don't
waste your time. I have seen quite a few people trying to "roll their own"
and not anyone gaining anything significant (i.e. more than 10-20%
performance boost).
Perhaps this is the exception. Good Luck.
Cheers
Niclas
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Trustin Lee <tr...@gmail.com>.
Hi,
2005/6/29, Emmanuel Lecharny <el...@gmail.com>:
>
> On Wed, 2005-06-29 at 00:39 +0900, Trustin Lee wrote:
>
> > If performance is not a problem, we can just go with object
> > serialization, but currently our performance is not really good, and
> > it is being caused by large extra I/O from object serialization.
>
> Trustin, If I understand well what you suggest, maybe we can use a kind
> ASN.1 codec to do the job. It's fast and tight.
You're correct. We could use ASN.1 codec because ApacheDS is closedly
related with LDAP. Perhaps we could reuse stuff from LDAP protocol codec?
Trustin
--
what we call human nature is actually human habit
--
http://gleamynode.net/
Re: [jira] Created: (DIREVE-170) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by Emmanuel Lecharny <el...@gmail.com>.
On Wed, 2005-06-29 at 00:39 +0900, Trustin Lee wrote:
> If performance is not a problem, we can just go with object
> serialization, but currently our performance is not really good, and
> it is being caused by large extra I/O from object serialization.
Trustin, If I understand well what you suggest, maybe we can use a kind
ASN.1 codec to do the job. It's fast and tight.
wdyt?
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Trustin Lee <tr...@gmail.com>.
Hi,
2005/6/28, Niclas Hedhman <ni...@hedhman.org>:
>
> On Tuesday 28 June 2005 08:23, Trustin Lee wrote:
>
> > The biggest problem is the class descriptors written by
> > ObjectOutputStream. It is sometimes even bigger than actual object data.
> We
> > can override some protected methods to store the descriptors somewhere
> > else, and it makes the serialized data dependent to the descriptor
> > database.
> > I even saw the case that SMS message object is serialized 2kB data
> > because its class descriptor took up 1.4kB.
>
> Hmmmm... What tests have you actually run?
> You can't do without the FQ classnames of the classes involved. They are
> written in 'clear text' once for each class, then referenced with an index
> (int IIRC). Whether or not you need the field names, is your call, but it
> sounds like a decent system to not depend on knowing the exact ordering.
> The codebase URLs is the third item which written out, which of course can
> be
> very large.
>
> import java.io.*;
>
> public class Test
> {
> static public void main( String[] args )
> throws Exception
> {
> FileOutputStream fos = new FileOutputStream( "abc.ser" );
> ObjectOutputStream oos = new ObjectOutputStream( fos );
> Abc abc = new Abc();
> oos.writeObject( abc );
> oos.close();
> }
>
> private static class Abc implements Serializable
> {
> String abc = "1";
> String def = "2";
> }
> }
>
> Typically case??? Well, it results in 75 bytes.
Yes, 75 bytes for only two single character strings are huge. :)
> What if the name of class changes?
>
> I assume this is a rhetorical question, since I am sure you know the
> answer. I
> am interesting to know how you are going to handle that in your own
> serialization framework.
We don't specify type name, because we know what type will come in the
stream.
> And if we implement readObject and
> > writeObject by ourselves, why do we use ObjectOutputStream?
>
> Because you don't need to worry about complex classes, and diving into the
> hierarchies of instances, which you would for both "rolling your own" as
> well
> as Externalizable.
Right. I thought Attributes, Attribute, and Name are simple enough to
forget about a complex object graph. But attribute values should be able to
contain any Java objects, so I'm thinking about allowing Java objects there
only.
> Moreover, it
> > adds extra metadata that indicates each field's type that increases the
> > size of serialized data. If we implement readObject and writeObject
> > manually, there's no need to include those metadata IMHO.
>
> Serialization writes the field names to the stream, so that it can restore
> the
> fields even if they were re-ordered in the class. I think you have
> observed
> that when you use writeObject(), the field names are till written to the
> stream. I don't know the answer to that, since the deserialization can not
> possibly know what to do with it.
You're right.
> > My aim is to create compact and fast codec for LDAP-specific entities
> > (LdapName, Attribute, Attributes) that is Java-independent so that they
> are
> > used to create another protocol based on ApacheDS or to store data in
> > Java-independent way.
>
> If they are flat, i.e. basically strings or collections of strings, then I
> agree that serialization is not necessarily any added value. But are you
> not
> allowed to store any arbitrary Object in attributes?
Attribute values can be any Java objects actually. So I'm going to use
object serialization only for that case. But most often used types such as
string and byte[] will have to be handled specially to gain maximum
performance.
LDAP entries are usually stored to B+Tree implementations, so we have to
initialize ObjectInputStream and ObjectOutputStream each time we read or
write objects, and it is major performance panelty because it usually gives
us additional memory allocation and copy and it cause class descriptors are
written every time again and again (in regular stream, it is not a problem
because they are reused, but it becomes a problem in the environment like
this). Plus, the size of entry impacts the performance of backing storage if
massive operation is being performed. Making serialized data smaller gives
performance gain because it makes database contain more items per page.
If performance is not a problem, we can just go with object serialization,
but currently our performance is not really good, and it is being caused by
large extra I/O from object serialization.
Trustin
--
what we call human nature is actually human habit
--
http://gleamynode.net/
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Tuesday 28 June 2005 08:23, Trustin Lee wrote:
> The biggest problem is the class descriptors written by
> ObjectOutputStream. It is sometimes even bigger than actual object data. We
> can override some protected methods to store the descriptors somewhere
> else, and it makes the serialized data dependent to the descriptor
> database.
> I even saw the case that SMS message object is serialized 2kB data
> because its class descriptor took up 1.4kB.
Hmmmm... What tests have you actually run?
You can't do without the FQ classnames of the classes involved. They are
written in 'clear text' once for each class, then referenced with an index
(int IIRC). Whether or not you need the field names, is your call, but it
sounds like a decent system to not depend on knowing the exact ordering.
The codebase URLs is the third item which written out, which of course can be
very large.
import java.io.*;
public class Test
{
static public void main( String[] args )
throws Exception
{
FileOutputStream fos = new FileOutputStream( "abc.ser" );
ObjectOutputStream oos = new ObjectOutputStream( fos );
Abc abc = new Abc();
oos.writeObject( abc );
oos.close();
}
private static class Abc implements Serializable
{
String abc = "1";
String def = "2";
}
}
Typically case??? Well, it results in 75 bytes.
> What if the name of class changes?
I assume this is a rhetorical question, since I am sure you know the answer. I
am interesting to know how you are going to handle that in your own
serialization framework.
> And if we implement readObject and
> writeObject by ourselves, why do we use ObjectOutputStream?
Because you don't need to worry about complex classes, and diving into the
hierarchies of instances, which you would for both "rolling your own" as well
as Externalizable.
> Moreover, it
> adds extra metadata that indicates each field's type that increases the
> size of serialized data. If we implement readObject and writeObject
> manually, there's no need to include those metadata IMHO.
Serialization writes the field names to the stream, so that it can restore the
fields even if they were re-ordered in the class. I think you have observed
that when you use writeObject(), the field names are till written to the
stream. I don't know the answer to that, since the deserialization can not
possibly know what to do with it.
> My aim is to create compact and fast codec for LDAP-specific entities
> (LdapName, Attribute, Attributes) that is Java-independent so that they are
> used to create another protocol based on ApacheDS or to store data in
> Java-independent way.
If they are flat, i.e. basically strings or collections of strings, then I
agree that serialization is not necessarily any added value. But are you not
allowed to store any arbitrary Object in attributes?
Cheers
Niclas
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Trustin Lee <tr...@gmail.com>.
Hi Nick,
2005/6/25, Niclas Hedhman <ni...@hedhman.org>:
>
> On Friday 24 June 2005 22:11, Trustin Lee (JIRA) wrote:
>
> Couple of notes;
>
> > Using Java serialization has couple of cons:
> >
> > * Slow
>
> That is a very relative term. Compared to other generic serialization
> mechanisms I don't think it fairs much better or worse.
Yes, using Externalizable interface will boost up this, but it requires a
default constructor.
> * Resulting data is big
>
> Again, very relative term. There is only one way to make it smaller than
> it
> already is, and that is to introduce sensible defaults for primitives,
> which
> are not serialized if it can be avoided. This tend to only work for
> JavaBeans, and the java.beans.XMLEncoder/XMLDecoder is utilizing this
> strategy.
The biggest problem is the class descriptors written by ObjectOutputStream.
It is sometimes even bigger than actual object data. We can override some
protected methods to store the descriptors somewhere else, and it makes the
serialized data dependent to the descriptor database.
I even saw the case that SMS message object is serialized 2kB data because
its class descriptor took up 1.4kB.
> * Serialized data might be unable to read due to class signature changes
>
> This is not true. If you do your home work properly, you can make fairly
> extensive changes without becoming incompatible, both forward and
> backward.
>
> Also, making proper Serializable code is not only a matter of adding
> java.io.Serializable to every class. That will most likely not work well.
> It
> takes quite some thinking of figuring out the purpose, patterns and usages
> of
> transient and/or the use of readObject/writeObject.
What if the name of class changes? And if we implement readObject and
writeObject by ourselves, why do we use ObjectOutputStream? Moreover, it
adds extra metadata that indicates each field's type that increases the size
of serialized data. If we implement readObject and writeObject manually,
there's no need to include those metadata IMHO.
I must say I don't know where you are using Serialization in DS, so I don't
> know where this have bearing, and whether it is worthwhile bothering
> about.
> End of the day, many of the same issues will take center-stage even with
> your
> own package.
My aim is to create compact and fast codec for LDAP-specific entities
(LdapName, Attribute, Attributes) that is Java-independent so that they are
used to create another protocol based on ApacheDS or to store data in
Java-independent way.
Thanks,
Trustin
--
what we call human nature is actually human habit
--
http://gleamynode.net/
Re: [jira] Created: (DIREVE-170) Standarzied serialization and deserialization of Name, Attribute, and Attributes.
Posted by Niclas Hedhman <ni...@hedhman.org>.
On Friday 24 June 2005 22:11, Trustin Lee (JIRA) wrote:
Couple of notes;
> Using Java serialization has couple of cons:
>
> * Slow
That is a very relative term. Compared to other generic serialization
mechanisms I don't think it fairs much better or worse.
> * Resulting data is big
Again, very relative term. There is only one way to make it smaller than it
already is, and that is to introduce sensible defaults for primitives, which
are not serialized if it can be avoided. This tend to only work for
JavaBeans, and the java.beans.XMLEncoder/XMLDecoder is utilizing this
strategy.
Now, MOST of the cases I have looked into when claims of "data is big", and
often related to "slow", is that people have code that are coupled together,
so that more objects are serialized than one thought. Instead of a small
vector of objects, the entire application gets serialized due to some
listeners, for instance.
> * Serialized data might be unable to read due to class signature changes
This is not true. If you do your home work properly, you can make fairly
extensive changes without becoming incompatible, both forward and backward.
Also, making proper Serializable code is not only a matter of adding
java.io.Serializable to every class. That will most likely not work well. It
takes quite some thinking of figuring out the purpose, patterns and usages of
transient and/or the use of readObject/writeObject.
I must say I don't know where you are using Serialization in DS, so I don't
know where this have bearing, and whether it is worthwhile bothering about.
End of the day, many of the same issues will take center-stage even with your
own package.
Cheers
Niclas
[jira] Commented: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/DIRSERVER-478?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12482804 ]
Emmanuel Lecharny commented on DIRSERVER-478:
---------------------------------------------
Attribute(s) are now using its own serialization, and it's much faster !
We should now implement other specific serialization, for DN, and for each attributeType too. The idea would be to create a "serializer" for each AT like we have normalizer, etc...
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: https://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Reporter: Trustin Lee
> Assigned To: Emmanuel Lecharny
> Priority: Minor
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Alex Karasulu (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/DIRSERVER-478?page=all ]
Alex Karasulu updated DIRSERVER-478:
------------------------------------
Component: core
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: http://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Type: Improvement
> Components: core
> Reporter: Trustin Lee
> Assignee: Trustin Lee
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Closed: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Alex Karasulu (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/DIRSERVER-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Alex Karasulu closed DIRSERVER-478.
-----------------------------------
Resolution: Fixed
done a while back with server entries
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: https://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.5.0
> Reporter: Trustin Lee
> Assignee: Emmanuel Lecharny
> Priority: Minor
> Fix For: 1.5.3
>
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/DIRSERVER-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Lecharny updated DIRSERVER-478:
----------------------------------------
Fix Version/s: (was: 1.5.2)
1.5.3
Partially done, but not finished. Has to be tested and improved.
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: https://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.5.0
> Reporter: Trustin Lee
> Assignee: Emmanuel Lecharny
> Priority: Minor
> Fix For: 1.5.3
>
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Trustin Lee (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/DIRSERVER-478?page=all ]
Trustin Lee updated DIRSERVER-478:
----------------------------------
Priority: Minor (was: Major)
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: http://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Reporter: Trustin Lee
> Assigned To: Trustin Lee
> Priority: Minor
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] Updated: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/DIRSERVER-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Lecharny updated DIRSERVER-478:
----------------------------------------
Affects Version/s: 1.5.0
Fix Version/s: 1.5.2
defined Affect/Fix versions
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: https://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Affects Versions: 1.5.0
> Reporter: Trustin Lee
> Assignee: Emmanuel Lecharny
> Priority: Minor
> Fix For: 1.5.2
>
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Assigned: (DIRSERVER-478) Standarzied serialization and
deserialization of Name, Attribute, and Attributes.
Posted by "Emmanuel Lecharny (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/DIRSERVER-478?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Emmanuel Lecharny reassigned DIRSERVER-478:
-------------------------------------------
Assignee: Emmanuel Lecharny (was: Trustin Lee)
> Standarzied serialization and deserialization of Name, Attribute, and Attributes.
> ---------------------------------------------------------------------------------
>
> Key: DIRSERVER-478
> URL: https://issues.apache.org/jira/browse/DIRSERVER-478
> Project: Directory ApacheDS
> Issue Type: Improvement
> Components: core
> Reporter: Trustin Lee
> Assigned To: Emmanuel Lecharny
> Priority: Minor
>
> We should provide standardized high-performance serialization/deserialization mechanism as a library.
> Using Java serialization has couple of cons:
> * Slow
> * Resulting data is big
> * Serialized data might be unable to read due to class signature changes
> It will also help users to create their own ContextPartition implementation easily.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.