You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@thrift.apache.org by "Bryan Duxbury (JIRA)" <ji...@apache.org> on 2012/06/22 23:41:43 UTC

[jira] [Updated] (THRIFT-1630) Equivalent objects that contain sets and maps can serialize differently

     [ https://issues.apache.org/jira/browse/THRIFT-1630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Bryan Duxbury updated THRIFT-1630:
----------------------------------

    Description: 
There's a subtle issue with trying to compare the serialized bytes of Thrift objects that contain maps or sets in Java. Even though the objects that go into sets (or serve as map keys) have consistent hashcodes, if they are inserted in different order, then the iteration order of the collection will also be different. Since serialization occurs in iteration order, this can lead to objects that are .equals() when in-memory being not-equals when serialized.

In most cases this isn't an issue. However, in cases where the user is doing raw comparison (ie, Hadoop), then it is a big issue.

One solution is to just switch the internal Map implementation to the Sorted version (TreeSet/TreeMap). However, these implementations are about 3x slower than their Hash counterparts, and I can certainly foresee situations in which that would upset a lot of users. I propose we add a compiler switch that toggles the Map/Set implementation between sorted and unsorted so that users can select which they prefer.


  was:allow users to indicate that they'd like sets/maps in their types. meaning, for example, that they'd be backed by TreeSet/TreeMap in Java.

        Summary: Equivalent objects that contain sets and maps can serialize differently  (was: add support for sorted sets/maps)
    
> Equivalent objects that contain sets and maps can serialize differently
> -----------------------------------------------------------------------
>
>                 Key: THRIFT-1630
>                 URL: https://issues.apache.org/jira/browse/THRIFT-1630
>             Project: Thrift
>          Issue Type: New Feature
>          Components: Java - Compiler
>            Reporter: Chris Mullins
>            Assignee: Bryan Duxbury
>
> There's a subtle issue with trying to compare the serialized bytes of Thrift objects that contain maps or sets in Java. Even though the objects that go into sets (or serve as map keys) have consistent hashcodes, if they are inserted in different order, then the iteration order of the collection will also be different. Since serialization occurs in iteration order, this can lead to objects that are .equals() when in-memory being not-equals when serialized.
> In most cases this isn't an issue. However, in cases where the user is doing raw comparison (ie, Hadoop), then it is a big issue.
> One solution is to just switch the internal Map implementation to the Sorted version (TreeSet/TreeMap). However, these implementations are about 3x slower than their Hash counterparts, and I can certainly foresee situations in which that would upset a lot of users. I propose we add a compiler switch that toggles the Map/Set implementation between sorted and unsorted so that users can select which they prefer.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira