You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@geode.apache.org by "ASF subversion and git services (JIRA)" <ji...@apache.org> on 2019/08/01 17:24:00 UTC
[jira] [Commented] (GEODE-6973) getExistingIdForType should not compare all entries in idToType region

    [ https://issues.apache.org/jira/browse/GEODE-6973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16898210#comment-16898210 ] 

ASF subversion and git services commented on GEODE-6973:
--------------------------------------------------------

Commit 181e3a4a465aa9f5e06f39cd1285c94f9bc78600 in geode's branch refs/heads/develop from Xiaojian Zhou
[ https://gitbox.apache.org/repos/asf?p=geode.git;h=181e3a4 ]

GEODE-6973: Use cachelistener to synchronize typeToId with IdToType r… (#3853)

* GEODE-6973: Use cachelistener to synchronize typeToId with IdToType region in
            PeerTypeRegistrationm, then when creating a new json pdxType, no need
            to look up in IdToType region. This will speed up creating new pdxType.

    Co-authored-by: Xiaojian Zhou <gz...@pivotal.io>
    Co-authored-by: Donal Evans <do...@pivotal.io>


> getExistingIdForType should not compare all entries in idToType region
> ----------------------------------------------------------------------
>
>                 Key: GEODE-6973
>                 URL: https://issues.apache.org/jira/browse/GEODE-6973
>             Project: Geode
>          Issue Type: Bug
>            Reporter: xiaojian zhou
>            Assignee: xiaojian zhou
>            Priority: Major
>              Labels: GeodeCommons
>          Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We found the PeerTypeRegistration's getExistingIdForType() will iterate through the idToType region's entries to find if the incoming newType is there. 
> If idToType region contains 20K or 100K entries, this will impact the put throughput (customers did notice the performance downgrade when there're many pdxTypes). 
> To make the things worse, the comparison is to compare the whole object, field to field. If the json object (which will be converted to pdxType) contains 30 fields, the comparison will have to compare up to 30 fields. If the idToType region contains 20K entries, A new pdxType will do 20K  x 30 string comparisons before register it. 
> We found each server maintained a typeToId map, this map is used to check if the pdxType exists. If exists, it will return the type id without check the IdToType region. The total number of pdxType did not impact the put performance if the pdxTypd exists. 
> The typeToId map is maintained with a d-lock, each time we added a new pdxType, it will update into the map while still holding the d-lok. So we believe that the map should be the same as the region in content. If we cannot find the pdxType in the map, it should not be in the region. We can skip the iteration of region (which is the root cause of the performance issue). 
> Another issue in current code is: when each time a new type come, it will recreate the map. This is unnecessary and contributes to the slowness too. 
> We should only create the map during initialize(). 
> Here are the tests we want to introduce:
> 1) a junit test to prove that reorder fields in a big JSON file will not cause significant hashcode conflicts (<1%)
> 2) a junit test to prove that add a index to a field in a big JSON file will hardly cause hashcode conflicts. 
> This 2 tests are to prove that hashcode conflict is not the root cause of linear probing for PDXTypeId. 
> 3) a junit test to prove that for the cases that hashcode conflict caused by reordered fields, there will be no hashcode conflicts if using SORT_JSON_FIELD_NAMES_PROPERTY=true. 
> 4) a dunit test to prove that SORT_JSON_FIELD_NAMES_PROPERTY=true or false did not impact the performance to add a new pdxType. 
> 5) a dunit test to create a new pdxType from 2 peer server at the same time. The test is to prove that the d-lock take effect, one server create the pdxType, and another server should find the pdxType exists. 
> Do this test both from server directly and from clients. 
> 6) Create 2 different objects which ends up with the same hashcode (we can get the 2 objects from test-1), try to put the 2 objects to create new pdxType. The 2nd one should also create a new type. It should not be treated as "found an existing pdxType". 



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)