You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by spmallette <gi...@git.apache.org> on 2017/05/25 19:45:21 UTC

[GitHub] tinkerpop pull request #613: TINKERPOP-1676 (tp32) GraphSON Performance

GitHub user spmallette opened a pull request:

    https://github.com/apache/tinkerpop/pull/613

    TINKERPOP-1676 (tp32) GraphSON Performance

    https://issues.apache.org/jira/browse/TINKERPOP-1676
    
    Made some pretty good improvements to performance. I was mostly focused on the serialization of complete graph elements so using TinkerGraph as the basis for benchmark seemed like a good way to do that.  I subgraphed a portion of the Grateful Dead graph:
    
    ```text
    gremlin> subgraph = g.E().hasLabel('followedBy').subgraph('subGraph').cap('subGraph').next()
    ==>tinkergraph[vertices:338 edges:7047]
    ```
    
    I wanted a comparison with Gryo so I got that first:
    
    ```text
    gremlin> gryo = graph.io(IoCore.gryo()).mapper().create().createMapper()
    ==>org.apache.tinkerpop.shaded.kryo.Kryo@5edf2821
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   def output = new org.apache.tinkerpop.shaded.kryo.io.Output(s)
    ......3>   gryo.writeObject(output, subgraph) 
    ......4>   output.flush()
    ......5>   def input = new org.apache.tinkerpop.shaded.kryo.io.Input(s.toByteArray())
    ......6>   gryo.readObject(input, TinkerGraph.class) 
    ......7> }
    ==>55.96791679999999
    ```
    
    then compared head of tp32:
    
    ```text
    gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
    ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@185f7840
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   graphson.writeValue(s, subgraph)
    ......3>   graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class)
    ......4> }
    ==>723.1702541799999
    ```
    
    pretty bad - now GraphSON with changes from this branch:
    
    ```text
    gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
    ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@563a89b5
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   graphson.writeValue(s, subgraph)
    ......3>   graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class)
    ......4> }
    ==>67.0219281
    ```
    
    Not too far behind gryo now!
    
    Running full integration tests now and will update this PR when it's done successfully. 
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #613
    
----
commit 46e6e9767b6818f33b56366ee15d80cec5a908e0
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-23T20:03:17Z

    TINKERPOP-1676 Got rid of stream() usage
    
    Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage.

commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-23T20:57:12Z

    TINKERPOP-1676 Improved speed and memory usage of GraphSON
    
    This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable.

commit 02b007366b434918fe1181f99d80689c1c03684b
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-24T00:13:46Z

    TINKERPOP-1676 Performance enhancement to graphson serialization
    
    Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven.

commit e5d2d2bc000654fd026521219cf30bac265139a3
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-24T17:38:16Z

    TINKERPOP-1676 Removed properties from graphson serialization of Path
    
    Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed.

commit 049c979720a84ea9744725fda5f8d2b371ab1751
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-25T17:29:45Z

    TINKERPOP-1676 More optimizations to GraphSON serializers
    
    Added some tests where there were previously gaps.

commit f32d725a080d39ef7bf93c68d8080939f622cb37
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-25T18:03:29Z

    TINKERPOP-1676 Cleaned up serialization with detached
    
    The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tinkerpop issue #613: TINKERPOP-1676 (tp32) GraphSON Performance

Posted by okram <gi...@git.apache.org>.
Github user okram commented on the issue:

    https://github.com/apache/tinkerpop/pull/613
  
    VOTE +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tinkerpop issue #613: TINKERPOP-1676 (tp32) GraphSON Performance

Posted by dkuppitz <gi...@git.apache.org>.
Github user dkuppitz commented on the issue:

    https://github.com/apache/tinkerpop/pull/613
  
    Not sure if this should really go into `tp32` as it's kinda breaking (though it breaks something that was broken :)), but if others agree, then
    
    VOTE: +1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tinkerpop issue #613: TINKERPOP-1676 (tp32) GraphSON Performance

Posted by spmallette <gi...@git.apache.org>.
Github user spmallette commented on the issue:

    https://github.com/apache/tinkerpop/pull/613
  
    yeah - never sure what to do in those situations. i'm having some pause as i think about this PR this week. let's consider it and merge it as-is and i'll see what i might do to allow the "old way" to still hang around so that if someone is depending on it, they can still do it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] tinkerpop pull request #613: TINKERPOP-1676 (tp32) GraphSON Performance

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/tinkerpop/pull/613


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---