You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2017/05/25 19:46:04 UTC

[jira] [Commented] (TINKERPOP-1676) Improve GraphSON 2.0 Performance

    [ https://issues.apache.org/jira/browse/TINKERPOP-1676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16025273#comment-16025273 ] 

ASF GitHub Bot commented on TINKERPOP-1676:
-------------------------------------------

GitHub user spmallette opened a pull request:

    https://github.com/apache/tinkerpop/pull/613

    TINKERPOP-1676 (tp32) GraphSON Performance

    https://issues.apache.org/jira/browse/TINKERPOP-1676
    
    Made some pretty good improvements to performance. I was mostly focused on the serialization of complete graph elements so using TinkerGraph as the basis for benchmark seemed like a good way to do that.  I subgraphed a portion of the Grateful Dead graph:
    
    ```text
    gremlin> subgraph = g.E().hasLabel('followedBy').subgraph('subGraph').cap('subGraph').next()
    ==>tinkergraph[vertices:338 edges:7047]
    ```
    
    I wanted a comparison with Gryo so I got that first:
    
    ```text
    gremlin> gryo = graph.io(IoCore.gryo()).mapper().create().createMapper()
    ==>org.apache.tinkerpop.shaded.kryo.Kryo@5edf2821
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   def output = new org.apache.tinkerpop.shaded.kryo.io.Output(s)
    ......3>   gryo.writeObject(output, subgraph) 
    ......4>   output.flush()
    ......5>   def input = new org.apache.tinkerpop.shaded.kryo.io.Input(s.toByteArray())
    ......6>   gryo.readObject(input, TinkerGraph.class) 
    ......7> }
    ==>55.96791679999999
    ```
    
    then compared head of tp32:
    
    ```text
    gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
    ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@185f7840
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   graphson.writeValue(s, subgraph)
    ......3>   graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class)
    ......4> }
    ==>723.1702541799999
    ```
    
    pretty bad - now GraphSON with changes from this branch:
    
    ```text
    gremlin> graphson = GraphSONMapper.build().version(GraphSONVersion.V2_0).addCustomModule(GraphSONXModuleV2d0.build().create(false)).addRegistry(TinkerIoRegistryV2d0.instance()).create().createMapper()
    ==>org.apache.tinkerpop.shaded.jackson.databind.ObjectMapper@563a89b5
    gremlin> clock { 
    ......1>   def s = new java.io.ByteArrayOutputStream()
    ......2>   graphson.writeValue(s, subgraph)
    ......3>   graphson.readValue(new java.io.ByteArrayInputStream(s.toByteArray()), TinkerGraph.class)
    ......4> }
    ==>67.0219281
    ```
    
    Not too far behind gryo now!
    
    Running full integration tests now and will update this PR when it's done successfully. 
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/tinkerpop TINKERPOP-1676

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/tinkerpop/pull/613.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #613
    
----
commit 46e6e9767b6818f33b56366ee15d80cec5a908e0
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-23T20:03:17Z

    TINKERPOP-1676 Got rid of stream() usage
    
    Can't believe we still had stream() in here. Will it ever all be gone from these performance sensitive places?! Anyway, removed that and deprecated a constructor on DetachedEdge that was using Pair for no really good reason. No need to create extra Pair objects for that. They just sorta get thrown away after usage.

commit 251f5b7e34e8ebd9a8bc36802e633be8c91eeb5e
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-23T20:57:12Z

    TINKERPOP-1676 Improved speed and memory usage of GraphSON
    
    This change is specific to TinkerGraph and the serialization of vertices/edges/properties. Removed a "middle layer" of JSON to Object serialization which was coercing JSON to Map first and then converting that to the graph elements. Also made deserializers cacheable.

commit 02b007366b434918fe1181f99d80689c1c03684b
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-24T00:13:46Z

    TINKERPOP-1676 Performance enhancement to graphson serialization
    
    Focuses on speeding up serialization of graph elements. Prevent use of generic maps and stream graphson data directly into "detached" elements. Not using intermediate maps cut down on memory usage and a bunch of jackson reflection calls (still not sure what they were for and why they were not made to be more efficient). It did mean some ugly changes to "detached" stuff. Will need to maybe refactor some more, but the basic premise seems to be proven.

commit e5d2d2bc000654fd026521219cf30bac265139a3
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-24T17:38:16Z

    TINKERPOP-1676 Removed properties from graphson serialization of Path
    
    Path should not have properties on elements (if they are present). That is inconsistent with gryo and was unintentially allowed.

commit 049c979720a84ea9744725fda5f8d2b371ab1751
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-25T17:29:45Z

    TINKERPOP-1676 More optimizations to GraphSON serializers
    
    Added some tests where there were previously gaps.

commit f32d725a080d39ef7bf93c68d8080939f622cb37
Author: Stephen Mallette <sp...@genoprime.com>
Date:   2017-05-25T18:03:29Z

    TINKERPOP-1676 Cleaned up serialization with detached
    
    The DetachedUtil was just there to test out if the performance changes would help and it did it's job nicely, but it was kinda ugly and hung some methods out there in weird way. Cleaned that up with some builder pattern on the detached classes themselves.

----


> Improve GraphSON 2.0 Performance
> --------------------------------
>
>                 Key: TINKERPOP-1676
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1676
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: io
>    Affects Versions: 3.2.4
>            Reporter: stephen mallette
>            Assignee: stephen mallette
>              Labels: breaking
>
> Not sure what will happen here. Just wanted an issue where I could experiment a bit with and gather feedback from others for collaboration in case others had ideas. Right now, GraphSON reads are considerably slower than Gryo and writes appear to be faster, so the focus will be on how to get reads more in line with what Gryo can do. 



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)