You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "Marko A. Rodriguez (JIRA)" <ji...@apache.org> on 2016/06/17 22:05:05 UTC

[jira] [Commented] (TINKERPOP-1343) A more efficient StarGraph serialization representation.

    [ https://issues.apache.org/jira/browse/TINKERPOP-1343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15337074#comment-15337074 ] 

Marko A. Rodriguez commented on TINKERPOP-1343:
-----------------------------------------------

If we go with #1 above and know that {{xxxIdClass}} is a {{long}} or an {{int}} we should use variable width encoding which is offered "out of the box" by Kryo.

> A more efficient StarGraph serialization representation.
> --------------------------------------------------------
>
>                 Key: TINKERPOP-1343
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1343
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: process
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>              Labels: breaking
>
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a vertex, its properties, its incident edges, and their properties. In essence, one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. This is bad because we have to write the class with each id. It would be better if the {{StarGraph}} had metadata like {{vertexIdClass}}, {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are serializing three class, but the benefit is that every id class is now known and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, otherVertexId]\*]\*}} and {{[ propertyKey[ vertexProperty[ id,propertyValue]\*]\*}}, respectively. This ensures we don't write so many strings as all edges/vertex properties are grouped by label. However, we do NOT do this for edge properties nor vertex property properties. We simply write out the {{Map<Object,Map<String,Object>>}} which is {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose between grouping by edgeId or by propertyKey, we should keep it as it is, but create a "meta map" that allows us to represent all property keys in a, e.g., {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized with the {{StarGraph}}.
> There are a few other tickets around optimizing {{StarGraph}} here:
> https://issues.apache.org/jira/browse/TINKERPOP-1128 (making {{GraphFilters}} more efficient)
> https://issues.apache.org/jira/browse/TINKERPOP-1122 (pointless bits and {{StarGraph}} should never auto-generate IDs as the ID space is distributed).
> https://issues.apache.org/jira/browse/TINKERPOP-1287 (related to heap usage and clock cycles -- not serialization).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)