You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by "stephen mallette (JIRA)" <ji...@apache.org> on 2018/10/31 15:53:00 UTC

[jira] [Commented] (TINKERPOP-1346) Gryo 4.0

    [ https://issues.apache.org/jira/browse/TINKERPOP-1346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16670284#comment-16670284 ] 

stephen mallette commented on TINKERPOP-1346:
---------------------------------------------

I wonder if we can stop thinking about the potential for a Gryo 4.0 given current discussions around: TINKERPOP-1942. 

> Gryo 4.0
> --------
>
>                 Key: TINKERPOP-1346
>                 URL: https://issues.apache.org/jira/browse/TINKERPOP-1346
>             Project: TinkerPop
>          Issue Type: Improvement
>          Components: io, structure
>    Affects Versions: 3.2.0-incubating
>            Reporter: Marko A. Rodriguez
>            Priority: Major
>              Labels: breaking
>
> *Reference*
> Right now, to send a {{ReferenceEdge}} message, we serialize the form as:
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassObject[Edge ID] + KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID] + KryoClassInteger[ReferenceVertex] + KryoClassObject[Vertex ID]
> {code}
> Assuming {{Long}} Element ids, the math says:
> {code:java}
> 48 bytes = 4 bytes + (4 bytes + 8 bytes [long]) + 4 bytes + (4 bytes + 8 bytes [long]) + 4 bytes + (4 bytes + 8 bytes [long])
> {code}
> We could get this smaller by not relying on Kryo's {{FieldSerializer}}.
> {code:java}
> KryoClassInteger[ReferenceEdge] + KryoClassInteger[VertexIDClass] + KryoClassObject[Edge ID] + KryoObject[Vertex ID] + KryoObject[Vertex ID]
> {code}
> The math says:
> {code:java}
> 36 bytes = 4 bytes + 4 bytes + (4 bytes + 8 bytes [long]) + 8 bytes [long] + 8 bytes [long]
> {code}
> Similar techniques would apply to {{ReferenceVertexProperty}} and {{ReferenceProperty}}.
> *StarGraph*
> Right now we serialize first the vertex, then its edges, then its properties. We should do vertex, properties, edges. Why? If we know that the vertex is to be filtered (which is an analysis of its label/id/properties), then we can skip over analyzing its edges. Right now, we may do all this work deserializing edges only to realize that the GraphFilter says that the vertex is filtered. Dah, pointless clock cycles – especially when edge sets can be massive.
> {{StarGraph}} is used by the Hadoop {{GraphComputers}} and represents a vertex, its properties, its incident edges, and their properties. In essence, one "row of an adjacency list."
> Here are some ideas on how to make the next version of the serialization format more efficient.
> 1. For all Element ids, we currently use {{kryo.readClassAndObject(...)}}. This is bad because we have to write the class with each id. It would be better if the {{StarGraph}} had metadata like {{vertexIdClass}}, {{vertexPropertyIdClass}}, and {{edgeIdClass}}. Now for every vertex we are serializing three class, but the benefit is that every id class is now known and we can use {{kryo.readObject(..., xxxIdClass)}}.
> 2. Edges and VertexProperties are written out as {{[ edgeLabel[ edge[ id, otherVertexId]*]*}} and {{[ propertyKey[ vertexProperty[ id,propertyValue]*]*}}, respectively. This ensures we don't write so many strings as all edges/vertex properties are grouped by label. However, we do NOT do this for edge properties nor vertex property properties. We simply write out the {{Map<Object,Map<String,Object>>}} which is {{Map<EdgeId,Map<PropertyKey,PropertyValue>>}}. Since we have to choose between grouping by edgeId or by propertyKey, we should keep it as it is, but create a "meta map" that allows us to represent all property keys in a, e.g., {{int}} space. Thus, {{Map<EdgeId,Map<PropertyKeyIntegerId,PropertyValue>>}} where we also have a {{Map<PropertyKeyIntegerId,String>}} that is serialized with the {{StarGraph}}.
> StarGraph also has a Long identifer - This makes no sense as then each StarGraph in the full Graph will have similar ids! Moreover, what is referencing what when the adjacent vertices are just arbitrary long ids?!! We should require that StarGraph get provided ids for vertices (and perhaps edges)... We ensure no inconsistencies and we save 64-bits.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)