You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by DO YUNG YOON <sh...@gmail.com> on 2016/11/24 03:44:36 UTC

[DISCUSS] Support Apache TinkerPop and Gremlin

Hi folks.

After discussion at ApacheCon BigData Europe(sevile), I was wondering if it
is possible to change S2Graph's core library to implement tp3's interface
directly rather than providing layer atop of existing codebase.

I have updated corresponding issue
<https://issues.apache.org/jira/browse/S2GRAPH-72> and create 2 sub tasks(
S2GRAPH-129 <https://issues.apache.org/jira/browse/S2GRAPH-129> ,
S2GRAPH-130 <https://issues.apache.org/jira/browse/S2GRAPH-130> ) to try
out this idea.

@committers, Please review PR99
<https://github.com/apache/incubator-s2graph/pull/99>, PR100
<https://github.com/apache/incubator-s2graph/pull/100> so we can proceed to
implement all interfaces of tp3 actually. I intentionally left actual
implementation omitted because it can be changed after this discussion.

Apart from that, Here are few things I want to discuss regarding support
Apache TinkerPop and Gremlin.

1. Data type of property value. Currently S2Graph only support types
available on JSON. is this ok? are we going to support any other type? If
then, What need to be done to support other data type on property's value.

2. No notion of VertexProperty. Property is same on Vertex and Edge in
S2Graph so we have to decide what's our S2VertexProperty would be. Are we
going to support this or just say we can't provide it(for now or what).

3. Vertex Id: S2Graph use ServiceColumn + UserProvidedId as internal vertex
Id. We need to decide how we are going to map ServiceColumn into tp3's
Verte. Are we going to serialize/deserialize ServiceColumn into tp3's
Vertex label or not? Not just about ServiceColumn but want to discuss
further about what S2Graph are going to provide through tp3's interface and
how.

Please feel free to comment on not only above but also anything regarding
to tp3 support in general.

Thanks.

Re: [DISCUSS] Support Apache TinkerPop and Gremlin

Posted by DO YUNG YOON <sh...@gmail.com>.
I has been working on this issue for a while, and I finally opened PR that
I believe the right direction(
https://github.com/apache/incubator-s2graph/pull/112).
Please review PR112 and give any feedback.
Here are some important notes on this PR.

1. Data type of property value.

checkout
https://github.com/apache/incubator-s2graph/pull/112/files#diff-8caf8eace8a4d2a42e1b0279d531d286

basically, currently we are only support data type already supported by
s2graph previously. more data type support is also possible, but on
seperate issue later if necessary.

2. No notion of VertexProperty. Property is same on Vertex and Edge in
S2Graph so we have to decide what's our S2VertexProperty would be. Are we
going to support this or just say we can't provide it(for now or what).

checkout
https://github.com/apache/incubator-s2graph/pull/112/files#diff-b64b1af513f07d8e34fb498c7618cf67

currently, only Cardinality.single vertex property is supported.

3. Vertex Id: S2Graph use ServiceColumn + UserProvidedId as internal vertex
Id. We need to decide how we are going to map ServiceColumn into tp3's
Verte. Are we going to serialize/deserialize ServiceColumn into tp3's
Vertex label or not? Not just about ServiceColumn but want to discuss
further about what S2Graph are going to provide through tp3's interface and
how.

checkout
https://github.com/apache/incubator-s2graph/pull/112/files#diff-95ac55266df22a798b8f3ac2d9298ead

it basically specify how to serialize/deserialize S2Graph's VertexId/EdgeId
into Tp3's id() method.

Also here is how to run tp3 test suite.

just run junit test on
org.apache.s2graph.core.tinkerpop.structure.S2GraphStructureStandardTest/org.apache.s2graph.core.tinkerpop.process.S2GraphProcessStandardTest
without any setup.

There are lots of tests so it will take some time.

One thing I found useful for debugging is setting environment variables
GREMLIN_TESTS as test class name such as
org.apache.tinkerpop.gremlin.structure.GraphTest, then it will run the one
test case only.

Also there are a few OptOuts on S2Graph.

Most of them are there because I think it is not currently possible to pass
such test cases. These are solely based on my knowledge so please ask
anything if it seems inappropriate.

Even though I believe that PR112 is valid implementation of tp3 interface,
there are many more things remain.

- TraversalStrategy: we do not have any provider optimization yet(
http://tinkerpop.apache.org/docs/current/reference/#traversalstrategy).
I think there are a few optimization we can provide.

ex) `g.V(vid/v).outE` will lookup graph by vid/v then finally return all
adjacent edges start from this vertex. current implementation in PR use
Await to wait I/O request to storage backend for V(vid/v) one time, then
after that Await S2Vertex.edges method for vertex fetched. this simply
require 2 I/O request, and 2 Await.
in S2Graph this query can be reduced to first create vertext to fetch in
memory, then fire one I/O request to storage backend which I think
efficient.

above is very limited example, but just want to know what others think.

- Global index: checkout http://markmail.org/message/2vn2bwrwh5zbeie4.
While I was going this issue, I noticed S2Graph do not have index provider
layer for global index.

ex) `g.V().has("name", "marko")`. current implementation do not have global
index provider, so it will fetch all vertex and then check if it has
property name as 'mark'.
check out
http://tinkerpop.apache.org/docs/current/reference/#traversalstrategy.
Basically, we need some layer that get traversal, then modify it using
global index. how to build global index is listed on
http://markmail.org/message/2vn2bwrwh5zbeie4.

- GremlinPlugin(https://issues.apache.org/jira/browse/S2GRAPH-148)
For user to try out S2Graph through tinkerpop APIs on Gremlin Console,
Gremlin Server, I believe we should provide `S2GraphGremlinPlugin`.

- OLAP(GraphComputer) support
I have not gone through GraphComputer parts yet(
http://tinkerpop.apache.org/docs/current/reference/#graphcomputer), but I
think s2graph can take benefits from tp3's OLAP framework.


All of above needs some help from community which is very limited
currently.
Please feel free to open issue/discussion on above or any other things we
should think about.

Best Regards.
DOYUNG YOON

On Thu, Nov 24, 2016 at 12:45 PM DO YUNG YOON <sh...@gmail.com> wrote:

> Hi folks.
>
> After discussion at ApacheCon BigData Europe(sevile), I was wondering if
> it is possible to change S2Graph's core library to implement tp3's
> interface directly rather than providing layer atop of existing codebase.
>
> I have updated corresponding issue
> <https://issues.apache.org/jira/browse/S2GRAPH-72> and create 2 sub tasks(
> S2GRAPH-129 <https://issues.apache.org/jira/browse/S2GRAPH-129> ,
> S2GRAPH-130 <https://issues.apache.org/jira/browse/S2GRAPH-130> ) to try
> out this idea.
>
> @committers, Please review PR99
> <https://github.com/apache/incubator-s2graph/pull/99>, PR100
> <https://github.com/apache/incubator-s2graph/pull/100> so we can proceed
> to implement all interfaces of tp3 actually. I intentionally left actual
> implementation omitted because it can be changed after this discussion.
>
> Apart from that, Here are few things I want to discuss regarding support
> Apache TinkerPop and Gremlin.
>
> 1. Data type of property value. Currently S2Graph only support types
> available on JSON. is this ok? are we going to support any other type? If
> then, What need to be done to support other data type on property's value.
>
> 2. No notion of VertexProperty. Property is same on Vertex and Edge in
> S2Graph so we have to decide what's our S2VertexProperty would be. Are we
> going to support this or just say we can't provide it(for now or what).
>
> 3. Vertex Id: S2Graph use ServiceColumn + UserProvidedId as internal
> vertex Id. We need to decide how we are going to map ServiceColumn into
> tp3's Verte. Are we going to serialize/deserialize ServiceColumn into tp3's
> Vertex label or not? Not just about ServiceColumn but want to discuss
> further about what S2Graph are going to provide through tp3's interface and
> how.
>
> Please feel free to comment on not only above but also anything regarding
> to tp3 support in general.
>
> Thanks.
>
>