You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@s2graph.apache.org by DO YUNG YOON <sh...@gmail.com> on 2017/07/03 13:11:38 UTC

[DISCUSSION] second release schedule and scope

Hi folks.

It's been for a while we released our first release.
It seems that needs for implementing tinkerpop interface has been high, but
we have not finished it. I have been working on
https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
recently merged it into master.

I think Gremlin-core is tested, but following is what I think we have to
improve for tinkerpop users to try out s2graph easily.

1. provide provider optimization, we have none currently.
2. full text search predicate is not currently supported(as @echarles
pointed out)
3. provide gremlin plugin
4. make sure tinkerpop stack works correctly.

Any help on above issues would be highly appreciated(help on any other
issue would be also highly appreciated).

By the way, What I want to discuss is the schedule and what will be
included on our second release.

I suggest to focus on integrate with tinkerpop on our second release. It
would be best if we can address above issues by this month, but I doubt if
it is possible.

I am suggesting fix our release date on late this month, then focus on
above issues with high priority. if we can address them all, great, but if
we can't, then release with version as much as we can deliver in time, then
move them on next next release so on.

Want to hear what other folks think about focus and schedule on our second
release, and happy to volunteer as release manager for this time if there
are no other volunteer.

If there are other issues which anyone think to be included on next
release, please list them on this thread.

Thanks

DO YUNG YOON

Re: [DISCUSSION] second release schedule and scope

Posted by DO YUNG YOON <sh...@gmail.com>.
Seems like everyone agrees on second release and I am going to work on
building release candidates this weekend.

Thanks for testing and giving feedbacks.

On Thu, Aug 3, 2017 at 12:34 PM Hwansung Yu <de...@gmail.com> wrote:

> I checkouted PR 115 (included daewon's commit) and executed the test.
> All test were confirmed to have passed.
>
> And then, I tried running gremlin example and this worked well.
> As you might think, this is enough to release.
>
> You did good job. This PR will make a great contribution to our community
> as well.
>
> Best Regards.
>
> On Wed, Aug 2, 2017 at 3:27 PM, daewon <da...@apache.org> wrote:
>
> > I agree with the above comments and I agree with the release.
> >
> > On Tue, Aug 1, 2017 at 7:45 PM DO YUNG YOON <sh...@gmail.com> wrote:
> >
> > > Updates on our second release scope and schedule.
> > >
> > > Since Hwansung suggest to resolve tinkerpop related issue before second
> > > release, I was working on the S2GRAPH-151, S2GRAPH-148.
> > >
> > > Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done)
> > and
> > > S2GRAPH-148 has PR ready.
> > >
> > > Please review https://github.com/apache/incubator-s2graph/pull/115 .
> > >
> > > As this point, I think we are ready for second release.
> > >
> > > Followings are issues I raised first.
> > >
> > > 1. provide provider optimization, we have none currently.
> > > - S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId
> > from
> > > IndexProvider such as Lucene.
> > > - Other optimization can be added on consecutive releases.
> > >
> > > 2. full text search predicate is not currently supported(as @echarles
> > > pointed out)
> > > - S2GRAPH-153 resolve this by using lucene as IndexProvider.
> > > - g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId
> from
> > > IndexProvider then actually lookup Storage for Edge/Vertex.
> > > - IndexProvider interface currently not optimized for large amount of
> > > documents hit, but this can be improved later.
> > >
> > > 3. provide gremlin plugin
> > > - S2GRAPH-148 provide subproject call s2graph-gremlin which contains
> > > S2GraphGremlinPlugin.
> > > - After merging https://github.com/apache/incubator-s2graph/pull/115,
> > > users
> > > can use gremlin-console to try out S2Graph.
> > >
> > > 4. make sure tinkerpop stack works correctly.
> > > - S2GRAPH-148 make sure gremlin-conole is working properly.
> > > - However, I found out it is too tedius to use scala code in
> > > gremlin-console(groovy), so I think creating java client can improve
> > > usability, but this also can be done later.
> > >
> > > In summary, I have resolved tinkerpop related issues, not totally, but
> > just
> > > enough for others to try out.
> > >
> > > I suggest to build our second release candidates at this point if there
> > is
> > > no objection.
> > > I want to hear what others think.
> > >
> > >
> > >
> > > On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <sh...@gmail.com> wrote:
> > >
> > > > Thanks for your feedback. Here is my questions.
> > > >
> > > > 1. Release schedule:
> > > > - Do you think we should wait until all issues with tinkerpop support
> > > > resolved after?
> > > >
> > > > What others think about the release schedule?
> > > >
> > > > Should we wait until all of tinkerpop related issues resolving?
> > > > Can you guys list up "must resolve" issues on our second release?
> > > > The reason I mentioned index is I think it is the only one blocker
> > issue
> > > > from list for next release.
> > > >
> > > > 2. Full-Text search:
> > > > - There would be 2 types of index support with
> > variation(mixed/composite)
> > > > - Graph-Index: s2graph do not have this type of index.
> > > > - Composite-Index
> > > > - Mixed-Index
> > > > - Vertex-Centric-Index: s2graph do have this type of index.
> > > >
> > > >
> > > > Since they are two different type of index, it is inevitable to
> provide
> > > > them as separate option.
> > > >
> > > > I doubt there could be confusion between graph-index and
> > > > vertex-centric-index and always clarify it on documentation.
> > > >
> > > > If we agree that graph index layer is necessary, then develop the
> > > features
> > > > first, then see if there could be confusion and decide what to do to
> > > > clarify it. I think you agree that graph-index is necessary addition
> on
> > > > project(tell me if you don't).
> > > >
> > > > Continue on more details on index topic.
> > > >
> > > > Following is what titan provide and I think it would be nice if we
> can
> > > > provide this in S2Graph so let me briefly explain. (I suggest read
> > > through
> > > > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are
> > not
> > > > familiar with notations)
> > > >
> > > > 1. composite
> > > >
> > > > Composite indexes retrieve vertices or edges by one or a (fixed)
> > > > composition of multiple keys.
> > > >
> > > > this example is how user can create composite index on titan.
> > > >
> > > > ```
> > > > mgmt.buildIndex('byNameAndAgeComposite',
> > > > Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
> > > > mgmt.commit()
> > > > ```
> > > >
> > > > then following traversal take benefit from `byNameComposite` index.
> > > >
> > > > ```
> > > > g.V().has('age', 30).has('name', 'hercules')
> > > > ```
> > > >
> > > > We can use HBase to store this index by creating row key as ("age",
> 30,
> > > > "name", "hercules").
> > > >
> > > > ```
> > > > g.V().has('name', 'hercules').has('age', 30)
> > > > ```
> > > >
> > > > To answer above traveral, it seems to sort property key and value in
> > > > composite index.
> > > >
> > > > we can also make partial composite index such as below.
> > > >
> > > > ```
> > > > ("age", 30)
> > > > ("name", "hercules")
> > > > ```
> > > >
> > > > I am not sure if this is necessary. user can explicitly create above
> as
> > > > seperate index such as 'byName', 'byAge'.
> > > >
> > > > One more suggestion is provide option to partition index, since there
> > > > could be lots of vertices/edges that has specific value. for example,
> > > > 'byCountryGender' index can contains lots of vertices/edges and it is
> > > > problematic to store vertices/edges on same HBase region. we need to
> > > > auto-partition theses into user specified number of partition by
> prefix
> > > > salt. This is optimization step so can be revisited once we have
> > > > functionality working.
> > > >
> > > > Note that composite index is only for comparing equality so following
> > > > traversal can't take advantage of index.
> > > >
> > > > ```
> > > > g.V().has('name', 'hercules').has('age', inside(20, 50))
> > > > ```
> > > >
> > > > 2. mixed
> > > >
> > > > Mixed indexes retrieve vertices or edges by any combination of
> > previously
> > > > added property keys. full text search can be powered by mixed index,
> > but
> > > it
> > > > may slower than composite index since it include external index
> backend
> > > > search(lucene, solr, elasticsearch, ...).
> > > >
> > > > this example is how user can create mixed index on titan.
> > > >
> > > > ```
> > > >
> > > >
> > > mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,
> > Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).
> > buildMixedIndex("search")
> > > > ```
> > > > user can decide use tokenizer when search engine index(named search)
> by
> > > > specifing Mapping(String or TEXT, default TEXT provide full text
> > search).
> > > >
> > > > then following traversal take benefit from `nameAndAge` index.
> > > >
> > > > ```
> > > > g.V().has('name', textContains('hercules')).has('age', inside(20,
> 50))
> > > > g.V().has('name', textContains('hercules'))
> > > > g.V().has('age', lt(50))
> > > > ```
> > > >
> > > > we can use elasticsearch/lucene/solr as index backend for this type
> of
> > > > index and actual tasks can be splitted by as following.
> > > >
> > > > If there is no objection, then I will create index task and list
> above
> > > > subtasks under it.
> > > >
> > > > One possible tasks list can be described as following.
> > > >
> > > > 1. Management Client:
> > > > - add option to speficy index type on creating ServiceColumn/Label.
> > > > 2. Storage:
> > > > - add method to build mutation for storage backend when set of
> > > > vertexs/edges are given.
> > > > - add method to call index backend with built mutation.
> > > > 3. Serializer/Deserializer:
> > > > - serializer: when a edge/vertex is given, build SKeyValue which can
> be
> > > > used by storage methods.
> > > > - deserializer: when byte array is given, build a Vertex/Edge that
> can
> > be
> > > > used by storage methods.
> > > > 4. ProviderOptimization
> > > > - tinkerpop ask provider to translate given traversal into
> > implementation
> > > > specific functions.
> > > > - not sure if this is necessary with my limited knowledge so far, but
> > > need
> > > > to check once S2Graph internal provide composite/mixed index.
> > > >
> > > > Any feedback would be appreciated.
> > > >
> > > >
> > > > On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <de...@gmail.com>
> > > wrote:
> > > >
> > > >> Sorry for late reply.
> > > >>
> > > >> I think it is important to implement Tinkerpop in terms of
> > functionality
> > > >> of
> > > >> S2Graph and for the activation of the community.
> > > >> I agree with your suggestion to concentrate on tinkerpop
> > implementation
> > > >> issues in the second release.
> > > >> In my opinion, the time of release is when the tinkerpop
> > implementation
> > > >> issue is cleaned up.
> > > >>
> > > >> And with regard to full text search...
> > > >> If full-text search is supported, we expect that constraints that
> were
> > > >> able
> > > >> to traversal will disappear only if the vertex is known.
> > > >> If supported, it would be better to leave it as a separate option to
> > > avoid
> > > >> confusion with existing indexes.
> > > >>
> > > >> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sh...@gmail.com>
> > wrote:
> > > >>
> > > >> > I guess there is no objection on my suggestion, so I am going to
> try
> > > >> list
> > > >> > up issues in more detail while preparing 0.2.0 release on late
> this
> > > >> month.
> > > >> >
> > > >> > Before list up above issues as task on jira, I want to discuss
> index
> > > in
> > > >> > more details.
> > > >> >
> > > >> > Following is my understanding on index to support tinkerpop fully
> > and
> > > >> > efficiently
> > > >> > - reference:
> > > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
> > > >> >
> > > >> > 1. graph index: traversal from a list of vertices or edges that
> are
> > > >> > identified by their properties
> > > >> >
> > > >> > 2. vertex-centric index: traversal through vertices with many
> > incident
> > > >> > edges.
> > > >> >
> > > >> > I believe s2graph has vertex-centric index already, but it does
> not
> > > have
> > > >> > graph index layer so full text predicate, and range search
> features
> > in
> > > >> > tinkerpop runs very inefficiently.
> > > >> >
> > > >> > For example, following traversal run full scan.
> > > >> >
> > > >> > - g.V().has('name', 'hercules')
> > > >> > - g.E().has('reason', textContains('loves'))
> > > >> >
> > > >> > To support full tinkerpop features efficiently, we need to add
> graph
> > > >> index
> > > >> > layer and I want to discuss how we are going to achieve this. like
> > > >> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4)
> using
> > > >> > external
> > > >> > search engine totally make sense to me.
> > > >> >
> > > >> > I suggest to design index management interface first, since graph
> > > index
> > > >> has
> > > >> > never exist in S2Graph previously. then decision about index
> storage
> > > >> > backend, implementation can be discussed in more detail(the other
> > way
> > > >> > around could also possible).
> > > >> >
> > > >> > Following is how user create index in s2graph currently.
> > > >> >
> > > >> > Management.createServiceColumn(
> > > >> > serviceName = serviceName, columnName = "person", columnType =
> > > >> "integer",
> > > >> >     props = Seq(
> > > >> >     Prop("name", "-", "string"),
> > > >> >     Prop("age", "0", "integer"),
> > > >> >     Prop("location", "-", "string")
> > > >> >     )
> > > >> > )
> > > >> >
> > > >> > management.createLabel(
> > > >> > label = "bought",
> > > >> >     srcServiceName = serviceName, srcColumnName = "person",
> > > >> srcColumnType =
> > > >> > "integer",
> > > >> >     tgtServiceName = serviceName, tgtColumnName = "product",
> > > >> tgtColumnType
> > > >> > = "integer", idDirected = true,
> > > >> >     serviceName = serviceName,
> > > >> >     indices = Seq(
> > > >> >     Index("PK", Seq("amount", "created_at")
> > > >> >     ),
> > > >> >     props = Seq(
> > > >> >     Prop("amount", "0.0", "double"),
> > > >> >     Prop("created_at", "2000-01-01", "string")
> > > >> >     ),
> > > >> >     consistencyLevel = "strong"
> > > >> > )
> > > >> >
> > > >> > How we going to let user to create graph-index? Should we add
> extra
> > > >> > parameters on existing methods, or provide separate methods?
> > > >> >
> > > >> >
> > > >> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com>
> > > wrote:
> > > >> >
> > > >> > > Hi folks.
> > > >> > >
> > > >> > > It's been for a while we released our first release.
> > > >> > > It seems that needs for implementing tinkerpop interface has
> been
> > > >> high,
> > > >> > > but we have not finished it. I have been working on
> > > >> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April,
> > then
> > > >> > > recently merged it into master.
> > > >> > >
> > > >> > > I think Gremlin-core is tested, but following is what I think we
> > > have
> > > >> to
> > > >> > > improve for tinkerpop users to try out s2graph easily.
> > > >> > >
> > > >> > > 1. provide provider optimization, we have none currently.
> > > >> > > 2. full text search predicate is not currently supported(as
> > > @echarles
> > > >> > > pointed out)
> > > >> > > 3. provide gremlin plugin
> > > >> > > 4. make sure tinkerpop stack works correctly.
> > > >> > >
> > > >> > > Any help on above issues would be highly appreciated(help on any
> > > other
> > > >> > > issue would be also highly appreciated).
> > > >> > >
> > > >> > > By the way, What I want to discuss is the schedule and what will
> > be
> > > >> > > included on our second release.
> > > >> > >
> > > >> > > I suggest to focus on integrate with tinkerpop on our second
> > > release.
> > > >> It
> > > >> > > would be best if we can address above issues by this month, but
> I
> > > >> doubt
> > > >> > if
> > > >> > > it is possible.
> > > >> > >
> > > >> > > I am suggesting fix our release date on late this month, then
> > focus
> > > on
> > > >> > > above issues with high priority. if we can address them all,
> > great,
> > > >> but
> > > >> > if
> > > >> > > we can't, then release with version as much as we can deliver in
> > > time,
> > > >> > then
> > > >> > > move them on next next release so on.
> > > >> > >
> > > >> > > Want to hear what other folks think about focus and schedule on
> > our
> > > >> > second
> > > >> > > release, and happy to volunteer as release manager for this time
> > if
> > > >> there
> > > >> > > are no other volunteer.
> > > >> > >
> > > >> > > If there are other issues which anyone think to be included on
> > next
> > > >> > > release, please list them on this thread.
> > > >> > >
> > > >> > > Thanks
> > > >> > >
> > > >> > > DO YUNG YOON
> > > >> > >
> > > >> > >
> > > >> >
> > > >>
> > > >
> > >
> >
>

Re: [DISCUSSION] second release schedule and scope

Posted by Hwansung Yu <de...@gmail.com>.
I checkouted PR 115 (included daewon's commit) and executed the test.
All test were confirmed to have passed.

And then, I tried running gremlin example and this worked well.
As you might think, this is enough to release.

You did good job. This PR will make a great contribution to our community
as well.

Best Regards.

On Wed, Aug 2, 2017 at 3:27 PM, daewon <da...@apache.org> wrote:

> I agree with the above comments and I agree with the release.
>
> On Tue, Aug 1, 2017 at 7:45 PM DO YUNG YOON <sh...@gmail.com> wrote:
>
> > Updates on our second release scope and schedule.
> >
> > Since Hwansung suggest to resolve tinkerpop related issue before second
> > release, I was working on the S2GRAPH-151, S2GRAPH-148.
> >
> > Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done)
> and
> > S2GRAPH-148 has PR ready.
> >
> > Please review https://github.com/apache/incubator-s2graph/pull/115 .
> >
> > As this point, I think we are ready for second release.
> >
> > Followings are issues I raised first.
> >
> > 1. provide provider optimization, we have none currently.
> > - S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId
> from
> > IndexProvider such as Lucene.
> > - Other optimization can be added on consecutive releases.
> >
> > 2. full text search predicate is not currently supported(as @echarles
> > pointed out)
> > - S2GRAPH-153 resolve this by using lucene as IndexProvider.
> > - g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId from
> > IndexProvider then actually lookup Storage for Edge/Vertex.
> > - IndexProvider interface currently not optimized for large amount of
> > documents hit, but this can be improved later.
> >
> > 3. provide gremlin plugin
> > - S2GRAPH-148 provide subproject call s2graph-gremlin which contains
> > S2GraphGremlinPlugin.
> > - After merging https://github.com/apache/incubator-s2graph/pull/115,
> > users
> > can use gremlin-console to try out S2Graph.
> >
> > 4. make sure tinkerpop stack works correctly.
> > - S2GRAPH-148 make sure gremlin-conole is working properly.
> > - However, I found out it is too tedius to use scala code in
> > gremlin-console(groovy), so I think creating java client can improve
> > usability, but this also can be done later.
> >
> > In summary, I have resolved tinkerpop related issues, not totally, but
> just
> > enough for others to try out.
> >
> > I suggest to build our second release candidates at this point if there
> is
> > no objection.
> > I want to hear what others think.
> >
> >
> >
> > On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <sh...@gmail.com> wrote:
> >
> > > Thanks for your feedback. Here is my questions.
> > >
> > > 1. Release schedule:
> > > - Do you think we should wait until all issues with tinkerpop support
> > > resolved after?
> > >
> > > What others think about the release schedule?
> > >
> > > Should we wait until all of tinkerpop related issues resolving?
> > > Can you guys list up "must resolve" issues on our second release?
> > > The reason I mentioned index is I think it is the only one blocker
> issue
> > > from list for next release.
> > >
> > > 2. Full-Text search:
> > > - There would be 2 types of index support with
> variation(mixed/composite)
> > > - Graph-Index: s2graph do not have this type of index.
> > > - Composite-Index
> > > - Mixed-Index
> > > - Vertex-Centric-Index: s2graph do have this type of index.
> > >
> > >
> > > Since they are two different type of index, it is inevitable to provide
> > > them as separate option.
> > >
> > > I doubt there could be confusion between graph-index and
> > > vertex-centric-index and always clarify it on documentation.
> > >
> > > If we agree that graph index layer is necessary, then develop the
> > features
> > > first, then see if there could be confusion and decide what to do to
> > > clarify it. I think you agree that graph-index is necessary addition on
> > > project(tell me if you don't).
> > >
> > > Continue on more details on index topic.
> > >
> > > Following is what titan provide and I think it would be nice if we can
> > > provide this in S2Graph so let me briefly explain. (I suggest read
> > through
> > > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are
> not
> > > familiar with notations)
> > >
> > > 1. composite
> > >
> > > Composite indexes retrieve vertices or edges by one or a (fixed)
> > > composition of multiple keys.
> > >
> > > this example is how user can create composite index on titan.
> > >
> > > ```
> > > mgmt.buildIndex('byNameAndAgeComposite',
> > > Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
> > > mgmt.commit()
> > > ```
> > >
> > > then following traversal take benefit from `byNameComposite` index.
> > >
> > > ```
> > > g.V().has('age', 30).has('name', 'hercules')
> > > ```
> > >
> > > We can use HBase to store this index by creating row key as ("age", 30,
> > > "name", "hercules").
> > >
> > > ```
> > > g.V().has('name', 'hercules').has('age', 30)
> > > ```
> > >
> > > To answer above traveral, it seems to sort property key and value in
> > > composite index.
> > >
> > > we can also make partial composite index such as below.
> > >
> > > ```
> > > ("age", 30)
> > > ("name", "hercules")
> > > ```
> > >
> > > I am not sure if this is necessary. user can explicitly create above as
> > > seperate index such as 'byName', 'byAge'.
> > >
> > > One more suggestion is provide option to partition index, since there
> > > could be lots of vertices/edges that has specific value. for example,
> > > 'byCountryGender' index can contains lots of vertices/edges and it is
> > > problematic to store vertices/edges on same HBase region. we need to
> > > auto-partition theses into user specified number of partition by prefix
> > > salt. This is optimization step so can be revisited once we have
> > > functionality working.
> > >
> > > Note that composite index is only for comparing equality so following
> > > traversal can't take advantage of index.
> > >
> > > ```
> > > g.V().has('name', 'hercules').has('age', inside(20, 50))
> > > ```
> > >
> > > 2. mixed
> > >
> > > Mixed indexes retrieve vertices or edges by any combination of
> previously
> > > added property keys. full text search can be powered by mixed index,
> but
> > it
> > > may slower than composite index since it include external index backend
> > > search(lucene, solr, elasticsearch, ...).
> > >
> > > this example is how user can create mixed index on titan.
> > >
> > > ```
> > >
> > >
> > mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,
> Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).
> buildMixedIndex("search")
> > > ```
> > > user can decide use tokenizer when search engine index(named search) by
> > > specifing Mapping(String or TEXT, default TEXT provide full text
> search).
> > >
> > > then following traversal take benefit from `nameAndAge` index.
> > >
> > > ```
> > > g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
> > > g.V().has('name', textContains('hercules'))
> > > g.V().has('age', lt(50))
> > > ```
> > >
> > > we can use elasticsearch/lucene/solr as index backend for this type of
> > > index and actual tasks can be splitted by as following.
> > >
> > > If there is no objection, then I will create index task and list above
> > > subtasks under it.
> > >
> > > One possible tasks list can be described as following.
> > >
> > > 1. Management Client:
> > > - add option to speficy index type on creating ServiceColumn/Label.
> > > 2. Storage:
> > > - add method to build mutation for storage backend when set of
> > > vertexs/edges are given.
> > > - add method to call index backend with built mutation.
> > > 3. Serializer/Deserializer:
> > > - serializer: when a edge/vertex is given, build SKeyValue which can be
> > > used by storage methods.
> > > - deserializer: when byte array is given, build a Vertex/Edge that can
> be
> > > used by storage methods.
> > > 4. ProviderOptimization
> > > - tinkerpop ask provider to translate given traversal into
> implementation
> > > specific functions.
> > > - not sure if this is necessary with my limited knowledge so far, but
> > need
> > > to check once S2Graph internal provide composite/mixed index.
> > >
> > > Any feedback would be appreciated.
> > >
> > >
> > > On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <de...@gmail.com>
> > wrote:
> > >
> > >> Sorry for late reply.
> > >>
> > >> I think it is important to implement Tinkerpop in terms of
> functionality
> > >> of
> > >> S2Graph and for the activation of the community.
> > >> I agree with your suggestion to concentrate on tinkerpop
> implementation
> > >> issues in the second release.
> > >> In my opinion, the time of release is when the tinkerpop
> implementation
> > >> issue is cleaned up.
> > >>
> > >> And with regard to full text search...
> > >> If full-text search is supported, we expect that constraints that were
> > >> able
> > >> to traversal will disappear only if the vertex is known.
> > >> If supported, it would be better to leave it as a separate option to
> > avoid
> > >> confusion with existing indexes.
> > >>
> > >> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sh...@gmail.com>
> wrote:
> > >>
> > >> > I guess there is no objection on my suggestion, so I am going to try
> > >> list
> > >> > up issues in more detail while preparing 0.2.0 release on late this
> > >> month.
> > >> >
> > >> > Before list up above issues as task on jira, I want to discuss index
> > in
> > >> > more details.
> > >> >
> > >> > Following is my understanding on index to support tinkerpop fully
> and
> > >> > efficiently
> > >> > - reference:
> > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
> > >> >
> > >> > 1. graph index: traversal from a list of vertices or edges that are
> > >> > identified by their properties
> > >> >
> > >> > 2. vertex-centric index: traversal through vertices with many
> incident
> > >> > edges.
> > >> >
> > >> > I believe s2graph has vertex-centric index already, but it does not
> > have
> > >> > graph index layer so full text predicate, and range search features
> in
> > >> > tinkerpop runs very inefficiently.
> > >> >
> > >> > For example, following traversal run full scan.
> > >> >
> > >> > - g.V().has('name', 'hercules')
> > >> > - g.E().has('reason', textContains('loves'))
> > >> >
> > >> > To support full tinkerpop features efficiently, we need to add graph
> > >> index
> > >> > layer and I want to discuss how we are going to achieve this. like
> > >> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
> > >> > external
> > >> > search engine totally make sense to me.
> > >> >
> > >> > I suggest to design index management interface first, since graph
> > index
> > >> has
> > >> > never exist in S2Graph previously. then decision about index storage
> > >> > backend, implementation can be discussed in more detail(the other
> way
> > >> > around could also possible).
> > >> >
> > >> > Following is how user create index in s2graph currently.
> > >> >
> > >> > Management.createServiceColumn(
> > >> > serviceName = serviceName, columnName = "person", columnType =
> > >> "integer",
> > >> >     props = Seq(
> > >> >     Prop("name", "-", "string"),
> > >> >     Prop("age", "0", "integer"),
> > >> >     Prop("location", "-", "string")
> > >> >     )
> > >> > )
> > >> >
> > >> > management.createLabel(
> > >> > label = "bought",
> > >> >     srcServiceName = serviceName, srcColumnName = "person",
> > >> srcColumnType =
> > >> > "integer",
> > >> >     tgtServiceName = serviceName, tgtColumnName = "product",
> > >> tgtColumnType
> > >> > = "integer", idDirected = true,
> > >> >     serviceName = serviceName,
> > >> >     indices = Seq(
> > >> >     Index("PK", Seq("amount", "created_at")
> > >> >     ),
> > >> >     props = Seq(
> > >> >     Prop("amount", "0.0", "double"),
> > >> >     Prop("created_at", "2000-01-01", "string")
> > >> >     ),
> > >> >     consistencyLevel = "strong"
> > >> > )
> > >> >
> > >> > How we going to let user to create graph-index? Should we add extra
> > >> > parameters on existing methods, or provide separate methods?
> > >> >
> > >> >
> > >> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com>
> > wrote:
> > >> >
> > >> > > Hi folks.
> > >> > >
> > >> > > It's been for a while we released our first release.
> > >> > > It seems that needs for implementing tinkerpop interface has been
> > >> high,
> > >> > > but we have not finished it. I have been working on
> > >> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April,
> then
> > >> > > recently merged it into master.
> > >> > >
> > >> > > I think Gremlin-core is tested, but following is what I think we
> > have
> > >> to
> > >> > > improve for tinkerpop users to try out s2graph easily.
> > >> > >
> > >> > > 1. provide provider optimization, we have none currently.
> > >> > > 2. full text search predicate is not currently supported(as
> > @echarles
> > >> > > pointed out)
> > >> > > 3. provide gremlin plugin
> > >> > > 4. make sure tinkerpop stack works correctly.
> > >> > >
> > >> > > Any help on above issues would be highly appreciated(help on any
> > other
> > >> > > issue would be also highly appreciated).
> > >> > >
> > >> > > By the way, What I want to discuss is the schedule and what will
> be
> > >> > > included on our second release.
> > >> > >
> > >> > > I suggest to focus on integrate with tinkerpop on our second
> > release.
> > >> It
> > >> > > would be best if we can address above issues by this month, but I
> > >> doubt
> > >> > if
> > >> > > it is possible.
> > >> > >
> > >> > > I am suggesting fix our release date on late this month, then
> focus
> > on
> > >> > > above issues with high priority. if we can address them all,
> great,
> > >> but
> > >> > if
> > >> > > we can't, then release with version as much as we can deliver in
> > time,
> > >> > then
> > >> > > move them on next next release so on.
> > >> > >
> > >> > > Want to hear what other folks think about focus and schedule on
> our
> > >> > second
> > >> > > release, and happy to volunteer as release manager for this time
> if
> > >> there
> > >> > > are no other volunteer.
> > >> > >
> > >> > > If there are other issues which anyone think to be included on
> next
> > >> > > release, please list them on this thread.
> > >> > >
> > >> > > Thanks
> > >> > >
> > >> > > DO YUNG YOON
> > >> > >
> > >> > >
> > >> >
> > >>
> > >
> >
>

Re: [DISCUSSION] second release schedule and scope

Posted by daewon <da...@apache.org>.
I agree with the above comments and I agree with the release.

On Tue, Aug 1, 2017 at 7:45 PM DO YUNG YOON <sh...@gmail.com> wrote:

> Updates on our second release scope and schedule.
>
> Since Hwansung suggest to resolve tinkerpop related issue before second
> release, I was working on the S2GRAPH-151, S2GRAPH-148.
>
> Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done) and
> S2GRAPH-148 has PR ready.
>
> Please review https://github.com/apache/incubator-s2graph/pull/115 .
>
> As this point, I think we are ready for second release.
>
> Followings are issues I raised first.
>
> 1. provide provider optimization, we have none currently.
> - S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId from
> IndexProvider such as Lucene.
> - Other optimization can be added on consecutive releases.
>
> 2. full text search predicate is not currently supported(as @echarles
> pointed out)
> - S2GRAPH-153 resolve this by using lucene as IndexProvider.
> - g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId from
> IndexProvider then actually lookup Storage for Edge/Vertex.
> - IndexProvider interface currently not optimized for large amount of
> documents hit, but this can be improved later.
>
> 3. provide gremlin plugin
> - S2GRAPH-148 provide subproject call s2graph-gremlin which contains
> S2GraphGremlinPlugin.
> - After merging https://github.com/apache/incubator-s2graph/pull/115,
> users
> can use gremlin-console to try out S2Graph.
>
> 4. make sure tinkerpop stack works correctly.
> - S2GRAPH-148 make sure gremlin-conole is working properly.
> - However, I found out it is too tedius to use scala code in
> gremlin-console(groovy), so I think creating java client can improve
> usability, but this also can be done later.
>
> In summary, I have resolved tinkerpop related issues, not totally, but just
> enough for others to try out.
>
> I suggest to build our second release candidates at this point if there is
> no objection.
> I want to hear what others think.
>
>
>
> On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <sh...@gmail.com> wrote:
>
> > Thanks for your feedback. Here is my questions.
> >
> > 1. Release schedule:
> > - Do you think we should wait until all issues with tinkerpop support
> > resolved after?
> >
> > What others think about the release schedule?
> >
> > Should we wait until all of tinkerpop related issues resolving?
> > Can you guys list up "must resolve" issues on our second release?
> > The reason I mentioned index is I think it is the only one blocker issue
> > from list for next release.
> >
> > 2. Full-Text search:
> > - There would be 2 types of index support with variation(mixed/composite)
> > - Graph-Index: s2graph do not have this type of index.
> > - Composite-Index
> > - Mixed-Index
> > - Vertex-Centric-Index: s2graph do have this type of index.
> >
> >
> > Since they are two different type of index, it is inevitable to provide
> > them as separate option.
> >
> > I doubt there could be confusion between graph-index and
> > vertex-centric-index and always clarify it on documentation.
> >
> > If we agree that graph index layer is necessary, then develop the
> features
> > first, then see if there could be confusion and decide what to do to
> > clarify it. I think you agree that graph-index is necessary addition on
> > project(tell me if you don't).
> >
> > Continue on more details on index topic.
> >
> > Following is what titan provide and I think it would be nice if we can
> > provide this in S2Graph so let me briefly explain. (I suggest read
> through
> > http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are not
> > familiar with notations)
> >
> > 1. composite
> >
> > Composite indexes retrieve vertices or edges by one or a (fixed)
> > composition of multiple keys.
> >
> > this example is how user can create composite index on titan.
> >
> > ```
> > mgmt.buildIndex('byNameAndAgeComposite',
> > Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
> > mgmt.commit()
> > ```
> >
> > then following traversal take benefit from `byNameComposite` index.
> >
> > ```
> > g.V().has('age', 30).has('name', 'hercules')
> > ```
> >
> > We can use HBase to store this index by creating row key as ("age", 30,
> > "name", "hercules").
> >
> > ```
> > g.V().has('name', 'hercules').has('age', 30)
> > ```
> >
> > To answer above traveral, it seems to sort property key and value in
> > composite index.
> >
> > we can also make partial composite index such as below.
> >
> > ```
> > ("age", 30)
> > ("name", "hercules")
> > ```
> >
> > I am not sure if this is necessary. user can explicitly create above as
> > seperate index such as 'byName', 'byAge'.
> >
> > One more suggestion is provide option to partition index, since there
> > could be lots of vertices/edges that has specific value. for example,
> > 'byCountryGender' index can contains lots of vertices/edges and it is
> > problematic to store vertices/edges on same HBase region. we need to
> > auto-partition theses into user specified number of partition by prefix
> > salt. This is optimization step so can be revisited once we have
> > functionality working.
> >
> > Note that composite index is only for comparing equality so following
> > traversal can't take advantage of index.
> >
> > ```
> > g.V().has('name', 'hercules').has('age', inside(20, 50))
> > ```
> >
> > 2. mixed
> >
> > Mixed indexes retrieve vertices or edges by any combination of previously
> > added property keys. full text search can be powered by mixed index, but
> it
> > may slower than composite index since it include external index backend
> > search(lucene, solr, elasticsearch, ...).
> >
> > this example is how user can create mixed index on titan.
> >
> > ```
> >
> >
> mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")
> > ```
> > user can decide use tokenizer when search engine index(named search) by
> > specifing Mapping(String or TEXT, default TEXT provide full text search).
> >
> > then following traversal take benefit from `nameAndAge` index.
> >
> > ```
> > g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
> > g.V().has('name', textContains('hercules'))
> > g.V().has('age', lt(50))
> > ```
> >
> > we can use elasticsearch/lucene/solr as index backend for this type of
> > index and actual tasks can be splitted by as following.
> >
> > If there is no objection, then I will create index task and list above
> > subtasks under it.
> >
> > One possible tasks list can be described as following.
> >
> > 1. Management Client:
> > - add option to speficy index type on creating ServiceColumn/Label.
> > 2. Storage:
> > - add method to build mutation for storage backend when set of
> > vertexs/edges are given.
> > - add method to call index backend with built mutation.
> > 3. Serializer/Deserializer:
> > - serializer: when a edge/vertex is given, build SKeyValue which can be
> > used by storage methods.
> > - deserializer: when byte array is given, build a Vertex/Edge that can be
> > used by storage methods.
> > 4. ProviderOptimization
> > - tinkerpop ask provider to translate given traversal into implementation
> > specific functions.
> > - not sure if this is necessary with my limited knowledge so far, but
> need
> > to check once S2Graph internal provide composite/mixed index.
> >
> > Any feedback would be appreciated.
> >
> >
> > On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <de...@gmail.com>
> wrote:
> >
> >> Sorry for late reply.
> >>
> >> I think it is important to implement Tinkerpop in terms of functionality
> >> of
> >> S2Graph and for the activation of the community.
> >> I agree with your suggestion to concentrate on tinkerpop implementation
> >> issues in the second release.
> >> In my opinion, the time of release is when the tinkerpop implementation
> >> issue is cleaned up.
> >>
> >> And with regard to full text search...
> >> If full-text search is supported, we expect that constraints that were
> >> able
> >> to traversal will disappear only if the vertex is known.
> >> If supported, it would be better to leave it as a separate option to
> avoid
> >> confusion with existing indexes.
> >>
> >> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sh...@gmail.com> wrote:
> >>
> >> > I guess there is no objection on my suggestion, so I am going to try
> >> list
> >> > up issues in more detail while preparing 0.2.0 release on late this
> >> month.
> >> >
> >> > Before list up above issues as task on jira, I want to discuss index
> in
> >> > more details.
> >> >
> >> > Following is my understanding on index to support tinkerpop fully and
> >> > efficiently
> >> > - reference:
> http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
> >> >
> >> > 1. graph index: traversal from a list of vertices or edges that are
> >> > identified by their properties
> >> >
> >> > 2. vertex-centric index: traversal through vertices with many incident
> >> > edges.
> >> >
> >> > I believe s2graph has vertex-centric index already, but it does not
> have
> >> > graph index layer so full text predicate, and range search features in
> >> > tinkerpop runs very inefficiently.
> >> >
> >> > For example, following traversal run full scan.
> >> >
> >> > - g.V().has('name', 'hercules')
> >> > - g.E().has('reason', textContains('loves'))
> >> >
> >> > To support full tinkerpop features efficiently, we need to add graph
> >> index
> >> > layer and I want to discuss how we are going to achieve this. like
> >> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
> >> > external
> >> > search engine totally make sense to me.
> >> >
> >> > I suggest to design index management interface first, since graph
> index
> >> has
> >> > never exist in S2Graph previously. then decision about index storage
> >> > backend, implementation can be discussed in more detail(the other way
> >> > around could also possible).
> >> >
> >> > Following is how user create index in s2graph currently.
> >> >
> >> > Management.createServiceColumn(
> >> > serviceName = serviceName, columnName = "person", columnType =
> >> "integer",
> >> >     props = Seq(
> >> >     Prop("name", "-", "string"),
> >> >     Prop("age", "0", "integer"),
> >> >     Prop("location", "-", "string")
> >> >     )
> >> > )
> >> >
> >> > management.createLabel(
> >> > label = "bought",
> >> >     srcServiceName = serviceName, srcColumnName = "person",
> >> srcColumnType =
> >> > "integer",
> >> >     tgtServiceName = serviceName, tgtColumnName = "product",
> >> tgtColumnType
> >> > = "integer", idDirected = true,
> >> >     serviceName = serviceName,
> >> >     indices = Seq(
> >> >     Index("PK", Seq("amount", "created_at")
> >> >     ),
> >> >     props = Seq(
> >> >     Prop("amount", "0.0", "double"),
> >> >     Prop("created_at", "2000-01-01", "string")
> >> >     ),
> >> >     consistencyLevel = "strong"
> >> > )
> >> >
> >> > How we going to let user to create graph-index? Should we add extra
> >> > parameters on existing methods, or provide separate methods?
> >> >
> >> >
> >> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com>
> wrote:
> >> >
> >> > > Hi folks.
> >> > >
> >> > > It's been for a while we released our first release.
> >> > > It seems that needs for implementing tinkerpop interface has been
> >> high,
> >> > > but we have not finished it. I have been working on
> >> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
> >> > > recently merged it into master.
> >> > >
> >> > > I think Gremlin-core is tested, but following is what I think we
> have
> >> to
> >> > > improve for tinkerpop users to try out s2graph easily.
> >> > >
> >> > > 1. provide provider optimization, we have none currently.
> >> > > 2. full text search predicate is not currently supported(as
> @echarles
> >> > > pointed out)
> >> > > 3. provide gremlin plugin
> >> > > 4. make sure tinkerpop stack works correctly.
> >> > >
> >> > > Any help on above issues would be highly appreciated(help on any
> other
> >> > > issue would be also highly appreciated).
> >> > >
> >> > > By the way, What I want to discuss is the schedule and what will be
> >> > > included on our second release.
> >> > >
> >> > > I suggest to focus on integrate with tinkerpop on our second
> release.
> >> It
> >> > > would be best if we can address above issues by this month, but I
> >> doubt
> >> > if
> >> > > it is possible.
> >> > >
> >> > > I am suggesting fix our release date on late this month, then focus
> on
> >> > > above issues with high priority. if we can address them all, great,
> >> but
> >> > if
> >> > > we can't, then release with version as much as we can deliver in
> time,
> >> > then
> >> > > move them on next next release so on.
> >> > >
> >> > > Want to hear what other folks think about focus and schedule on our
> >> > second
> >> > > release, and happy to volunteer as release manager for this time if
> >> there
> >> > > are no other volunteer.
> >> > >
> >> > > If there are other issues which anyone think to be included on next
> >> > > release, please list them on this thread.
> >> > >
> >> > > Thanks
> >> > >
> >> > > DO YUNG YOON
> >> > >
> >> > >
> >> >
> >>
> >
>

Re: [DISCUSSION] second release schedule and scope

Posted by DO YUNG YOON <sh...@gmail.com>.
Updates on our second release scope and schedule.

Since Hwansung suggest to resolve tinkerpop related issue before second
release, I was working on the S2GRAPH-151, S2GRAPH-148.

Currently, S2GRAPH-151 is partially done(2 out of 4 subtasks are done) and
S2GRAPH-148 has PR ready.

Please review https://github.com/apache/incubator-s2graph/pull/115 .

As this point, I think we are ready for second release.

Followings are issues I raised first.

1. provide provider optimization, we have none currently.
- S2GRAPH-153 has S2GraphStep optimization that lookup EdgeId/VertexId from
IndexProvider such as Lucene.
- Other optimization can be added on consecutive releases.

2. full text search predicate is not currently supported(as @echarles
pointed out)
- S2GRAPH-153 resolve this by using lucene as IndexProvider.
- g.V().has("name", "*steamshon*") will try to find EdgeId/VertexId from
IndexProvider then actually lookup Storage for Edge/Vertex.
- IndexProvider interface currently not optimized for large amount of
documents hit, but this can be improved later.

3. provide gremlin plugin
- S2GRAPH-148 provide subproject call s2graph-gremlin which contains
S2GraphGremlinPlugin.
- After merging https://github.com/apache/incubator-s2graph/pull/115, users
can use gremlin-console to try out S2Graph.

4. make sure tinkerpop stack works correctly.
- S2GRAPH-148 make sure gremlin-conole is working properly.
- However, I found out it is too tedius to use scala code in
gremlin-console(groovy), so I think creating java client can improve
usability, but this also can be done later.

In summary, I have resolved tinkerpop related issues, not totally, but just
enough for others to try out.

I suggest to build our second release candidates at this point if there is
no objection.
I want to hear what others think.



On Sun, Jul 9, 2017 at 10:26 AM DO YUNG YOON <sh...@gmail.com> wrote:

> Thanks for your feedback. Here is my questions.
>
> 1. Release schedule:
> - Do you think we should wait until all issues with tinkerpop support
> resolved after?
>
> What others think about the release schedule?
>
> Should we wait until all of tinkerpop related issues resolving?
> Can you guys list up "must resolve" issues on our second release?
> The reason I mentioned index is I think it is the only one blocker issue
> from list for next release.
>
> 2. Full-Text search:
> - There would be 2 types of index support with variation(mixed/composite)
> - Graph-Index: s2graph do not have this type of index.
> - Composite-Index
> - Mixed-Index
> - Vertex-Centric-Index: s2graph do have this type of index.
>
>
> Since they are two different type of index, it is inevitable to provide
> them as separate option.
>
> I doubt there could be confusion between graph-index and
> vertex-centric-index and always clarify it on documentation.
>
> If we agree that graph index layer is necessary, then develop the features
> first, then see if there could be confusion and decide what to do to
> clarify it. I think you agree that graph-index is necessary addition on
> project(tell me if you don't).
>
> Continue on more details on index topic.
>
> Following is what titan provide and I think it would be nice if we can
> provide this in S2Graph so let me briefly explain. (I suggest read through
> http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are not
> familiar with notations)
>
> 1. composite
>
> Composite indexes retrieve vertices or edges by one or a (fixed)
> composition of multiple keys.
>
> this example is how user can create composite index on titan.
>
> ```
> mgmt.buildIndex('byNameAndAgeComposite',
> Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
> mgmt.commit()
> ```
>
> then following traversal take benefit from `byNameComposite` index.
>
> ```
> g.V().has('age', 30).has('name', 'hercules')
> ```
>
> We can use HBase to store this index by creating row key as ("age", 30,
> "name", "hercules").
>
> ```
> g.V().has('name', 'hercules').has('age', 30)
> ```
>
> To answer above traveral, it seems to sort property key and value in
> composite index.
>
> we can also make partial composite index such as below.
>
> ```
> ("age", 30)
> ("name", "hercules")
> ```
>
> I am not sure if this is necessary. user can explicitly create above as
> seperate index such as 'byName', 'byAge'.
>
> One more suggestion is provide option to partition index, since there
> could be lots of vertices/edges that has specific value. for example,
> 'byCountryGender' index can contains lots of vertices/edges and it is
> problematic to store vertices/edges on same HBase region. we need to
> auto-partition theses into user specified number of partition by prefix
> salt. This is optimization step so can be revisited once we have
> functionality working.
>
> Note that composite index is only for comparing equality so following
> traversal can't take advantage of index.
>
> ```
> g.V().has('name', 'hercules').has('age', inside(20, 50))
> ```
>
> 2. mixed
>
> Mixed indexes retrieve vertices or edges by any combination of previously
> added property keys. full text search can be powered by mixed index, but it
> may slower than composite index since it include external index backend
> search(lucene, solr, elasticsearch, ...).
>
> this example is how user can create mixed index on titan.
>
> ```
>
> mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")
> ```
> user can decide use tokenizer when search engine index(named search) by
> specifing Mapping(String or TEXT, default TEXT provide full text search).
>
> then following traversal take benefit from `nameAndAge` index.
>
> ```
> g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
> g.V().has('name', textContains('hercules'))
> g.V().has('age', lt(50))
> ```
>
> we can use elasticsearch/lucene/solr as index backend for this type of
> index and actual tasks can be splitted by as following.
>
> If there is no objection, then I will create index task and list above
> subtasks under it.
>
> One possible tasks list can be described as following.
>
> 1. Management Client:
> - add option to speficy index type on creating ServiceColumn/Label.
> 2. Storage:
> - add method to build mutation for storage backend when set of
> vertexs/edges are given.
> - add method to call index backend with built mutation.
> 3. Serializer/Deserializer:
> - serializer: when a edge/vertex is given, build SKeyValue which can be
> used by storage methods.
> - deserializer: when byte array is given, build a Vertex/Edge that can be
> used by storage methods.
> 4. ProviderOptimization
> - tinkerpop ask provider to translate given traversal into implementation
> specific functions.
> - not sure if this is necessary with my limited knowledge so far, but need
> to check once S2Graph internal provide composite/mixed index.
>
> Any feedback would be appreciated.
>
>
> On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <de...@gmail.com> wrote:
>
>> Sorry for late reply.
>>
>> I think it is important to implement Tinkerpop in terms of functionality
>> of
>> S2Graph and for the activation of the community.
>> I agree with your suggestion to concentrate on tinkerpop implementation
>> issues in the second release.
>> In my opinion, the time of release is when the tinkerpop implementation
>> issue is cleaned up.
>>
>> And with regard to full text search...
>> If full-text search is supported, we expect that constraints that were
>> able
>> to traversal will disappear only if the vertex is known.
>> If supported, it would be better to leave it as a separate option to avoid
>> confusion with existing indexes.
>>
>> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sh...@gmail.com> wrote:
>>
>> > I guess there is no objection on my suggestion, so I am going to try
>> list
>> > up issues in more detail while preparing 0.2.0 release on late this
>> month.
>> >
>> > Before list up above issues as task on jira, I want to discuss index in
>> > more details.
>> >
>> > Following is my understanding on index to support tinkerpop fully and
>> > efficiently
>> > - reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
>> >
>> > 1. graph index: traversal from a list of vertices or edges that are
>> > identified by their properties
>> >
>> > 2. vertex-centric index: traversal through vertices with many incident
>> > edges.
>> >
>> > I believe s2graph has vertex-centric index already, but it does not have
>> > graph index layer so full text predicate, and range search features in
>> > tinkerpop runs very inefficiently.
>> >
>> > For example, following traversal run full scan.
>> >
>> > - g.V().has('name', 'hercules')
>> > - g.E().has('reason', textContains('loves'))
>> >
>> > To support full tinkerpop features efficiently, we need to add graph
>> index
>> > layer and I want to discuss how we are going to achieve this. like
>> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
>> > external
>> > search engine totally make sense to me.
>> >
>> > I suggest to design index management interface first, since graph index
>> has
>> > never exist in S2Graph previously. then decision about index storage
>> > backend, implementation can be discussed in more detail(the other way
>> > around could also possible).
>> >
>> > Following is how user create index in s2graph currently.
>> >
>> > Management.createServiceColumn(
>> > serviceName = serviceName, columnName = "person", columnType =
>> "integer",
>> >     props = Seq(
>> >     Prop("name", "-", "string"),
>> >     Prop("age", "0", "integer"),
>> >     Prop("location", "-", "string")
>> >     )
>> > )
>> >
>> > management.createLabel(
>> > label = "bought",
>> >     srcServiceName = serviceName, srcColumnName = "person",
>> srcColumnType =
>> > "integer",
>> >     tgtServiceName = serviceName, tgtColumnName = "product",
>> tgtColumnType
>> > = "integer", idDirected = true,
>> >     serviceName = serviceName,
>> >     indices = Seq(
>> >     Index("PK", Seq("amount", "created_at")
>> >     ),
>> >     props = Seq(
>> >     Prop("amount", "0.0", "double"),
>> >     Prop("created_at", "2000-01-01", "string")
>> >     ),
>> >     consistencyLevel = "strong"
>> > )
>> >
>> > How we going to let user to create graph-index? Should we add extra
>> > parameters on existing methods, or provide separate methods?
>> >
>> >
>> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com> wrote:
>> >
>> > > Hi folks.
>> > >
>> > > It's been for a while we released our first release.
>> > > It seems that needs for implementing tinkerpop interface has been
>> high,
>> > > but we have not finished it. I have been working on
>> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
>> > > recently merged it into master.
>> > >
>> > > I think Gremlin-core is tested, but following is what I think we have
>> to
>> > > improve for tinkerpop users to try out s2graph easily.
>> > >
>> > > 1. provide provider optimization, we have none currently.
>> > > 2. full text search predicate is not currently supported(as @echarles
>> > > pointed out)
>> > > 3. provide gremlin plugin
>> > > 4. make sure tinkerpop stack works correctly.
>> > >
>> > > Any help on above issues would be highly appreciated(help on any other
>> > > issue would be also highly appreciated).
>> > >
>> > > By the way, What I want to discuss is the schedule and what will be
>> > > included on our second release.
>> > >
>> > > I suggest to focus on integrate with tinkerpop on our second release.
>> It
>> > > would be best if we can address above issues by this month, but I
>> doubt
>> > if
>> > > it is possible.
>> > >
>> > > I am suggesting fix our release date on late this month, then focus on
>> > > above issues with high priority. if we can address them all, great,
>> but
>> > if
>> > > we can't, then release with version as much as we can deliver in time,
>> > then
>> > > move them on next next release so on.
>> > >
>> > > Want to hear what other folks think about focus and schedule on our
>> > second
>> > > release, and happy to volunteer as release manager for this time if
>> there
>> > > are no other volunteer.
>> > >
>> > > If there are other issues which anyone think to be included on next
>> > > release, please list them on this thread.
>> > >
>> > > Thanks
>> > >
>> > > DO YUNG YOON
>> > >
>> > >
>> >
>>
>

Re: [DISCUSSION] second release schedule and scope

Posted by DO YUNG YOON <sh...@gmail.com>.
Thanks for your feedback. Here is my questions.

1. Release schedule:
- Do you think we should wait until all issues with tinkerpop support
resolved after?

What others think about the release schedule?

Should we wait until all of tinkerpop related issues resolving?
Can you guys list up "must resolve" issues on our second release?
The reason I mentioned index is I think it is the only one blocker issue
from list for next release.

2. Full-Text search:
- There would be 2 types of index support with variation(mixed/composite)
- Graph-Index: s2graph do not have this type of index.
- Composite-Index
- Mixed-Index
- Vertex-Centric-Index: s2graph do have this type of index.


Since they are two different type of index, it is inevitable to provide
them as separate option.

I doubt there could be confusion between graph-index and
vertex-centric-index and always clarify it on documentation.

If we agree that graph index layer is necessary, then develop the features
first, then see if there could be confusion and decide what to do to
clarify it. I think you agree that graph-index is necessary addition on
project(tell me if you don't).

Continue on more details on index topic.

Following is what titan provide and I think it would be nice if we can
provide this in S2Graph so let me briefly explain. (I suggest read through
http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html if you are not
familiar with notations)

1. composite

Composite indexes retrieve vertices or edges by one or a (fixed)
composition of multiple keys.

this example is how user can create composite index on titan.

```
mgmt.buildIndex('byNameAndAgeComposite',
Vertex.class).addKey(name).addKey(age).buildCompositeIndex()
mgmt.commit()
```

then following traversal take benefit from `byNameComposite` index.

```
g.V().has('age', 30).has('name', 'hercules')
```

We can use HBase to store this index by creating row key as ("age", 30,
"name", "hercules").

```
g.V().has('name', 'hercules').has('age', 30)
```

To answer above traveral, it seems to sort property key and value in
composite index.

we can also make partial composite index such as below.

```
("age", 30)
("name", "hercules")
```

I am not sure if this is necessary. user can explicitly create above as
seperate index such as 'byName', 'byAge'.

One more suggestion is provide option to partition index, since there could
be lots of vertices/edges that has specific value. for example,
'byCountryGender' index can contains lots of vertices/edges and it is
problematic to store vertices/edges on same HBase region. we need to
auto-partition theses into user specified number of partition by prefix
salt. This is optimization step so can be revisited once we have
functionality working.

Note that composite index is only for comparing equality so following
traversal can't take advantage of index.

```
g.V().has('name', 'hercules').has('age', inside(20, 50))
```

2. mixed

Mixed indexes retrieve vertices or edges by any combination of previously
added property keys. full text search can be powered by mixed index, but it
may slower than composite index since it include external index backend
search(lucene, solr, elasticsearch, ...).

this example is how user can create mixed index on titan.

```
mgmt.buildIndex('nameAndAge',Vertex.class).addKey(name,Mapping.TEXT.getParameter()).addKey(age,Mapping.TEXT.getParameter()).buildMixedIndex("search")
```
user can decide use tokenizer when search engine index(named search) by
specifing Mapping(String or TEXT, default TEXT provide full text search).

then following traversal take benefit from `nameAndAge` index.

```
g.V().has('name', textContains('hercules')).has('age', inside(20, 50))
g.V().has('name', textContains('hercules'))
g.V().has('age', lt(50))
```

we can use elasticsearch/lucene/solr as index backend for this type of
index and actual tasks can be splitted by as following.

If there is no objection, then I will create index task and list above
subtasks under it.

One possible tasks list can be described as following.

1. Management Client:
- add option to speficy index type on creating ServiceColumn/Label.
2. Storage:
- add method to build mutation for storage backend when set of
vertexs/edges are given.
- add method to call index backend with built mutation.
3. Serializer/Deserializer:
- serializer: when a edge/vertex is given, build SKeyValue which can be
used by storage methods.
- deserializer: when byte array is given, build a Vertex/Edge that can be
used by storage methods.
4. ProviderOptimization
- tinkerpop ask provider to translate given traversal into implementation
specific functions.
- not sure if this is necessary with my limited knowledge so far, but need
to check once S2Graph internal provide composite/mixed index.

Any feedback would be appreciated.


On Sat, Jul 8, 2017 at 11:48 AM Hwansung Yu <de...@gmail.com> wrote:

> Sorry for late reply.
>
> I think it is important to implement Tinkerpop in terms of functionality of
> S2Graph and for the activation of the community.
> I agree with your suggestion to concentrate on tinkerpop implementation
> issues in the second release.
> In my opinion, the time of release is when the tinkerpop implementation
> issue is cleaned up.
>
> And with regard to full text search...
> If full-text search is supported, we expect that constraints that were able
> to traversal will disappear only if the vertex is known.
> If supported, it would be better to leave it as a separate option to avoid
> confusion with existing indexes.
>
> On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sh...@gmail.com> wrote:
>
> > I guess there is no objection on my suggestion, so I am going to try list
> > up issues in more detail while preparing 0.2.0 release on late this
> month.
> >
> > Before list up above issues as task on jira, I want to discuss index in
> > more details.
> >
> > Following is my understanding on index to support tinkerpop fully and
> > efficiently
> > - reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
> >
> > 1. graph index: traversal from a list of vertices or edges that are
> > identified by their properties
> >
> > 2. vertex-centric index: traversal through vertices with many incident
> > edges.
> >
> > I believe s2graph has vertex-centric index already, but it does not have
> > graph index layer so full text predicate, and range search features in
> > tinkerpop runs very inefficiently.
> >
> > For example, following traversal run full scan.
> >
> > - g.V().has('name', 'hercules')
> > - g.E().has('reason', textContains('loves'))
> >
> > To support full tinkerpop features efficiently, we need to add graph
> index
> > layer and I want to discuss how we are going to achieve this. like
> > suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
> > external
> > search engine totally make sense to me.
> >
> > I suggest to design index management interface first, since graph index
> has
> > never exist in S2Graph previously. then decision about index storage
> > backend, implementation can be discussed in more detail(the other way
> > around could also possible).
> >
> > Following is how user create index in s2graph currently.
> >
> > Management.createServiceColumn(
> > serviceName = serviceName, columnName = "person", columnType = "integer",
> >     props = Seq(
> >     Prop("name", "-", "string"),
> >     Prop("age", "0", "integer"),
> >     Prop("location", "-", "string")
> >     )
> > )
> >
> > management.createLabel(
> > label = "bought",
> >     srcServiceName = serviceName, srcColumnName = "person",
> srcColumnType =
> > "integer",
> >     tgtServiceName = serviceName, tgtColumnName = "product",
> tgtColumnType
> > = "integer", idDirected = true,
> >     serviceName = serviceName,
> >     indices = Seq(
> >     Index("PK", Seq("amount", "created_at")
> >     ),
> >     props = Seq(
> >     Prop("amount", "0.0", "double"),
> >     Prop("created_at", "2000-01-01", "string")
> >     ),
> >     consistencyLevel = "strong"
> > )
> >
> > How we going to let user to create graph-index? Should we add extra
> > parameters on existing methods, or provide separate methods?
> >
> >
> > On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com> wrote:
> >
> > > Hi folks.
> > >
> > > It's been for a while we released our first release.
> > > It seems that needs for implementing tinkerpop interface has been high,
> > > but we have not finished it. I have been working on
> > > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
> > > recently merged it into master.
> > >
> > > I think Gremlin-core is tested, but following is what I think we have
> to
> > > improve for tinkerpop users to try out s2graph easily.
> > >
> > > 1. provide provider optimization, we have none currently.
> > > 2. full text search predicate is not currently supported(as @echarles
> > > pointed out)
> > > 3. provide gremlin plugin
> > > 4. make sure tinkerpop stack works correctly.
> > >
> > > Any help on above issues would be highly appreciated(help on any other
> > > issue would be also highly appreciated).
> > >
> > > By the way, What I want to discuss is the schedule and what will be
> > > included on our second release.
> > >
> > > I suggest to focus on integrate with tinkerpop on our second release.
> It
> > > would be best if we can address above issues by this month, but I doubt
> > if
> > > it is possible.
> > >
> > > I am suggesting fix our release date on late this month, then focus on
> > > above issues with high priority. if we can address them all, great, but
> > if
> > > we can't, then release with version as much as we can deliver in time,
> > then
> > > move them on next next release so on.
> > >
> > > Want to hear what other folks think about focus and schedule on our
> > second
> > > release, and happy to volunteer as release manager for this time if
> there
> > > are no other volunteer.
> > >
> > > If there are other issues which anyone think to be included on next
> > > release, please list them on this thread.
> > >
> > > Thanks
> > >
> > > DO YUNG YOON
> > >
> > >
> >
>

Re: [DISCUSSION] second release schedule and scope

Posted by Hwansung Yu <de...@gmail.com>.
Sorry for late reply.

I think it is important to implement Tinkerpop in terms of functionality of
S2Graph and for the activation of the community.
I agree with your suggestion to concentrate on tinkerpop implementation
issues in the second release.
In my opinion, the time of release is when the tinkerpop implementation
issue is cleaned up.

And with regard to full text search...
If full-text search is supported, we expect that constraints that were able
to traversal will disappear only if the vertex is known.
If supported, it would be better to leave it as a separate option to avoid
confusion with existing indexes.

On Sat, Jul 8, 2017 at 9:10 AM, DO YUNG YOON <sh...@gmail.com> wrote:

> I guess there is no objection on my suggestion, so I am going to try list
> up issues in more detail while preparing 0.2.0 release on late this month.
>
> Before list up above issues as task on jira, I want to discuss index in
> more details.
>
> Following is my understanding on index to support tinkerpop fully and
> efficiently
> - reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html
>
> 1. graph index: traversal from a list of vertices or edges that are
> identified by their properties
>
> 2. vertex-centric index: traversal through vertices with many incident
> edges.
>
> I believe s2graph has vertex-centric index already, but it does not have
> graph index layer so full text predicate, and range search features in
> tinkerpop runs very inefficiently.
>
> For example, following traversal run full scan.
>
> - g.V().has('name', 'hercules')
> - g.E().has('reason', textContains('loves'))
>
> To support full tinkerpop features efficiently, we need to add graph index
> layer and I want to discuss how we are going to achieve this. like
> suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using
> external
> search engine totally make sense to me.
>
> I suggest to design index management interface first, since graph index has
> never exist in S2Graph previously. then decision about index storage
> backend, implementation can be discussed in more detail(the other way
> around could also possible).
>
> Following is how user create index in s2graph currently.
>
> Management.createServiceColumn(
> serviceName = serviceName, columnName = "person", columnType = "integer",
>     props = Seq(
>     Prop("name", "-", "string"),
>     Prop("age", "0", "integer"),
>     Prop("location", "-", "string")
>     )
> )
>
> management.createLabel(
> label = "bought",
>     srcServiceName = serviceName, srcColumnName = "person", srcColumnType =
> "integer",
>     tgtServiceName = serviceName, tgtColumnName = "product", tgtColumnType
> = "integer", idDirected = true,
>     serviceName = serviceName,
>     indices = Seq(
>     Index("PK", Seq("amount", "created_at")
>     ),
>     props = Seq(
>     Prop("amount", "0.0", "double"),
>     Prop("created_at", "2000-01-01", "string")
>     ),
>     consistencyLevel = "strong"
> )
>
> How we going to let user to create graph-index? Should we add extra
> parameters on existing methods, or provide separate methods?
>
>
> On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com> wrote:
>
> > Hi folks.
> >
> > It's been for a while we released our first release.
> > It seems that needs for implementing tinkerpop interface has been high,
> > but we have not finished it. I have been working on
> > https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
> > recently merged it into master.
> >
> > I think Gremlin-core is tested, but following is what I think we have to
> > improve for tinkerpop users to try out s2graph easily.
> >
> > 1. provide provider optimization, we have none currently.
> > 2. full text search predicate is not currently supported(as @echarles
> > pointed out)
> > 3. provide gremlin plugin
> > 4. make sure tinkerpop stack works correctly.
> >
> > Any help on above issues would be highly appreciated(help on any other
> > issue would be also highly appreciated).
> >
> > By the way, What I want to discuss is the schedule and what will be
> > included on our second release.
> >
> > I suggest to focus on integrate with tinkerpop on our second release. It
> > would be best if we can address above issues by this month, but I doubt
> if
> > it is possible.
> >
> > I am suggesting fix our release date on late this month, then focus on
> > above issues with high priority. if we can address them all, great, but
> if
> > we can't, then release with version as much as we can deliver in time,
> then
> > move them on next next release so on.
> >
> > Want to hear what other folks think about focus and schedule on our
> second
> > release, and happy to volunteer as release manager for this time if there
> > are no other volunteer.
> >
> > If there are other issues which anyone think to be included on next
> > release, please list them on this thread.
> >
> > Thanks
> >
> > DO YUNG YOON
> >
> >
>

Re: [DISCUSSION] second release schedule and scope

Posted by DO YUNG YOON <sh...@gmail.com>.
I guess there is no objection on my suggestion, so I am going to try list
up issues in more detail while preparing 0.2.0 release on late this month.

Before list up above issues as task on jira, I want to discuss index in
more details.

Following is my understanding on index to support tinkerpop fully and
efficiently
- reference: http://s3.thinkaurelius.com/docs/titan/1.0.0/indexes.html

1. graph index: traversal from a list of vertices or edges that are
identified by their properties

2. vertex-centric index: traversal through vertices with many incident
edges.

I believe s2graph has vertex-centric index already, but it does not have
graph index layer so full text predicate, and range search features in
tinkerpop runs very inefficiently.

For example, following traversal run full scan.

- g.V().has('name', 'hercules')
- g.E().has('reason', textContains('loves'))

To support full tinkerpop features efficiently, we need to add graph index
layer and I want to discuss how we are going to achieve this. like
suggested here(http://markmail.org/message/2vn2bwrwh5zbeie4) using external
search engine totally make sense to me.

I suggest to design index management interface first, since graph index has
never exist in S2Graph previously. then decision about index storage
backend, implementation can be discussed in more detail(the other way
around could also possible).

Following is how user create index in s2graph currently.

Management.createServiceColumn(
serviceName = serviceName, columnName = "person", columnType = "integer",
    props = Seq(
    Prop("name", "-", "string"),
    Prop("age", "0", "integer"),
    Prop("location", "-", "string")
    )
)

management.createLabel(
label = "bought",
    srcServiceName = serviceName, srcColumnName = "person", srcColumnType =
"integer",
    tgtServiceName = serviceName, tgtColumnName = "product", tgtColumnType
= "integer", idDirected = true,
    serviceName = serviceName,
    indices = Seq(
    Index("PK", Seq("amount", "created_at")
    ),
    props = Seq(
    Prop("amount", "0.0", "double"),
    Prop("created_at", "2000-01-01", "string")
    ),
    consistencyLevel = "strong"
)

How we going to let user to create graph-index? Should we add extra
parameters on existing methods, or provide separate methods?


On Mon, Jul 3, 2017 at 10:11 PM DO YUNG YOON <sh...@gmail.com> wrote:

> Hi folks.
>
> It's been for a while we released our first release.
> It seems that needs for implementing tinkerpop interface has been high,
> but we have not finished it. I have been working on
> https://issues.apache.org/jira/browse/S2GRAPH-136 since April, then
> recently merged it into master.
>
> I think Gremlin-core is tested, but following is what I think we have to
> improve for tinkerpop users to try out s2graph easily.
>
> 1. provide provider optimization, we have none currently.
> 2. full text search predicate is not currently supported(as @echarles
> pointed out)
> 3. provide gremlin plugin
> 4. make sure tinkerpop stack works correctly.
>
> Any help on above issues would be highly appreciated(help on any other
> issue would be also highly appreciated).
>
> By the way, What I want to discuss is the schedule and what will be
> included on our second release.
>
> I suggest to focus on integrate with tinkerpop on our second release. It
> would be best if we can address above issues by this month, but I doubt if
> it is possible.
>
> I am suggesting fix our release date on late this month, then focus on
> above issues with high priority. if we can address them all, great, but if
> we can't, then release with version as much as we can deliver in time, then
> move them on next next release so on.
>
> Want to hear what other folks think about focus and schedule on our second
> release, and happy to volunteer as release manager for this time if there
> are no other volunteer.
>
> If there are other issues which anyone think to be included on next
> release, please list them on this thread.
>
> Thanks
>
> DO YUNG YOON
>
>