You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@s2graph.apache.org by Hyunsung Jo <hy...@gmail.com> on 2016/04/01 07:52:22 UTC

Re: [DISCUSS] Project Road Map

Doyoung,

Thank you for sharing the document!


Luke and Alexander,

Do you have any concerns regarding supporting multiple storage engines?

As far as I understand, although S2Graph began exclusively on top of HBase,
it always had other storage engines in mind.
Perhaps this is somewhat unclear in the proposal, but I see hits of the
plan for additional storages in statements such as -
S2Graph <https://wiki.apache.org/incubator/S2Graph> provides a scalable
distributed graph database engine over *a key/value store such as HBase*.
This is also why some of the earliest JIRA tickets (S2GRAPH-1, 51) cover
this topic. (Now that I think of it, we should have had this discussion
prior to opening the tickets, but better late than never!)
Thanks to the recent refactoring (S2GRAPH-17) as Doyoung mentioned, I think
the latest storage-related code is abstract + general enough to try out
integrations with storages other than HBase.

Thanks,
Jo

On Fri, Mar 25, 2016 at 11:06 PM DO YUNG YOON <sh...@gmail.com> wrote:

> Hi Luke and Alexander.
>
> Thanks for asking question and here is reason I did list storage engine.
>
> S2Graph has been used HBase as primary storage engine. I think there is no
> reason we need to change this.
>
> However, I also think there is no reason we should only support HBase.
>
> We realized that lots of codes can be independent to storage backend, so we
> abstract away storage dependent codes at S2GRAPH-17. after this
> refactoring, it becomes easy for others who want to use different storage
> other than HBase to connect to their choice for storage.
>
> Personally I think it would be better to give user more options.
>
> For example, http://thinkaurelius.github.io/titan/ support various
> storages(Cassandra, HBase, BerkeleyDB, but seems primary is Casssandra) and
> I think S2Graph can also support these options, but primary is HBase.
>
> What others think about this?
>
> Also regarding Query Graphical User Interface, I have no idea what other
> existing project can be used. If it is possible to re-use existing
> projects, then I prefer to use them.
>
> Please guide me what are these existing projects(I would love to try
> Zeppelin though).
>
> Thanks.
> Doyung Yoon
>
> On Thu, Mar 24, 2016 at 8:54 AM Luke Han <lu...@gmail.com> wrote:
>
> > Hi
> >     For Storage Engines, are we trying to extend to others rather than
> > HBase?
> >
> > Thanks.
> > Luke
> >
> >
> > Best Regards!
> > ---------------------
> >
> > Luke Han
> >
> > On Wed, Mar 23, 2016 at 10:31 PM, Kim, Min-Seok <ms...@gmail.com>
> > wrote:
> >
> > > updated and added
> > >
> > >
> > >    -
> > >
> > >    Batch Jobs(S2Lambda)
> > >    -
> > >
> > >       Kafka to HDFS (WAL)
> > >       -
> > >
> > >       Streaming Cooccurrence across labels(user-user/item-item
> > similarity)
> > >       -
> > >
> > >       OLAP operations on (WAL or KAFKA)
> > >       -
> > >
> > >    A/B Testing capabilities
> > >    - Multi-armed Bandit to select the best query
> > >
> > >
> > > I think A/B and MAB can be component themselves, they could be merged
> > into
> > > other components.
> > >
> > > Thanks
> > >
> > >
> > >
> > >
> > >
> > > 2016년 3월 22일 (화) 오전 7:57, DO YUNG YOON <sh...@gmail.com>님이 작성:
> > >
> > > > Hi folks.
> > > >
> > > > I just want to open up discussion on our project roadmap.
> > > >
> > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1QSEf628QHrLmky16cJN_wIv0H_cfi1E9NGLqKMpdsNE/edit?usp=sharing
> > > >
> > > >
> > > > Things I wrote on link is completely draft(I just list up all I can
> > think
> > > > of now), please feel free to change as it is necessary.
> > > > Some of them might easy and some of them would take time, so I also
> > want
> > > to
> > > > ask what others think about first release.
> > > > Personally, I think It would be great if we can list up our load map,
> > > then
> > > > decide priority, and talks about our first release.
> > > > Also, I think once we decide road map, then this road map can be used
> > as
> > > > component at JIRA. currently there is no component so it is hard to
> > guess
> > > > what each issue is about.
> > > >
> > > > Looking forward to hear what others think.
> > > >
> > >
> >
>

Re: [DISCUSS] Project Road Map

Posted by DO YUNG YOON <sh...@gmail.com>.

I am also fan of Neo4J’s web interface and I agree with JongWook's
opinion, think about the UI from the scratch.

I thought simply visualize graph structure could be done with simple
javascript work(maybe d3.js or something) since s2graph query has
"returnTree" feature that return entire tree that query traversed on(in bfs
search manner).

I think most of folks had chance to take a look at what we are heading to
in big picture(give any feedback if anything is not clear).

I want to start discuss what we will prioritize on above list and more
importantly, what will be our focus for our first release.
Here is my opinion.

Low Latency, High Throughput, Serialize Schema, BFS Query, Documentations,
Project Homepage.

Above parts are very basic for storing/select data as Edge/Vertex and
traverse on them which I think what is most important features.
What do you guys think?



On Tue, Apr 12, 2016 at 11:27 PM Alexander Bezzubov <bz...@apache.org> wrote:

> Sure, It's totally up to you guys!
>
> I was suggesting a way to save some development efforts only on a query UI.
>
> Somehow having "best selling point" for the graph storage system being an
> embedded webapp sounds a bit, well, surprising to me :) But that is just a
> feedback, no back-seat driving here.
>
> I can imagine though having handy tool to visualize graph structure can be
> very useful i.e for debugging.
>
> Great to see such an ambitious roadmap for the project!
>
> --
> Alex
>
>
>
> On Tue, Apr 12, 2016 at 7:39 PM, Jong Wook Kim <jo...@nyu.edu> wrote:
>
> > I know that Zeppelin is very good at interactively plotting something out
> > of dataframes, like pie charts, histograms, line graphs, etc.
> >
> > But I'm not quite sure if it is any easier to visualize graph structures,
> > than starting from the scratch.
> >
> > I have been very pleasant with Neo4J’s web interface which is embedded in
> > its server. Using Zeppelin as the primary visualizer might overcomplicate
> > things, as it will require configuring the whole Zeppelin distribution
> as a
> > subproject of S2Graph. There would also be a lot more JVM processes to
> > manage - one for the Zeppelin server and one interpreter process for
> every
> > notebook.
> >
> > I’m not trying to be NIH or anything, but looking at Neo4J, Apache spark
> > and RethinkDB’s embedded web UI, I think it will be a nicer to think
> about
> > the UI from the scratch - it could be the best selling point of s2graph.
> >
> >
> > Best,
> > Jong Wook
> >
> >
> > > On Apr 12, 2016, at 6:08 AM, Alexander Bezzubov <bz...@apache.org>
> wrote:
> > >
> > > Sounds as a great plan to me as well, thank you for sharing details,
> > please
> > > keep it up and keep the mailing list posted!
> > >
> > > As for "Query Graphical User Interface" I would suggest trying Zeppelin
> > out
> > > and just providing a an interpreter implementation [1] for the query
> > > language you choose as it's very simple and nice GUI comes for free.
> > >
> > > 1. http://zeppelin.incubator.apache
> > >
> >
> .org/docs/0.6.0-incubating-SNAPSHOT/development/writingzeppelininterpreter
> > > .html
> > >
> > > --
> > > Alex
> > >
> > >
> > > On Tue, Apr 5, 2016 at 11:30 PM, Luke Han <lu...@gmail.com> wrote:
> > >
> > >> Hi Doyung and Jo,
> > >>    Actually, I have no concern about supporting more storages rather
> > than
> > >> HBase. Refactoring existing design to support more engines will make
> > >> project more suitable for different usage.
> > >>
> > >>    But the question here is the community does not know why, until you
> > >> guys started to discuss in mailing list and reply above. Please keep
> > moving
> > >> on  and bring more discussion in mailing list.
> > >>
> > >>    Thanks.
> > >> Luke
> > >>
> > >>
> > >>
> > >> Best Regards!
> > >> ---------------------
> > >>
> > >> Luke Han
> > >>
> > >> On Fri, Apr 1, 2016 at 1:52 PM, Hyunsung Jo <hy...@gmail.com>
> > wrote:
> > >>
> > >>> Doyoung,
> > >>>
> > >>> Thank you for sharing the document!
> > >>>
> > >>>
> > >>> Luke and Alexander,
> > >>>
> > >>> Do you have any concerns regarding supporting multiple storage
> engines?
> > >>>
> > >>> As far as I understand, although S2Graph began exclusively on top of
> > >> HBase,
> > >>> it always had other storage engines in mind.
> > >>> Perhaps this is somewhat unclear in the proposal, but I see hits of
> the
> > >>> plan for additional storages in statements such as -
> > >>> S2Graph <https://wiki.apache.org/incubator/S2Graph> provides a
> > scalable
> > >>> distributed graph database engine over *a key/value store such as
> > HBase*.
> > >>> This is also why some of the earliest JIRA tickets (S2GRAPH-1, 51)
> > cover
> > >>> this topic. (Now that I think of it, we should have had this
> discussion
> > >>> prior to opening the tickets, but better late than never!)
> > >>> Thanks to the recent refactoring (S2GRAPH-17) as Doyoung mentioned, I
> > >> think
> > >>> the latest storage-related code is abstract + general enough to try
> out
> > >>> integrations with storages other than HBase.
> > >>>
> > >>> Thanks,
> > >>> Jo
> > >>>
> > >>> On Fri, Mar 25, 2016 at 11:06 PM DO YUNG YOON <sh...@gmail.com>
> > wrote:
> > >>>
> > >>>> Hi Luke and Alexander.
> > >>>>
> > >>>> Thanks for asking question and here is reason I did list storage
> > >> engine.
> > >>>>
> > >>>> S2Graph has been used HBase as primary storage engine. I think there
> > is
> > >>> no
> > >>>> reason we need to change this.
> > >>>>
> > >>>> However, I also think there is no reason we should only support
> HBase.
> > >>>>
> > >>>> We realized that lots of codes can be independent to storage
> backend,
> > >> so
> > >>> we
> > >>>> abstract away storage dependent codes at S2GRAPH-17. after this
> > >>>> refactoring, it becomes easy for others who want to use different
> > >> storage
> > >>>> other than HBase to connect to their choice for storage.
> > >>>>
> > >>>> Personally I think it would be better to give user more options.
> > >>>>
> > >>>> For example, http://thinkaurelius.github.io/titan/ support various
> > >>>> storages(Cassandra, HBase, BerkeleyDB, but seems primary is
> > Casssandra)
> > >>> and
> > >>>> I think S2Graph can also support these options, but primary is
> HBase.
> > >>>>
> > >>>> What others think about this?
> > >>>>
> > >>>> Also regarding Query Graphical User Interface, I have no idea what
> > >> other
> > >>>> existing project can be used. If it is possible to re-use existing
> > >>>> projects, then I prefer to use them.
> > >>>>
> > >>>> Please guide me what are these existing projects(I would love to try
> > >>>> Zeppelin though).
> > >>>>
> > >>>> Thanks.
> > >>>> Doyung Yoon
> > >>>>
> > >>>> On Thu, Mar 24, 2016 at 8:54 AM Luke Han <lu...@gmail.com> wrote:
> > >>>>
> > >>>>> Hi
> > >>>>>    For Storage Engines, are we trying to extend to others rather
> > >> than
> > >>>>> HBase?
> > >>>>>
> > >>>>> Thanks.
> > >>>>> Luke
> > >>>>>
> > >>>>>
> > >>>>> Best Regards!
> > >>>>> ---------------------
> > >>>>>
> > >>>>> Luke Han
> > >>>>>
> > >>>>> On Wed, Mar 23, 2016 at 10:31 PM, Kim, Min-Seok <
> mskim.org@gmail.com
> > >>>
> > >>>>> wrote:
> > >>>>>
> > >>>>>> updated and added
> > >>>>>>
> > >>>>>>
> > >>>>>>   -
> > >>>>>>
> > >>>>>>   Batch Jobs(S2Lambda)
> > >>>>>>   -
> > >>>>>>
> > >>>>>>      Kafka to HDFS (WAL)
> > >>>>>>      -
> > >>>>>>
> > >>>>>>      Streaming Cooccurrence across labels(user-user/item-item
> > >>>>> similarity)
> > >>>>>>      -
> > >>>>>>
> > >>>>>>      OLAP operations on (WAL or KAFKA)
> > >>>>>>      -
> > >>>>>>
> > >>>>>>   A/B Testing capabilities
> > >>>>>>   - Multi-armed Bandit to select the best query
> > >>>>>>
> > >>>>>>
> > >>>>>> I think A/B and MAB can be component themselves, they could be
> > >> merged
> > >>>>> into
> > >>>>>> other components.
> > >>>>>>
> > >>>>>> Thanks
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>>
> > >>>>>> 2016년 3월 22일 (화) 오전 7:57, DO YUNG YOON <sh...@gmail.com>님이 작성:
> > >>>>>>
> > >>>>>>> Hi folks.
> > >>>>>>>
> > >>>>>>> I just want to open up discussion on our project roadmap.
> > >>>>>>>
> > >>>>>>>
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> https://docs.google.com/document/d/1QSEf628QHrLmky16cJN_wIv0H_cfi1E9NGLqKMpdsNE/edit?usp=sharing
> > >>>>>>>
> > >>>>>>>
> > >>>>>>> Things I wrote on link is completely draft(I just list up all I
> > >> can
> > >>>>> think
> > >>>>>>> of now), please feel free to change as it is necessary.
> > >>>>>>> Some of them might easy and some of them would take time, so I
> > >> also
> > >>>>> want
> > >>>>>> to
> > >>>>>>> ask what others think about first release.
> > >>>>>>> Personally, I think It would be great if we can list up our load
> > >>> map,
> > >>>>>> then
> > >>>>>>> decide priority, and talks about our first release.
> > >>>>>>> Also, I think once we decide road map, then this road map can be
> > >>> used
> > >>>>> as
> > >>>>>>> component at JIRA. currently there is no component so it is hard
> > >> to
> > >>>>> guess
> > >>>>>>> what each issue is about.
> > >>>>>>>
> > >>>>>>> Looking forward to hear what others think.
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>
> > >>>
> > >>
> >
> >
>

Re: [DISCUSS] Project Road Map

Posted by Alexander Bezzubov <bz...@apache.org>.

Sure, It's totally up to you guys!

I was suggesting a way to save some development efforts only on a query UI.

Somehow having "best selling point" for the graph storage system being an
embedded webapp sounds a bit, well, surprising to me :) But that is just a
feedback, no back-seat driving here.

I can imagine though having handy tool to visualize graph structure can be
very useful i.e for debugging.

Great to see such an ambitious roadmap for the project!

--
Alex



On Tue, Apr 12, 2016 at 7:39 PM, Jong Wook Kim <jo...@nyu.edu> wrote:

> I know that Zeppelin is very good at interactively plotting something out
> of dataframes, like pie charts, histograms, line graphs, etc.
>
> But I'm not quite sure if it is any easier to visualize graph structures,
> than starting from the scratch.
>
> I have been very pleasant with Neo4J’s web interface which is embedded in
> its server. Using Zeppelin as the primary visualizer might overcomplicate
> things, as it will require configuring the whole Zeppelin distribution as a
> subproject of S2Graph. There would also be a lot more JVM processes to
> manage - one for the Zeppelin server and one interpreter process for every
> notebook.
>
> I’m not trying to be NIH or anything, but looking at Neo4J, Apache spark
> and RethinkDB’s embedded web UI, I think it will be a nicer to think about
> the UI from the scratch - it could be the best selling point of s2graph.
>
>
> Best,
> Jong Wook
>
>
> > On Apr 12, 2016, at 6:08 AM, Alexander Bezzubov <bz...@apache.org> wrote:
> >
> > Sounds as a great plan to me as well, thank you for sharing details,
> please
> > keep it up and keep the mailing list posted!
> >
> > As for "Query Graphical User Interface" I would suggest trying Zeppelin
> out
> > and just providing a an interpreter implementation [1] for the query
> > language you choose as it's very simple and nice GUI comes for free.
> >
> > 1. http://zeppelin.incubator.apache
> >
> .org/docs/0.6.0-incubating-SNAPSHOT/development/writingzeppelininterpreter
> > .html
> >
> > --
> > Alex
> >
> >
> > On Tue, Apr 5, 2016 at 11:30 PM, Luke Han <lu...@gmail.com> wrote:
> >
> >> Hi Doyung and Jo,
> >>    Actually, I have no concern about supporting more storages rather
> than
> >> HBase. Refactoring existing design to support more engines will make
> >> project more suitable for different usage.
> >>
> >>    But the question here is the community does not know why, until you
> >> guys started to discuss in mailing list and reply above. Please keep
> moving
> >> on  and bring more discussion in mailing list.
> >>
> >>    Thanks.
> >> Luke
> >>
> >>
> >>
> >> Best Regards!
> >> ---------------------
> >>
> >> Luke Han
> >>
> >> On Fri, Apr 1, 2016 at 1:52 PM, Hyunsung Jo <hy...@gmail.com>
> wrote:
> >>
> >>> Doyoung,
> >>>
> >>> Thank you for sharing the document!
> >>>
> >>>
> >>> Luke and Alexander,
> >>>
> >>> Do you have any concerns regarding supporting multiple storage engines?
> >>>
> >>> As far as I understand, although S2Graph began exclusively on top of
> >> HBase,
> >>> it always had other storage engines in mind.
> >>> Perhaps this is somewhat unclear in the proposal, but I see hits of the
> >>> plan for additional storages in statements such as -
> >>> S2Graph <https://wiki.apache.org/incubator/S2Graph> provides a
> scalable
> >>> distributed graph database engine over *a key/value store such as
> HBase*.
> >>> This is also why some of the earliest JIRA tickets (S2GRAPH-1, 51)
> cover
> >>> this topic. (Now that I think of it, we should have had this discussion
> >>> prior to opening the tickets, but better late than never!)
> >>> Thanks to the recent refactoring (S2GRAPH-17) as Doyoung mentioned, I
> >> think
> >>> the latest storage-related code is abstract + general enough to try out
> >>> integrations with storages other than HBase.
> >>>
> >>> Thanks,
> >>> Jo
> >>>
> >>> On Fri, Mar 25, 2016 at 11:06 PM DO YUNG YOON <sh...@gmail.com>
> wrote:
> >>>
> >>>> Hi Luke and Alexander.
> >>>>
> >>>> Thanks for asking question and here is reason I did list storage
> >> engine.
> >>>>
> >>>> S2Graph has been used HBase as primary storage engine. I think there
> is
> >>> no
> >>>> reason we need to change this.
> >>>>
> >>>> However, I also think there is no reason we should only support HBase.
> >>>>
> >>>> We realized that lots of codes can be independent to storage backend,
> >> so
> >>> we
> >>>> abstract away storage dependent codes at S2GRAPH-17. after this
> >>>> refactoring, it becomes easy for others who want to use different
> >> storage
> >>>> other than HBase to connect to their choice for storage.
> >>>>
> >>>> Personally I think it would be better to give user more options.
> >>>>
> >>>> For example, http://thinkaurelius.github.io/titan/ support various
> >>>> storages(Cassandra, HBase, BerkeleyDB, but seems primary is
> Casssandra)
> >>> and
> >>>> I think S2Graph can also support these options, but primary is HBase.
> >>>>
> >>>> What others think about this?
> >>>>
> >>>> Also regarding Query Graphical User Interface, I have no idea what
> >> other
> >>>> existing project can be used. If it is possible to re-use existing
> >>>> projects, then I prefer to use them.
> >>>>
> >>>> Please guide me what are these existing projects(I would love to try
> >>>> Zeppelin though).
> >>>>
> >>>> Thanks.
> >>>> Doyung Yoon
> >>>>
> >>>> On Thu, Mar 24, 2016 at 8:54 AM Luke Han <lu...@gmail.com> wrote:
> >>>>
> >>>>> Hi
> >>>>>    For Storage Engines, are we trying to extend to others rather
> >> than
> >>>>> HBase?
> >>>>>
> >>>>> Thanks.
> >>>>> Luke
> >>>>>
> >>>>>
> >>>>> Best Regards!
> >>>>> ---------------------
> >>>>>
> >>>>> Luke Han
> >>>>>
> >>>>> On Wed, Mar 23, 2016 at 10:31 PM, Kim, Min-Seok <mskim.org@gmail.com
> >>>
> >>>>> wrote:
> >>>>>
> >>>>>> updated and added
> >>>>>>
> >>>>>>
> >>>>>>   -
> >>>>>>
> >>>>>>   Batch Jobs(S2Lambda)
> >>>>>>   -
> >>>>>>
> >>>>>>      Kafka to HDFS (WAL)
> >>>>>>      -
> >>>>>>
> >>>>>>      Streaming Cooccurrence across labels(user-user/item-item
> >>>>> similarity)
> >>>>>>      -
> >>>>>>
> >>>>>>      OLAP operations on (WAL or KAFKA)
> >>>>>>      -
> >>>>>>
> >>>>>>   A/B Testing capabilities
> >>>>>>   - Multi-armed Bandit to select the best query
> >>>>>>
> >>>>>>
> >>>>>> I think A/B and MAB can be component themselves, they could be
> >> merged
> >>>>> into
> >>>>>> other components.
> >>>>>>
> >>>>>> Thanks
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>>
> >>>>>> 2016년 3월 22일 (화) 오전 7:57, DO YUNG YOON <sh...@gmail.com>님이 작성:
> >>>>>>
> >>>>>>> Hi folks.
> >>>>>>>
> >>>>>>> I just want to open up discussion on our project roadmap.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> https://docs.google.com/document/d/1QSEf628QHrLmky16cJN_wIv0H_cfi1E9NGLqKMpdsNE/edit?usp=sharing
> >>>>>>>
> >>>>>>>
> >>>>>>> Things I wrote on link is completely draft(I just list up all I
> >> can
> >>>>> think
> >>>>>>> of now), please feel free to change as it is necessary.
> >>>>>>> Some of them might easy and some of them would take time, so I
> >> also
> >>>>> want
> >>>>>> to
> >>>>>>> ask what others think about first release.
> >>>>>>> Personally, I think It would be great if we can list up our load
> >>> map,
> >>>>>> then
> >>>>>>> decide priority, and talks about our first release.
> >>>>>>> Also, I think once we decide road map, then this road map can be
> >>> used
> >>>>> as
> >>>>>>> component at JIRA. currently there is no component so it is hard
> >> to
> >>>>> guess
> >>>>>>> what each issue is about.
> >>>>>>>
> >>>>>>> Looking forward to hear what others think.
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
>
>

Re: [DISCUSS] Project Road Map

Posted by Jong Wook Kim <jo...@nyu.edu>.

I know that Zeppelin is very good at interactively plotting something out of dataframes, like pie charts, histograms, line graphs, etc.

But I'm not quite sure if it is any easier to visualize graph structures, than starting from the scratch. 

I have been very pleasant with Neo4J’s web interface which is embedded in its server. Using Zeppelin as the primary visualizer might overcomplicate things, as it will require configuring the whole Zeppelin distribution as a subproject of S2Graph. There would also be a lot more JVM processes to manage - one for the Zeppelin server and one interpreter process for every notebook.

I’m not trying to be NIH or anything, but looking at Neo4J, Apache spark and RethinkDB’s embedded web UI, I think it will be a nicer to think about the UI from the scratch - it could be the best selling point of s2graph.


Best,
Jong Wook


> On Apr 12, 2016, at 6:08 AM, Alexander Bezzubov <bz...@apache.org> wrote:
> 
> Sounds as a great plan to me as well, thank you for sharing details, please
> keep it up and keep the mailing list posted!
> 
> As for "Query Graphical User Interface" I would suggest trying Zeppelin out
> and just providing a an interpreter implementation [1] for the query
> language you choose as it's very simple and nice GUI comes for free.
> 
> 1. http://zeppelin.incubator.apache
> .org/docs/0.6.0-incubating-SNAPSHOT/development/writingzeppelininterpreter
> .html
> 
> --
> Alex
> 
> 
> On Tue, Apr 5, 2016 at 11:30 PM, Luke Han <lu...@gmail.com> wrote:
> 
>> Hi Doyung and Jo,
>>    Actually, I have no concern about supporting more storages rather than
>> HBase. Refactoring existing design to support more engines will make
>> project more suitable for different usage.
>> 
>>    But the question here is the community does not know why, until you
>> guys started to discuss in mailing list and reply above. Please keep moving
>> on  and bring more discussion in mailing list.
>> 
>>    Thanks.
>> Luke
>> 
>> 
>> 
>> Best Regards!
>> ---------------------
>> 
>> Luke Han
>> 
>> On Fri, Apr 1, 2016 at 1:52 PM, Hyunsung Jo <hy...@gmail.com> wrote:
>> 
>>> Doyoung,
>>> 
>>> Thank you for sharing the document!
>>> 
>>> 
>>> Luke and Alexander,
>>> 
>>> Do you have any concerns regarding supporting multiple storage engines?
>>> 
>>> As far as I understand, although S2Graph began exclusively on top of
>> HBase,
>>> it always had other storage engines in mind.
>>> Perhaps this is somewhat unclear in the proposal, but I see hits of the
>>> plan for additional storages in statements such as -
>>> S2Graph <https://wiki.apache.org/incubator/S2Graph> provides a scalable
>>> distributed graph database engine over *a key/value store such as HBase*.
>>> This is also why some of the earliest JIRA tickets (S2GRAPH-1, 51) cover
>>> this topic. (Now that I think of it, we should have had this discussion
>>> prior to opening the tickets, but better late than never!)
>>> Thanks to the recent refactoring (S2GRAPH-17) as Doyoung mentioned, I
>> think
>>> the latest storage-related code is abstract + general enough to try out
>>> integrations with storages other than HBase.
>>> 
>>> Thanks,
>>> Jo
>>> 
>>> On Fri, Mar 25, 2016 at 11:06 PM DO YUNG YOON <sh...@gmail.com> wrote:
>>> 
>>>> Hi Luke and Alexander.
>>>> 
>>>> Thanks for asking question and here is reason I did list storage
>> engine.
>>>> 
>>>> S2Graph has been used HBase as primary storage engine. I think there is
>>> no
>>>> reason we need to change this.
>>>> 
>>>> However, I also think there is no reason we should only support HBase.
>>>> 
>>>> We realized that lots of codes can be independent to storage backend,
>> so
>>> we
>>>> abstract away storage dependent codes at S2GRAPH-17. after this
>>>> refactoring, it becomes easy for others who want to use different
>> storage
>>>> other than HBase to connect to their choice for storage.
>>>> 
>>>> Personally I think it would be better to give user more options.
>>>> 
>>>> For example, http://thinkaurelius.github.io/titan/ support various
>>>> storages(Cassandra, HBase, BerkeleyDB, but seems primary is Casssandra)
>>> and
>>>> I think S2Graph can also support these options, but primary is HBase.
>>>> 
>>>> What others think about this?
>>>> 
>>>> Also regarding Query Graphical User Interface, I have no idea what
>> other
>>>> existing project can be used. If it is possible to re-use existing
>>>> projects, then I prefer to use them.
>>>> 
>>>> Please guide me what are these existing projects(I would love to try
>>>> Zeppelin though).
>>>> 
>>>> Thanks.
>>>> Doyung Yoon
>>>> 
>>>> On Thu, Mar 24, 2016 at 8:54 AM Luke Han <lu...@gmail.com> wrote:
>>>> 
>>>>> Hi
>>>>>    For Storage Engines, are we trying to extend to others rather
>> than
>>>>> HBase?
>>>>> 
>>>>> Thanks.
>>>>> Luke
>>>>> 
>>>>> 
>>>>> Best Regards!
>>>>> ---------------------
>>>>> 
>>>>> Luke Han
>>>>> 
>>>>> On Wed, Mar 23, 2016 at 10:31 PM, Kim, Min-Seok <mskim.org@gmail.com
>>> 
>>>>> wrote:
>>>>> 
>>>>>> updated and added
>>>>>> 
>>>>>> 
>>>>>>   -
>>>>>> 
>>>>>>   Batch Jobs(S2Lambda)
>>>>>>   -
>>>>>> 
>>>>>>      Kafka to HDFS (WAL)
>>>>>>      -
>>>>>> 
>>>>>>      Streaming Cooccurrence across labels(user-user/item-item
>>>>> similarity)
>>>>>>      -
>>>>>> 
>>>>>>      OLAP operations on (WAL or KAFKA)
>>>>>>      -
>>>>>> 
>>>>>>   A/B Testing capabilities
>>>>>>   - Multi-armed Bandit to select the best query
>>>>>> 
>>>>>> 
>>>>>> I think A/B and MAB can be component themselves, they could be
>> merged
>>>>> into
>>>>>> other components.
>>>>>> 
>>>>>> Thanks
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 2016년 3월 22일 (화) 오전 7:57, DO YUNG YOON <sh...@gmail.com>님이 작성:
>>>>>> 
>>>>>>> Hi folks.
>>>>>>> 
>>>>>>> I just want to open up discussion on our project roadmap.
>>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>> https://docs.google.com/document/d/1QSEf628QHrLmky16cJN_wIv0H_cfi1E9NGLqKMpdsNE/edit?usp=sharing
>>>>>>> 
>>>>>>> 
>>>>>>> Things I wrote on link is completely draft(I just list up all I
>> can
>>>>> think
>>>>>>> of now), please feel free to change as it is necessary.
>>>>>>> Some of them might easy and some of them would take time, so I
>> also
>>>>> want
>>>>>> to
>>>>>>> ask what others think about first release.
>>>>>>> Personally, I think It would be great if we can list up our load
>>> map,
>>>>>> then
>>>>>>> decide priority, and talks about our first release.
>>>>>>> Also, I think once we decide road map, then this road map can be
>>> used
>>>>> as
>>>>>>> component at JIRA. currently there is no component so it is hard
>> to
>>>>> guess
>>>>>>> what each issue is about.
>>>>>>> 
>>>>>>> Looking forward to hear what others think.
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>

Re: [DISCUSS] Project Road Map

Posted by Alexander Bezzubov <bz...@apache.org>.

Sounds as a great plan to me as well, thank you for sharing details, please
keep it up and keep the mailing list posted!

As for "Query Graphical User Interface" I would suggest trying Zeppelin out
and just providing a an interpreter implementation [1] for the query
language you choose as it's very simple and nice GUI comes for free.

 1. http://zeppelin.incubator.apache
.org/docs/0.6.0-incubating-SNAPSHOT/development/writingzeppelininterpreter
.html

--
Alex


On Tue, Apr 5, 2016 at 11:30 PM, Luke Han <lu...@gmail.com> wrote:

> Hi Doyung and Jo,
>     Actually, I have no concern about supporting more storages rather than
> HBase. Refactoring existing design to support more engines will make
> project more suitable for different usage.
>
>     But the question here is the community does not know why, until you
> guys started to discuss in mailing list and reply above. Please keep moving
> on  and bring more discussion in mailing list.
>
>     Thanks.
> Luke
>
>
>
> Best Regards!
> ---------------------
>
> Luke Han
>
> On Fri, Apr 1, 2016 at 1:52 PM, Hyunsung Jo <hy...@gmail.com> wrote:
>
> > Doyoung,
> >
> > Thank you for sharing the document!
> >
> >
> > Luke and Alexander,
> >
> > Do you have any concerns regarding supporting multiple storage engines?
> >
> > As far as I understand, although S2Graph began exclusively on top of
> HBase,
> > it always had other storage engines in mind.
> > Perhaps this is somewhat unclear in the proposal, but I see hits of the
> > plan for additional storages in statements such as -
> > S2Graph <https://wiki.apache.org/incubator/S2Graph> provides a scalable
> > distributed graph database engine over *a key/value store such as HBase*.
> > This is also why some of the earliest JIRA tickets (S2GRAPH-1, 51) cover
> > this topic. (Now that I think of it, we should have had this discussion
> > prior to opening the tickets, but better late than never!)
> > Thanks to the recent refactoring (S2GRAPH-17) as Doyoung mentioned, I
> think
> > the latest storage-related code is abstract + general enough to try out
> > integrations with storages other than HBase.
> >
> > Thanks,
> > Jo
> >
> > On Fri, Mar 25, 2016 at 11:06 PM DO YUNG YOON <sh...@gmail.com> wrote:
> >
> > > Hi Luke and Alexander.
> > >
> > > Thanks for asking question and here is reason I did list storage
> engine.
> > >
> > > S2Graph has been used HBase as primary storage engine. I think there is
> > no
> > > reason we need to change this.
> > >
> > > However, I also think there is no reason we should only support HBase.
> > >
> > > We realized that lots of codes can be independent to storage backend,
> so
> > we
> > > abstract away storage dependent codes at S2GRAPH-17. after this
> > > refactoring, it becomes easy for others who want to use different
> storage
> > > other than HBase to connect to their choice for storage.
> > >
> > > Personally I think it would be better to give user more options.
> > >
> > > For example, http://thinkaurelius.github.io/titan/ support various
> > > storages(Cassandra, HBase, BerkeleyDB, but seems primary is Casssandra)
> > and
> > > I think S2Graph can also support these options, but primary is HBase.
> > >
> > > What others think about this?
> > >
> > > Also regarding Query Graphical User Interface, I have no idea what
> other
> > > existing project can be used. If it is possible to re-use existing
> > > projects, then I prefer to use them.
> > >
> > > Please guide me what are these existing projects(I would love to try
> > > Zeppelin though).
> > >
> > > Thanks.
> > > Doyung Yoon
> > >
> > > On Thu, Mar 24, 2016 at 8:54 AM Luke Han <lu...@gmail.com> wrote:
> > >
> > > > Hi
> > > >     For Storage Engines, are we trying to extend to others rather
> than
> > > > HBase?
> > > >
> > > > Thanks.
> > > > Luke
> > > >
> > > >
> > > > Best Regards!
> > > > ---------------------
> > > >
> > > > Luke Han
> > > >
> > > > On Wed, Mar 23, 2016 at 10:31 PM, Kim, Min-Seok <mskim.org@gmail.com
> >
> > > > wrote:
> > > >
> > > > > updated and added
> > > > >
> > > > >
> > > > >    -
> > > > >
> > > > >    Batch Jobs(S2Lambda)
> > > > >    -
> > > > >
> > > > >       Kafka to HDFS (WAL)
> > > > >       -
> > > > >
> > > > >       Streaming Cooccurrence across labels(user-user/item-item
> > > > similarity)
> > > > >       -
> > > > >
> > > > >       OLAP operations on (WAL or KAFKA)
> > > > >       -
> > > > >
> > > > >    A/B Testing capabilities
> > > > >    - Multi-armed Bandit to select the best query
> > > > >
> > > > >
> > > > > I think A/B and MAB can be component themselves, they could be
> merged
> > > > into
> > > > > other components.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > 2016년 3월 22일 (화) 오전 7:57, DO YUNG YOON <sh...@gmail.com>님이 작성:
> > > > >
> > > > > > Hi folks.
> > > > > >
> > > > > > I just want to open up discussion on our project roadmap.
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1QSEf628QHrLmky16cJN_wIv0H_cfi1E9NGLqKMpdsNE/edit?usp=sharing
> > > > > >
> > > > > >
> > > > > > Things I wrote on link is completely draft(I just list up all I
> can
> > > > think
> > > > > > of now), please feel free to change as it is necessary.
> > > > > > Some of them might easy and some of them would take time, so I
> also
> > > > want
> > > > > to
> > > > > > ask what others think about first release.
> > > > > > Personally, I think It would be great if we can list up our load
> > map,
> > > > > then
> > > > > > decide priority, and talks about our first release.
> > > > > > Also, I think once we decide road map, then this road map can be
> > used
> > > > as
> > > > > > component at JIRA. currently there is no component so it is hard
> to
> > > > guess
> > > > > > what each issue is about.
> > > > > >
> > > > > > Looking forward to hear what others think.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: [DISCUSS] Project Road Map

Posted by Luke Han <lu...@gmail.com>.

Hi Doyung and Jo,
    Actually, I have no concern about supporting more storages rather than
HBase. Refactoring existing design to support more engines will make
project more suitable for different usage.

    But the question here is the community does not know why, until you
guys started to discuss in mailing list and reply above. Please keep moving
on  and bring more discussion in mailing list.

    Thanks.
Luke



Best Regards!
---------------------

Luke Han

On Fri, Apr 1, 2016 at 1:52 PM, Hyunsung Jo <hy...@gmail.com> wrote:

> Doyoung,
>
> Thank you for sharing the document!
>
>
> Luke and Alexander,
>
> Do you have any concerns regarding supporting multiple storage engines?
>
> As far as I understand, although S2Graph began exclusively on top of HBase,
> it always had other storage engines in mind.
> Perhaps this is somewhat unclear in the proposal, but I see hits of the
> plan for additional storages in statements such as -
> S2Graph <https://wiki.apache.org/incubator/S2Graph> provides a scalable
> distributed graph database engine over *a key/value store such as HBase*.
> This is also why some of the earliest JIRA tickets (S2GRAPH-1, 51) cover
> this topic. (Now that I think of it, we should have had this discussion
> prior to opening the tickets, but better late than never!)
> Thanks to the recent refactoring (S2GRAPH-17) as Doyoung mentioned, I think
> the latest storage-related code is abstract + general enough to try out
> integrations with storages other than HBase.
>
> Thanks,
> Jo
>
> On Fri, Mar 25, 2016 at 11:06 PM DO YUNG YOON <sh...@gmail.com> wrote:
>
> > Hi Luke and Alexander.
> >
> > Thanks for asking question and here is reason I did list storage engine.
> >
> > S2Graph has been used HBase as primary storage engine. I think there is
> no
> > reason we need to change this.
> >
> > However, I also think there is no reason we should only support HBase.
> >
> > We realized that lots of codes can be independent to storage backend, so
> we
> > abstract away storage dependent codes at S2GRAPH-17. after this
> > refactoring, it becomes easy for others who want to use different storage
> > other than HBase to connect to their choice for storage.
> >
> > Personally I think it would be better to give user more options.
> >
> > For example, http://thinkaurelius.github.io/titan/ support various
> > storages(Cassandra, HBase, BerkeleyDB, but seems primary is Casssandra)
> and
> > I think S2Graph can also support these options, but primary is HBase.
> >
> > What others think about this?
> >
> > Also regarding Query Graphical User Interface, I have no idea what other
> > existing project can be used. If it is possible to re-use existing
> > projects, then I prefer to use them.
> >
> > Please guide me what are these existing projects(I would love to try
> > Zeppelin though).
> >
> > Thanks.
> > Doyung Yoon
> >
> > On Thu, Mar 24, 2016 at 8:54 AM Luke Han <lu...@gmail.com> wrote:
> >
> > > Hi
> > >     For Storage Engines, are we trying to extend to others rather than
> > > HBase?
> > >
> > > Thanks.
> > > Luke
> > >
> > >
> > > Best Regards!
> > > ---------------------
> > >
> > > Luke Han
> > >
> > > On Wed, Mar 23, 2016 at 10:31 PM, Kim, Min-Seok <ms...@gmail.com>
> > > wrote:
> > >
> > > > updated and added
> > > >
> > > >
> > > >    -
> > > >
> > > >    Batch Jobs(S2Lambda)
> > > >    -
> > > >
> > > >       Kafka to HDFS (WAL)
> > > >       -
> > > >
> > > >       Streaming Cooccurrence across labels(user-user/item-item
> > > similarity)
> > > >       -
> > > >
> > > >       OLAP operations on (WAL or KAFKA)
> > > >       -
> > > >
> > > >    A/B Testing capabilities
> > > >    - Multi-armed Bandit to select the best query
> > > >
> > > >
> > > > I think A/B and MAB can be component themselves, they could be merged
> > > into
> > > > other components.
> > > >
> > > > Thanks
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > 2016년 3월 22일 (화) 오전 7:57, DO YUNG YOON <sh...@gmail.com>님이 작성:
> > > >
> > > > > Hi folks.
> > > > >
> > > > > I just want to open up discussion on our project roadmap.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://docs.google.com/document/d/1QSEf628QHrLmky16cJN_wIv0H_cfi1E9NGLqKMpdsNE/edit?usp=sharing
> > > > >
> > > > >
> > > > > Things I wrote on link is completely draft(I just list up all I can
> > > think
> > > > > of now), please feel free to change as it is necessary.
> > > > > Some of them might easy and some of them would take time, so I also
> > > want
> > > > to
> > > > > ask what others think about first release.
> > > > > Personally, I think It would be great if we can list up our load
> map,
> > > > then
> > > > > decide priority, and talks about our first release.
> > > > > Also, I think once we decide road map, then this road map can be
> used
> > > as
> > > > > component at JIRA. currently there is no component so it is hard to
> > > guess
> > > > > what each issue is about.
> > > > >
> > > > > Looking forward to hear what others think.
> > > > >
> > > >
> > >
> >
>