You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Ran Magen <rm...@gmail.com> on 2015/05/18 20:07:41 UTC

elastic-gremlin

Hey guys,
Just wanted to let you know about a TP3 implementation we're working on.
It's based on elastic-search, enabling very good scalability and indexing
capabilities.
You can find the code here <https://github.com/rmagen/elastic-gremlin>.

This is still very much a work in progress (still more features and
optimizations planned, and some bugs to fix), but we're already using it
with very big graphs.

I would appreciate any feedback!
Cheers,

Re: elastic-gremlin

Posted by Stephen Mallette <sp...@gmail.com>.

I've not had a chance to think about it, but I now see the issue you
opened.  It was probably good that you added that for tracking:

https://issues.apache.org/jira/browse/TINKERPOP3-701

On Sat, May 23, 2015 at 4:25 PM, Ran Magen <rm...@gmail.com> wrote:

> >i may have messed up the Mutating interface design a bit.  looking at it
> now, i feel like it could be less coupled to the EventStrategy related
> features.  I'll take a look at it to see if I can make it "better" before
> GA.  I don't think my changes should affect vendors or the test suites, so
> if it turns out to be that way i'll give it a shot.
>
> Any progress? Should I open a ticket for this?
>
> On Wed, 20 May 2015 at 22:17 Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > >  I guess today these features don't work because the Suite classes
> > initialize the tests
> >
> > right - because we have the custom test suites the tests are determine
> more
> > dynamically so your ability to right-click/run is kinda lost. :/
> >
> > On Wed, May 20, 2015 at 2:47 PM, Ran Magen <rm...@gmail.com> wrote:
> >
> > > >I don't have a better idea than the environment variable.  you should
> be
> > > able to use the debugger though.  works for me in intellij when i've
> > looked
> > > at a problem in titan.  i'm not sure if it only works because i have
> the
> > > tinkerpop source on my system, but i can step through tinkerpop source
> > > and titan source interchangeably.  i don't think i did anything
> specific
> > > to enable that.
> > >
> > > I wasn't clear. I use intellij, and it has simple shortcuts to run
> tests:
> > > right clicking on a test method/class and clicking run, rerunning only
> > > failed tests, etc. This could really help cases where I need to debug a
> > > test, and put a breakpoint somewhere in the code. If other tests run
> > > before, the breakpoints will usually get hit lots of times. I guess
> today
> > > these features don't work because the Suite classes initialize the
> > tests. I
> > > don't know enough about jUnit to offer solutions, thought you might
> have.
> > >
> > > >perhaps you could provide links to relevant code.  i'm sorry to say
> that
> > > most times the answer to this kind of stuff isn't obvious.
> > >
> > > Okay, Ill get some example code.
> > >
> > > >i may have messed up the Mutating interface design a bit. looking at
> > > it now, i feel like it could be less coupled to the EventStrategy
> related
> > > features.  I'll take a look at it to see if I can make it "better"
> before
> > > GA.
> > >
> > > Great that would be a big help!
> > >
> > > >we don't have much on bulk insertion in the API. perhaps you should
> > create
> > > an issue for discussion
> > >
> > > https://issues.apache.org/jira/browse/TINKERPOP3-694
> > >
> > >
> > > Thanks again for all the help
> > >
> > > On Wed, 20 May 2015 at 19:53 Stephen Mallette <sp...@gmail.com>
> > > wrote:
> > >
> > > > >
> > > > > The Process coverage seems good. I believe most of the failures are
> > due
> > > > to
> > > > > the fact that I only support string IDs (I think not all tests call
> > the
> > > > > convertId method).
> > > >
> > > >
> > > > hmmm - thought we had rooted all of those out via work with pieter
> > martin
> > > > on sqlg.  please let me know which ones still aren't making those
> > calls.
> > > >
> > > >
> > > > > It would also be great if we could easily run specific tests or
> > classes
> > > > > using junit. at the moment its cumbersome to run a class of tests
> > > > > (updateing the environment variable each time), and impossible to
> > > debug a
> > > > > specific test easily (or at least I haven't found a way).
> > > > >
> > > >
> > > > I don't have a better idea than the environment variable.  you should
> > be
> > > > able to use the debugger though.  works for me in intellij when i've
> > > looked
> > > > at a problem in titan.  i'm not sure if it only works because i have
> > the
> > > > tinkerpop source on my system, but i can step through tinkerpop
> source
> > > and
> > > > titan source interchangeably.  i don't think i did anything specific
> to
> > > > enable that.
> > > >
> > > >
> > > > >    1. We made a custom VertexStep that aggregates traversers, and
> has
> > > > >    steps, to minimize the amount of queries issued. It messed up a
> > few
> > > > > things,
> > > > >    but we got the basic usage working in M9 (guess you fixed some
> > stuff
> > > > for
> > > > >    Titan, which do the same thing). The problem now is that it
> > doesn't
> > > > > work on
> > > > >    inner traversals. For example, Repeat gives out only 1 traverser
> > > every
> > > > >    time. Do you have any suggestions? Am I doing something wrong?
> > > > >
> > > >
> > > > perhaps you could provide links to relevant code.  i'm sorry to say
> > that
> > > > most times the answer to this kind of stuff isn't obvious.
> > > >
> > > >
> > > > >    2. We want to implement a validation strategy. Sort of like
> > > > >    EventStrategy, but it will notify before a mutation, and will
> > enable
> > > > the
> > > > >    user's validation code to cancel a mutation if it doesn't pass
> its
> > > > > checks.
> > > > >    The problem is that there are no "before" callbacks for the
> > Mutating
> > > > >    interface.
> > > > >
> > > >
> > > > i may have messed up the Mutating interface design a bit.  looking at
> > it
> > > > now, i feel like it could be less coupled to the EventStrategy
> related
> > > > features.  I'll take a look at it to see if I can make it "better"
> > before
> > > > GA.  I don't think my changes should affect vendors or the test
> suites,
> > > so
> > > > if it turns out to be that way i'll give it a shot.
> > > >
> > > >
> > > > >    3. Adding in bulk - we added our own functions for bulk inserts,
> > > since
> > > > >    we didn't find anything to support it in the API. The thing is
> we
> > > need
> > > > > this
> > > > >    ability as part of the traversal, so we can utilize the
> validation
> > > > > strategy
> > > > >    (if we can get that working). We thought about inheriting from
> the
> > > Add
> > > > >    steps, but they're final. It'd be great to have somting like
> > > > >    __.inject(vertices).as('x').addV('x'), and have the ability to
> > make
> > > it
> > > > > bulk
> > > > >    load the vertices.
> > > >
> > > >
> > > > we're trying to avoid problems with improper inheritance which messes
> > > with
> > > > traversal strategies - hence steps are typically "final".   we don't
> > have
> > > > much on bulk insertion in the API.  perhaps you should create an
> issue
> > > for
> > > > discussion.
> > > >
> > > > On Wed, May 20, 2015 at 11:08 AM, Ran Magen <rm...@gmail.com>
> wrote:
> > > >
> > > > > > percentage of the tests fire for you given ElasticFeatures?
> > > > >
> > > > > ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored,
> 320
> > > > > passed
> > > > > ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed,
> > 321
> > > > > ignored, 394 passed
> > > > > The Process coverage seems good. I believe most of the failures are
> > due
> > > > to
> > > > > the fact that I only support string IDs (I think not all tests call
> > the
> > > > > convertId method). And some new stuff in M9 that I haven't gotten
> > > around
> > > > to
> > > > > fixing yet. But I'll make sure and open tickets for anything I
> find.
> > > > > It would also be great if we could easily run specific tests or
> > classes
> > > > > using junit. at the moment its cumbersome to run a class of tests
> > > > > (updateing the environment variable each time), and impossible to
> > > debug a
> > > > > specific test easily (or at least I haven't found a way).
> > > > >
> > > > > > we'd be interested in hearing about your issues.
> > > > >
> > > > >    1. We made a custom VertexStep that aggregates traversers, and
> has
> > > > >    steps, to minimize the amount of queries issued. It messed up a
> > few
> > > > > things,
> > > > >    but we got the basic usage working in M9 (guess you fixed some
> > stuff
> > > > for
> > > > >    Titan, which do the same thing). The problem now is that it
> > doesn't
> > > > > work on
> > > > >    inner traversals. For example, Repeat gives out only 1 traverser
> > > every
> > > > >    time. Do you have any suggestions? Am I doing something wrong?
> > > > >    2. We want to implement a validation strategy. Sort of like
> > > > >    EventStrategy, but it will notify before a mutation, and will
> > enable
> > > > the
> > > > >    user's validation code to cancel a mutation if it doesn't pass
> its
> > > > > checks.
> > > > >    The problem is that there are no "before" callbacks for the
> > Mutating
> > > > >    interface. We also thought the strategy could just add a
> > validation
> > > > step
> > > > >    before each mutating step, but that had its own issues. Also,
> the
> > > > >    validation strategy won't work on stuff like graph.addVertex(),
> > but
> > > I
> > > > > guess
> > > > >    we can make sure people only use the traversal.
> > > > >    3. Adding in bulk - we added our own functions for bulk inserts,
> > > since
> > > > >    we didn't find anything to support it in the API. The thing is
> we
> > > need
> > > > > this
> > > > >    ability as part of the traversal, so we can utilize the
> validation
> > > > > strategy
> > > > >    (if we can get that working). We thought about inheriting from
> the
> > > Add
> > > > >    steps, but they're final. It'd be great to have somting like
> > > > >    __.inject(vertices).as('x').addV('x'), and have the ability to
> > make
> > > it
> > > > > bulk
> > > > >    load the vertices.
> > > > >
> > > > > Thank you for your help!
> > > > >
> > > > >
> > > > > On Tue, 19 May 2015 at 01:37 Stephen Mallette <
> spmallette@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for sharing all that additional information.
> > > > > >
> > > > > > > The biggest issue I had was implementing custom steps.
> > > > > >
> > > > > > I think we have a bit of a hole in the docs around that kinda of
> > > stuff
> > > > at
> > > > > > the moment.  You have to be careful with custom steps because the
> > > > > > TraversalStrategy implementations might not behave nicely if they
> > > come
> > > > > > across steps they don't know about.  We've been trying to
> > understand
> > > > the
> > > > > > right set of recommendations to give around that issue which is
> > most
> > > of
> > > > > the
> > > > > > reason we probably don't have docs developed yet.  If you'd like
> to
> > > > > > elaborate as you offered, we'd be interested in hearing about
> your
> > > > > issues.
> > > > > >
> > > > > > > The Test Suite is awesome!
> > > > > >
> > > > > > That is excellent to hear.  Not many people have to interact with
> > the
> > > > > test
> > > > > > suite directly but it is super critical part of the TinkerPop
> > > > Ecosystem -
> > > > > > if those who have to use is aren't satisfied with it, I'd
> consider
> > > > that a
> > > > > > big problem.
> > > > > >
> > > > > > > Just a thought, it would be great if failing tests would print
> > some
> > > > > kind
> > > > > > of "DEBUG" logs from the steps (or something like the profile
> > step's
> > > > > > output), so it's easier to figure out what step isn't working
> > > properly
> > > > > and
> > > > > > why .
> > > > > >
> > > > > > Still trying to figure that out (i.e. what's the most useful way
> to
> > > > > "DEBUG"
> > > > > > things).  We don't do logging in gremlin-core so there isn't much
> > to
> > > > > output
> > > > > > there.  I'm hoping that this ticket will be useful in this area:
> > > > > >
> > > > > > https://issues.apache.org/jira/browse/TINKERPOP3-679
> > > > > >
> > > > > > I did give a look at your implementation code.  I noticed that
> you
> > > only
> > > > > had
> > > > > > to @OptOut of a couple of tests - not bad, though I'm not sure
> how
> > > much
> > > > > of
> > > > > > the test suite fires under your ElasticFeatures implementation.
> We
> > > > tried
> > > > > > to write tests to allow maximum coverage given the most common
> > > feature
> > > > > set
> > > > > > - hopefully you receive good coverage under that model.  Can you
> > > share
> > > > > what
> > > > > > percentage of the tests fire for you given ElasticFeatures?
> > > > > >
> > > > > > Speaking of ElasticFeatures, you might want to make this a static
> > > > > > reference:
> > > > > >
> > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> > > > > >
> > > > > > and try to generally reduce anonymous object creation within
> > > > > > ElasticFeatures itself.  You don't want to create a new instance
> of
> > > > that
> > > > > > stuff for every feature check - we do a internal feature checking
> > in
> > > > > > different part of the stack and it could create a lot
> > > > > > of unnecessary objects for you.
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hey Stephen,
> > > > > > >
> > > > > > > ElasticGraph can be seen as an alternative to Titan - a big
> > > > scaled-out
> > > > > > > graph with indices (currentlywe we only have OLTP, but will add
> > > OLAP
> > > > > > soon).
> > > > > > > We're a company that started out a project using Titan, but it
> > > lacked
> > > > > > some
> > > > > > > capabilities we needed:
> > > > > > >
> > > > > > >    - Speed, especially with regards to using text/number/geo
> > > indices.
> > > > > Our
> > > > > > >    benchmarks showed that ES could function much faster than
> the
> > > > > > > performance
> > > > > > >    we were getting from Titan.
> > > > > > >    - Partitioning the data - useful for optimizing indexed
> > queries
> > > on
> > > > > ES
> > > > > > >    (Titan also uses ES, but doesn't include these
> optimizations).
> > > > Plus,
> > > > > > it
> > > > > > >    allows you to manage the data for your specific needs. For
> > > example
> > > > > if
> > > > > > > you
> > > > > > >    have a graph with real-time events coming in, and you want
> to
> > > > > > > periodically
> > > > > > >    delete all the old events, you can partition the data by
> time.
> > > > > > >    - The spatial capabilities didn't support all the features
> we
> > > > > needed.
> > > > > > >    - Titan's future was in question
> > > > > > >    <
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > > > > > >
> > > > > > >    .
> > > > > > >    - And a bunch of other small issues.
> > > > > > >
> > > > > > > We thought about contributing to Titan to add these
> capabilites,
> > > but
> > > > > > > Titan's architecture (which separates the indexing backend from
> > the
> > > > > > "main"
> > > > > > > store) made it difficult. Plus Titan has a big codebase
> > supporting
> > > > many
> > > > > > > different BEs. At the end we figured it would just be simpler
> to
> > > > > implenet
> > > > > > > TP directly on ES. It also sparse us from maintaining an extra
> > > > > > > hbase/cassandra cluster.
> > > > > > > We figured more people might have stumbled across these issues,
> > so
> > > > > we're
> > > > > > > sharing the code.
> > > > > > >
> > > > > > > Numbers - we've gotten up to a few billions at this point in
> our
> > > > tests,
> > > > > > but
> > > > > > > I'm pretty confident on its ability to scale further.
> > > > > > >
> > > > > > > As for developing for TP, it's been mostly great :) The
> > > architecture
> > > > is
> > > > > > > very powerful, and gremlin 3 is turning out to be a great
> > querying
> > > > > > > language. And most importantly, it's fast to implement it.
> > > > > > > The biggest issue I had was implementing custom steps. Apart
> from
> > > > > > GraphStep
> > > > > > > (which has a simple example in TinkerGraph), the other steps
> are
> > > > pretty
> > > > > > > hard to figure out. For example we implemented a VertexStep
> that
> > > > > batches
> > > > > > up
> > > > > > > traversers and their has steps to query them together, and had
> > many
> > > > > > issues
> > > > > > > (I can elaborate if you want). We actually still have a pretty
> > big
> > > > > issue
> > > > > > > I'll raise in another thread.
> > > > > > >
> > > > > > > The Test Suite is awesome! It would be practically impossible
> to
> > > > > > implement
> > > > > > > TP so fast and easily without it. Just a thought, it would be
> > great
> > > > if
> > > > > > > failing tests would print some kind of "DEBUG" logs from the
> > steps
> > > > (or
> > > > > > > something like the profile step's output), so it's easier to
> > figure
> > > > out
> > > > > > > what step isn't working properly and why .
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <
> > > spmallette@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Thanks for sharing your project. Looks like you've
> implemented
> > > both
> > > > > the
> > > > > > > > structure and process suites in ElasticGraph up to the latest
> > M9
> > > > > > release
> > > > > > > > candidate - very nice.
> > > > > > > >
> > > > > > > > Where would you say that this implementation fits?  Are there
> > > > > specific
> > > > > > > uses
> > > > > > > > cases where you would want to use ElasticGraph over other
> > > > > > > implementations?
> > > > > > > > When you say that "we're already using it with very big
> graphs"
> > > can
> > > > > you
> > > > > > > > qualify that a bit (millions of edge, billions of edges,
> etc.)?
> > > > > > > >
> > > > > > > > Finally, more specifically related to TinkerPop, did you
> > > encounter
> > > > > any
> > > > > > > > challenges in implementing the APIs or the Test Suite itself?
> > > > > > > >
> > > > > > > >
> > > > > > > >
> > > > > > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rmagen@gmail.com
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hey guys,
> > > > > > > > > Just wanted to let you know about a TP3 implementation
> we're
> > > > > working
> > > > > > > on.
> > > > > > > > > It's based on elastic-search, enabling very good
> scalability
> > > and
> > > > > > > indexing
> > > > > > > > > capabilities.
> > > > > > > > > You can find the code here <
> > > > > > https://github.com/rmagen/elastic-gremlin
> > > > > > > >.
> > > > > > > > >
> > > > > > > > > This is still very much a work in progress (still more
> > features
> > > > and
> > > > > > > > > optimizations planned, and some bugs to fix), but we're
> > already
> > > > > using
> > > > > > > it
> > > > > > > > > with very big graphs.
> > > > > > > > >
> > > > > > > > > I would appreciate any feedback!
> > > > > > > > > Cheers,
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Posted by Ran Magen <rm...@gmail.com>.

>i may have messed up the Mutating interface design a bit.  looking at it
now, i feel like it could be less coupled to the EventStrategy related
features.  I'll take a look at it to see if I can make it "better" before
GA.  I don't think my changes should affect vendors or the test suites, so
if it turns out to be that way i'll give it a shot.

Any progress? Should I open a ticket for this?

On Wed, 20 May 2015 at 22:17 Stephen Mallette <sp...@gmail.com> wrote:

> >  I guess today these features don't work because the Suite classes
> initialize the tests
>
> right - because we have the custom test suites the tests are determine more
> dynamically so your ability to right-click/run is kinda lost. :/
>
> On Wed, May 20, 2015 at 2:47 PM, Ran Magen <rm...@gmail.com> wrote:
>
> > >I don't have a better idea than the environment variable.  you should be
> > able to use the debugger though.  works for me in intellij when i've
> looked
> > at a problem in titan.  i'm not sure if it only works because i have the
> > tinkerpop source on my system, but i can step through tinkerpop source
> > and titan source interchangeably.  i don't think i did anything specific
> > to enable that.
> >
> > I wasn't clear. I use intellij, and it has simple shortcuts to run tests:
> > right clicking on a test method/class and clicking run, rerunning only
> > failed tests, etc. This could really help cases where I need to debug a
> > test, and put a breakpoint somewhere in the code. If other tests run
> > before, the breakpoints will usually get hit lots of times. I guess today
> > these features don't work because the Suite classes initialize the
> tests. I
> > don't know enough about jUnit to offer solutions, thought you might have.
> >
> > >perhaps you could provide links to relevant code.  i'm sorry to say that
> > most times the answer to this kind of stuff isn't obvious.
> >
> > Okay, Ill get some example code.
> >
> > >i may have messed up the Mutating interface design a bit. looking at
> > it now, i feel like it could be less coupled to the EventStrategy related
> > features.  I'll take a look at it to see if I can make it "better" before
> > GA.
> >
> > Great that would be a big help!
> >
> > >we don't have much on bulk insertion in the API. perhaps you should
> create
> > an issue for discussion
> >
> > https://issues.apache.org/jira/browse/TINKERPOP3-694
> >
> >
> > Thanks again for all the help
> >
> > On Wed, 20 May 2015 at 19:53 Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > >
> > > > The Process coverage seems good. I believe most of the failures are
> due
> > > to
> > > > the fact that I only support string IDs (I think not all tests call
> the
> > > > convertId method).
> > >
> > >
> > > hmmm - thought we had rooted all of those out via work with pieter
> martin
> > > on sqlg.  please let me know which ones still aren't making those
> calls.
> > >
> > >
> > > > It would also be great if we could easily run specific tests or
> classes
> > > > using junit. at the moment its cumbersome to run a class of tests
> > > > (updateing the environment variable each time), and impossible to
> > debug a
> > > > specific test easily (or at least I haven't found a way).
> > > >
> > >
> > > I don't have a better idea than the environment variable.  you should
> be
> > > able to use the debugger though.  works for me in intellij when i've
> > looked
> > > at a problem in titan.  i'm not sure if it only works because i have
> the
> > > tinkerpop source on my system, but i can step through tinkerpop source
> > and
> > > titan source interchangeably.  i don't think i did anything specific to
> > > enable that.
> > >
> > >
> > > >    1. We made a custom VertexStep that aggregates traversers, and has
> > > >    steps, to minimize the amount of queries issued. It messed up a
> few
> > > > things,
> > > >    but we got the basic usage working in M9 (guess you fixed some
> stuff
> > > for
> > > >    Titan, which do the same thing). The problem now is that it
> doesn't
> > > > work on
> > > >    inner traversals. For example, Repeat gives out only 1 traverser
> > every
> > > >    time. Do you have any suggestions? Am I doing something wrong?
> > > >
> > >
> > > perhaps you could provide links to relevant code.  i'm sorry to say
> that
> > > most times the answer to this kind of stuff isn't obvious.
> > >
> > >
> > > >    2. We want to implement a validation strategy. Sort of like
> > > >    EventStrategy, but it will notify before a mutation, and will
> enable
> > > the
> > > >    user's validation code to cancel a mutation if it doesn't pass its
> > > > checks.
> > > >    The problem is that there are no "before" callbacks for the
> Mutating
> > > >    interface.
> > > >
> > >
> > > i may have messed up the Mutating interface design a bit.  looking at
> it
> > > now, i feel like it could be less coupled to the EventStrategy related
> > > features.  I'll take a look at it to see if I can make it "better"
> before
> > > GA.  I don't think my changes should affect vendors or the test suites,
> > so
> > > if it turns out to be that way i'll give it a shot.
> > >
> > >
> > > >    3. Adding in bulk - we added our own functions for bulk inserts,
> > since
> > > >    we didn't find anything to support it in the API. The thing is we
> > need
> > > > this
> > > >    ability as part of the traversal, so we can utilize the validation
> > > > strategy
> > > >    (if we can get that working). We thought about inheriting from the
> > Add
> > > >    steps, but they're final. It'd be great to have somting like
> > > >    __.inject(vertices).as('x').addV('x'), and have the ability to
> make
> > it
> > > > bulk
> > > >    load the vertices.
> > >
> > >
> > > we're trying to avoid problems with improper inheritance which messes
> > with
> > > traversal strategies - hence steps are typically "final".   we don't
> have
> > > much on bulk insertion in the API.  perhaps you should create an issue
> > for
> > > discussion.
> > >
> > > On Wed, May 20, 2015 at 11:08 AM, Ran Magen <rm...@gmail.com> wrote:
> > >
> > > > > percentage of the tests fire for you given ElasticFeatures?
> > > >
> > > > ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320
> > > > passed
> > > > ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed,
> 321
> > > > ignored, 394 passed
> > > > The Process coverage seems good. I believe most of the failures are
> due
> > > to
> > > > the fact that I only support string IDs (I think not all tests call
> the
> > > > convertId method). And some new stuff in M9 that I haven't gotten
> > around
> > > to
> > > > fixing yet. But I'll make sure and open tickets for anything I find.
> > > > It would also be great if we could easily run specific tests or
> classes
> > > > using junit. at the moment its cumbersome to run a class of tests
> > > > (updateing the environment variable each time), and impossible to
> > debug a
> > > > specific test easily (or at least I haven't found a way).
> > > >
> > > > > we'd be interested in hearing about your issues.
> > > >
> > > >    1. We made a custom VertexStep that aggregates traversers, and has
> > > >    steps, to minimize the amount of queries issued. It messed up a
> few
> > > > things,
> > > >    but we got the basic usage working in M9 (guess you fixed some
> stuff
> > > for
> > > >    Titan, which do the same thing). The problem now is that it
> doesn't
> > > > work on
> > > >    inner traversals. For example, Repeat gives out only 1 traverser
> > every
> > > >    time. Do you have any suggestions? Am I doing something wrong?
> > > >    2. We want to implement a validation strategy. Sort of like
> > > >    EventStrategy, but it will notify before a mutation, and will
> enable
> > > the
> > > >    user's validation code to cancel a mutation if it doesn't pass its
> > > > checks.
> > > >    The problem is that there are no "before" callbacks for the
> Mutating
> > > >    interface. We also thought the strategy could just add a
> validation
> > > step
> > > >    before each mutating step, but that had its own issues. Also, the
> > > >    validation strategy won't work on stuff like graph.addVertex(),
> but
> > I
> > > > guess
> > > >    we can make sure people only use the traversal.
> > > >    3. Adding in bulk - we added our own functions for bulk inserts,
> > since
> > > >    we didn't find anything to support it in the API. The thing is we
> > need
> > > > this
> > > >    ability as part of the traversal, so we can utilize the validation
> > > > strategy
> > > >    (if we can get that working). We thought about inheriting from the
> > Add
> > > >    steps, but they're final. It'd be great to have somting like
> > > >    __.inject(vertices).as('x').addV('x'), and have the ability to
> make
> > it
> > > > bulk
> > > >    load the vertices.
> > > >
> > > > Thank you for your help!
> > > >
> > > >
> > > > On Tue, 19 May 2015 at 01:37 Stephen Mallette <sp...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for sharing all that additional information.
> > > > >
> > > > > > The biggest issue I had was implementing custom steps.
> > > > >
> > > > > I think we have a bit of a hole in the docs around that kinda of
> > stuff
> > > at
> > > > > the moment.  You have to be careful with custom steps because the
> > > > > TraversalStrategy implementations might not behave nicely if they
> > come
> > > > > across steps they don't know about.  We've been trying to
> understand
> > > the
> > > > > right set of recommendations to give around that issue which is
> most
> > of
> > > > the
> > > > > reason we probably don't have docs developed yet.  If you'd like to
> > > > > elaborate as you offered, we'd be interested in hearing about your
> > > > issues.
> > > > >
> > > > > > The Test Suite is awesome!
> > > > >
> > > > > That is excellent to hear.  Not many people have to interact with
> the
> > > > test
> > > > > suite directly but it is super critical part of the TinkerPop
> > > Ecosystem -
> > > > > if those who have to use is aren't satisfied with it, I'd consider
> > > that a
> > > > > big problem.
> > > > >
> > > > > > Just a thought, it would be great if failing tests would print
> some
> > > > kind
> > > > > of "DEBUG" logs from the steps (or something like the profile
> step's
> > > > > output), so it's easier to figure out what step isn't working
> > properly
> > > > and
> > > > > why .
> > > > >
> > > > > Still trying to figure that out (i.e. what's the most useful way to
> > > > "DEBUG"
> > > > > things).  We don't do logging in gremlin-core so there isn't much
> to
> > > > output
> > > > > there.  I'm hoping that this ticket will be useful in this area:
> > > > >
> > > > > https://issues.apache.org/jira/browse/TINKERPOP3-679
> > > > >
> > > > > I did give a look at your implementation code.  I noticed that you
> > only
> > > > had
> > > > > to @OptOut of a couple of tests - not bad, though I'm not sure how
> > much
> > > > of
> > > > > the test suite fires under your ElasticFeatures implementation.  We
> > > tried
> > > > > to write tests to allow maximum coverage given the most common
> > feature
> > > > set
> > > > > - hopefully you receive good coverage under that model.  Can you
> > share
> > > > what
> > > > > percentage of the tests fire for you given ElasticFeatures?
> > > > >
> > > > > Speaking of ElasticFeatures, you might want to make this a static
> > > > > reference:
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> > > > >
> > > > > and try to generally reduce anonymous object creation within
> > > > > ElasticFeatures itself.  You don't want to create a new instance of
> > > that
> > > > > stuff for every feature check - we do a internal feature checking
> in
> > > > > different part of the stack and it could create a lot
> > > > > of unnecessary objects for you.
> > > > >
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com>
> wrote:
> > > > >
> > > > > > Hey Stephen,
> > > > > >
> > > > > > ElasticGraph can be seen as an alternative to Titan - a big
> > > scaled-out
> > > > > > graph with indices (currentlywe we only have OLTP, but will add
> > OLAP
> > > > > soon).
> > > > > > We're a company that started out a project using Titan, but it
> > lacked
> > > > > some
> > > > > > capabilities we needed:
> > > > > >
> > > > > >    - Speed, especially with regards to using text/number/geo
> > indices.
> > > > Our
> > > > > >    benchmarks showed that ES could function much faster than the
> > > > > > performance
> > > > > >    we were getting from Titan.
> > > > > >    - Partitioning the data - useful for optimizing indexed
> queries
> > on
> > > > ES
> > > > > >    (Titan also uses ES, but doesn't include these optimizations).
> > > Plus,
> > > > > it
> > > > > >    allows you to manage the data for your specific needs. For
> > example
> > > > if
> > > > > > you
> > > > > >    have a graph with real-time events coming in, and you want to
> > > > > > periodically
> > > > > >    delete all the old events, you can partition the data by time.
> > > > > >    - The spatial capabilities didn't support all the features we
> > > > needed.
> > > > > >    - Titan's future was in question
> > > > > >    <
> > > > > >
> > > > >
> > > >
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > > > > >
> > > > > >    .
> > > > > >    - And a bunch of other small issues.
> > > > > >
> > > > > > We thought about contributing to Titan to add these capabilites,
> > but
> > > > > > Titan's architecture (which separates the indexing backend from
> the
> > > > > "main"
> > > > > > store) made it difficult. Plus Titan has a big codebase
> supporting
> > > many
> > > > > > different BEs. At the end we figured it would just be simpler to
> > > > implenet
> > > > > > TP directly on ES. It also sparse us from maintaining an extra
> > > > > > hbase/cassandra cluster.
> > > > > > We figured more people might have stumbled across these issues,
> so
> > > > we're
> > > > > > sharing the code.
> > > > > >
> > > > > > Numbers - we've gotten up to a few billions at this point in our
> > > tests,
> > > > > but
> > > > > > I'm pretty confident on its ability to scale further.
> > > > > >
> > > > > > As for developing for TP, it's been mostly great :) The
> > architecture
> > > is
> > > > > > very powerful, and gremlin 3 is turning out to be a great
> querying
> > > > > > language. And most importantly, it's fast to implement it.
> > > > > > The biggest issue I had was implementing custom steps. Apart from
> > > > > GraphStep
> > > > > > (which has a simple example in TinkerGraph), the other steps are
> > > pretty
> > > > > > hard to figure out. For example we implemented a VertexStep that
> > > > batches
> > > > > up
> > > > > > traversers and their has steps to query them together, and had
> many
> > > > > issues
> > > > > > (I can elaborate if you want). We actually still have a pretty
> big
> > > > issue
> > > > > > I'll raise in another thread.
> > > > > >
> > > > > > The Test Suite is awesome! It would be practically impossible to
> > > > > implement
> > > > > > TP so fast and easily without it. Just a thought, it would be
> great
> > > if
> > > > > > failing tests would print some kind of "DEBUG" logs from the
> steps
> > > (or
> > > > > > something like the profile step's output), so it's easier to
> figure
> > > out
> > > > > > what step isn't working properly and why .
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <
> > spmallette@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Thanks for sharing your project. Looks like you've implemented
> > both
> > > > the
> > > > > > > structure and process suites in ElasticGraph up to the latest
> M9
> > > > > release
> > > > > > > candidate - very nice.
> > > > > > >
> > > > > > > Where would you say that this implementation fits?  Are there
> > > > specific
> > > > > > uses
> > > > > > > cases where you would want to use ElasticGraph over other
> > > > > > implementations?
> > > > > > > When you say that "we're already using it with very big graphs"
> > can
> > > > you
> > > > > > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > > > > > >
> > > > > > > Finally, more specifically related to TinkerPop, did you
> > encounter
> > > > any
> > > > > > > challenges in implementing the APIs or the Test Suite itself?
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com>
> > > wrote:
> > > > > > >
> > > > > > > > Hey guys,
> > > > > > > > Just wanted to let you know about a TP3 implementation we're
> > > > working
> > > > > > on.
> > > > > > > > It's based on elastic-search, enabling very good scalability
> > and
> > > > > > indexing
> > > > > > > > capabilities.
> > > > > > > > You can find the code here <
> > > > > https://github.com/rmagen/elastic-gremlin
> > > > > > >.
> > > > > > > >
> > > > > > > > This is still very much a work in progress (still more
> features
> > > and
> > > > > > > > optimizations planned, and some bugs to fix), but we're
> already
> > > > using
> > > > > > it
> > > > > > > > with very big graphs.
> > > > > > > >
> > > > > > > > I would appreciate any feedback!
> > > > > > > > Cheers,
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Posted by Stephen Mallette <sp...@gmail.com>.

>  I guess today these features don't work because the Suite classes
initialize the tests

right - because we have the custom test suites the tests are determine more
dynamically so your ability to right-click/run is kinda lost. :/

On Wed, May 20, 2015 at 2:47 PM, Ran Magen <rm...@gmail.com> wrote:

> >I don't have a better idea than the environment variable.  you should be
> able to use the debugger though.  works for me in intellij when i've looked
> at a problem in titan.  i'm not sure if it only works because i have the
> tinkerpop source on my system, but i can step through tinkerpop source
> and titan source interchangeably.  i don't think i did anything specific
> to enable that.
>
> I wasn't clear. I use intellij, and it has simple shortcuts to run tests:
> right clicking on a test method/class and clicking run, rerunning only
> failed tests, etc. This could really help cases where I need to debug a
> test, and put a breakpoint somewhere in the code. If other tests run
> before, the breakpoints will usually get hit lots of times. I guess today
> these features don't work because the Suite classes initialize the tests. I
> don't know enough about jUnit to offer solutions, thought you might have.
>
> >perhaps you could provide links to relevant code.  i'm sorry to say that
> most times the answer to this kind of stuff isn't obvious.
>
> Okay, Ill get some example code.
>
> >i may have messed up the Mutating interface design a bit. looking at
> it now, i feel like it could be less coupled to the EventStrategy related
> features.  I'll take a look at it to see if I can make it "better" before
> GA.
>
> Great that would be a big help!
>
> >we don't have much on bulk insertion in the API. perhaps you should create
> an issue for discussion
>
> https://issues.apache.org/jira/browse/TINKERPOP3-694
>
>
> Thanks again for all the help
>
> On Wed, 20 May 2015 at 19:53 Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > >
> > > The Process coverage seems good. I believe most of the failures are due
> > to
> > > the fact that I only support string IDs (I think not all tests call the
> > > convertId method).
> >
> >
> > hmmm - thought we had rooted all of those out via work with pieter martin
> > on sqlg.  please let me know which ones still aren't making those calls.
> >
> >
> > > It would also be great if we could easily run specific tests or classes
> > > using junit. at the moment its cumbersome to run a class of tests
> > > (updateing the environment variable each time), and impossible to
> debug a
> > > specific test easily (or at least I haven't found a way).
> > >
> >
> > I don't have a better idea than the environment variable.  you should be
> > able to use the debugger though.  works for me in intellij when i've
> looked
> > at a problem in titan.  i'm not sure if it only works because i have the
> > tinkerpop source on my system, but i can step through tinkerpop source
> and
> > titan source interchangeably.  i don't think i did anything specific to
> > enable that.
> >
> >
> > >    1. We made a custom VertexStep that aggregates traversers, and has
> > >    steps, to minimize the amount of queries issued. It messed up a few
> > > things,
> > >    but we got the basic usage working in M9 (guess you fixed some stuff
> > for
> > >    Titan, which do the same thing). The problem now is that it doesn't
> > > work on
> > >    inner traversals. For example, Repeat gives out only 1 traverser
> every
> > >    time. Do you have any suggestions? Am I doing something wrong?
> > >
> >
> > perhaps you could provide links to relevant code.  i'm sorry to say that
> > most times the answer to this kind of stuff isn't obvious.
> >
> >
> > >    2. We want to implement a validation strategy. Sort of like
> > >    EventStrategy, but it will notify before a mutation, and will enable
> > the
> > >    user's validation code to cancel a mutation if it doesn't pass its
> > > checks.
> > >    The problem is that there are no "before" callbacks for the Mutating
> > >    interface.
> > >
> >
> > i may have messed up the Mutating interface design a bit.  looking at it
> > now, i feel like it could be less coupled to the EventStrategy related
> > features.  I'll take a look at it to see if I can make it "better" before
> > GA.  I don't think my changes should affect vendors or the test suites,
> so
> > if it turns out to be that way i'll give it a shot.
> >
> >
> > >    3. Adding in bulk - we added our own functions for bulk inserts,
> since
> > >    we didn't find anything to support it in the API. The thing is we
> need
> > > this
> > >    ability as part of the traversal, so we can utilize the validation
> > > strategy
> > >    (if we can get that working). We thought about inheriting from the
> Add
> > >    steps, but they're final. It'd be great to have somting like
> > >    __.inject(vertices).as('x').addV('x'), and have the ability to make
> it
> > > bulk
> > >    load the vertices.
> >
> >
> > we're trying to avoid problems with improper inheritance which messes
> with
> > traversal strategies - hence steps are typically "final".   we don't have
> > much on bulk insertion in the API.  perhaps you should create an issue
> for
> > discussion.
> >
> > On Wed, May 20, 2015 at 11:08 AM, Ran Magen <rm...@gmail.com> wrote:
> >
> > > > percentage of the tests fire for you given ElasticFeatures?
> > >
> > > ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320
> > > passed
> > > ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, 321
> > > ignored, 394 passed
> > > The Process coverage seems good. I believe most of the failures are due
> > to
> > > the fact that I only support string IDs (I think not all tests call the
> > > convertId method). And some new stuff in M9 that I haven't gotten
> around
> > to
> > > fixing yet. But I'll make sure and open tickets for anything I find.
> > > It would also be great if we could easily run specific tests or classes
> > > using junit. at the moment its cumbersome to run a class of tests
> > > (updateing the environment variable each time), and impossible to
> debug a
> > > specific test easily (or at least I haven't found a way).
> > >
> > > > we'd be interested in hearing about your issues.
> > >
> > >    1. We made a custom VertexStep that aggregates traversers, and has
> > >    steps, to minimize the amount of queries issued. It messed up a few
> > > things,
> > >    but we got the basic usage working in M9 (guess you fixed some stuff
> > for
> > >    Titan, which do the same thing). The problem now is that it doesn't
> > > work on
> > >    inner traversals. For example, Repeat gives out only 1 traverser
> every
> > >    time. Do you have any suggestions? Am I doing something wrong?
> > >    2. We want to implement a validation strategy. Sort of like
> > >    EventStrategy, but it will notify before a mutation, and will enable
> > the
> > >    user's validation code to cancel a mutation if it doesn't pass its
> > > checks.
> > >    The problem is that there are no "before" callbacks for the Mutating
> > >    interface. We also thought the strategy could just add a validation
> > step
> > >    before each mutating step, but that had its own issues. Also, the
> > >    validation strategy won't work on stuff like graph.addVertex(), but
> I
> > > guess
> > >    we can make sure people only use the traversal.
> > >    3. Adding in bulk - we added our own functions for bulk inserts,
> since
> > >    we didn't find anything to support it in the API. The thing is we
> need
> > > this
> > >    ability as part of the traversal, so we can utilize the validation
> > > strategy
> > >    (if we can get that working). We thought about inheriting from the
> Add
> > >    steps, but they're final. It'd be great to have somting like
> > >    __.inject(vertices).as('x').addV('x'), and have the ability to make
> it
> > > bulk
> > >    load the vertices.
> > >
> > > Thank you for your help!
> > >
> > >
> > > On Tue, 19 May 2015 at 01:37 Stephen Mallette <sp...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for sharing all that additional information.
> > > >
> > > > > The biggest issue I had was implementing custom steps.
> > > >
> > > > I think we have a bit of a hole in the docs around that kinda of
> stuff
> > at
> > > > the moment.  You have to be careful with custom steps because the
> > > > TraversalStrategy implementations might not behave nicely if they
> come
> > > > across steps they don't know about.  We've been trying to understand
> > the
> > > > right set of recommendations to give around that issue which is most
> of
> > > the
> > > > reason we probably don't have docs developed yet.  If you'd like to
> > > > elaborate as you offered, we'd be interested in hearing about your
> > > issues.
> > > >
> > > > > The Test Suite is awesome!
> > > >
> > > > That is excellent to hear.  Not many people have to interact with the
> > > test
> > > > suite directly but it is super critical part of the TinkerPop
> > Ecosystem -
> > > > if those who have to use is aren't satisfied with it, I'd consider
> > that a
> > > > big problem.
> > > >
> > > > > Just a thought, it would be great if failing tests would print some
> > > kind
> > > > of "DEBUG" logs from the steps (or something like the profile step's
> > > > output), so it's easier to figure out what step isn't working
> properly
> > > and
> > > > why .
> > > >
> > > > Still trying to figure that out (i.e. what's the most useful way to
> > > "DEBUG"
> > > > things).  We don't do logging in gremlin-core so there isn't much to
> > > output
> > > > there.  I'm hoping that this ticket will be useful in this area:
> > > >
> > > > https://issues.apache.org/jira/browse/TINKERPOP3-679
> > > >
> > > > I did give a look at your implementation code.  I noticed that you
> only
> > > had
> > > > to @OptOut of a couple of tests - not bad, though I'm not sure how
> much
> > > of
> > > > the test suite fires under your ElasticFeatures implementation.  We
> > tried
> > > > to write tests to allow maximum coverage given the most common
> feature
> > > set
> > > > - hopefully you receive good coverage under that model.  Can you
> share
> > > what
> > > > percentage of the tests fire for you given ElasticFeatures?
> > > >
> > > > Speaking of ElasticFeatures, you might want to make this a static
> > > > reference:
> > > >
> > > >
> > > >
> > >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> > > >
> > > > and try to generally reduce anonymous object creation within
> > > > ElasticFeatures itself.  You don't want to create a new instance of
> > that
> > > > stuff for every feature check - we do a internal feature checking in
> > > > different part of the stack and it could create a lot
> > > > of unnecessary objects for you.
> > > >
> > > >
> > > >
> > > >
> > > > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com> wrote:
> > > >
> > > > > Hey Stephen,
> > > > >
> > > > > ElasticGraph can be seen as an alternative to Titan - a big
> > scaled-out
> > > > > graph with indices (currentlywe we only have OLTP, but will add
> OLAP
> > > > soon).
> > > > > We're a company that started out a project using Titan, but it
> lacked
> > > > some
> > > > > capabilities we needed:
> > > > >
> > > > >    - Speed, especially with regards to using text/number/geo
> indices.
> > > Our
> > > > >    benchmarks showed that ES could function much faster than the
> > > > > performance
> > > > >    we were getting from Titan.
> > > > >    - Partitioning the data - useful for optimizing indexed queries
> on
> > > ES
> > > > >    (Titan also uses ES, but doesn't include these optimizations).
> > Plus,
> > > > it
> > > > >    allows you to manage the data for your specific needs. For
> example
> > > if
> > > > > you
> > > > >    have a graph with real-time events coming in, and you want to
> > > > > periodically
> > > > >    delete all the old events, you can partition the data by time.
> > > > >    - The spatial capabilities didn't support all the features we
> > > needed.
> > > > >    - Titan's future was in question
> > > > >    <
> > > > >
> > > >
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > > > >
> > > > >    .
> > > > >    - And a bunch of other small issues.
> > > > >
> > > > > We thought about contributing to Titan to add these capabilites,
> but
> > > > > Titan's architecture (which separates the indexing backend from the
> > > > "main"
> > > > > store) made it difficult. Plus Titan has a big codebase supporting
> > many
> > > > > different BEs. At the end we figured it would just be simpler to
> > > implenet
> > > > > TP directly on ES. It also sparse us from maintaining an extra
> > > > > hbase/cassandra cluster.
> > > > > We figured more people might have stumbled across these issues, so
> > > we're
> > > > > sharing the code.
> > > > >
> > > > > Numbers - we've gotten up to a few billions at this point in our
> > tests,
> > > > but
> > > > > I'm pretty confident on its ability to scale further.
> > > > >
> > > > > As for developing for TP, it's been mostly great :) The
> architecture
> > is
> > > > > very powerful, and gremlin 3 is turning out to be a great querying
> > > > > language. And most importantly, it's fast to implement it.
> > > > > The biggest issue I had was implementing custom steps. Apart from
> > > > GraphStep
> > > > > (which has a simple example in TinkerGraph), the other steps are
> > pretty
> > > > > hard to figure out. For example we implemented a VertexStep that
> > > batches
> > > > up
> > > > > traversers and their has steps to query them together, and had many
> > > > issues
> > > > > (I can elaborate if you want). We actually still have a pretty big
> > > issue
> > > > > I'll raise in another thread.
> > > > >
> > > > > The Test Suite is awesome! It would be practically impossible to
> > > > implement
> > > > > TP so fast and easily without it. Just a thought, it would be great
> > if
> > > > > failing tests would print some kind of "DEBUG" logs from the steps
> > (or
> > > > > something like the profile step's output), so it's easier to figure
> > out
> > > > > what step isn't working properly and why .
> > > > >
> > > > >
> > > > >
> > > > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <
> spmallette@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Thanks for sharing your project. Looks like you've implemented
> both
> > > the
> > > > > > structure and process suites in ElasticGraph up to the latest M9
> > > > release
> > > > > > candidate - very nice.
> > > > > >
> > > > > > Where would you say that this implementation fits?  Are there
> > > specific
> > > > > uses
> > > > > > cases where you would want to use ElasticGraph over other
> > > > > implementations?
> > > > > > When you say that "we're already using it with very big graphs"
> can
> > > you
> > > > > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > > > > >
> > > > > > Finally, more specifically related to TinkerPop, did you
> encounter
> > > any
> > > > > > challenges in implementing the APIs or the Test Suite itself?
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com>
> > wrote:
> > > > > >
> > > > > > > Hey guys,
> > > > > > > Just wanted to let you know about a TP3 implementation we're
> > > working
> > > > > on.
> > > > > > > It's based on elastic-search, enabling very good scalability
> and
> > > > > indexing
> > > > > > > capabilities.
> > > > > > > You can find the code here <
> > > > https://github.com/rmagen/elastic-gremlin
> > > > > >.
> > > > > > >
> > > > > > > This is still very much a work in progress (still more features
> > and
> > > > > > > optimizations planned, and some bugs to fix), but we're already
> > > using
> > > > > it
> > > > > > > with very big graphs.
> > > > > > >
> > > > > > > I would appreciate any feedback!
> > > > > > > Cheers,
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Posted by Ran Magen <rm...@gmail.com>.

>I don't have a better idea than the environment variable.  you should be
able to use the debugger though.  works for me in intellij when i've looked
at a problem in titan.  i'm not sure if it only works because i have the
tinkerpop source on my system, but i can step through tinkerpop source
and titan source interchangeably.  i don't think i did anything specific
to enable that.

I wasn't clear. I use intellij, and it has simple shortcuts to run tests:
right clicking on a test method/class and clicking run, rerunning only
failed tests, etc. This could really help cases where I need to debug a
test, and put a breakpoint somewhere in the code. If other tests run
before, the breakpoints will usually get hit lots of times. I guess today
these features don't work because the Suite classes initialize the tests. I
don't know enough about jUnit to offer solutions, thought you might have.

>perhaps you could provide links to relevant code.  i'm sorry to say that
most times the answer to this kind of stuff isn't obvious.

Okay, Ill get some example code.

>i may have messed up the Mutating interface design a bit. looking at
it now, i feel like it could be less coupled to the EventStrategy related
features.  I'll take a look at it to see if I can make it "better" before
GA.

Great that would be a big help!

>we don't have much on bulk insertion in the API. perhaps you should create
an issue for discussion

https://issues.apache.org/jira/browse/TINKERPOP3-694


Thanks again for all the help

On Wed, 20 May 2015 at 19:53 Stephen Mallette <sp...@gmail.com> wrote:

> >
> > The Process coverage seems good. I believe most of the failures are due
> to
> > the fact that I only support string IDs (I think not all tests call the
> > convertId method).
>
>
> hmmm - thought we had rooted all of those out via work with pieter martin
> on sqlg.  please let me know which ones still aren't making those calls.
>
>
> > It would also be great if we could easily run specific tests or classes
> > using junit. at the moment its cumbersome to run a class of tests
> > (updateing the environment variable each time), and impossible to debug a
> > specific test easily (or at least I haven't found a way).
> >
>
> I don't have a better idea than the environment variable.  you should be
> able to use the debugger though.  works for me in intellij when i've looked
> at a problem in titan.  i'm not sure if it only works because i have the
> tinkerpop source on my system, but i can step through tinkerpop source and
> titan source interchangeably.  i don't think i did anything specific to
> enable that.
>
>
> >    1. We made a custom VertexStep that aggregates traversers, and has
> >    steps, to minimize the amount of queries issued. It messed up a few
> > things,
> >    but we got the basic usage working in M9 (guess you fixed some stuff
> for
> >    Titan, which do the same thing). The problem now is that it doesn't
> > work on
> >    inner traversals. For example, Repeat gives out only 1 traverser every
> >    time. Do you have any suggestions? Am I doing something wrong?
> >
>
> perhaps you could provide links to relevant code.  i'm sorry to say that
> most times the answer to this kind of stuff isn't obvious.
>
>
> >    2. We want to implement a validation strategy. Sort of like
> >    EventStrategy, but it will notify before a mutation, and will enable
> the
> >    user's validation code to cancel a mutation if it doesn't pass its
> > checks.
> >    The problem is that there are no "before" callbacks for the Mutating
> >    interface.
> >
>
> i may have messed up the Mutating interface design a bit.  looking at it
> now, i feel like it could be less coupled to the EventStrategy related
> features.  I'll take a look at it to see if I can make it "better" before
> GA.  I don't think my changes should affect vendors or the test suites, so
> if it turns out to be that way i'll give it a shot.
>
>
> >    3. Adding in bulk - we added our own functions for bulk inserts, since
> >    we didn't find anything to support it in the API. The thing is we need
> > this
> >    ability as part of the traversal, so we can utilize the validation
> > strategy
> >    (if we can get that working). We thought about inheriting from the Add
> >    steps, but they're final. It'd be great to have somting like
> >    __.inject(vertices).as('x').addV('x'), and have the ability to make it
> > bulk
> >    load the vertices.
>
>
> we're trying to avoid problems with improper inheritance which messes with
> traversal strategies - hence steps are typically "final".   we don't have
> much on bulk insertion in the API.  perhaps you should create an issue for
> discussion.
>
> On Wed, May 20, 2015 at 11:08 AM, Ran Magen <rm...@gmail.com> wrote:
>
> > > percentage of the tests fire for you given ElasticFeatures?
> >
> > ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320
> > passed
> > ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, 321
> > ignored, 394 passed
> > The Process coverage seems good. I believe most of the failures are due
> to
> > the fact that I only support string IDs (I think not all tests call the
> > convertId method). And some new stuff in M9 that I haven't gotten around
> to
> > fixing yet. But I'll make sure and open tickets for anything I find.
> > It would also be great if we could easily run specific tests or classes
> > using junit. at the moment its cumbersome to run a class of tests
> > (updateing the environment variable each time), and impossible to debug a
> > specific test easily (or at least I haven't found a way).
> >
> > > we'd be interested in hearing about your issues.
> >
> >    1. We made a custom VertexStep that aggregates traversers, and has
> >    steps, to minimize the amount of queries issued. It messed up a few
> > things,
> >    but we got the basic usage working in M9 (guess you fixed some stuff
> for
> >    Titan, which do the same thing). The problem now is that it doesn't
> > work on
> >    inner traversals. For example, Repeat gives out only 1 traverser every
> >    time. Do you have any suggestions? Am I doing something wrong?
> >    2. We want to implement a validation strategy. Sort of like
> >    EventStrategy, but it will notify before a mutation, and will enable
> the
> >    user's validation code to cancel a mutation if it doesn't pass its
> > checks.
> >    The problem is that there are no "before" callbacks for the Mutating
> >    interface. We also thought the strategy could just add a validation
> step
> >    before each mutating step, but that had its own issues. Also, the
> >    validation strategy won't work on stuff like graph.addVertex(), but I
> > guess
> >    we can make sure people only use the traversal.
> >    3. Adding in bulk - we added our own functions for bulk inserts, since
> >    we didn't find anything to support it in the API. The thing is we need
> > this
> >    ability as part of the traversal, so we can utilize the validation
> > strategy
> >    (if we can get that working). We thought about inheriting from the Add
> >    steps, but they're final. It'd be great to have somting like
> >    __.inject(vertices).as('x').addV('x'), and have the ability to make it
> > bulk
> >    load the vertices.
> >
> > Thank you for your help!
> >
> >
> > On Tue, 19 May 2015 at 01:37 Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > Thanks for sharing all that additional information.
> > >
> > > > The biggest issue I had was implementing custom steps.
> > >
> > > I think we have a bit of a hole in the docs around that kinda of stuff
> at
> > > the moment.  You have to be careful with custom steps because the
> > > TraversalStrategy implementations might not behave nicely if they come
> > > across steps they don't know about.  We've been trying to understand
> the
> > > right set of recommendations to give around that issue which is most of
> > the
> > > reason we probably don't have docs developed yet.  If you'd like to
> > > elaborate as you offered, we'd be interested in hearing about your
> > issues.
> > >
> > > > The Test Suite is awesome!
> > >
> > > That is excellent to hear.  Not many people have to interact with the
> > test
> > > suite directly but it is super critical part of the TinkerPop
> Ecosystem -
> > > if those who have to use is aren't satisfied with it, I'd consider
> that a
> > > big problem.
> > >
> > > > Just a thought, it would be great if failing tests would print some
> > kind
> > > of "DEBUG" logs from the steps (or something like the profile step's
> > > output), so it's easier to figure out what step isn't working properly
> > and
> > > why .
> > >
> > > Still trying to figure that out (i.e. what's the most useful way to
> > "DEBUG"
> > > things).  We don't do logging in gremlin-core so there isn't much to
> > output
> > > there.  I'm hoping that this ticket will be useful in this area:
> > >
> > > https://issues.apache.org/jira/browse/TINKERPOP3-679
> > >
> > > I did give a look at your implementation code.  I noticed that you only
> > had
> > > to @OptOut of a couple of tests - not bad, though I'm not sure how much
> > of
> > > the test suite fires under your ElasticFeatures implementation.  We
> tried
> > > to write tests to allow maximum coverage given the most common feature
> > set
> > > - hopefully you receive good coverage under that model.  Can you share
> > what
> > > percentage of the tests fire for you given ElasticFeatures?
> > >
> > > Speaking of ElasticFeatures, you might want to make this a static
> > > reference:
> > >
> > >
> > >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> > >
> > > and try to generally reduce anonymous object creation within
> > > ElasticFeatures itself.  You don't want to create a new instance of
> that
> > > stuff for every feature check - we do a internal feature checking in
> > > different part of the stack and it could create a lot
> > > of unnecessary objects for you.
> > >
> > >
> > >
> > >
> > > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com> wrote:
> > >
> > > > Hey Stephen,
> > > >
> > > > ElasticGraph can be seen as an alternative to Titan - a big
> scaled-out
> > > > graph with indices (currentlywe we only have OLTP, but will add OLAP
> > > soon).
> > > > We're a company that started out a project using Titan, but it lacked
> > > some
> > > > capabilities we needed:
> > > >
> > > >    - Speed, especially with regards to using text/number/geo indices.
> > Our
> > > >    benchmarks showed that ES could function much faster than the
> > > > performance
> > > >    we were getting from Titan.
> > > >    - Partitioning the data - useful for optimizing indexed queries on
> > ES
> > > >    (Titan also uses ES, but doesn't include these optimizations).
> Plus,
> > > it
> > > >    allows you to manage the data for your specific needs. For example
> > if
> > > > you
> > > >    have a graph with real-time events coming in, and you want to
> > > > periodically
> > > >    delete all the old events, you can partition the data by time.
> > > >    - The spatial capabilities didn't support all the features we
> > needed.
> > > >    - Titan's future was in question
> > > >    <
> > > >
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > > >
> > > >    .
> > > >    - And a bunch of other small issues.
> > > >
> > > > We thought about contributing to Titan to add these capabilites, but
> > > > Titan's architecture (which separates the indexing backend from the
> > > "main"
> > > > store) made it difficult. Plus Titan has a big codebase supporting
> many
> > > > different BEs. At the end we figured it would just be simpler to
> > implenet
> > > > TP directly on ES. It also sparse us from maintaining an extra
> > > > hbase/cassandra cluster.
> > > > We figured more people might have stumbled across these issues, so
> > we're
> > > > sharing the code.
> > > >
> > > > Numbers - we've gotten up to a few billions at this point in our
> tests,
> > > but
> > > > I'm pretty confident on its ability to scale further.
> > > >
> > > > As for developing for TP, it's been mostly great :) The architecture
> is
> > > > very powerful, and gremlin 3 is turning out to be a great querying
> > > > language. And most importantly, it's fast to implement it.
> > > > The biggest issue I had was implementing custom steps. Apart from
> > > GraphStep
> > > > (which has a simple example in TinkerGraph), the other steps are
> pretty
> > > > hard to figure out. For example we implemented a VertexStep that
> > batches
> > > up
> > > > traversers and their has steps to query them together, and had many
> > > issues
> > > > (I can elaborate if you want). We actually still have a pretty big
> > issue
> > > > I'll raise in another thread.
> > > >
> > > > The Test Suite is awesome! It would be practically impossible to
> > > implement
> > > > TP so fast and easily without it. Just a thought, it would be great
> if
> > > > failing tests would print some kind of "DEBUG" logs from the steps
> (or
> > > > something like the profile step's output), so it's easier to figure
> out
> > > > what step isn't working properly and why .
> > > >
> > > >
> > > >
> > > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <sp...@gmail.com>
> > > > wrote:
> > > >
> > > > > Thanks for sharing your project. Looks like you've implemented both
> > the
> > > > > structure and process suites in ElasticGraph up to the latest M9
> > > release
> > > > > candidate - very nice.
> > > > >
> > > > > Where would you say that this implementation fits?  Are there
> > specific
> > > > uses
> > > > > cases where you would want to use ElasticGraph over other
> > > > implementations?
> > > > > When you say that "we're already using it with very big graphs" can
> > you
> > > > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > > > >
> > > > > Finally, more specifically related to TinkerPop, did you encounter
> > any
> > > > > challenges in implementing the APIs or the Test Suite itself?
> > > > >
> > > > >
> > > > >
> > > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com>
> wrote:
> > > > >
> > > > > > Hey guys,
> > > > > > Just wanted to let you know about a TP3 implementation we're
> > working
> > > > on.
> > > > > > It's based on elastic-search, enabling very good scalability and
> > > > indexing
> > > > > > capabilities.
> > > > > > You can find the code here <
> > > https://github.com/rmagen/elastic-gremlin
> > > > >.
> > > > > >
> > > > > > This is still very much a work in progress (still more features
> and
> > > > > > optimizations planned, and some bugs to fix), but we're already
> > using
> > > > it
> > > > > > with very big graphs.
> > > > > >
> > > > > > I would appreciate any feedback!
> > > > > > Cheers,
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Posted by Stephen Mallette <sp...@gmail.com>.

>
> The Process coverage seems good. I believe most of the failures are due to
> the fact that I only support string IDs (I think not all tests call the
> convertId method).


hmmm - thought we had rooted all of those out via work with pieter martin
on sqlg.  please let me know which ones still aren't making those calls.


> It would also be great if we could easily run specific tests or classes
> using junit. at the moment its cumbersome to run a class of tests
> (updateing the environment variable each time), and impossible to debug a
> specific test easily (or at least I haven't found a way).
>

I don't have a better idea than the environment variable.  you should be
able to use the debugger though.  works for me in intellij when i've looked
at a problem in titan.  i'm not sure if it only works because i have the
tinkerpop source on my system, but i can step through tinkerpop source and
titan source interchangeably.  i don't think i did anything specific to
enable that.


>    1. We made a custom VertexStep that aggregates traversers, and has
>    steps, to minimize the amount of queries issued. It messed up a few
> things,
>    but we got the basic usage working in M9 (guess you fixed some stuff for
>    Titan, which do the same thing). The problem now is that it doesn't
> work on
>    inner traversals. For example, Repeat gives out only 1 traverser every
>    time. Do you have any suggestions? Am I doing something wrong?
>

perhaps you could provide links to relevant code.  i'm sorry to say that
most times the answer to this kind of stuff isn't obvious.


>    2. We want to implement a validation strategy. Sort of like
>    EventStrategy, but it will notify before a mutation, and will enable the
>    user's validation code to cancel a mutation if it doesn't pass its
> checks.
>    The problem is that there are no "before" callbacks for the Mutating
>    interface.
>

i may have messed up the Mutating interface design a bit.  looking at it
now, i feel like it could be less coupled to the EventStrategy related
features.  I'll take a look at it to see if I can make it "better" before
GA.  I don't think my changes should affect vendors or the test suites, so
if it turns out to be that way i'll give it a shot.


>    3. Adding in bulk - we added our own functions for bulk inserts, since
>    we didn't find anything to support it in the API. The thing is we need
> this
>    ability as part of the traversal, so we can utilize the validation
> strategy
>    (if we can get that working). We thought about inheriting from the Add
>    steps, but they're final. It'd be great to have somting like
>    __.inject(vertices).as('x').addV('x'), and have the ability to make it
> bulk
>    load the vertices.


we're trying to avoid problems with improper inheritance which messes with
traversal strategies - hence steps are typically "final".   we don't have
much on bulk insertion in the API.  perhaps you should create an issue for
discussion.

On Wed, May 20, 2015 at 11:08 AM, Ran Magen <rm...@gmail.com> wrote:

> > percentage of the tests fire for you given ElasticFeatures?
>
> ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320
> passed
> ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, 321
> ignored, 394 passed
> The Process coverage seems good. I believe most of the failures are due to
> the fact that I only support string IDs (I think not all tests call the
> convertId method). And some new stuff in M9 that I haven't gotten around to
> fixing yet. But I'll make sure and open tickets for anything I find.
> It would also be great if we could easily run specific tests or classes
> using junit. at the moment its cumbersome to run a class of tests
> (updateing the environment variable each time), and impossible to debug a
> specific test easily (or at least I haven't found a way).
>
> > we'd be interested in hearing about your issues.
>
>    1. We made a custom VertexStep that aggregates traversers, and has
>    steps, to minimize the amount of queries issued. It messed up a few
> things,
>    but we got the basic usage working in M9 (guess you fixed some stuff for
>    Titan, which do the same thing). The problem now is that it doesn't
> work on
>    inner traversals. For example, Repeat gives out only 1 traverser every
>    time. Do you have any suggestions? Am I doing something wrong?
>    2. We want to implement a validation strategy. Sort of like
>    EventStrategy, but it will notify before a mutation, and will enable the
>    user's validation code to cancel a mutation if it doesn't pass its
> checks.
>    The problem is that there are no "before" callbacks for the Mutating
>    interface. We also thought the strategy could just add a validation step
>    before each mutating step, but that had its own issues. Also, the
>    validation strategy won't work on stuff like graph.addVertex(), but I
> guess
>    we can make sure people only use the traversal.
>    3. Adding in bulk - we added our own functions for bulk inserts, since
>    we didn't find anything to support it in the API. The thing is we need
> this
>    ability as part of the traversal, so we can utilize the validation
> strategy
>    (if we can get that working). We thought about inheriting from the Add
>    steps, but they're final. It'd be great to have somting like
>    __.inject(vertices).as('x').addV('x'), and have the ability to make it
> bulk
>    load the vertices.
>
> Thank you for your help!
>
>
> On Tue, 19 May 2015 at 01:37 Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > Thanks for sharing all that additional information.
> >
> > > The biggest issue I had was implementing custom steps.
> >
> > I think we have a bit of a hole in the docs around that kinda of stuff at
> > the moment.  You have to be careful with custom steps because the
> > TraversalStrategy implementations might not behave nicely if they come
> > across steps they don't know about.  We've been trying to understand the
> > right set of recommendations to give around that issue which is most of
> the
> > reason we probably don't have docs developed yet.  If you'd like to
> > elaborate as you offered, we'd be interested in hearing about your
> issues.
> >
> > > The Test Suite is awesome!
> >
> > That is excellent to hear.  Not many people have to interact with the
> test
> > suite directly but it is super critical part of the TinkerPop Ecosystem -
> > if those who have to use is aren't satisfied with it, I'd consider that a
> > big problem.
> >
> > > Just a thought, it would be great if failing tests would print some
> kind
> > of "DEBUG" logs from the steps (or something like the profile step's
> > output), so it's easier to figure out what step isn't working properly
> and
> > why .
> >
> > Still trying to figure that out (i.e. what's the most useful way to
> "DEBUG"
> > things).  We don't do logging in gremlin-core so there isn't much to
> output
> > there.  I'm hoping that this ticket will be useful in this area:
> >
> > https://issues.apache.org/jira/browse/TINKERPOP3-679
> >
> > I did give a look at your implementation code.  I noticed that you only
> had
> > to @OptOut of a couple of tests - not bad, though I'm not sure how much
> of
> > the test suite fires under your ElasticFeatures implementation.  We tried
> > to write tests to allow maximum coverage given the most common feature
> set
> > - hopefully you receive good coverage under that model.  Can you share
> what
> > percentage of the tests fire for you given ElasticFeatures?
> >
> > Speaking of ElasticFeatures, you might want to make this a static
> > reference:
> >
> >
> >
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
> >
> > and try to generally reduce anonymous object creation within
> > ElasticFeatures itself.  You don't want to create a new instance of that
> > stuff for every feature check - we do a internal feature checking in
> > different part of the stack and it could create a lot
> > of unnecessary objects for you.
> >
> >
> >
> >
> > On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com> wrote:
> >
> > > Hey Stephen,
> > >
> > > ElasticGraph can be seen as an alternative to Titan - a big scaled-out
> > > graph with indices (currentlywe we only have OLTP, but will add OLAP
> > soon).
> > > We're a company that started out a project using Titan, but it lacked
> > some
> > > capabilities we needed:
> > >
> > >    - Speed, especially with regards to using text/number/geo indices.
> Our
> > >    benchmarks showed that ES could function much faster than the
> > > performance
> > >    we were getting from Titan.
> > >    - Partitioning the data - useful for optimizing indexed queries on
> ES
> > >    (Titan also uses ES, but doesn't include these optimizations). Plus,
> > it
> > >    allows you to manage the data for your specific needs. For example
> if
> > > you
> > >    have a graph with real-time events coming in, and you want to
> > > periodically
> > >    delete all the old events, you can partition the data by time.
> > >    - The spatial capabilities didn't support all the features we
> needed.
> > >    - Titan's future was in question
> > >    <
> > >
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > > >
> > >    .
> > >    - And a bunch of other small issues.
> > >
> > > We thought about contributing to Titan to add these capabilites, but
> > > Titan's architecture (which separates the indexing backend from the
> > "main"
> > > store) made it difficult. Plus Titan has a big codebase supporting many
> > > different BEs. At the end we figured it would just be simpler to
> implenet
> > > TP directly on ES. It also sparse us from maintaining an extra
> > > hbase/cassandra cluster.
> > > We figured more people might have stumbled across these issues, so
> we're
> > > sharing the code.
> > >
> > > Numbers - we've gotten up to a few billions at this point in our tests,
> > but
> > > I'm pretty confident on its ability to scale further.
> > >
> > > As for developing for TP, it's been mostly great :) The architecture is
> > > very powerful, and gremlin 3 is turning out to be a great querying
> > > language. And most importantly, it's fast to implement it.
> > > The biggest issue I had was implementing custom steps. Apart from
> > GraphStep
> > > (which has a simple example in TinkerGraph), the other steps are pretty
> > > hard to figure out. For example we implemented a VertexStep that
> batches
> > up
> > > traversers and their has steps to query them together, and had many
> > issues
> > > (I can elaborate if you want). We actually still have a pretty big
> issue
> > > I'll raise in another thread.
> > >
> > > The Test Suite is awesome! It would be practically impossible to
> > implement
> > > TP so fast and easily without it. Just a thought, it would be great if
> > > failing tests would print some kind of "DEBUG" logs from the steps (or
> > > something like the profile step's output), so it's easier to figure out
> > > what step isn't working properly and why .
> > >
> > >
> > >
> > > On Mon, 18 May 2015 at 21:23 Stephen Mallette <sp...@gmail.com>
> > > wrote:
> > >
> > > > Thanks for sharing your project. Looks like you've implemented both
> the
> > > > structure and process suites in ElasticGraph up to the latest M9
> > release
> > > > candidate - very nice.
> > > >
> > > > Where would you say that this implementation fits?  Are there
> specific
> > > uses
> > > > cases where you would want to use ElasticGraph over other
> > > implementations?
> > > > When you say that "we're already using it with very big graphs" can
> you
> > > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > > >
> > > > Finally, more specifically related to TinkerPop, did you encounter
> any
> > > > challenges in implementing the APIs or the Test Suite itself?
> > > >
> > > >
> > > >
> > > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com> wrote:
> > > >
> > > > > Hey guys,
> > > > > Just wanted to let you know about a TP3 implementation we're
> working
> > > on.
> > > > > It's based on elastic-search, enabling very good scalability and
> > > indexing
> > > > > capabilities.
> > > > > You can find the code here <
> > https://github.com/rmagen/elastic-gremlin
> > > >.
> > > > >
> > > > > This is still very much a work in progress (still more features and
> > > > > optimizations planned, and some bugs to fix), but we're already
> using
> > > it
> > > > > with very big graphs.
> > > > >
> > > > > I would appreciate any feedback!
> > > > > Cheers,
> > > > >
> > > >
> > >
> >
>

Re: elastic-gremlin

Posted by Ran Magen <rm...@gmail.com>.

> percentage of the tests fire for you given ElasticFeatures?

ElasticGraphProcessStandardTest: 334 total, 4 failed, 10 ignored, 320 passed
ElasticGraphStructureStandardTest: 752 total, 22 error, 15 failed, 321
ignored, 394 passed
The Process coverage seems good. I believe most of the failures are due to
the fact that I only support string IDs (I think not all tests call the
convertId method). And some new stuff in M9 that I haven't gotten around to
fixing yet. But I'll make sure and open tickets for anything I find.
It would also be great if we could easily run specific tests or classes
using junit. at the moment its cumbersome to run a class of tests
(updateing the environment variable each time), and impossible to debug a
specific test easily (or at least I haven't found a way).

> we'd be interested in hearing about your issues.

   1. We made a custom VertexStep that aggregates traversers, and has
   steps, to minimize the amount of queries issued. It messed up a few things,
   but we got the basic usage working in M9 (guess you fixed some stuff for
   Titan, which do the same thing). The problem now is that it doesn't work on
   inner traversals. For example, Repeat gives out only 1 traverser every
   time. Do you have any suggestions? Am I doing something wrong?
   2. We want to implement a validation strategy. Sort of like
   EventStrategy, but it will notify before a mutation, and will enable the
   user's validation code to cancel a mutation if it doesn't pass its checks.
   The problem is that there are no "before" callbacks for the Mutating
   interface. We also thought the strategy could just add a validation step
   before each mutating step, but that had its own issues. Also, the
   validation strategy won't work on stuff like graph.addVertex(), but I guess
   we can make sure people only use the traversal.
   3. Adding in bulk - we added our own functions for bulk inserts, since
   we didn't find anything to support it in the API. The thing is we need this
   ability as part of the traversal, so we can utilize the validation strategy
   (if we can get that working). We thought about inheriting from the Add
   steps, but they're final. It'd be great to have somting like
   __.inject(vertices).as('x').addV('x'), and have the ability to make it bulk
   load the vertices.

Thank you for your help!


On Tue, 19 May 2015 at 01:37 Stephen Mallette <sp...@gmail.com> wrote:

> Thanks for sharing all that additional information.
>
> > The biggest issue I had was implementing custom steps.
>
> I think we have a bit of a hole in the docs around that kinda of stuff at
> the moment.  You have to be careful with custom steps because the
> TraversalStrategy implementations might not behave nicely if they come
> across steps they don't know about.  We've been trying to understand the
> right set of recommendations to give around that issue which is most of the
> reason we probably don't have docs developed yet.  If you'd like to
> elaborate as you offered, we'd be interested in hearing about your issues.
>
> > The Test Suite is awesome!
>
> That is excellent to hear.  Not many people have to interact with the test
> suite directly but it is super critical part of the TinkerPop Ecosystem -
> if those who have to use is aren't satisfied with it, I'd consider that a
> big problem.
>
> > Just a thought, it would be great if failing tests would print some kind
> of "DEBUG" logs from the steps (or something like the profile step's
> output), so it's easier to figure out what step isn't working properly and
> why .
>
> Still trying to figure that out (i.e. what's the most useful way to "DEBUG"
> things).  We don't do logging in gremlin-core so there isn't much to output
> there.  I'm hoping that this ticket will be useful in this area:
>
> https://issues.apache.org/jira/browse/TINKERPOP3-679
>
> I did give a look at your implementation code.  I noticed that you only had
> to @OptOut of a couple of tests - not bad, though I'm not sure how much of
> the test suite fires under your ElasticFeatures implementation.  We tried
> to write tests to allow maximum coverage given the most common feature set
> - hopefully you receive good coverage under that model.  Can you share what
> percentage of the tests fire for you given ElasticFeatures?
>
> Speaking of ElasticFeatures, you might want to make this a static
> reference:
>
>
> https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68
>
> and try to generally reduce anonymous object creation within
> ElasticFeatures itself.  You don't want to create a new instance of that
> stuff for every feature check - we do a internal feature checking in
> different part of the stack and it could create a lot
> of unnecessary objects for you.
>
>
>
>
> On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com> wrote:
>
> > Hey Stephen,
> >
> > ElasticGraph can be seen as an alternative to Titan - a big scaled-out
> > graph with indices (currentlywe we only have OLTP, but will add OLAP
> soon).
> > We're a company that started out a project using Titan, but it lacked
> some
> > capabilities we needed:
> >
> >    - Speed, especially with regards to using text/number/geo indices. Our
> >    benchmarks showed that ES could function much faster than the
> > performance
> >    we were getting from Titan.
> >    - Partitioning the data - useful for optimizing indexed queries on ES
> >    (Titan also uses ES, but doesn't include these optimizations). Plus,
> it
> >    allows you to manage the data for your specific needs. For example if
> > you
> >    have a graph with real-time events coming in, and you want to
> > periodically
> >    delete all the old events, you can partition the data by time.
> >    - The spatial capabilities didn't support all the features we needed.
> >    - Titan's future was in question
> >    <
> >
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> > >
> >    .
> >    - And a bunch of other small issues.
> >
> > We thought about contributing to Titan to add these capabilites, but
> > Titan's architecture (which separates the indexing backend from the
> "main"
> > store) made it difficult. Plus Titan has a big codebase supporting many
> > different BEs. At the end we figured it would just be simpler to implenet
> > TP directly on ES. It also sparse us from maintaining an extra
> > hbase/cassandra cluster.
> > We figured more people might have stumbled across these issues, so we're
> > sharing the code.
> >
> > Numbers - we've gotten up to a few billions at this point in our tests,
> but
> > I'm pretty confident on its ability to scale further.
> >
> > As for developing for TP, it's been mostly great :) The architecture is
> > very powerful, and gremlin 3 is turning out to be a great querying
> > language. And most importantly, it's fast to implement it.
> > The biggest issue I had was implementing custom steps. Apart from
> GraphStep
> > (which has a simple example in TinkerGraph), the other steps are pretty
> > hard to figure out. For example we implemented a VertexStep that batches
> up
> > traversers and their has steps to query them together, and had many
> issues
> > (I can elaborate if you want). We actually still have a pretty big issue
> > I'll raise in another thread.
> >
> > The Test Suite is awesome! It would be practically impossible to
> implement
> > TP so fast and easily without it. Just a thought, it would be great if
> > failing tests would print some kind of "DEBUG" logs from the steps (or
> > something like the profile step's output), so it's easier to figure out
> > what step isn't working properly and why .
> >
> >
> >
> > On Mon, 18 May 2015 at 21:23 Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > Thanks for sharing your project. Looks like you've implemented both the
> > > structure and process suites in ElasticGraph up to the latest M9
> release
> > > candidate - very nice.
> > >
> > > Where would you say that this implementation fits?  Are there specific
> > uses
> > > cases where you would want to use ElasticGraph over other
> > implementations?
> > > When you say that "we're already using it with very big graphs" can you
> > > qualify that a bit (millions of edge, billions of edges, etc.)?
> > >
> > > Finally, more specifically related to TinkerPop, did you encounter any
> > > challenges in implementing the APIs or the Test Suite itself?
> > >
> > >
> > >
> > > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com> wrote:
> > >
> > > > Hey guys,
> > > > Just wanted to let you know about a TP3 implementation we're working
> > on.
> > > > It's based on elastic-search, enabling very good scalability and
> > indexing
> > > > capabilities.
> > > > You can find the code here <
> https://github.com/rmagen/elastic-gremlin
> > >.
> > > >
> > > > This is still very much a work in progress (still more features and
> > > > optimizations planned, and some bugs to fix), but we're already using
> > it
> > > > with very big graphs.
> > > >
> > > > I would appreciate any feedback!
> > > > Cheers,
> > > >
> > >
> >
>

Re: elastic-gremlin

Posted by Stephen Mallette <sp...@gmail.com>.

Thanks for sharing all that additional information.

> The biggest issue I had was implementing custom steps.

I think we have a bit of a hole in the docs around that kinda of stuff at
the moment.  You have to be careful with custom steps because the
TraversalStrategy implementations might not behave nicely if they come
across steps they don't know about.  We've been trying to understand the
right set of recommendations to give around that issue which is most of the
reason we probably don't have docs developed yet.  If you'd like to
elaborate as you offered, we'd be interested in hearing about your issues.

> The Test Suite is awesome!

That is excellent to hear.  Not many people have to interact with the test
suite directly but it is super critical part of the TinkerPop Ecosystem -
if those who have to use is aren't satisfied with it, I'd consider that a
big problem.

> Just a thought, it would be great if failing tests would print some kind
of "DEBUG" logs from the steps (or something like the profile step's
output), so it's easier to figure out what step isn't working properly and
why .

Still trying to figure that out (i.e. what's the most useful way to "DEBUG"
things).  We don't do logging in gremlin-core so there isn't much to output
there.  I'm hoping that this ticket will be useful in this area:

https://issues.apache.org/jira/browse/TINKERPOP3-679

I did give a look at your implementation code.  I noticed that you only had
to @OptOut of a couple of tests - not bad, though I'm not sure how much of
the test suite fires under your ElasticFeatures implementation.  We tried
to write tests to allow maximum coverage given the most common feature set
- hopefully you receive good coverage under that model.  Can you share what
percentage of the tests fire for you given ElasticFeatures?

Speaking of ElasticFeatures, you might want to make this a static reference:

https://github.com/rmagen/elastic-gremlin/blob/master/src/main/java/org/apache/tinkerpop/gremlin/elastic/structure/ElasticGraph.java#L68

and try to generally reduce anonymous object creation within
ElasticFeatures itself.  You don't want to create a new instance of that
stuff for every feature check - we do a internal feature checking in
different part of the stack and it could create a lot
of unnecessary objects for you.




On Mon, May 18, 2015 at 5:13 PM, Ran Magen <rm...@gmail.com> wrote:

> Hey Stephen,
>
> ElasticGraph can be seen as an alternative to Titan - a big scaled-out
> graph with indices (currentlywe we only have OLTP, but will add OLAP soon).
> We're a company that started out a project using Titan, but it lacked some
> capabilities we needed:
>
>    - Speed, especially with regards to using text/number/geo indices. Our
>    benchmarks showed that ES could function much faster than the
> performance
>    we were getting from Titan.
>    - Partitioning the data - useful for optimizing indexed queries on ES
>    (Titan also uses ES, but doesn't include these optimizations). Plus, it
>    allows you to manage the data for your specific needs. For example if
> you
>    have a graph with real-time events coming in, and you want to
> periodically
>    delete all the old events, you can partition the data by time.
>    - The spatial capabilities didn't support all the features we needed.
>    - Titan's future was in question
>    <
> http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/
> >
>    .
>    - And a bunch of other small issues.
>
> We thought about contributing to Titan to add these capabilites, but
> Titan's architecture (which separates the indexing backend from the "main"
> store) made it difficult. Plus Titan has a big codebase supporting many
> different BEs. At the end we figured it would just be simpler to implenet
> TP directly on ES. It also sparse us from maintaining an extra
> hbase/cassandra cluster.
> We figured more people might have stumbled across these issues, so we're
> sharing the code.
>
> Numbers - we've gotten up to a few billions at this point in our tests, but
> I'm pretty confident on its ability to scale further.
>
> As for developing for TP, it's been mostly great :) The architecture is
> very powerful, and gremlin 3 is turning out to be a great querying
> language. And most importantly, it's fast to implement it.
> The biggest issue I had was implementing custom steps. Apart from GraphStep
> (which has a simple example in TinkerGraph), the other steps are pretty
> hard to figure out. For example we implemented a VertexStep that batches up
> traversers and their has steps to query them together, and had many issues
> (I can elaborate if you want). We actually still have a pretty big issue
> I'll raise in another thread.
>
> The Test Suite is awesome! It would be practically impossible to implement
> TP so fast and easily without it. Just a thought, it would be great if
> failing tests would print some kind of "DEBUG" logs from the steps (or
> something like the profile step's output), so it's easier to figure out
> what step isn't working properly and why .
>
>
>
> On Mon, 18 May 2015 at 21:23 Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > Thanks for sharing your project. Looks like you've implemented both the
> > structure and process suites in ElasticGraph up to the latest M9 release
> > candidate - very nice.
> >
> > Where would you say that this implementation fits?  Are there specific
> uses
> > cases where you would want to use ElasticGraph over other
> implementations?
> > When you say that "we're already using it with very big graphs" can you
> > qualify that a bit (millions of edge, billions of edges, etc.)?
> >
> > Finally, more specifically related to TinkerPop, did you encounter any
> > challenges in implementing the APIs or the Test Suite itself?
> >
> >
> >
> > On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com> wrote:
> >
> > > Hey guys,
> > > Just wanted to let you know about a TP3 implementation we're working
> on.
> > > It's based on elastic-search, enabling very good scalability and
> indexing
> > > capabilities.
> > > You can find the code here <https://github.com/rmagen/elastic-gremlin
> >.
> > >
> > > This is still very much a work in progress (still more features and
> > > optimizations planned, and some bugs to fix), but we're already using
> it
> > > with very big graphs.
> > >
> > > I would appreciate any feedback!
> > > Cheers,
> > >
> >
>

Re: elastic-gremlin

Posted by Ran Magen <rm...@gmail.com>.

Hey Stephen,

ElasticGraph can be seen as an alternative to Titan - a big scaled-out
graph with indices (currentlywe we only have OLTP, but will add OLAP soon).
We're a company that started out a project using Titan, but it lacked some
capabilities we needed:

   - Speed, especially with regards to using text/number/geo indices. Our
   benchmarks showed that ES could function much faster than the performance
   we were getting from Titan.
   - Partitioning the data - useful for optimizing indexed queries on ES
   (Titan also uses ES, but doesn't include these optimizations). Plus, it
   allows you to manage the data for your specific needs. For example if you
   have a graph with real-time events coming in, and you want to periodically
   delete all the old events, you can partition the data by time.
   - The spatial capabilities didn't support all the features we needed.
   - Titan's future was in question
   <http://www.zdnet.com/article/datastax-snaps-up-aurelius-and-its-titan-team-to-build-new-graph-database/>
   .
   - And a bunch of other small issues.

We thought about contributing to Titan to add these capabilites, but
Titan's architecture (which separates the indexing backend from the "main"
store) made it difficult. Plus Titan has a big codebase supporting many
different BEs. At the end we figured it would just be simpler to implenet
TP directly on ES. It also sparse us from maintaining an extra
hbase/cassandra cluster.
We figured more people might have stumbled across these issues, so we're
sharing the code.

Numbers - we've gotten up to a few billions at this point in our tests, but
I'm pretty confident on its ability to scale further.

As for developing for TP, it's been mostly great :) The architecture is
very powerful, and gremlin 3 is turning out to be a great querying
language. And most importantly, it's fast to implement it.
The biggest issue I had was implementing custom steps. Apart from GraphStep
(which has a simple example in TinkerGraph), the other steps are pretty
hard to figure out. For example we implemented a VertexStep that batches up
traversers and their has steps to query them together, and had many issues
(I can elaborate if you want). We actually still have a pretty big issue
I'll raise in another thread.

The Test Suite is awesome! It would be practically impossible to implement
TP so fast and easily without it. Just a thought, it would be great if
failing tests would print some kind of "DEBUG" logs from the steps (or
something like the profile step's output), so it's easier to figure out
what step isn't working properly and why .

On Mon, 18 May 2015 at 21:23 Stephen Mallette <sp...@gmail.com> wrote:

> Thanks for sharing your project. Looks like you've implemented both the
> structure and process suites in ElasticGraph up to the latest M9 release
> candidate - very nice.
>
> Where would you say that this implementation fits?  Are there specific uses
> cases where you would want to use ElasticGraph over other implementations?
> When you say that "we're already using it with very big graphs" can you
> qualify that a bit (millions of edge, billions of edges, etc.)?
>
> Finally, more specifically related to TinkerPop, did you encounter any
> challenges in implementing the APIs or the Test Suite itself?
>
>
>
> On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com> wrote:
>
> > Hey guys,
> > Just wanted to let you know about a TP3 implementation we're working on.
> > It's based on elastic-search, enabling very good scalability and indexing
> > capabilities.
> > You can find the code here <https://github.com/rmagen/elastic-gremlin>.
> >
> > This is still very much a work in progress (still more features and
> > optimizations planned, and some bugs to fix), but we're already using it
> > with very big graphs.
> >
> > I would appreciate any feedback!
> > Cheers,
> >
>

Re: elastic-gremlin

Posted by Stephen Mallette <sp...@gmail.com>.

Thanks for sharing your project. Looks like you've implemented both the
structure and process suites in ElasticGraph up to the latest M9 release
candidate - very nice.

Where would you say that this implementation fits?  Are there specific uses
cases where you would want to use ElasticGraph over other implementations?
When you say that "we're already using it with very big graphs" can you
qualify that a bit (millions of edge, billions of edges, etc.)?

Finally, more specifically related to TinkerPop, did you encounter any
challenges in implementing the APIs or the Test Suite itself?

On Mon, May 18, 2015 at 2:07 PM, Ran Magen <rm...@gmail.com> wrote:

> Hey guys,
> Just wanted to let you know about a TP3 implementation we're working on.
> It's based on elastic-search, enabling very good scalability and indexing
> capabilities.
> You can find the code here <https://github.com/rmagen/elastic-gremlin>.
>
> This is still very much a work in progress (still more features and
> optimizations planned, and some bugs to fix), but we're already using it
> with very big graphs.
>
> I would appreciate any feedback!
> Cheers,
>