You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Matt Frantz <ma...@gmail.com> on 2015/04/22 23:03:16 UTC

OLAP + path

In working on TINKERPOP3-639, I've observed that there seems to be a
pattern in the tests that are disabled with
@IgnoreEngine(TraversalEngine.Type.COMPUTER), at least in SelectTest and
PathTest.

   - The only SelectTest tests that run in OLAP are those for which the
   TraverserRequirement.PATH (which for `select` is added by static analysis
   of the traversal) is not present.
   - The only PathTest tests that run in OLAP are those which traverse a
   single vertex.

I see this warning in the docs for using OLAP with path:

Generating path information is expensive as the history of the traverser is
> stored into a Java list. With numerous traversers, there are numerous
> lists. Moreover, in an OLAP GraphComputer environment this becomes
> exceedingly prohibitive as there are traversers emanating from all vertices
> in the graph in parallel. In OLAP there are optimizations provided for
> traverser populations, but when paths are calculated (and each traverser is
> unique due to its history), then these optimizations are no longer possible.


What is the expectation of path functionality in OLAP?  Is this a work in
progress?

More generally, how are the various @Ignore and @IgnoreEngine annotations
maintained?  Some of them mean "this will never work" while some of them
mean "TODO".

Re: OLAP + path

Posted by Marko Rodriguez <ok...@gmail.com>.
Hello Matt,

> It seems that eventually the OLAP engine should attempt to implement such
> traversals, even if it moves a lot of bits.  A given application may use
> properties sparingly, in which case the serialization mass of a full vertex
> might be reasonable.

This is possible, but not fully integrated yet. Stephen and I are currently doing work on the Attachable interface (and corresponding uses). For most OLAP operations, you will want to use ReferenceXXX as that is an extremely small message to pass. For those that require more data about the element to propagate (e.g. vertex properties), then DetachedXXX should be used. Determining which Attachable to use is a function of analyzing the traversal and seeing if any path()/select()/??? have a by() modulation that access element properties. 

> Another option is to create a LazyElement class which would implement
> property access by executing an OLTP traversal.

This is not always possible as for HadoopGraph, while you can OLTP query it, its a linear scan and prohibitively expensive. However, yes, in theory this is very possible where the Host of the Attachable is the OLTP graph interface. This would enable such things as path().by(out().out().value("name")) in OLAP, where DetachedVertex would only provide you properties.

> Also, there might be an opportunity to recognize when only labelled
> elements are required to be retained.  Is that what PATH_ACCESS without
> PATH means?  In such a case, a traversal author may be selective about
> which steps are labelled based on their serialization mass.

Hmm. No PATH_ACCESS was more about the SparsePath optimization. I just got rid of SparsePath given the select() confusions in OLTP vs. OLAP and as such, PATH_ACCESS went away. Its just PATH now.

HTH,
Marko.

http://markorodriguez.com



> 
> On Thu, Apr 23, 2015 at 8:55 AM, Marko Rodriguez <ok...@gmail.com>
> wrote:
> 
>> Hello Matt,
>> 
>> The reason some of the OLAP tests ignore path()-based (as well as
>> select()-based) traversals is because of by().
>> 
>>        If the test traversal is g.V.out.path, OLAP is happy.
>>        If the test traversal is g.V.out.path.by('name'), OLAP is unhappy.
>> 
>> Why? Because the path is "detached" which means that a path like
>> [v[1],v[2],v[3]] just contains ReferenceVertices, not the actual vertices.
>> A ReferenceVertex is just an id. Thus, it doesn't have a "name" property.
>> In OLAP, it would be very expensive to have traverser paths contain all the
>> properties of all the elements they touched -- a massive amount of data. In
>> OLTP, its easy as its a direct memory reference back to the original vertex.
>> 
>> HTH,
>> Marko.
>> 
>> http://markorodriguez.com
>> 
>> On Apr 22, 2015, at 3:03 PM, Matt Frantz <ma...@gmail.com>
>> wrote:
>> 
>>> In working on TINKERPOP3-639, I've observed that there seems to be a
>>> pattern in the tests that are disabled with
>>> @IgnoreEngine(TraversalEngine.Type.COMPUTER), at least in SelectTest and
>>> PathTest.
>>> 
>>>  - The only SelectTest tests that run in OLAP are those for which the
>>>  TraverserRequirement.PATH (which for `select` is added by static
>> analysis
>>>  of the traversal) is not present.
>>>  - The only PathTest tests that run in OLAP are those which traverse a
>>>  single vertex.
>>> 
>>> I see this warning in the docs for using OLAP with path:
>>> 
>>> Generating path information is expensive as the history of the traverser
>> is
>>>> stored into a Java list. With numerous traversers, there are numerous
>>>> lists. Moreover, in an OLAP GraphComputer environment this becomes
>>>> exceedingly prohibitive as there are traversers emanating from all
>> vertices
>>>> in the graph in parallel. In OLAP there are optimizations provided for
>>>> traverser populations, but when paths are calculated (and each
>> traverser is
>>>> unique due to its history), then these optimizations are no longer
>> possible.
>>> 
>>> 
>>> What is the expectation of path functionality in OLAP?  Is this a work in
>>> progress?
>>> 
>>> More generally, how are the various @Ignore and @IgnoreEngine annotations
>>> maintained?  Some of them mean "this will never work" while some of them
>>> mean "TODO".
>> 
>> 


Re: OLAP + path

Posted by Matt Frantz <Ma...@gmail.com>.
Ah, that explains the property-free vertices I was seeing.

It seems that eventually the OLAP engine should attempt to implement such
traversals, even if it moves a lot of bits.  A given application may use
properties sparingly, in which case the serialization mass of a full vertex
might be reasonable.

Another option is to create a LazyElement class which would implement
property access by executing an OLTP traversal.

Also, there might be an opportunity to recognize when only labelled
elements are required to be retained.  Is that what PATH_ACCESS without
PATH means?  In such a case, a traversal author may be selective about
which steps are labelled based on their serialization mass.

On Thu, Apr 23, 2015 at 8:55 AM, Marko Rodriguez <ok...@gmail.com>
wrote:

> Hello Matt,
>
> The reason some of the OLAP tests ignore path()-based (as well as
> select()-based) traversals is because of by().
>
>         If the test traversal is g.V.out.path, OLAP is happy.
>         If the test traversal is g.V.out.path.by('name'), OLAP is unhappy.
>
> Why? Because the path is "detached" which means that a path like
> [v[1],v[2],v[3]] just contains ReferenceVertices, not the actual vertices.
> A ReferenceVertex is just an id. Thus, it doesn't have a "name" property.
> In OLAP, it would be very expensive to have traverser paths contain all the
> properties of all the elements they touched -- a massive amount of data. In
> OLTP, its easy as its a direct memory reference back to the original vertex.
>
> HTH,
> Marko.
>
> http://markorodriguez.com
>
> On Apr 22, 2015, at 3:03 PM, Matt Frantz <ma...@gmail.com>
> wrote:
>
> > In working on TINKERPOP3-639, I've observed that there seems to be a
> > pattern in the tests that are disabled with
> > @IgnoreEngine(TraversalEngine.Type.COMPUTER), at least in SelectTest and
> > PathTest.
> >
> >   - The only SelectTest tests that run in OLAP are those for which the
> >   TraverserRequirement.PATH (which for `select` is added by static
> analysis
> >   of the traversal) is not present.
> >   - The only PathTest tests that run in OLAP are those which traverse a
> >   single vertex.
> >
> > I see this warning in the docs for using OLAP with path:
> >
> > Generating path information is expensive as the history of the traverser
> is
> >> stored into a Java list. With numerous traversers, there are numerous
> >> lists. Moreover, in an OLAP GraphComputer environment this becomes
> >> exceedingly prohibitive as there are traversers emanating from all
> vertices
> >> in the graph in parallel. In OLAP there are optimizations provided for
> >> traverser populations, but when paths are calculated (and each
> traverser is
> >> unique due to its history), then these optimizations are no longer
> possible.
> >
> >
> > What is the expectation of path functionality in OLAP?  Is this a work in
> > progress?
> >
> > More generally, how are the various @Ignore and @IgnoreEngine annotations
> > maintained?  Some of them mean "this will never work" while some of them
> > mean "TODO".
>
>

Re: OLAP + path

Posted by Marko Rodriguez <ok...@gmail.com>.
Hello Matt,

The reason some of the OLAP tests ignore path()-based (as well as select()-based) traversals is because of by().

	If the test traversal is g.V.out.path, OLAP is happy.
	If the test traversal is g.V.out.path.by('name'), OLAP is unhappy.

Why? Because the path is "detached" which means that a path like [v[1],v[2],v[3]] just contains ReferenceVertices, not the actual vertices. A ReferenceVertex is just an id. Thus, it doesn't have a "name" property. In OLAP, it would be very expensive to have traverser paths contain all the properties of all the elements they touched -- a massive amount of data. In OLTP, its easy as its a direct memory reference back to the original vertex.

HTH,
Marko.

http://markorodriguez.com

On Apr 22, 2015, at 3:03 PM, Matt Frantz <ma...@gmail.com> wrote:

> In working on TINKERPOP3-639, I've observed that there seems to be a
> pattern in the tests that are disabled with
> @IgnoreEngine(TraversalEngine.Type.COMPUTER), at least in SelectTest and
> PathTest.
> 
>   - The only SelectTest tests that run in OLAP are those for which the
>   TraverserRequirement.PATH (which for `select` is added by static analysis
>   of the traversal) is not present.
>   - The only PathTest tests that run in OLAP are those which traverse a
>   single vertex.
> 
> I see this warning in the docs for using OLAP with path:
> 
> Generating path information is expensive as the history of the traverser is
>> stored into a Java list. With numerous traversers, there are numerous
>> lists. Moreover, in an OLAP GraphComputer environment this becomes
>> exceedingly prohibitive as there are traversers emanating from all vertices
>> in the graph in parallel. In OLAP there are optimizations provided for
>> traverser populations, but when paths are calculated (and each traverser is
>> unique due to its history), then these optimizations are no longer possible.
> 
> 
> What is the expectation of path functionality in OLAP?  Is this a work in
> progress?
> 
> More generally, how are the various @Ignore and @IgnoreEngine annotations
> maintained?  Some of them mean "this will never work" while some of them
> mean "TODO".