You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@age.apache.org by GitBox <gi...@apache.org> on 2022/03/10 17:09:43 UTC

[GitHub] [incubator-age] jrgemignani commented on issue #195: Variable Length Edges feature does not scale well with the length of the path

jrgemignani commented on issue #195:
URL: https://github.com/apache/incubator-age/issues/195#issuecomment-1064297992

I think I need to clear up some misconceptions first -

The VLE is a path finding algorithm, its speed is directly related to the size and connectedness of the graph and what it is looking for. Remember, this is a graph, not a tree, so nodes can be reused in paths but not edges. This means, the longer the path, the less qualified the edges are in a path, the more time it will take in very connected graphs. Btw, other than the start and terminal vertices, the algorithm doesn't care about the interior vertices; it only cares about the edges.

If we look at the following queries:

MATCH p=(u)-[*]-(v) RETURN p
MATCH (u)-[*]-(v) RETURN u
MATCH ()-[*]-() RETURN COUNT(1);

All of these queries will basically do the same thing - search for all paths of any length from all vertices to all vertices. This means that for a graph of 1 million nodes, this algorithm will be invoked up to 1 trillion times (the number of VxV combinations). Now, add that you are asking for all of the paths between each combination. So, each additional path length can add to the work done. This will always be the case for finding paths.

As you can see, the best way to deal with this is to limit what the algorithm has to do. This means restricting where it needs to look and how deeply it needs to look. Just remember, just because you restrict the input vertices doesn't mean that you restrict the interior vertices. The algorithm doesn't care about them, it only cares about interior edges. Additionally, due to other implementation factors (PostgreSQL being a big one) the algorithm generally generates all valid (within the selection window) paths from a start vertex and leaves the match against the terminal vertex to PG.

I will add this -

There are 2 patches coming out in the next release that will improve performance of some queries by up to 10%. I wouldn't expect their impact to be that great but, that depends on the graph and the query. This is across the board, btw, not just the VLE.

Additionally, I am currently working on a patch to the VLE that will negate some of the overhead due to how PG calls the VLE function. This has the potential of greatly increasing the speed of a large subset of queries. However, it is unlikely to be available as part of the next release. But, it should be available to pull down from the current master in about 1-2 weeks.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscribe@age.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org