You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@tinkerpop.apache.org by Stephen Mallette <sp...@gmail.com> on 2018/06/11 13:12:40 UTC

[DISCUSS] text predicates

I found a CosmosDB issue on github calling for support of text predicates

https://github.com/Azure/azure-documentdb-dotnet/issues/473

and it conveniently listed the text predicates for a number of different
graphs, so it made the job of compiling these pretty easy.

DSE Graph (tokenized search is for long multi-sentence type properties)
+ eq/neq
+ prefix
+ regex
+ token
+ tokenPrefix
+ tokenRegex
+ phrase
+ fuzzy
+ tokenFuzzy

https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/useSearchIndexes.html

JanusGraph
+ textContains
+ textContainsPrefix
+ textContainsRegex
+ textContainsFuzzy
+ eq/neq
+ textPrefix
+ textRegex
+ textFuzzy

http://docs.janusgraph.org/latest/index-parameters.html#text-search

Neo4j/Cypher
+ STARTS WITH
+ ENDS WITH
+ CONTAINS

http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html

OrientDB - basically just lucene syntax
+ LUCENE

https://orientdb.com/docs/last/Full-Text-Index.html

So - that's the list as best I can determine. JanusGraph and DSE Graph have
the most complex set of expressions it seems. Neo4j/Cypher has the easiest
developer friendly looking set that probably covers most of the questions
we get out in the community. OrientDB gets vendor specific in what they
do.  Did I leave any out - please update this thread if I did.

Not sure what we do with that now, but that's what is out there.

Re: [DISCUSS] text predicates

Posted by Daniel Kuppitz <dk...@apache.org>.

https://issues.apache.org/jira/browse/TINKERPOP-2041

I'm going to work on some simple predicates (startsWith, endsWith and contains). For now, I'd like to keep it really simple and only provide simple 1-arg overloads. Also, RegEx will not be included as there are still some back and forth discussions regarding different RegEx syntaxes.

Cheers,
Daniel


On 2018/06/21 20:06:46, Stephen Mallette <sp...@gmail.com> wrote: 
> >   Also, wouldn't you need to configure a serializer for
> 'DseGraph.searchType'?
> 
> that's the nice part of with() - internally that falls into a standard Map
> in bytecode and no serialization hassles. it still leaves the graph
> provider to expose a "DseGraph" type of class, but at least there is
> nothing to configure anywhere
> 
> >  Then all that's left is fuzzy.  I don't have an opinion on that yet.
> Maybe it's more Search enums?
> 
> could be. or with() is the catch all case that providers can use for
> "everything else" they can come up with....for now
> 
> 
> 
> On Thu, Jun 21, 2018 at 3:52 PM Robert Dale <ro...@gmail.com> wrote:
> 
> > No, that makes it non-portable, provider-specific.  That is, I can't
> > cut-n-paste that from one graph db to the next.  Also, wouldn't you need to
> > configure a serializer for 'DseGraph.searchType'?
> >
> > I think we can start with a small, simple set.
> >
> > startsWith(String)
> > startsWith(Search,String)
> > contains(String)
> > contains(Search, String)
> > regex(String)
> > regex(Search, String)
> >
> > Each takes Search, String.  Where Search is an enum of String (default),
> > Text (tokenized).  String is the search term.
> >
> > The regex syntax may be provider-specific, but the traversal would be
> > portable. If the provider doesn't override the step/predicate then it would
> > use the default implementation.
> >
> > Then all that's left is fuzzy.  I don't have an opinion on that yet. Maybe
> > it's more Search enums?
> >
> > Robert Dale
> >
> >
> > On Thu, Jun 21, 2018 at 3:04 PM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > Just thinking out loud here, but i wonder if we could keep our predicate
> > > list more or less as-is, but then use with() to modulate a has() to be
> > > provider specific:
> > >
> > > g.V().
> > >    has('longText',eq("a.*").
> > >      with(DseGraph.searchType, tokenRegex)
> > >
> > > In other words, this would be the standard way that users would inform
> > > graph providers to handle special text search types. The upside is that
> > >
> > > 1. graph providers no longer have to hassle with serialization at all to
> > > implement this (which means users don't need special configuration of
> > their
> > > servers/drivers).
> > > 2. we have a common way that all graph providers can take advantage of
> > and
> > > thus users have one method for writing their gremlin (albeit with
> > different
> > > with() and search syntax).
> > > 3. we can make this part of our reference implementation i think pretty
> > > easily for TinkerGraph with some basic java regex stuff.
> > > 4. stays backward compatible with existing graph provider predicates
> > >
> > > good idea?
> > >
> > >
> > >
> > > On Mon, Jun 11, 2018 at 9:12 AM Stephen Mallette <sp...@gmail.com>
> > > wrote:
> > >
> > > > I found a CosmosDB issue on github calling for support of text
> > predicates
> > > >
> > > > https://github.com/Azure/azure-documentdb-dotnet/issues/473
> > > >
> > > > and it conveniently listed the text predicates for a number of
> > different
> > > > graphs, so it made the job of compiling these pretty easy.
> > > >
> > > > DSE Graph (tokenized search is for long multi-sentence type properties)
> > > > + eq/neq
> > > > + prefix
> > > > + regex
> > > > + token
> > > > + tokenPrefix
> > > > + tokenRegex
> > > > + phrase
> > > > + fuzzy
> > > > + tokenFuzzy
> > > >
> > > >
> > > >
> > >
> > https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/useSearchIndexes.html
> > > >
> > > > JanusGraph
> > > > + textContains
> > > > + textContainsPrefix
> > > > + textContainsRegex
> > > > + textContainsFuzzy
> > > > + eq/neq
> > > > + textPrefix
> > > > + textRegex
> > > > + textFuzzy
> > > >
> > > > http://docs.janusgraph.org/latest/index-parameters.html#text-search
> > > >
> > > > Neo4j/Cypher
> > > > + STARTS WITH
> > > > + ENDS WITH
> > > > + CONTAINS
> > > >
> > > >
> > >
> > http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html
> > > >
> > > > OrientDB - basically just lucene syntax
> > > > + LUCENE
> > > >
> > > > https://orientdb.com/docs/last/Full-Text-Index.html
> > > >
> > > > So - that's the list as best I can determine. JanusGraph and DSE Graph
> > > > have the most complex set of expressions it seems. Neo4j/Cypher has the
> > > > easiest developer friendly looking set that probably covers most of the
> > > > questions we get out in the community. OrientDB gets vendor specific in
> > > > what they do.  Did I leave any out - please update this thread if I
> > did.
> > > >
> > > > Not sure what we do with that now, but that's what is out there.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>

Re: [DISCUSS] text predicates

Posted by Stephen Mallette <sp...@gmail.com>.

>   Also, wouldn't you need to configure a serializer for
'DseGraph.searchType'?

that's the nice part of with() - internally that falls into a standard Map
in bytecode and no serialization hassles. it still leaves the graph
provider to expose a "DseGraph" type of class, but at least there is
nothing to configure anywhere

>  Then all that's left is fuzzy.  I don't have an opinion on that yet.
Maybe it's more Search enums?

could be. or with() is the catch all case that providers can use for
"everything else" they can come up with....for now



On Thu, Jun 21, 2018 at 3:52 PM Robert Dale <ro...@gmail.com> wrote:

> No, that makes it non-portable, provider-specific.  That is, I can't
> cut-n-paste that from one graph db to the next.  Also, wouldn't you need to
> configure a serializer for 'DseGraph.searchType'?
>
> I think we can start with a small, simple set.
>
> startsWith(String)
> startsWith(Search,String)
> contains(String)
> contains(Search, String)
> regex(String)
> regex(Search, String)
>
> Each takes Search, String.  Where Search is an enum of String (default),
> Text (tokenized).  String is the search term.
>
> The regex syntax may be provider-specific, but the traversal would be
> portable. If the provider doesn't override the step/predicate then it would
> use the default implementation.
>
> Then all that's left is fuzzy.  I don't have an opinion on that yet. Maybe
> it's more Search enums?
>
> Robert Dale
>
>
> On Thu, Jun 21, 2018 at 3:04 PM Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > Just thinking out loud here, but i wonder if we could keep our predicate
> > list more or less as-is, but then use with() to modulate a has() to be
> > provider specific:
> >
> > g.V().
> >    has('longText',eq("a.*").
> >      with(DseGraph.searchType, tokenRegex)
> >
> > In other words, this would be the standard way that users would inform
> > graph providers to handle special text search types. The upside is that
> >
> > 1. graph providers no longer have to hassle with serialization at all to
> > implement this (which means users don't need special configuration of
> their
> > servers/drivers).
> > 2. we have a common way that all graph providers can take advantage of
> and
> > thus users have one method for writing their gremlin (albeit with
> different
> > with() and search syntax).
> > 3. we can make this part of our reference implementation i think pretty
> > easily for TinkerGraph with some basic java regex stuff.
> > 4. stays backward compatible with existing graph provider predicates
> >
> > good idea?
> >
> >
> >
> > On Mon, Jun 11, 2018 at 9:12 AM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > I found a CosmosDB issue on github calling for support of text
> predicates
> > >
> > > https://github.com/Azure/azure-documentdb-dotnet/issues/473
> > >
> > > and it conveniently listed the text predicates for a number of
> different
> > > graphs, so it made the job of compiling these pretty easy.
> > >
> > > DSE Graph (tokenized search is for long multi-sentence type properties)
> > > + eq/neq
> > > + prefix
> > > + regex
> > > + token
> > > + tokenPrefix
> > > + tokenRegex
> > > + phrase
> > > + fuzzy
> > > + tokenFuzzy
> > >
> > >
> > >
> >
> https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/useSearchIndexes.html
> > >
> > > JanusGraph
> > > + textContains
> > > + textContainsPrefix
> > > + textContainsRegex
> > > + textContainsFuzzy
> > > + eq/neq
> > > + textPrefix
> > > + textRegex
> > > + textFuzzy
> > >
> > > http://docs.janusgraph.org/latest/index-parameters.html#text-search
> > >
> > > Neo4j/Cypher
> > > + STARTS WITH
> > > + ENDS WITH
> > > + CONTAINS
> > >
> > >
> >
> http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html
> > >
> > > OrientDB - basically just lucene syntax
> > > + LUCENE
> > >
> > > https://orientdb.com/docs/last/Full-Text-Index.html
> > >
> > > So - that's the list as best I can determine. JanusGraph and DSE Graph
> > > have the most complex set of expressions it seems. Neo4j/Cypher has the
> > > easiest developer friendly looking set that probably covers most of the
> > > questions we get out in the community. OrientDB gets vendor specific in
> > > what they do.  Did I leave any out - please update this thread if I
> did.
> > >
> > > Not sure what we do with that now, but that's what is out there.
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> >
>

Re: [DISCUSS] text predicates

Posted by Robert Dale <ro...@gmail.com>.

No, that makes it non-portable, provider-specific.  That is, I can't
cut-n-paste that from one graph db to the next.  Also, wouldn't you need to
configure a serializer for 'DseGraph.searchType'?

I think we can start with a small, simple set.

startsWith(String)
startsWith(Search,String)
contains(String)
contains(Search, String)
regex(String)
regex(Search, String)

Each takes Search, String.  Where Search is an enum of String (default),
Text (tokenized).  String is the search term.

The regex syntax may be provider-specific, but the traversal would be
portable. If the provider doesn't override the step/predicate then it would
use the default implementation.

Then all that's left is fuzzy.  I don't have an opinion on that yet. Maybe
it's more Search enums?

Robert Dale


On Thu, Jun 21, 2018 at 3:04 PM Stephen Mallette <sp...@gmail.com>
wrote:

> Just thinking out loud here, but i wonder if we could keep our predicate
> list more or less as-is, but then use with() to modulate a has() to be
> provider specific:
>
> g.V().
>    has('longText',eq("a.*").
>      with(DseGraph.searchType, tokenRegex)
>
> In other words, this would be the standard way that users would inform
> graph providers to handle special text search types. The upside is that
>
> 1. graph providers no longer have to hassle with serialization at all to
> implement this (which means users don't need special configuration of their
> servers/drivers).
> 2. we have a common way that all graph providers can take advantage of and
> thus users have one method for writing their gremlin (albeit with different
> with() and search syntax).
> 3. we can make this part of our reference implementation i think pretty
> easily for TinkerGraph with some basic java regex stuff.
> 4. stays backward compatible with existing graph provider predicates
>
> good idea?
>
>
>
> On Mon, Jun 11, 2018 at 9:12 AM Stephen Mallette <sp...@gmail.com>
> wrote:
>
> > I found a CosmosDB issue on github calling for support of text predicates
> >
> > https://github.com/Azure/azure-documentdb-dotnet/issues/473
> >
> > and it conveniently listed the text predicates for a number of different
> > graphs, so it made the job of compiling these pretty easy.
> >
> > DSE Graph (tokenized search is for long multi-sentence type properties)
> > + eq/neq
> > + prefix
> > + regex
> > + token
> > + tokenPrefix
> > + tokenRegex
> > + phrase
> > + fuzzy
> > + tokenFuzzy
> >
> >
> >
> https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/useSearchIndexes.html
> >
> > JanusGraph
> > + textContains
> > + textContainsPrefix
> > + textContainsRegex
> > + textContainsFuzzy
> > + eq/neq
> > + textPrefix
> > + textRegex
> > + textFuzzy
> >
> > http://docs.janusgraph.org/latest/index-parameters.html#text-search
> >
> > Neo4j/Cypher
> > + STARTS WITH
> > + ENDS WITH
> > + CONTAINS
> >
> >
> http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html
> >
> > OrientDB - basically just lucene syntax
> > + LUCENE
> >
> > https://orientdb.com/docs/last/Full-Text-Index.html
> >
> > So - that's the list as best I can determine. JanusGraph and DSE Graph
> > have the most complex set of expressions it seems. Neo4j/Cypher has the
> > easiest developer friendly looking set that probably covers most of the
> > questions we get out in the community. OrientDB gets vendor specific in
> > what they do.  Did I leave any out - please update this thread if I did.
> >
> > Not sure what we do with that now, but that's what is out there.
> >
> >
> >
> >
> >
> >
> >
> >
>

Re: [DISCUSS] text predicates

Posted by Stephen Mallette <sp...@gmail.com>.

Just thinking out loud here, but i wonder if we could keep our predicate
list more or less as-is, but then use with() to modulate a has() to be
provider specific:

g.V().
   has('longText',eq("a.*").
     with(DseGraph.searchType, tokenRegex)

In other words, this would be the standard way that users would inform
graph providers to handle special text search types. The upside is that

1. graph providers no longer have to hassle with serialization at all to
implement this (which means users don't need special configuration of their
servers/drivers).
2. we have a common way that all graph providers can take advantage of and
thus users have one method for writing their gremlin (albeit with different
with() and search syntax).
3. we can make this part of our reference implementation i think pretty
easily for TinkerGraph with some basic java regex stuff.
4. stays backward compatible with existing graph provider predicates

good idea?

On Mon, Jun 11, 2018 at 9:12 AM Stephen Mallette <sp...@gmail.com>
wrote:

> I found a CosmosDB issue on github calling for support of text predicates
>
> https://github.com/Azure/azure-documentdb-dotnet/issues/473
>
> and it conveniently listed the text predicates for a number of different
> graphs, so it made the job of compiling these pretty easy.
>
> DSE Graph (tokenized search is for long multi-sentence type properties)
> + eq/neq
> + prefix
> + regex
> + token
> + tokenPrefix
> + tokenRegex
> + phrase
> + fuzzy
> + tokenFuzzy
>
>
> https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/useSearchIndexes.html
>
> JanusGraph
> + textContains
> + textContainsPrefix
> + textContainsRegex
> + textContainsFuzzy
> + eq/neq
> + textPrefix
> + textRegex
> + textFuzzy
>
> http://docs.janusgraph.org/latest/index-parameters.html#text-search
>
> Neo4j/Cypher
> + STARTS WITH
> + ENDS WITH
> + CONTAINS
>
> http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html
>
> OrientDB - basically just lucene syntax
> + LUCENE
>
> https://orientdb.com/docs/last/Full-Text-Index.html
>
> So - that's the list as best I can determine. JanusGraph and DSE Graph
> have the most complex set of expressions it seems. Neo4j/Cypher has the
> easiest developer friendly looking set that probably covers most of the
> questions we get out in the community. OrientDB gets vendor specific in
> what they do.  Did I leave any out - please update this thread if I did.
>
> Not sure what we do with that now, but that's what is out there.
>
>
>
>
>
>
>
>