You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tinkerpop.apache.org by Daniel Kuppitz <dk...@apache.org> on 2018/09/19 20:45:33 UTC
Re: [DISCUSS] text predicates

https://issues.apache.org/jira/browse/TINKERPOP-2041

I'm going to work on some simple predicates (startsWith, endsWith and contains). For now, I'd like to keep it really simple and only provide simple 1-arg overloads. Also, RegEx will not be included as there are still some back and forth discussions regarding different RegEx syntaxes.

Cheers,
Daniel


On 2018/06/21 20:06:46, Stephen Mallette <sp...@gmail.com> wrote: 
> >   Also, wouldn't you need to configure a serializer for
> 'DseGraph.searchType'?
> 
> that's the nice part of with() - internally that falls into a standard Map
> in bytecode and no serialization hassles. it still leaves the graph
> provider to expose a "DseGraph" type of class, but at least there is
> nothing to configure anywhere
> 
> >  Then all that's left is fuzzy.  I don't have an opinion on that yet.
> Maybe it's more Search enums?
> 
> could be. or with() is the catch all case that providers can use for
> "everything else" they can come up with....for now
> 
> 
> 
> On Thu, Jun 21, 2018 at 3:52 PM Robert Dale <ro...@gmail.com> wrote:
> 
> > No, that makes it non-portable, provider-specific.  That is, I can't
> > cut-n-paste that from one graph db to the next.  Also, wouldn't you need to
> > configure a serializer for 'DseGraph.searchType'?
> >
> > I think we can start with a small, simple set.
> >
> > startsWith(String)
> > startsWith(Search,String)
> > contains(String)
> > contains(Search, String)
> > regex(String)
> > regex(Search, String)
> >
> > Each takes Search, String.  Where Search is an enum of String (default),
> > Text (tokenized).  String is the search term.
> >
> > The regex syntax may be provider-specific, but the traversal would be
> > portable. If the provider doesn't override the step/predicate then it would
> > use the default implementation.
> >
> > Then all that's left is fuzzy.  I don't have an opinion on that yet. Maybe
> > it's more Search enums?
> >
> > Robert Dale
> >
> >
> > On Thu, Jun 21, 2018 at 3:04 PM Stephen Mallette <sp...@gmail.com>
> > wrote:
> >
> > > Just thinking out loud here, but i wonder if we could keep our predicate
> > > list more or less as-is, but then use with() to modulate a has() to be
> > > provider specific:
> > >
> > > g.V().
> > >    has('longText',eq("a.*").
> > >      with(DseGraph.searchType, tokenRegex)
> > >
> > > In other words, this would be the standard way that users would inform
> > > graph providers to handle special text search types. The upside is that
> > >
> > > 1. graph providers no longer have to hassle with serialization at all to
> > > implement this (which means users don't need special configuration of
> > their
> > > servers/drivers).
> > > 2. we have a common way that all graph providers can take advantage of
> > and
> > > thus users have one method for writing their gremlin (albeit with
> > different
> > > with() and search syntax).
> > > 3. we can make this part of our reference implementation i think pretty
> > > easily for TinkerGraph with some basic java regex stuff.
> > > 4. stays backward compatible with existing graph provider predicates
> > >
> > > good idea?
> > >
> > >
> > >
> > > On Mon, Jun 11, 2018 at 9:12 AM Stephen Mallette <sp...@gmail.com>
> > > wrote:
> > >
> > > > I found a CosmosDB issue on github calling for support of text
> > predicates
> > > >
> > > > https://github.com/Azure/azure-documentdb-dotnet/issues/473
> > > >
> > > > and it conveniently listed the text predicates for a number of
> > different
> > > > graphs, so it made the job of compiling these pretty easy.
> > > >
> > > > DSE Graph (tokenized search is for long multi-sentence type properties)
> > > > + eq/neq
> > > > + prefix
> > > > + regex
> > > > + token
> > > > + tokenPrefix
> > > > + tokenRegex
> > > > + phrase
> > > > + fuzzy
> > > > + tokenFuzzy
> > > >
> > > >
> > > >
> > >
> > https://docs.datastax.com/en/dse/6.0/dse-dev/datastax_enterprise/graph/using/useSearchIndexes.html
> > > >
> > > > JanusGraph
> > > > + textContains
> > > > + textContainsPrefix
> > > > + textContainsRegex
> > > > + textContainsFuzzy
> > > > + eq/neq
> > > > + textPrefix
> > > > + textRegex
> > > > + textFuzzy
> > > >
> > > > http://docs.janusgraph.org/latest/index-parameters.html#text-search
> > > >
> > > > Neo4j/Cypher
> > > > + STARTS WITH
> > > > + ENDS WITH
> > > > + CONTAINS
> > > >
> > > >
> > >
> > http://www.jexp.de/blog/html/full-text-and-spatial-search-in-neo4j-3.html
> > > >
> > > > OrientDB - basically just lucene syntax
> > > > + LUCENE
> > > >
> > > > https://orientdb.com/docs/last/Full-Text-Index.html
> > > >
> > > > So - that's the list as best I can determine. JanusGraph and DSE Graph
> > > > have the most complex set of expressions it seems. Neo4j/Cypher has the
> > > > easiest developer friendly looking set that probably covers most of the
> > > > questions we get out in the community. OrientDB gets vendor specific in
> > > > what they do.  Did I leave any out - please update this thread if I
> > did.
> > > >
> > > > Not sure what we do with that now, but that's what is out there.
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > >
> >
>