You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2022/01/07 23:26:17 UTC

[GitHub] [tinkerpop] krlawrence opened a new pull request #1542: Add Text.regex text predicate

krlawrence opened a new pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542


   This Pull Request adds regular expressions in the form of `TextP.regex` to the Gremlin text predicates across the Java reference implementation  and the Pyhon, Javascript, and .Net, GLVs..
   
   ```
   gremlin> g.addV('City').property('name','Dallas')
   ==>v[2]
   gremlin> g.addV('City').property('name','Darwin')
   ==>v[4]
   
   gremlin> g.V().has('City','name',TextP.regex('^Da[r|l]')).values('name')
   ==>Dallas
   ==>Darwin
   ```
   
   ### Known limitations
   I consciously avoided trying to include a pattern cache with this implementation. Patterns are "compiled" each time they are used. This is something we could definitely add in a future PR but I also suspect implementers will want to use their own REGEX engines and processors anyway.
   
   ***
   
   Everything builds with `mvn clean install`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette merged pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
spmallette merged pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#issuecomment-1069220769


   > I believe your issue was satisfied by documentation which you wrote was a way to settle it with: "Continue using TextP.regex() syntax but clarify the behaviour explicitly".
   > 
   > https://github.com/apache/tinkerpop/pull/1542/files#diff-aa69bbbf8224e408ea19bf3df1466e16e6309bcfb241f68b578032d9fbbbd48dR4204-R4214
   > 
   > https://github.com/apache/tinkerpop/pull/1542/files#diff-a523495c08b5b75fe43826198bab44206a235fc4af4ffc7a40bc684cf817fc71R484-R503
   
   Perhaps I should have clarified myself. I still wanted to have a discussion on option 1 vs. option 2 that I suggested in the comment https://github.com/apache/tinkerpop/pull/1542#discussion_r782965129 . I would have favoured option 2. Nevertheless, I would like to understand the advantages of choosing option 1 over option 2 here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
krlawrence commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782543845



##########
File path: docs/src/reference/the-traversal.asciidoc
##########
@@ -4099,8 +4100,12 @@ The provided predicates are outlined in the table below and are used in various
 | `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
 | `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
 | `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
+| `TextP.regex(string)` | Does the incoming `String` match the regular expression in the provided `String`?

Review comment:
       I took "inspiration" from other Graphs and query languages. JanusGraph and Titan use the "regex" phrase as does SPARQL with its  `FILTER REGEX`. Cypher uses the Ruby style of `=~`  so neither a `like` or a `regex`. I considered `match` but did not go with it for the reasons discussed. We could go with `like` but `like` is not really a regex matcher in SQL it's a more basic syntax. There have been e-mails on this that date back to when the original TextP.startingWith work was done but I do not have links handy. There is also a Jira. I would still prefer to go with `regex` as that makes it very clear what the step is going to do to my mind.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782550154



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TextP.java
##########
@@ -106,4 +106,22 @@ public static TextP containing(final String value) {
     public static TextP notContaining(final String value) {
         return new TextP(Text.notContaining, value);
     }
+    
+    /**           
+     * Determines if String has a match with the given REGEX pattern. 
+     *
+     * @since 3.6.0
+     */
+    public static TextP regex(final String value) {
+        return new TextP(Text.regex, value);

Review comment:
       Per my comment above:
   
   `return new TextP(new RegexPedicate(value), value);`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r781714478



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);
+            Matcher matcher = pattern.matcher(value);
+            return matcher.find();   
+        }
+
+        /**
+         * The negative of {@code regex} is {@link #notRegex}.
+         */
+        @Override
+        public Text negate() {
+            return notRegex;
+        }
+    },
+    /**
+     * Evaluates if the first string does not have a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    notRegex {
+        @Override
+        public boolean test(final String value, final String regex) {
+	    Pattern pattern = Pattern.compile(regex);

Review comment:
       Why not just have notRegex call !Text.regex.test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r781714093



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       final 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#issuecomment-1064288522


   > I believe all items have been addressed at this point for this PR. VOTE +1
   
   I missed this one. My concern about using `regex` as a syntax was not addressed. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
spmallette commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782175976



##########
File path: docs/src/reference/the-traversal.asciidoc
##########
@@ -4099,8 +4100,12 @@ The provided predicates are outlined in the table below and are used in various
 | `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
 | `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
 | `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
+| `TextP.regex(string)` | Does the incoming `String` match the regular expression in the provided `String`?

Review comment:
       i'd have preferred `match()` as well but it conflicts with `match()`-step. 
   
   > Also, we don't need a special negation syntax, since the user can do
   
   The negate is there to satisfy the `Predicate.negate()` with a non-lambda for serialization purposes. we might add a comment thought to say that it typically isn't recommended to use it.
   
   `like()` is actually a nice alternative as it makes the `notLike()` easier. `like()` isn't self-explaining like `regex()` but perhaps that's not a problem. I'd be happy with `like()`. I suppose i'd also be in favor of `matches()` which is amenable to the negation.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782548981



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       Actually it would still need to be a BiPredicate because that is what P expect, but at least it would already have the pattern compiled at runtime. The second arg would just be ignored.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
krlawrence commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782537221



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       The reason I left it this way is related to your other comment. As the code paths I believe are going to be almost identical, this sets us up to potentially cache Pattern objects in the future.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#issuecomment-1072292046


   I don't have a lot of technical advantages of 1 over 2 of list, but I don't find 2 more compelling than 1 either to spend time reworking it. @krlawrence who did most of the PR indicated earlier in this discussion that he preferred the naming he chose and it is consistent with what the TinkerPop graph community has been used to seeing since 2015 with Titan to JanusGraph to DataStax Graph.  Personally, I'm fine to stick with it on those grounds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#issuecomment-1072292046


   I don't have a lot of technical advantages of 1 over 2 of list, but I don't find 2 more compelling than 1 either to spend time reworking it. @krlawrence who did most of the PR indicated earlier in this discussion that he preferred the naming he chose and it is consistent with what the TinkerPop graph community has been used to seeing since 2015 with Titan to JanusGraph to DataStax Graph.  Personally, I'm fine to stick with it on those grounds.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#issuecomment-1065034366


   I believe your issue was satisfied by documentation which you wrote was a way to settle it with: "Continue using TextP.regex() syntax but clarify the behaviour explicitly".
   
   https://github.com/apache/tinkerpop/pull/1542/files#diff-aa69bbbf8224e408ea19bf3df1466e16e6309bcfb241f68b578032d9fbbbd48dR4204-R4214
   
   https://github.com/apache/tinkerpop/pull/1542/files#diff-a523495c08b5b75fe43826198bab44206a235fc4af4ffc7a40bc684cf817fc71R484-R503


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r781714152



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);
+            Matcher matcher = pattern.matcher(value);
+            return matcher.find();   
+        }
+
+        /**
+         * The negative of {@code regex} is {@link #notRegex}.
+         */
+        @Override
+        public Text negate() {
+            return notRegex;
+        }
+    },
+    /**
+     * Evaluates if the first string does not have a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    notRegex {
+        @Override
+        public boolean test(final String value, final String regex) {
+	    Pattern pattern = Pattern.compile(regex);

Review comment:
       indent, final




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] spmallette commented on pull request #1542: TINKERPOP-2652 Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
spmallette commented on pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#issuecomment-1055331926


   I believe all items have been addressed at this point for this PR.  VOTE +1


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782965129



##########
File path: docs/src/reference/the-traversal.asciidoc
##########
@@ -4099,8 +4100,12 @@ The provided predicates are outlined in the table below and are used in various
 | `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
 | `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
 | `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
+| `TextP.regex(string)` | Does the incoming `String` match the regular expression in the provided `String`?

Review comment:
       My main concern with using `TextP.regex()` syntax is that it doesn't disambiguate between the following two definitions:
   1. Returns true if the entire input matches the regex pattern.
   2. Returns true if part of the string matches the regex pattern.
   
   Consider examples:
   
   If we use definition#1
   ```
   Input string "abc"
   RegEx "a"
   Result: false
   ```
   
   If we use definition#2
   ```
   Input string "abc"
   RegEx "a"
   Result: true
   ```
   
   SQL operator LIKE follows definition#1 above. 
   
   OpenCypher uses Java's regular expression [3] which follows definition#1 above.
   
   Sparql regex uses Xpath's fn:matches [1] which follows definition#2 above [2]
   
   In our case, we are using definition#1 similar to OC `~=` and SQL's `LIKE` operator whereas using `TextP.regex()` syntax indicates similarity to SPARQL which points to definition#2. That is the problem in using `TextP.regex()` as the syntax.
   
   Moving ahead, we could either:
   1. Continue using `TextP.regex()` syntax but clarify the behaviour explicitly that we are using definition#1 above and should not be confused with definition#2 used by other languages like SPARQL.
   2. Use `TextP.like()` syntax which is more consistent with user experience.
   
   I would favor option#2 for consistency with other languages and got providing gotchas for users of TinkerPop.
   
   [1] SPARQL uses XPath's fn:matches -> https://www.w3.org/TR/sparql11-query/#func-regex
   [2] *Unless the metacharacters ^ and $ are used as anchors, the string is considered to match the pattern if any substring matches the pattern.* Quoted from https://www.w3.org/TR/xpath-functions/#func-matches
   [3] https://neo4j.com/docs/cypher-manual/current/clauses/where/#query-where-regex

##########
File path: docs/src/reference/the-traversal.asciidoc
##########
@@ -4099,8 +4100,12 @@ The provided predicates are outlined in the table below and are used in various
 | `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
 | `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
 | `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
+| `TextP.regex(string)` | Does the incoming `String` match the regular expression in the provided `String`?

Review comment:
       Please add more clarification on whether the entire string should match the pattern or not, e.g. LIKE operator in SQL requires the entire string matches the pattern.
   
   Suggested change:
   
   "Does the regular expression match the entire incoming string"




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782134860



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       Pattern compilation is expensive. I am wondering if we need to maintain a LRU cache. In most practical scenarios, it is highly unlikely that user sends a different pattern with every query.
   
   For reference, an benchmark is provided at https://www.baeldung.com/java-regex-pre-compile 

##########
File path: gremlin-test/features/filter/Has.feature
##########
@@ -514,6 +514,33 @@ Feature: Step - has()
       | v[ripple] |
       | v[peter] |
 
+ Scenario: g_V_hasXname_regexXrMarXX
+    Given the modern graph
+    And the traversal of
+      """
+      g.V().has("name", TextP.regex("^mar"))

Review comment:
       please add some tests using unicode characters as well to demonstrate that the implementation works for it.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       Using
   
   Pattern.compile followed by Matcher matcher = pattern.matcher(value); is equivalent to using `String.matches(regex)`. We might was well use the later since it simplifies code.
   
   The Pattern + Matcher approach is better if we plan to re-use the compiled pattern.

##########
File path: docs/src/reference/the-traversal.asciidoc
##########
@@ -4099,8 +4100,12 @@ The provided predicates are outlined in the table below and are used in various
 | `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
 | `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
 | `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
+| `TextP.regex(string)` | Does the incoming `String` match the regular expression in the provided `String`?

Review comment:
       I have a question on the name of the predicate. Please feel free to point me to the email thread is a discussion has already taken place.
   
   While adding new syntax to Gremlin, we should try to maintain consistency with other query languages / programming languages. This will help reduce the learning curve for users of Gremlin.
   
   Keeping the above tenet in mind for adding new syntax, I would favour a more widely used syntax for regex match such as:
   
   TextP.like() -> used by SQL
   or
   TextP.match() -> used by Java / Python
   
   I would favor the former (matches()/match()).
   
   Also, we don't need a special negation syntax, since the user can do `P.not(TextP.regex())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782543142



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       I agree with Divij that Pattern.compile is too costly to do on every solution flowing through the predicate. You don't need an LRU to fix the problem though, just use a new Predicate instance for each regex filter in the query instead of reusing a single Text.regex enum BiPredicate for everything. The RegexPredicate would implement Predicate instead of BiPredicate and take the pattern in the constructor and then just compile it once in the constructor, instead of once per solution at evaluation time. (Obviously this will only work for regex with a constant search pattern, but that should be most of the time.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r781714478



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);
+            Matcher matcher = pattern.matcher(value);
+            return matcher.find();   
+        }
+
+        /**
+         * The negative of {@code regex} is {@link #notRegex}.
+         */
+        @Override
+        public Text negate() {
+            return notRegex;
+        }
+    },
+    /**
+     * Evaluates if the first string does not have a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    notRegex {
+        @Override
+        public boolean test(final String value, final String regex) {
+	    Pattern pattern = Pattern.compile(regex);

Review comment:
       Why not just have notRegex call Text.regex.test




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782543142



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       I agree with Divij that Pattern.compile is too costly to do on every solution flowing through the predicate. You don't need an LRU to fix the problem though, just use a new Predicate instance for each regex filter in the query instead of reusing a single Text.regex enum BiPredicate for everything. The RegexPredicate would take the pattern in the constructor and then just compile it once in the constructor, instead of once per solution at evaluation time. (Obviously this will only work for regex with a constant search pattern, but that should be most of the time.)




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
krlawrence commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782558325



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       Will do and good catch on the `notRegex` - not sure why I did that!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
krlawrence commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782536514



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       As I noted in the PR notes at the top, I did consider this and made a conscious decision to defer implementing an LRU (queue/cache) for a later PR. I would like to get the basic functionality in, see how people use it, and then circle back for any LRU type functionality. We use Caffeine in other parts of the project but I'm not sure if that would be too heavy a solution for this.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] krlawrence commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
krlawrence commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782544056



##########
File path: gremlin-test/features/filter/Has.feature
##########
@@ -514,6 +514,33 @@ Feature: Step - has()
       | v[ripple] |
       | v[peter] |
 
+ Scenario: g_V_hasXname_regexXrMarXX
+    Given the modern graph
+    And the traversal of
+      """
+      g.V().has("name", TextP.regex("^mar"))

Review comment:
       Will do




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] mikepersonick commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
mikepersonick commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782550154



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/TextP.java
##########
@@ -106,4 +106,22 @@ public static TextP containing(final String value) {
     public static TextP notContaining(final String value) {
         return new TextP(Text.notContaining, value);
     }
+    
+    /**           
+     * Determines if String has a match with the given REGEX pattern. 
+     *
+     * @since 3.6.0
+     */
+    public static TextP regex(final String value) {
+        return new TextP(Text.regex, value);

Review comment:
       Per my comment above:
   
   `return new TextP(new RegexPedicate(value), value);`
   
   Where RegexPredicate compiles the pattern in the constructor instead of in the test() method.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1542: Add TextP.regex text predicate

Posted by GitBox <gi...@apache.org>.
divijvaidya commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782969923



##########
File path: gremlin-core/src/test/java/org/apache/tinkerpop/gremlin/language/grammar/TraversalPredicateVisitorTest.java
##########
@@ -90,6 +90,9 @@
                 {"TextP.endingWith('hakuna')", TextP.endingWith("hakuna")},
                 {"TextP.notEndingWith('hakuna')", TextP.notEndingWith("hakuna")},
                 {"TextP.notStartingWith('hakuna')", TextP.notStartingWith("hakuna")},
+                {"TextP.regex('^h')", TextP.regex("^h")},

Review comment:
       Please add the following tests:
   1. a pattern starting with `~` because this is a reserved keyword in Gremlin and we want to ensure the implementation works correctly.
   2. a pattern containing the Java implementation specific flag such as case-insensitive search `(?i)` to demonstrate that our regex works with Java specific flags.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org