You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@tinkerpop.apache.org by GitBox <gi...@apache.org> on 2022/01/11 13:43:07 UTC

[GitHub] [tinkerpop] divijvaidya commented on a change in pull request #1542: Add TextP.regex text predicate

divijvaidya commented on a change in pull request #1542:
URL: https://github.com/apache/tinkerpop/pull/1542#discussion_r782134860



##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       Pattern compilation is expensive. I am wondering if we need to maintain a LRU cache. In most practical scenarios, it is highly unlikely that user sends a different pattern with every query.
   
   For reference, an benchmark is provided at https://www.baeldung.com/java-regex-pre-compile 

##########
File path: gremlin-test/features/filter/Has.feature
##########
@@ -514,6 +514,33 @@ Feature: Step - has()
       | v[ripple] |
       | v[peter] |
 
+ Scenario: g_V_hasXname_regexXrMarXX
+    Given the modern graph
+    And the traversal of
+      """
+      g.V().has("name", TextP.regex("^mar"))

Review comment:
       please add some tests using unicode characters as well to demonstrate that the implementation works for it.

##########
File path: gremlin-core/src/main/java/org/apache/tinkerpop/gremlin/process/traversal/Text.java
##########
@@ -29,6 +31,48 @@
  */
 public enum Text implements BiPredicate<String, String> {
 
+    /**
+     * Evaluates if the first string has a regex match with the second (pattern).
+     *
+     * @since 3.6.0
+     */
+    regex {
+        @Override
+        public boolean test(final String value, final String regex) {
+            Pattern pattern = Pattern.compile(regex);

Review comment:
       Using
   
   Pattern.compile followed by Matcher matcher = pattern.matcher(value); is equivalent to using `String.matches(regex)`. We might was well use the later since it simplifies code.
   
   The Pattern + Matcher approach is better if we plan to re-use the compiled pattern.

##########
File path: docs/src/reference/the-traversal.asciidoc
##########
@@ -4099,8 +4100,12 @@ The provided predicates are outlined in the table below and are used in various
 | `TextP.notStartingWith(string)` | Does the incoming `String` not start with the provided `String`?
 | `TextP.notEndingWith(string)` | Does the incoming `String` not end with the provided `String`?
 | `TextP.notContaining(string)` | Does the incoming `String` not contain the provided `String`?
+| `TextP.regex(string)` | Does the incoming `String` match the regular expression in the provided `String`?

Review comment:
       I have a question on the name of the predicate. Please feel free to point me to the email thread is a discussion has already taken place.
   
   While adding new syntax to Gremlin, we should try to maintain consistency with other query languages / programming languages. This will help reduce the learning curve for users of Gremlin.
   
   Keeping the above tenet in mind for adding new syntax, I would favour a more widely used syntax for regex match such as:
   
   TextP.like() -> used by SQL
   or
   TextP.match() -> used by Java / Python
   
   I would favor the former (matches()/match()).
   
   Also, we don't need a special negation syntax, since the user can do `P.not(TextP.regex())`




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@tinkerpop.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org