You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@nuttx.apache.org by GitBox <gi...@apache.org> on 2022/04/14 09:12:05 UTC

[GitHub] [incubator-nuttx] anjiahao1 opened a new pull request, #6067: libc/string: Use Byte-Shift algorithm for very long needles

anjiahao1 opened a new pull request, #6067:
URL: https://github.com/apache/incubator-nuttx/pull/6067

   ## Testing
   pass ci


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] xiaoxiang781216 merged pull request #6067: libc/string: Use Byte-Shift algorithm for very long needles

Posted by GitBox <gi...@apache.org>.
xiaoxiang781216 merged PR #6067:
URL: https://github.com/apache/incubator-nuttx/pull/6067


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] anjiahao1 commented on pull request #6067: libc/string: Use Byte-Shift algorithm for very long needles

Posted by GitBox <gi...@apache.org>.
anjiahao1 commented on PR #6067:
URL: https://github.com/apache/incubator-nuttx/pull/6067#issuecomment-1098990758

   > So this PR only change a license header?
   
   it have 3 commits,the last just change license 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] pkarashchenko commented on pull request #6067: libc/string: Use Byte-Shift algorithm for very long needles

Posted by GitBox <gi...@apache.org>.
pkarashchenko commented on PR #6067:
URL: https://github.com/apache/incubator-nuttx/pull/6067#issuecomment-1099028566

   > > So this PR only change a license header?
   > 
   > it have 3 commits,the last just change license
   
   yeah. seems like I clicked a wrong link initially.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] pkarashchenko commented on a diff in pull request #6067: libc/string: Use Byte-Shift algorithm for very long needles

Posted by GitBox <gi...@apache.org>.
pkarashchenko commented on code in PR #6067:
URL: https://github.com/apache/incubator-nuttx/pull/6067#discussion_r850311171


##########
libs/libc/string/lib_strstr.c:
##########
@@ -25,67 +34,198 @@
 #include <nuttx/config.h>
 
 #include <string.h>
+#include <stdbool.h>
+#include <arpa/inet.h>
+
+/****************************************************************************
+ * Pre-processor Definitions
+ ****************************************************************************/
+
+#define LONG_INT_N_BYTES    sizeof(long)
 
 /****************************************************************************
  * Public Functions
  ****************************************************************************/
 
-#undef strstr /* See mm/README.txt */
-FAR char *strstr(FAR const char *str, FAR const char *substr)
+/* Finds the first occurrence of the sub-string needle in the
+ * string haystack. Returns NULL if needle was not found.
+ */
+
+FAR char *strstr(FAR const char *haystack, FAR const char *needle)
 {
-  FAR const char *candidate; /* Candidate in str with matching start character */
-  char ch;                   /* First character of the substring */
-  size_t len;                /* The length of the substring */
+  FAR const unsigned char *needle_cmp_end;
+  FAR const unsigned char *i_haystack;
+  const char needle_first = *needle;
+  FAR const unsigned char *i_needle;
+  unsigned long last_haystack_chars;
+  unsigned long last_needle_chars;
+  FAR const char *sub_start;
+  size_t needle_cmp_len;
+  bool identical = true;
+  unsigned long mask;
+  size_t compare_len;
+  size_t needle_len;
+
+  if (!*needle)
+    {
+      return (FAR char *)haystack;
+    }
+
+  /* Runs strchr() on the first section of the haystack as it has a lower
+   * algorithmic complexity for discarding the first non-matching characters.
+   */
+
+  haystack = strchr(haystack, needle_first);
+  if (!haystack) /* First character of needle is not in the haystack. */
+    {
+      return NULL;
+    }
 
-  /* Special case the empty substring */
+  /* First characters of haystack and needle are the same now. Both are
+   * guaranteed to be at least one character long.
+   * Now computes the sum of the first needle_len characters of haystack
+   * minus the sum of characters values of needle.
+   */
 
-  len = strlen(substr);
-  ch  = *substr;
+  i_haystack = (FAR const unsigned char *)haystack + 1;
+  i_needle = (FAR const unsigned char *)needle + 1;
 
-  if (!ch)
+  while (*i_haystack && *i_needle)
     {
-      /* We'll say that an empty substring matches at the beginning of
-       * the string
-       */
+      identical &= *i_haystack++ == *i_needle++;
+    }
 
-      return (FAR char *)str;
+  /* i_haystack now references the (needle_len + 1)-th character. */
+
+  if (*i_needle) /* haystack is smaller than needle. */
+    {
+      return NULL;
     }
+  else if (identical)
+    {
+      return (FAR char *) haystack;
+    }
+
+  needle_len = i_needle - (FAR const unsigned char *)needle;
+
+  /* Note:
+   * needle_len > 1, because we checked that it isn't zero, and if it
+   * is 1 then identical must be true because the first strchr() ensured
+   * that the first characters are identical
+   */
+
+  sub_start = haystack;
+  needle_cmp_len = (needle_len < LONG_INT_N_BYTES) ?
+                    needle_len : LONG_INT_N_BYTES;
+
+  needle_cmp_end = i_needle;
 
-  /* Search for the substring */
+  i_needle -= needle_cmp_len;
+  i_haystack -= needle_cmp_len;
+  last_needle_chars = 0;
+  last_haystack_chars = 0;
 
-  candidate = str;
-  for (; ; )
+  while (i_needle != needle_cmp_end)
     {
-      /* strchr() will return a pointer to the next occurrence of the
-       * character ch in the string
+      last_needle_chars <<= 8;
+      last_needle_chars ^= *i_needle++;
+      last_haystack_chars <<= 8;
+      last_haystack_chars ^= *i_haystack++;
+    }
+
+  /* At this point:
+   * needle is at least two characters long
+   * haystack is at least needle_len characters long (also at least two)
+   * the first characters of needle and haystack are identical
+   */
+
+  if (needle_len > LONG_INT_N_BYTES + 1)
+    {
+      /* we will call memcmp() only once we know that the LONG_INT_N_BYTES
+       * last chars are equal, so it will be enough to compare all but the
+       * last LONG_INT_N_BYTES characters
        */
 
-      candidate = strchr(candidate, ch);
-      if (!candidate || strlen(candidate) < len)
-        {
-          /* First character of the substring does not appear in the string
-           * or the remainder of the string is not long enough to contain the
-           * substring.
-           */
+      compare_len = needle_len - LONG_INT_N_BYTES;
 
-          return NULL;
-        }
+      /* iterate through the remainder of haystack while checking for
+       * identity of the last LONG_INT_N_BYTES, and checking the rest
+       * with memcmp()
+       */
 
-      /* Check if this is the beginning of a matching substring */
+      while (*i_haystack)
+        {
+          last_haystack_chars <<= 8;
+          last_haystack_chars ^= *i_haystack++;
+
+          sub_start++;
+          if (last_haystack_chars == last_needle_chars &&
+              memcmp(sub_start, needle, compare_len) == 0)
+            {
+              return (FAR char *)sub_start;
+            }
+        }
+    }
+  else if (needle_len == LONG_INT_N_BYTES + 1)
+    {
+      /* iterate through the remainder of haystack while checking for
+       * identity of the last LONG_INT_N_BYTES as well as the single
+       * additional character, which is the first one
+       */
 
-      if (strncmp(candidate, substr, len) == 0)
+      while (*i_haystack)
         {
-          return (FAR char *)candidate;
+          last_haystack_chars <<= 8;
+          last_haystack_chars ^= *i_haystack++;
+
+          sub_start++;
+          if (last_haystack_chars == last_needle_chars &&
+              *sub_start == needle_first)
+            {
+              return (FAR char *)sub_start;
+            }
         }
+    }
+  else if (needle_len == LONG_INT_N_BYTES)
+    {
+      /* iterate through the remainder of haystack while checking for
+       * identity of the last LONG_INT_N_BYTES characters, which
+       * should exactly match the entire needle
+       */
 
-      /* No, find the next candidate after this one */
+      while (*i_haystack)
+        {
+          last_haystack_chars <<= 8;
+          last_haystack_chars ^= *i_haystack++;
 
-      candidate++;
+          if (last_haystack_chars == last_needle_chars)
+            {
+              return (FAR char *)(i_haystack - needle_len);
+            }
+        }
     }
+  else /* needle_len < LONG_INT_N_BYTES */
+    {
+      mask = (((unsigned long) 1) << (needle_len * 8)) - 1;

Review Comment:
   ```suggestion
         mask = (1lu << (needle_len * 8)) - 1;
   ```



##########
libs/libc/string/lib_strstr.c:
##########
@@ -25,67 +34,198 @@
 #include <nuttx/config.h>
 
 #include <string.h>
+#include <stdbool.h>
+#include <arpa/inet.h>
+
+/****************************************************************************
+ * Pre-processor Definitions
+ ****************************************************************************/
+
+#define LONG_INT_N_BYTES    sizeof(long)
 
 /****************************************************************************
  * Public Functions
  ****************************************************************************/
 
-#undef strstr /* See mm/README.txt */
-FAR char *strstr(FAR const char *str, FAR const char *substr)
+/* Finds the first occurrence of the sub-string needle in the
+ * string haystack. Returns NULL if needle was not found.
+ */
+
+FAR char *strstr(FAR const char *haystack, FAR const char *needle)
 {
-  FAR const char *candidate; /* Candidate in str with matching start character */
-  char ch;                   /* First character of the substring */
-  size_t len;                /* The length of the substring */
+  FAR const unsigned char *needle_cmp_end;
+  FAR const unsigned char *i_haystack;
+  const char needle_first = *needle;
+  FAR const unsigned char *i_needle;
+  unsigned long last_haystack_chars;
+  unsigned long last_needle_chars;
+  FAR const char *sub_start;
+  size_t needle_cmp_len;
+  bool identical = true;
+  unsigned long mask;
+  size_t compare_len;
+  size_t needle_len;
+
+  if (!*needle)
+    {
+      return (FAR char *)haystack;
+    }
+
+  /* Runs strchr() on the first section of the haystack as it has a lower
+   * algorithmic complexity for discarding the first non-matching characters.
+   */
+
+  haystack = strchr(haystack, needle_first);
+  if (!haystack) /* First character of needle is not in the haystack. */
+    {
+      return NULL;
+    }
 
-  /* Special case the empty substring */
+  /* First characters of haystack and needle are the same now. Both are
+   * guaranteed to be at least one character long.
+   * Now computes the sum of the first needle_len characters of haystack
+   * minus the sum of characters values of needle.
+   */
 
-  len = strlen(substr);
-  ch  = *substr;
+  i_haystack = (FAR const unsigned char *)haystack + 1;
+  i_needle = (FAR const unsigned char *)needle + 1;
 
-  if (!ch)
+  while (*i_haystack && *i_needle)
     {
-      /* We'll say that an empty substring matches at the beginning of
-       * the string
-       */
+      identical &= *i_haystack++ == *i_needle++;
+    }
 
-      return (FAR char *)str;
+  /* i_haystack now references the (needle_len + 1)-th character. */
+
+  if (*i_needle) /* haystack is smaller than needle. */
+    {
+      return NULL;
     }
+  else if (identical)
+    {
+      return (FAR char *) haystack;

Review Comment:
   ```suggestion
         return (FAR char *)haystack;
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [incubator-nuttx] pkarashchenko commented on pull request #6067: libc/string: Use Byte-Shift algorithm for very long needles

Posted by GitBox <gi...@apache.org>.
pkarashchenko commented on PR #6067:
URL: https://github.com/apache/incubator-nuttx/pull/6067#issuecomment-1098985192

   So this PR only change a license header?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@nuttx.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org