You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@apr.apache.org by Stefan Sperling <st...@stsp.name> on 2011/12/06 13:36:30 UTC
[PATCH] charclass matching and input check for fnmatch
The patch below adds character class matching to fnmatch.
This code was originally written for OpenBSD by Todd C. Miller.
It also adds a sanity check on the input size which was suggested
by Miod Vallat. fnmatch(3) is used to match path names so patterns
or strings longer than PATH_MAX don't need to be processed.
I've committed APR's fnmatch implementation to OpenBSD, with these
changes merged in, see
http://www.openbsd.org/cgi-bin/cvsweb/src/lib/libc/gen/fnmatch.c
Thanks again Bill for providing this under a BSD licence.
It is likely that we'll apply some intending changes on the OpenBSD side.
Would these also be welcome at APR so we can stay in sync more easily?
[[[
Add character class matching to fnmatch().
Don't try to match patterns or strings longer than APR_PATH_MAX.
Both changes were obtained from OpenBSD.
Submitted by: Stefan Sperling <st...@apache.org>
]]]
Index: strings/charclass.h
===================================================================
--- strings/charclass.h (revision 0)
+++ strings/charclass.h (working copy)
@@ -0,0 +1,29 @@
+/*
+ * Public domain, 2008, Todd C. Miller <To...@courtesan.com>
+ *
+ * $OpenBSD: charclass.h,v 1.1 2008/10/01 23:04:13 millert Exp $
+ */
+
+/*
+ * POSIX character class support for fnmatch() and glob().
+ */
+static struct cclass {
+ const char *name;
+ int (*isctype)(int);
+} cclasses[] = {
+ { "alnum", isalnum },
+ { "alpha", isalpha },
+ { "blank", isblank },
+ { "cntrl", iscntrl },
+ { "digit", isdigit },
+ { "graph", isgraph },
+ { "lower", islower },
+ { "print", isprint },
+ { "punct", ispunct },
+ { "space", isspace },
+ { "upper", isupper },
+ { "xdigit", isxdigit },
+ { NULL, NULL }
+};
+
+#define NCCLASSES (sizeof(cclasses) / sizeof(cclasses[0]) - 1)
Index: strings/apr_fnmatch.c
===================================================================
--- strings/apr_fnmatch.c (revision 1210859)
+++ strings/apr_fnmatch.c (working copy)
@@ -14,6 +14,21 @@
* limitations under the License.
*/
+/*
+ * Copyright (c) 2008 Todd C. Miller <mi...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
/* Derived from The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2008
* as described in;
@@ -63,7 +78,52 @@
# include <ctype.h>
#endif
+#include "charclass.h"
+#define RANGE_MATCH 1
+#define RANGE_NOMATCH 0
+#define RANGE_ERROR (-1)
+
+static int
+classmatch(const char *pattern, char test, int foldcase, const char **ep)
+{
+ struct cclass *cc;
+ const char *colon;
+ size_t len;
+ int rval = RANGE_NOMATCH;
+ const char * const mismatch = pattern;
+
+ if (*pattern != '[' || pattern[1] != ':') {
+ *ep = mismatch;
+ return(RANGE_ERROR);
+ }
+
+ pattern += 2;
+
+ if ((colon = strchr(pattern, ':')) == NULL || colon[1] != ']') {
+ *ep = mismatch;
+ return(RANGE_ERROR);
+ }
+ *ep = colon + 2;
+ len = (size_t)(colon - pattern);
+
+ if (foldcase && strncmp(pattern, "upper:]", 7) == 0)
+ pattern = "lower:]";
+ for (cc = cclasses; cc->name != NULL; cc++) {
+ if (!strncmp(pattern, cc->name, len) && cc->name[len] == '\0') {
+ if (cc->isctype((unsigned char)test))
+ rval = RANGE_MATCH;
+ break;
+ }
+ }
+ if (cc->name == NULL) {
+ /* invalid character class, treat as normal text */
+ *ep = mismatch;
+ rval = RANGE_ERROR;
+ }
+ return(rval);
+}
+
/* Most MBCS/collation/case issues handled here. Wildcard '*' is not handled.
* EOS '\0' and the FNM_PATHNAME '/' delimiters are not advanced over,
* however the "\/" sequence is advanced to '/'.
@@ -115,6 +175,13 @@ static APR_INLINE int fnmatch_ch(const char **patt
if (slash && (**pattern == '/'))
break;
+ /* Match character classes. */
+ if (classmatch(*pattern, **string, nocase, pattern)
+ == RANGE_MATCH) {
+ result = 0;
+ continue;
+ }
+
leadingclosebrace:
/* Look at only well-formed range patterns;
* "x-]" is not allowed unless escaped ("x-\]")
@@ -207,6 +274,10 @@ APR_DECLARE(int) apr_fnmatch(const char *pattern,
const char *mismatch = NULL;
int matchlen = 0;
+ if (strnlen(pattern, APR_PATH_MAX) == APR_PATH_MAX ||
+ strnlen(string, APR_PATH_MAX) == APR_PATH_MAX)
+ return (APR_FNM_NOMATCH);
+
if (*pattern == '*')
goto firstsegment;
Re: [PATCH] charclass matching and input check for fnmatch
Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 12/6/2011 4:50 AM, Stefan Sperling wrote:
> On Tue, Dec 06, 2011 at 01:36:30PM +0100, Stefan Sperling wrote:
>> It is likely that we'll apply some intending changes on the OpenBSD side.
>
> I meant of course *indentation* changes :)
The issue is that apr has its own style guide.
Organization, bracketing etc should stay the same.
Whitespace diffs are a noop, just diff --ignore-all-whitespace.
There's no need to force apr conventions upon bsd or visa versa.
Re: [PATCH] charclass matching and input check for fnmatch
Posted by Stefan Sperling <st...@stsp.name>.
On Tue, Dec 06, 2011 at 01:36:30PM +0100, Stefan Sperling wrote:
> It is likely that we'll apply some intending changes on the OpenBSD side.
I meant of course *indentation* changes :)
Re: [PATCH] charclass matching and input check for fnmatch
Posted by Stefan Sperling <st...@stsp.name>.
Oops, hit 'send' to early and forgot to put the charclass.h
copyright notice into LICENSE in the last diff I sent.
Index: strings/charclass.h
===================================================================
--- strings/charclass.h (revision 0)
+++ strings/charclass.h (working copy)
@@ -0,0 +1,29 @@
+/*
+ * Public domain, 2008, Todd C. Miller <To...@courtesan.com>
+ *
+ * $OpenBSD: charclass.h,v 1.1 2008/10/01 23:04:13 millert Exp $
+ */
+
+/*
+ * POSIX character class support for fnmatch() and glob().
+ */
+static struct cclass {
+ const char *name;
+ int (*isctype)(int);
+} cclasses[] = {
+ { "alnum", isalnum },
+ { "alpha", isalpha },
+ { "blank", isblank },
+ { "cntrl", iscntrl },
+ { "digit", isdigit },
+ { "graph", isgraph },
+ { "lower", islower },
+ { "print", isprint },
+ { "punct", ispunct },
+ { "space", isspace },
+ { "upper", isupper },
+ { "xdigit", isxdigit },
+ { NULL, NULL }
+};
+
+#define NCCLASSES (sizeof(cclasses) / sizeof(cclasses[0]) - 1)
Index: strings/apr_fnmatch.c
===================================================================
--- strings/apr_fnmatch.c (revision 1210859)
+++ strings/apr_fnmatch.c (working copy)
@@ -14,6 +14,21 @@
* limitations under the License.
*/
+/*
+ * Copyright (c) 2008 Todd C. Miller <mi...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
/* Derived from The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2008
* as described in;
@@ -63,7 +78,52 @@
# include <ctype.h>
#endif
+#include "charclass.h"
+#define RANGE_MATCH 1
+#define RANGE_NOMATCH 0
+#define RANGE_ERROR (-1)
+
+static int
+classmatch(const char *pattern, char test, int foldcase, const char **ep)
+{
+ struct cclass *cc;
+ const char *colon;
+ size_t len;
+ int rval = RANGE_NOMATCH;
+ const char * const mismatch = pattern;
+
+ if (*pattern != '[' || pattern[1] != ':') {
+ *ep = mismatch;
+ return(RANGE_ERROR);
+ }
+
+ pattern += 2;
+
+ if ((colon = strchr(pattern, ':')) == NULL || colon[1] != ']') {
+ *ep = mismatch;
+ return(RANGE_ERROR);
+ }
+ *ep = colon + 2;
+ len = (size_t)(colon - pattern);
+
+ if (foldcase && strncmp(pattern, "upper:]", 7) == 0)
+ pattern = "lower:]";
+ for (cc = cclasses; cc->name != NULL; cc++) {
+ if (!strncmp(pattern, cc->name, len) && cc->name[len] == '\0') {
+ if (cc->isctype((unsigned char)test))
+ rval = RANGE_MATCH;
+ break;
+ }
+ }
+ if (cc->name == NULL) {
+ /* invalid character class, treat as normal text */
+ *ep = mismatch;
+ rval = RANGE_ERROR;
+ }
+ return(rval);
+}
+
/* Most MBCS/collation/case issues handled here. Wildcard '*' is not handled.
* EOS '\0' and the FNM_PATHNAME '/' delimiters are not advanced over,
* however the "\/" sequence is advanced to '/'.
@@ -115,6 +175,13 @@ static APR_INLINE int fnmatch_ch(const char **patt
if (slash && (**pattern == '/'))
break;
+ /* Match character classes. */
+ if (classmatch(*pattern, **string, nocase, pattern)
+ == RANGE_MATCH) {
+ result = 0;
+ continue;
+ }
+
leadingclosebrace:
/* Look at only well-formed range patterns;
* "x-]" is not allowed unless escaped ("x-\]")
Index: LICENSE
===================================================================
--- LICENSE (revision 1210859)
+++ LICENSE (working copy)
@@ -209,8 +209,8 @@ separate copyright notices and license terms. Your
code for these subcomponents is subject to the terms and conditions
of the following licenses.
-From strings/apr_fnmatch.c, include/apr_fnmatch.h, misc/unix/getopt.c,
-file_io/unix/mktemp.c, strings/apr_strings.c:
+From include/apr_fnmatch.h, misc/unix/getopt.c, file_io/unix/mktemp.c,
+strings/apr_strings.c:
/*
* Copyright (c) 1987, 1993, 1994
@@ -338,4 +338,22 @@ From strings/apr_snprintf.c:
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
+From strings/apr_fnmatch.c:
+ * Copyright (c) 2008 Todd C. Miller <mi...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+
+From strings/charclass.h:
+
+ Public domain, 2008, Todd C. Miller <To...@courtesan.com>
Re: [PATCH] charclass matching and input check for fnmatch
Posted by Stefan Sperling <st...@stsp.name>.
On Wed, Dec 07, 2011 at 03:38:05PM -0800, William A. Rowe Jr. wrote:
> On 12/7/2011 1:12 PM, Stefan Sperling wrote:
> > Can the classmatch bits go in without this strnlen check in the meantime?
>
> As long as the appropriate change is made to NOTICE, I am +1,
> the license appears group-A fully compatible.
You mean the LICENSE file, not NOTICE, right?
The ISC licence has no advertising clause.
And I just noted that LICENSE still lists the old Berkeley copyright
notice for apr_fnmatch.c, from before your rewrite.
Index: strings/charclass.h
===================================================================
--- strings/charclass.h (revision 0)
+++ strings/charclass.h (working copy)
@@ -0,0 +1,29 @@
+/*
+ * Public domain, 2008, Todd C. Miller <To...@courtesan.com>
+ *
+ * $OpenBSD: charclass.h,v 1.1 2008/10/01 23:04:13 millert Exp $
+ */
+
+/*
+ * POSIX character class support for fnmatch() and glob().
+ */
+static struct cclass {
+ const char *name;
+ int (*isctype)(int);
+} cclasses[] = {
+ { "alnum", isalnum },
+ { "alpha", isalpha },
+ { "blank", isblank },
+ { "cntrl", iscntrl },
+ { "digit", isdigit },
+ { "graph", isgraph },
+ { "lower", islower },
+ { "print", isprint },
+ { "punct", ispunct },
+ { "space", isspace },
+ { "upper", isupper },
+ { "xdigit", isxdigit },
+ { NULL, NULL }
+};
+
+#define NCCLASSES (sizeof(cclasses) / sizeof(cclasses[0]) - 1)
Index: strings/apr_fnmatch.c
===================================================================
--- strings/apr_fnmatch.c (revision 1210859)
+++ strings/apr_fnmatch.c (working copy)
@@ -14,6 +14,21 @@
* limitations under the License.
*/
+/*
+ * Copyright (c) 2008 Todd C. Miller <mi...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
+ */
/* Derived from The Open Group Base Specifications Issue 7, IEEE Std 1003.1-2008
* as described in;
@@ -63,7 +78,52 @@
# include <ctype.h>
#endif
+#include "charclass.h"
+#define RANGE_MATCH 1
+#define RANGE_NOMATCH 0
+#define RANGE_ERROR (-1)
+
+static int
+classmatch(const char *pattern, char test, int foldcase, const char **ep)
+{
+ struct cclass *cc;
+ const char *colon;
+ size_t len;
+ int rval = RANGE_NOMATCH;
+ const char * const mismatch = pattern;
+
+ if (*pattern != '[' || pattern[1] != ':') {
+ *ep = mismatch;
+ return(RANGE_ERROR);
+ }
+
+ pattern += 2;
+
+ if ((colon = strchr(pattern, ':')) == NULL || colon[1] != ']') {
+ *ep = mismatch;
+ return(RANGE_ERROR);
+ }
+ *ep = colon + 2;
+ len = (size_t)(colon - pattern);
+
+ if (foldcase && strncmp(pattern, "upper:]", 7) == 0)
+ pattern = "lower:]";
+ for (cc = cclasses; cc->name != NULL; cc++) {
+ if (!strncmp(pattern, cc->name, len) && cc->name[len] == '\0') {
+ if (cc->isctype((unsigned char)test))
+ rval = RANGE_MATCH;
+ break;
+ }
+ }
+ if (cc->name == NULL) {
+ /* invalid character class, treat as normal text */
+ *ep = mismatch;
+ rval = RANGE_ERROR;
+ }
+ return(rval);
+}
+
/* Most MBCS/collation/case issues handled here. Wildcard '*' is not handled.
* EOS '\0' and the FNM_PATHNAME '/' delimiters are not advanced over,
* however the "\/" sequence is advanced to '/'.
@@ -115,6 +175,13 @@ static APR_INLINE int fnmatch_ch(const char **patt
if (slash && (**pattern == '/'))
break;
+ /* Match character classes. */
+ if (classmatch(*pattern, **string, nocase, pattern)
+ == RANGE_MATCH) {
+ result = 0;
+ continue;
+ }
+
leadingclosebrace:
/* Look at only well-formed range patterns;
* "x-]" is not allowed unless escaped ("x-\]")
Index: LICENSE
===================================================================
--- LICENSE (revision 1210859)
+++ LICENSE (working copy)
@@ -209,8 +209,8 @@ separate copyright notices and license terms. Your
code for these subcomponents is subject to the terms and conditions
of the following licenses.
-From strings/apr_fnmatch.c, include/apr_fnmatch.h, misc/unix/getopt.c,
-file_io/unix/mktemp.c, strings/apr_strings.c:
+From include/apr_fnmatch.h, misc/unix/getopt.c, file_io/unix/mktemp.c,
+strings/apr_strings.c:
/*
* Copyright (c) 1987, 1993, 1994
@@ -338,4 +338,18 @@ From strings/apr_snprintf.c:
ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
POSSIBILITY OF SUCH DAMAGE.
+From strings/apr_fnmatch.c:
+ * Copyright (c) 2008 Todd C. Miller <mi...@openbsd.org>
+ *
+ * Permission to use, copy, modify, and distribute this software for any
+ * purpose with or without fee is hereby granted, provided that the above
+ * copyright notice and this permission notice appear in all copies.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES
+ * WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR
+ * ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES
+ * WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN
+ * ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF
+ * OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE.
Re: [PATCH] charclass matching and input check for fnmatch
Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 12/7/2011 1:12 PM, Stefan Sperling wrote:
> On Tue, Dec 06, 2011 at 12:48:26PM -0800, William A. Rowe Jr. wrote:
>> On 12/6/2011 4:36 AM, Stefan Sperling wrote:
>>> @@ -207,6 +274,10 @@ APR_DECLARE(int) apr_fnmatch(const char *pattern,
>>> const char *mismatch = NULL;
>>> int matchlen = 0;
>>>
>>> + if (strnlen(pattern, APR_PATH_MAX) == APR_PATH_MAX ||
>>> + strnlen(string, APR_PATH_MAX) == APR_PATH_MAX)
>>> + return (APR_FNM_NOMATCH);
>>> +
>>
>> I'm not certain if we can presume strnlen() without a feature
>> test? Also I'd benchmark this against comparing the offset to
>> a max offset in the code... numeric vs string pre-counting.
>
> Fair enough. I'll look into that.
>
> Can the classmatch bits go in without this strnlen check in the meantime?
As long as the appropriate change is made to NOTICE, I am +1,
the license appears group-A fully compatible.
Re: [PATCH] charclass matching and input check for fnmatch
Posted by Stefan Sperling <st...@stsp.name>.
On Tue, Dec 06, 2011 at 12:48:26PM -0800, William A. Rowe Jr. wrote:
> On 12/6/2011 4:36 AM, Stefan Sperling wrote:
> > @@ -207,6 +274,10 @@ APR_DECLARE(int) apr_fnmatch(const char *pattern,
> > const char *mismatch = NULL;
> > int matchlen = 0;
> >
> > + if (strnlen(pattern, APR_PATH_MAX) == APR_PATH_MAX ||
> > + strnlen(string, APR_PATH_MAX) == APR_PATH_MAX)
> > + return (APR_FNM_NOMATCH);
> > +
>
> I'm not certain if we can presume strnlen() without a feature
> test? Also I'd benchmark this against comparing the offset to
> a max offset in the code... numeric vs string pre-counting.
Fair enough. I'll look into that.
Can the classmatch bits go in without this strnlen check in the meantime?
Re: [PATCH] charclass matching and input check for fnmatch
Posted by "William A. Rowe Jr." <wr...@rowe-clan.net>.
On 12/6/2011 4:36 AM, Stefan Sperling wrote:
> @@ -207,6 +274,10 @@ APR_DECLARE(int) apr_fnmatch(const char *pattern,
> const char *mismatch = NULL;
> int matchlen = 0;
>
> + if (strnlen(pattern, APR_PATH_MAX) == APR_PATH_MAX ||
> + strnlen(string, APR_PATH_MAX) == APR_PATH_MAX)
> + return (APR_FNM_NOMATCH);
> +
I'm not certain if we can presume strnlen() without a feature
test? Also I'd benchmark this against comparing the offset to
a max offset in the code... numeric vs string pre-counting.