You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@couchdb.apache.org by "Jens Alfke (JIRA)" <ji...@apache.org> on 2014/09/11 18:22:35 UTC

[jira] [Created] (COUCHDB-2327) Add string/array prefix match option, for view queries

Jens Alfke created COUCHDB-2327:
-----------------------------------

Summary: Add string/array prefix match option, for view queries
Key: COUCHDB-2327
URL: https://issues.apache.org/jira/browse/COUCHDB-2327
Project: CouchDB
Issue Type: Improvement
Security Level: public (Regular issues)
Components: HTTP Interface
Reporter: Jens Alfke

View querying provides no clean way to match a string prefix The only advice I've seen is to set startkey to the prefix, and endkey to the prefix with "some really high Unicode character" appended, which is a total kludge*.

There's a similar issue with matching an array prefix, e.g. "all keys that start with [2014, ...]". Here the solution is less kludgy (append a "{}" to the endkey) but it's still very unintuitive to people learning CouchDB. I've had to explain it to newbies many times.

I suggest adding an explicit query option to enable prefix matching. This doesn't need to mess with the actual query engine — all it has to do is modify the endkey by appending an appropriate Unicode character (in the string case) or empty object (in the array case.) If no `endkey` is given it will be based on the `startkey`.

I've already implemented a comparable feature for Couchbase Lite:
https://github.com/couchbase/couchbase-lite-ios/wiki/Query-Enhancements#prefix-matching

Note that I made the `prefix_match` parameter an integer, not a boolean. This is to support cases where you want to match a prefix of a _nested component_ of the key, for example "all keys in 2014 whose product name starts with 'f'", where the startkey would be [2014, "f"] and the prefix_match would be 2 to indicate that it's the nested string that should be prefix-matched not the array. But in the common case you'd just set the value to 1 to indicate that the top level key should be prefix-matched.

* Why is adding "some high Unicode character" a kludge? Because Unicode is so complicated and so inconsistently implemented. Doing this immediately opens the possibility of weird Unicode issues in your development language's string type, in its HTTP client library, and in Erlang's equivalents on the server side. Not to mention the swamp that is the Unicode specification itself — for instance, I've seen advice to use a character like \uFFFE, which was correct until Unicode went 32-bit, and tended to work alright for a while after that, but will now fail with emoji characters (which are both very commonly used and well outside the 16-bit range.) Actually whether it fails depends on whether your string implementation operates on UTF-16 (very common) or true Unicode code points. Like I said, it's a kludge.

--
This message was sent by Atlassian JIRA
(v6.3.4#6332)