You are viewing a plain text version of this content. The canonical link for it is here.
Posted to notifications@couchdb.apache.org by GitBox <gi...@apache.org> on 2020/05/27 03:52:48 UTC

[GitHub] [couchdb] davisp edited a comment on issue #2895: In CouchDB Query regex with the caseless (?i) modifier is not case-insensitive for unicode strings

davisp edited a comment on issue #2895:
URL: https://github.com/apache/couchdb/issues/2895#issuecomment-634188167


   Excellent write up. Unfortunately the Erlang shell and its weirdo UTF-8 behavior caused you more pain that it helped. The issue is that Erlang's displaying binaries and interpreting binaries differently that you might expect.
   
   A quick test to see that in action is to print out the actual binary values to see how things have been interpreted (Running on Erlang 22 locally):
   
   Also for clarity, `ö` is `192 186` and `Ö` is `192 150`.
   
   ```erlang
   Forms = [
       <<"xxxöxxx">>,
       <<"xxxöxxx"/utf8>>,
       <<"xxxÖxxx">>,
       <<"xxxÖxxx"/utf8>>,
       <<"(?i)ö">>,
       <<"(?i)ö"/utf8>>
   ],
   lists:foreach(fun(F) ->
       io:format("~w~n", [F])
   end, Forms).
   
   <<120,120,120,246,120,120,120>>
   <<120,120,120,195,182,120,120,120>>
   <<120,120,120,214,120,120,120>>
   <<120,120,120,195,150,120,120,120>>
   <<40,63,105,41,246>>
   <<40,63,105,41,195,182>>
   ```
   
   You'll notice that the `/utf8` flag is merely telling the shell to correctly interpret unicode characters that have been typed into the shell. So the table from above is just showing this albeit in a round about fashion.
   
   Internally since we know everything has gone through Jiffy we know everything is valid UTF8, so it then becomes a question of providing that flag and whether or not its something we should do since it could be a behavior change.
   
   Although, we can also control that flag from the pattern as well, so this works as expected:
   
   ```bash
   curl -u admin:password -X POST http://localhost:5984/test/_find -H "Content-Type: application/json" -d '
   {
     "selector": {
       "data": {
         "$regex": "(*UTF8)(?i)ö"
       }
     }
   }'
   {"docs":[
   {"_id":"1","_rev":"1-96be014c47e090c7705b66c4b646d6f6","data":"xxxöxxx"},
   {"_id":"2","_rev":"1-903d0f61be4c2eda02c1bd72d3ba92bc","data":"xxxÖxxx"}
   ],
   "bookmark": "g1AAAAAyeJzLYWBgYMpgSmHgKy5JLCrJTq2MT8lPzkzJBYozGoEkOGASEKEsAErJDRs",
   "warning": "No matching index found, create an index to optimize query time."}
   
   ```


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org