You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@couchdb.apache.org by Constantin Teodorescu <br...@gmail.com> on 2016/07/30 12:17:41 UTC

Mango full text search is immune to accented letters?

Is Mango Full text indexer/search (or would it be) immune for accented
letters?

I'm planning to use it for searching "posta" but it may be "poştă" in
documents!
SQLite3 FTS4 is able to do that!

For the moment I'm using CouchDB 1.6 views with explicit "flatten function"
in JavaScript to create a non-accented index:

  var translate_re = /[ŞȘŢȚÎĂÂÁşșţțîăâá]/g,
      translate = {
        'Ş': 'S', 'ş': 's',
        'Ș': 'S', 'ș': 's',
        'Ţ': 'T', 'ţ': 't',
        'Ț': 'T', 'ț': 't',
        'Ă': 'A', 'ă': 'a',
        'Â': 'A', 'â': 'a',
        'Á': 'A', 'á': 'a',
        'Î': 'I', 'î': 'i'
      };

    function makeSearchString(s) {
        return ( s.replace(translate_re, function(match) {
          return translate[match];
        }) );
    }

Teo

Re: Mango full text search is immune to accented letters?

Posted by Robert Newson <rn...@apache.org>.
The backend of mango FT is Lucene and certainly handles accented characters. It all comes down to which analyser you are using. 

Sent from my iPhone

> On 30 Jul 2016, at 13:17, Constantin Teodorescu <br...@gmail.com> wrote:
> 
> Is Mango Full text indexer/search (or would it be) immune for accented
> letters?
> 
> I'm planning to use it for searching "posta" but it may be "poştă" in
> documents!
> SQLite3 FTS4 is able to do that!
> 
> For the moment I'm using CouchDB 1.6 views with explicit "flatten function"
> in JavaScript to create a non-accented index:
> 
>  var translate_re = /[ŞȘŢȚÎĂÂÁşșţțîăâá]/g,
>      translate = {
>        'Ş': 'S', 'ş': 's',
>        'Ș': 'S', 'ș': 's',
>        'Ţ': 'T', 'ţ': 't',
>        'Ț': 'T', 'ț': 't',
>        'Ă': 'A', 'ă': 'a',
>        'Â': 'A', 'â': 'a',
>        'Á': 'A', 'á': 'a',
>        'Î': 'I', 'î': 'i'
>      };
> 
>    function makeSearchString(s) {
>        return ( s.replace(translate_re, function(match) {
>          return translate[match];
>        }) );
>    }
> 
> Teo


Re: Mango full text search is immune to accented letters?

Posted by Constantin Teodorescu <br...@gmail.com>.
On Mon, Aug 1, 2016 at 7:50 PM, Tony Sun <to...@gmail.com> wrote:

> Hey Teo,
>
>  Were you able to get Mango text search working? Specifying the analyzer
> get's a little tricky.
>

No Tony, thanks for asking!
I realised that there is another process that it handles Lucene search,
just like in the old "CouchDB Lucene"!
We have used "CouchDB Lucene" 3 years ago, with very, very good results but
I thought that CouchDB 2.0 included an Erlang port of Lucene, like Riak did
for full text search!

For the moment I decided to wait some more time for CouchDB 2.0 to
stabilise, including howto's documentation for installing! :-)

At the begining, I was somehow excited reading about Mango indexes.
Now I understood that "json" type indexes are basically views where map
functions are probably defined in Erlang.
Today I discovered that I cannot use "sort" in a query if the column is not
present in the index fields ... another deception! :-(
I believed that the query executor is using a Mango index then it is able
to do a merge sort from shards, like any SQL database is doing! :-D

Teo

Re: Mango full text search is immune to accented letters?

Posted by Tony Sun <to...@gmail.com>.
Hey Teo,

   Were you able to get Mango text search working? Specifying the analyzer
get's a little tricky.

On Sat, Jul 30, 2016 at 5:17 AM, Constantin Teodorescu <br...@gmail.com>
wrote:

> Is Mango Full text indexer/search (or would it be) immune for accented
> letters?
>
> I'm planning to use it for searching "posta" but it may be "poştă" in
> documents!
> SQLite3 FTS4 is able to do that!
>
> For the moment I'm using CouchDB 1.6 views with explicit "flatten function"
> in JavaScript to create a non-accented index:
>
>   var translate_re = /[ŞȘŢȚÎĂÂÁşșţțîăâá]/g,
>       translate = {
>         'Ş': 'S', 'ş': 's',
>         'Ș': 'S', 'ș': 's',
>         'Ţ': 'T', 'ţ': 't',
>         'Ț': 'T', 'ț': 't',
>         'Ă': 'A', 'ă': 'a',
>         'Â': 'A', 'â': 'a',
>         'Á': 'A', 'á': 'a',
>         'Î': 'I', 'î': 'i'
>       };
>
>     function makeSearchString(s) {
>         return ( s.replace(translate_re, function(match) {
>           return translate[match];
>         }) );
>     }
>
> Teo
>