You are viewing a plain text version of this content. The canonical link for it is here.

Posted to user@couchdb.apache.org by Ladislav Prskavec <la...@prskavec.net> on 2011/05/31 12:37:12 UTC

Problem with czech alphabet in View_Collation

I have simple view:

function(doc) {
if (doc.parentId == "0") {
emit(doc.seoname, [doc.id, doc.seourl, doc.rank]);
 }
}

and in results:

{"id":"S0001","key":"Brno Business School","value":["S0001","brno-business-school",5]}, {"id":"41000","key":"\u010desk\u00e1 zem\u011bd\u011blsk\u00e1 univerzita v Praze","value":["41000","ceska-zemedelska-univerzita-v-praze",5]}, {"id":"21000","key":"\u010cesk\u00e9 vysok\u00e9 u\u010den\u00ed technick\u00e9 v Praze","value":["21000","ceske-vysoke-uceni-technicke-v-praze",5]}, {"id":"7D000","key":"CEVRO Institut, o. p. s.","value":["7D000","cevro-institut-o-p-s",5]}, {"id":"S0003","key":"CMC Graduate School of Business o.p.s.","value":["S0003","cmc-graduate-school-of-business-o-p-s",5]},

I have problem with \u010 (Č) is before C, but in alphabet we have C, Č order.
It's bug? ICU can solve this.

Thanks for help.

Re: Problem with czech alphabet in View_Collation

Posted by Paul Davis <pa...@gmail.com>.

Ticket here: https://issues.apache.org/jira/browse/COUCHDB-158

On Wed, Jun 1, 2011 at 11:12 AM, Paul Davis <pa...@gmail.com> wrote:
> I already did that once. And someone already ported app engine to
> CouchDB. And then no one cared. The patch itself isn't *too* horrid
> but does take some mucking around. Pretty sure there's already a
> ticket for it.
>
> On Wed, Jun 1, 2011 at 3:32 AM, Robert Newson <ro...@gmail.com> wrote:
>> That would be a neat enhancement, worth a ticket, imo.
>>
>> On 1 June 2011 03:42, Jason Smith <jh...@iriscouch.com> wrote:
>>> On Wed, Jun 1, 2011 at 12:42 AM, Paul Davis <pa...@gmail.com> wrote:
>>>> Now that we've moved to using NIF's I had been contemplating rewriting
>>>> the ICU driver as a NIF to see if there were any performance
>>>> differences. As part of that I would investigate the ability to pass
>>>> in these tailoring bits to allow people to do fancier ICU collation
>>>> that's been requested a couple times.
>>>
>>> Perhaps this is the wrong place to do this, but a feature I'd like to
>>> see is specifying sort direction of keys in an array. This would give
>>> AFAIK 100% compatibility with the Google App Engine indexing system;
>>> thus in principle App Engine apps could be ported to CouchDB.
>>>
>>> emit([doc.first_name, doc.last_name, doc.age], 1);
>>>
>>> It is not possible for first names to sort ascending, last names to
>>> sort descending, and age to sort ascending again. The example may seem
>>> contrived but it happens all the time.
>>>
>>> --
>>> Iris Couch
>>>
>>
>

Re: Problem with czech alphabet in View_Collation

Posted by Paul Davis <pa...@gmail.com>.

I already did that once. And someone already ported app engine to
CouchDB. And then no one cared. The patch itself isn't *too* horrid
but does take some mucking around. Pretty sure there's already a
ticket for it.

On Wed, Jun 1, 2011 at 3:32 AM, Robert Newson <ro...@gmail.com> wrote:
> That would be a neat enhancement, worth a ticket, imo.
>
> On 1 June 2011 03:42, Jason Smith <jh...@iriscouch.com> wrote:
>> On Wed, Jun 1, 2011 at 12:42 AM, Paul Davis <pa...@gmail.com> wrote:
>>> Now that we've moved to using NIF's I had been contemplating rewriting
>>> the ICU driver as a NIF to see if there were any performance
>>> differences. As part of that I would investigate the ability to pass
>>> in these tailoring bits to allow people to do fancier ICU collation
>>> that's been requested a couple times.
>>
>> Perhaps this is the wrong place to do this, but a feature I'd like to
>> see is specifying sort direction of keys in an array. This would give
>> AFAIK 100% compatibility with the Google App Engine indexing system;
>> thus in principle App Engine apps could be ported to CouchDB.
>>
>> emit([doc.first_name, doc.last_name, doc.age], 1);
>>
>> It is not possible for first names to sort ascending, last names to
>> sort descending, and age to sort ascending again. The example may seem
>> contrived but it happens all the time.
>>
>> --
>> Iris Couch
>>
>

Re: Problem with czech alphabet in View_Collation

Posted by Robert Newson <ro...@gmail.com>.

That would be a neat enhancement, worth a ticket, imo.

On 1 June 2011 03:42, Jason Smith <jh...@iriscouch.com> wrote:
> On Wed, Jun 1, 2011 at 12:42 AM, Paul Davis <pa...@gmail.com> wrote:
>> Now that we've moved to using NIF's I had been contemplating rewriting
>> the ICU driver as a NIF to see if there were any performance
>> differences. As part of that I would investigate the ability to pass
>> in these tailoring bits to allow people to do fancier ICU collation
>> that's been requested a couple times.
>
> Perhaps this is the wrong place to do this, but a feature I'd like to
> see is specifying sort direction of keys in an array. This would give
> AFAIK 100% compatibility with the Google App Engine indexing system;
> thus in principle App Engine apps could be ported to CouchDB.
>
> emit([doc.first_name, doc.last_name, doc.age], 1);
>
> It is not possible for first names to sort ascending, last names to
> sort descending, and age to sort ascending again. The example may seem
> contrived but it happens all the time.
>
> --
> Iris Couch
>

Re: Problem with czech alphabet in View_Collation

Posted by Jason Smith <jh...@iriscouch.com>.

On Wed, Jun 1, 2011 at 12:42 AM, Paul Davis <pa...@gmail.com> wrote:
> Now that we've moved to using NIF's I had been contemplating rewriting
> the ICU driver as a NIF to see if there were any performance
> differences. As part of that I would investigate the ability to pass
> in these tailoring bits to allow people to do fancier ICU collation
> that's been requested a couple times.

Perhaps this is the wrong place to do this, but a feature I'd like to
see is specifying sort direction of keys in an array. This would give
AFAIK 100% compatibility with the Google App Engine indexing system;
thus in principle App Engine apps could be ported to CouchDB.

emit([doc.first_name, doc.last_name, doc.age], 1);

It is not possible for first names to sort ascending, last names to
sort descending, and age to sort ascending again. The example may seem
contrived but it happens all the time.

-- 
Iris Couch

Re: Problem with czech alphabet in View_Collation

Posted by Paul Davis <pa...@gmail.com>.

2011/5/31 Robert Newson <ro...@gmail.com>:
> We already use ICU for collation and the keys you mentioned are
> correctly ordered in UCA order. What's missing is support for custom
> tailoring rules, I think.
>
> B.
>
> 2011/5/31 Ladislav Thon <la...@gmail.com>:
>> IIRC, this isn't supported right now, but will be (might be? :-) ) in the
>> future. See this thread:
>> http://www.mail-archive.com/user@couchdb.apache.org/msg10606.html for
>> previous discussion.
>>
>> LT
>>
>> 2011/5/31 Ladislav Prskavec <la...@prskavec.net>
>>
>>> I have simple view:
>>>
>>> function(doc) {
>>> if (doc.parentId == "0") {
>>> emit(doc.seoname, [doc.id, doc.seourl, doc.rank]);
>>>  }
>>> }
>>>
>>> and in results:
>>>
>>> {"id":"S0001","key":"Brno Business
>>> School","value":["S0001","brno-business-school",5]},
>>> {"id":"41000","key":"\u010desk\u00e1 zem\u011bd\u011blsk\u00e1 univerzita v
>>> Praze","value":["41000","ceska-zemedelska-univerzita-v-praze",5]},
>>> {"id":"21000","key":"\u010cesk\u00e9 vysok\u00e9 u\u010den\u00ed
>>> technick\u00e9 v
>>> Praze","value":["21000","ceske-vysoke-uceni-technicke-v-praze",5]},
>>> {"id":"7D000","key":"CEVRO Institut, o. p.
>>> s.","value":["7D000","cevro-institut-o-p-s",5]}, {"id":"S0003","key":"CMC
>>> Graduate School of Business
>>> o.p.s.","value":["S0003","cmc-graduate-school-of-business-o-p-s",5]},
>>>
>>> I have problem with \u010 (Č) is before C, but in alphabet we have C, Č
>>> order.
>>> It's bug? ICU can solve this.
>>>
>>> Thanks for help.
>>>
>>>
>>>
>>
>

Now that we've moved to using NIF's I had been contemplating rewriting
the ICU driver as a NIF to see if there were any performance
differences. As part of that I would investigate the ability to pass
in these tailoring bits to allow people to do fancier ICU collation
that's been requested a couple times.

Re: Problem with czech alphabet in View_Collation

Posted by Robert Newson <ro...@gmail.com>.

We already use ICU for collation and the keys you mentioned are
correctly ordered in UCA order. What's missing is support for custom
tailoring rules, I think.

B.

2011/5/31 Ladislav Thon <la...@gmail.com>:
> IIRC, this isn't supported right now, but will be (might be? :-) ) in the
> future. See this thread:
> http://www.mail-archive.com/user@couchdb.apache.org/msg10606.html for
> previous discussion.
>
> LT
>
> 2011/5/31 Ladislav Prskavec <la...@prskavec.net>
>
>> I have simple view:
>>
>> function(doc) {
>> if (doc.parentId == "0") {
>> emit(doc.seoname, [doc.id, doc.seourl, doc.rank]);
>>  }
>> }
>>
>> and in results:
>>
>> {"id":"S0001","key":"Brno Business
>> School","value":["S0001","brno-business-school",5]},
>> {"id":"41000","key":"\u010desk\u00e1 zem\u011bd\u011blsk\u00e1 univerzita v
>> Praze","value":["41000","ceska-zemedelska-univerzita-v-praze",5]},
>> {"id":"21000","key":"\u010cesk\u00e9 vysok\u00e9 u\u010den\u00ed
>> technick\u00e9 v
>> Praze","value":["21000","ceske-vysoke-uceni-technicke-v-praze",5]},
>> {"id":"7D000","key":"CEVRO Institut, o. p.
>> s.","value":["7D000","cevro-institut-o-p-s",5]}, {"id":"S0003","key":"CMC
>> Graduate School of Business
>> o.p.s.","value":["S0003","cmc-graduate-school-of-business-o-p-s",5]},
>>
>> I have problem with \u010 (Č) is before C, but in alphabet we have C, Č
>> order.
>> It's bug? ICU can solve this.
>>
>> Thanks for help.
>>
>>
>>
>

Re: Problem with czech alphabet in View_Collation

Posted by Ladislav Thon <la...@gmail.com>.

IIRC, this isn't supported right now, but will be (might be? :-) ) in the
future. See this thread:
http://www.mail-archive.com/user@couchdb.apache.org/msg10606.html for
previous discussion.

LT

2011/5/31 Ladislav Prskavec <la...@prskavec.net>

> I have simple view:
>
> function(doc) {
> if (doc.parentId == "0") {
> emit(doc.seoname, [doc.id, doc.seourl, doc.rank]);
>  }
> }
>
> and in results:
>
> {"id":"S0001","key":"Brno Business
> School","value":["S0001","brno-business-school",5]},
> {"id":"41000","key":"\u010desk\u00e1 zem\u011bd\u011blsk\u00e1 univerzita v
> Praze","value":["41000","ceska-zemedelska-univerzita-v-praze",5]},
> {"id":"21000","key":"\u010cesk\u00e9 vysok\u00e9 u\u010den\u00ed
> technick\u00e9 v
> Praze","value":["21000","ceske-vysoke-uceni-technicke-v-praze",5]},
> {"id":"7D000","key":"CEVRO Institut, o. p.
> s.","value":["7D000","cevro-institut-o-p-s",5]}, {"id":"S0003","key":"CMC
> Graduate School of Business
> o.p.s.","value":["S0003","cmc-graduate-school-of-business-o-p-s",5]},
>
> I have problem with \u010 (Č) is before C, but in alphabet we have C, Č
> order.
> It's bug? ICU can solve this.
>
> Thanks for help.
>
>
>