You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Jörg von Frantzius <jo...@aperto.de> on 2013/02/13 18:39:33 UTC

String normalization or collation?

Hi,

we've got the problem that our SQL2 queries with ORDER BY clauses on 
Strings do sort the Germans Umlauts at the end of the results, while 
Umlauts should be sorted like their equivalent characters without the 
accent (e.g. "ö" = "o").

According to https://issues.apache.org/jira/browse/JCR-3443, with 
Jackrabbit 2.5.3 it should be possible to have a normalize() function in 
XPath queries, but not in SQL2.

We're now thinking of modifying the Jackrabbit configuration, and in 
particular setting the "analyzer" param to the SearchIndex with a custom 
subclass of org.apache.lucene.analysis.standard.StandardAnalyzer, which 
makes use of a 
https://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html 
.

Does anybody per chance have an opinion whether that could be a viable 
approach?

Thanks for answers + regards,
Jörg

-- 
*Dipl. inf. Jörg von Frantzius, System Architect*

Email mailto:joerg.frantzius@aperto.de
Phone +49 30 283921-318
Fax +49 30 283921-29

Aperto AG - In der Pianofabrik
Chausseestraße 5, D-10115 Berlin-Mitte
http://www.aperto.de
http://www.facebook.com/aperto
https://www.xing.com/companies/apertoag

HRB 77049, AG Berlin Charlottenburg
Vorstand: Dirk Buddensiek (Vorsitzender), Kai Großmann, Stephan Haagen
Aufsichtsrat: Bernd Hardes (Vorsitzender)

Re: String normalization or collation?

Posted by Jörg von Frantzius <jo...@aperto.de>.
Hi Cédric,

thanks for your answer here and in JIRA.

Out of curiosity, does that mean that the comparator works on values 
from the underlying database, and not from Lucene index?

For anybody interested and reading here, I opened 
https://issues.apache.org/jira/browse/JCR-3522 for porting of a 
NORMALIZE() function to SQL2.

Regards,
Jörg

On 13.02.2013 18:46, Cédric Damioli wrote:
> Hi Jörg,
>
> Your approach won't work with the current implementation as the order 
> clauses are not "analyzed", so changing the analyzer won't have any 
> effect.
> It should be possible to add a NORMALIZE() function also in the 
> SQL/SQL2 grammar, reusing the NormalizeSortComparator introduced by 
> JCR-3443
>
> Regards,
> Cédric
>
> Le 13/02/2013 18:39, Jörg von Frantzius a écrit :
>> Hi,
>>
>> we've got the problem that our SQL2 queries with ORDER BY clauses on 
>> Strings do sort the Germans Umlauts at the end of the results, while 
>> Umlauts should be sorted like their equivalent characters without the 
>> accent (e.g. "ö" = "o").
>>
>> According to https://issues.apache.org/jira/browse/JCR-3443, with 
>> Jackrabbit 2.5.3 it should be possible to have a normalize() function 
>> in XPath queries, but not in SQL2.
>>
>> We're now thinking of modifying the Jackrabbit configuration, and in 
>> particular setting the "analyzer" param to the SearchIndex with a 
>> custom subclass of 
>> org.apache.lucene.analysis.standard.StandardAnalyzer, which makes use 
>> of a 
>> https://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html 
>> .
>>
>> Does anybody per chance have an opinion whether that could be a 
>> viable approach?
>>
>> Thanks for answers + regards,
>> Jörg
>>
>>
>> -- 
>> Cédric Damioli
>> Ametys CMS
>> http://www.ametys.org
>> http://www.anyware-services.com


-- 
*Dipl. inf. Jörg von Frantzius, System Architect*

Email mailto:joerg.frantzius@aperto.de
Phone +49 30 283921-318
Fax +49 30 283921-29

Aperto AG - In der Pianofabrik
Chausseestraße 5, D-10115 Berlin-Mitte
http://www.aperto.de
http://www.facebook.com/aperto
https://www.xing.com/companies/apertoag

HRB 77049, AG Berlin Charlottenburg
Vorstand: Dirk Buddensiek (Vorsitzender), Kai Großmann, Stephan Haagen
Aufsichtsrat: Bernd Hardes (Vorsitzender)

Re: String normalization or collation?

Posted by Cédric Damioli <cd...@apache.org>.
Hi Jörg,

Your approach won't work with the current implementation as the order 
clauses are not "analyzed", so changing the analyzer won't have any effect.
It should be possible to add a NORMALIZE() function also in the SQL/SQL2 
grammar, reusing the NormalizeSortComparator introduced by JCR-3443

Regards,
Cédric

Le 13/02/2013 18:39, Jörg von Frantzius a écrit :
> Hi,
>
> we've got the problem that our SQL2 queries with ORDER BY clauses on 
> Strings do sort the Germans Umlauts at the end of the results, while 
> Umlauts should be sorted like their equivalent characters without the 
> accent (e.g. "ö" = "o").
>
> According to https://issues.apache.org/jira/browse/JCR-3443, with 
> Jackrabbit 2.5.3 it should be possible to have a normalize() function 
> in XPath queries, but not in SQL2.
>
> We're now thinking of modifying the Jackrabbit configuration, and in 
> particular setting the "analyzer" param to the SearchIndex with a 
> custom subclass of 
> org.apache.lucene.analysis.standard.StandardAnalyzer, which makes use 
> of a 
> https://lucene.apache.org/core/old_versioned_docs/versions/3_0_3/api/core/org/apache/lucene/analysis/ASCIIFoldingFilter.html 
> .
>
> Does anybody per chance have an opinion whether that could be a viable 
> approach?
>
> Thanks for answers + regards,
> Jörg
>
>
> -- 
> Cédric Damioli
> Ametys CMS
> http://www.ametys.org
> http://www.anyware-services.com

Re: String normalization or collation?

Posted by Hendry_Betts <he...@greenskycredit.com>.
Jorg,

Since the issue that you're having with Jackrabbit is related to indexing,
it seems to me that this would be better answered by looking into Lucene.
The Jacrabbit Core uses the Lucene Core (though the team is not straight
forward in discussing the version used). Your idea may be useful, but I
think it is deeper in the configuration of the Lucene core.

I know it sounds like I am passing out of the project to another, but the
dependency requirement would lead me down the path of Lucene directly. 
Perhaps exposing more of the Lucene configuration in the Jackrabbit core
would be the greatest benefit the team could give to its community.



--
View this message in context: http://jackrabbit.510166.n4.nabble.com/String-normalization-or-collation-tp4657764p4657765.html
Sent from the Jackrabbit - Users mailing list archive at Nabble.com.