You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@jackrabbit.apache.org by Antonio Facciorusso <A....@westpole.it> on 2019/04/17 10:05:28 UTC

Can't find Japanese words ending with numbers

Dear all,

I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.

I have a node of type "mynodetype" having a property named "description" having the following value: "横浜第2センタ". If I perform a full-text search using "jcr:contains" like:

jcr:contains(., '<value>*')

this query returns 0 results:
"//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"

while all of the following work correctly and return at least one result:

"//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
"//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"

I tried using both the default analyzer and the Japanese one (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html).

This is the content of my indexingConfiguration.xml file:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <index-rule nodeType="entity">
        <!-- simple properties -->
        <property isRegexp="true">.*:[^_]+</property>
        <!-- resources_data_xxx -->
        <property isRegexp="true">.*:resources_data_[^_]+</property>
        <!-- resources_xxx (with xxx != 'data') -->
        <property isRegexp="true">.*:resources_data[^_]+</property>
        <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
        <!-- resourcesxyz_xxx -->
        <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
        <!-- all other xxx_yyy (with xxx != resources) -->
        <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
    </index-rule>
</configuration>

Should I use a different configuration/analyzer? Is it a bug?

Thank you.

Best regards,
Antonio.
[https://westpole.it/firma/logo.png]

Antonio Facciorusso
WebRainbow(r) Software Analyst & Developer

P +39 051 8550 562
M +39 335 1219330
E A.Facciorusso@westpole.it
W https://westpole.webex.com/meet/A.Facciorusso
A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno

[https://westpole.it/firma/sito.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/facebook.png] <https://www.facebook.com/WESTPOLESPA/>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>


This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:privacy@westpole.it) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.


[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email


RE: Can't find Japanese words ending with numbers

Posted by Gareth Harper <ga...@mandp.com>.
Thank you.

-----Original Message-----
From: Uwe Schindler <uw...@thetaphi.de> 
Sent: 17 April 2019 12:18
To: general@lucene.apache.org
Subject: RE: Can't find Japanese words ending with numbers

Please check here, you have to do it on your own:
http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Gareth Harper <ga...@mandp.com>
> Sent: Wednesday, April 17, 2019 12:45 PM
> To: general@lucene.apache.org
> Subject: RE: Can't find Japanese words ending with numbers
> 
> Could someone please take me off this mailing list.
> 
> -----Original Message-----
> From: Antonio Facciorusso <A....@westpole.it>
> Sent: 17 April 2019 11:05
> To: users@jackrabbit.apache.org; general@lucene.apache.org
> Subject: Can't find Japanese words ending with numbers
> 
> Dear all,
> 
> I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.
> 
> I have a node of type "mynodetype" having a property named "description"
> having the following value: "横浜第2センタ". If I perform a full-text search 
> using "jcr:contains" like:
> 
> jcr:contains(., '<value>*')
> 
> this query returns 0 results:
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"
> 
> while all of the following work correctly and return at least one result:
> 
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"
> 
> I tried using both the default analyzer and the Japanese one 
> (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analys
> is/j
> a/JapaneseAnalyzer.html).
> 
> This is the content of my indexingConfiguration.xml file:
> 
> <?xml version="1.0"?>
> <!DOCTYPE configuration SYSTEM
> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>     <index-rule nodeType="entity">
>         <!-- simple properties -->
>         <property isRegexp="true">.*:[^_]+</property>
>         <!-- resources_data_xxx -->
>         <property isRegexp="true">.*:resources_data_[^_]+</property>
>         <!-- resources_xxx (with xxx != 'data') -->
>         <property isRegexp="true">.*:resources_data[^_]+</property>
>         <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
>         <!-- resourcesxyz_xxx -->
>         <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
>         <!-- all other xxx_yyy (with xxx != resources) -->
>         <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
>     </index-rule>
> </configuration>
> 
> Should I use a different configuration/analyzer? Is it a bug?
> 
> Thank you.
> 
> Best regards,
> Antonio.
> [https://westpole.it/firma/logo.png]
> 
> Antonio Facciorusso
> WebRainbow(r) Software Analyst & Developer
> 
> P +39 051 8550 562
> M +39 335 1219330
> E A.Facciorusso@westpole.it
> W https://westpole.webex.com/meet/A.Facciorusso
> A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno
> 
> [https://westpole.it/firma/sito.png]<https://westpole.it>
> [https://westpole.it/firma/twitter.png]
> <https://twitter.com/WESTPOLE_SPA>
> [https://westpole.it/firma/facebook.png]
> <https://www.facebook.com/WESTPOLESPA/>
> [https://westpole.it/firma/linkedin.png]
> <https://www.linkedin.com/company/westpole/>
> 
> 
> This email for the D.lgs.196/2003 (Privacy Code) and European 
> Regulation 679/2016/UE (GDPR) may contain confidential and/or 
> privileged information for the exclusive use of the intended 
> recipient. Any review or distribution by others is strictly 
> prohibited. If you are not the intended recipient, you must not use, 
> copy, disclose or take any action based on this message or any 
> information here. If you have received this email in error, please 
> contact us
> (email:privacy@westpole.it) by reply email and delete all copies. 
> Legal privilege is not waived because you have read this email. Thank 
> you for your cooperation.
> 
> 
> [https://westpole.it/firma/ambiente.png] Please consider the 
> environment before printing this email
> 
> 
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Claranet. The service 
> is powered by MessageLabs. For more information on a proactive 
> anti-virus service working around the clock, around the globe, visit:
> http://www.claranet.co.uk
> ________________________________________________________________
> ________
> 
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Star Internet. The 
> service is powered by MessageLabs - For more information on a 
> proactive anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ________________________________________________________________
> ________

________________________________________________________________________
This e-mail has been scanned for all viruses by Claranet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit:
http://www.claranet.co.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs - For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

RE: Can't find Japanese words ending with numbers

Posted by Uwe Schindler <uw...@thetaphi.de>.
Please check here, you have to do it on your own:
http://lucene.apache.org/core/discussion.html#java-user-list-java-userluceneapacheorg

-----
Uwe Schindler
Achterdiek 19, D-28357 Bremen
https://www.thetaphi.de
eMail: uwe@thetaphi.de

> -----Original Message-----
> From: Gareth Harper <ga...@mandp.com>
> Sent: Wednesday, April 17, 2019 12:45 PM
> To: general@lucene.apache.org
> Subject: RE: Can't find Japanese words ending with numbers
> 
> Could someone please take me off this mailing list.
> 
> -----Original Message-----
> From: Antonio Facciorusso <A....@westpole.it>
> Sent: 17 April 2019 11:05
> To: users@jackrabbit.apache.org; general@lucene.apache.org
> Subject: Can't find Japanese words ending with numbers
> 
> Dear all,
> 
> I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.
> 
> I have a node of type "mynodetype" having a property named "description"
> having the following value: "横浜第2センタ". If I perform a full-text search
> using "jcr:contains" like:
> 
> jcr:contains(., '<value>*')
> 
> this query returns 0 results:
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"
> 
> while all of the following work correctly and return at least one result:
> 
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"
> 
> I tried using both the default analyzer and the Japanese one
> (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/j
> a/JapaneseAnalyzer.html).
> 
> This is the content of my indexingConfiguration.xml file:
> 
> <?xml version="1.0"?>
> <!DOCTYPE configuration SYSTEM
> "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>     <index-rule nodeType="entity">
>         <!-- simple properties -->
>         <property isRegexp="true">.*:[^_]+</property>
>         <!-- resources_data_xxx -->
>         <property isRegexp="true">.*:resources_data_[^_]+</property>
>         <!-- resources_xxx (with xxx != 'data') -->
>         <property isRegexp="true">.*:resources_data[^_]+</property>
>         <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
>         <!-- resourcesxyz_xxx -->
>         <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
>         <!-- all other xxx_yyy (with xxx != resources) -->
>         <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
>     </index-rule>
> </configuration>
> 
> Should I use a different configuration/analyzer? Is it a bug?
> 
> Thank you.
> 
> Best regards,
> Antonio.
> [https://westpole.it/firma/logo.png]
> 
> Antonio Facciorusso
> WebRainbow(r) Software Analyst & Developer
> 
> P +39 051 8550 562
> M +39 335 1219330
> E A.Facciorusso@westpole.it
> W https://westpole.webex.com/meet/A.Facciorusso
> A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno
> 
> [https://westpole.it/firma/sito.png]<https://westpole.it>
> [https://westpole.it/firma/twitter.png]
> <https://twitter.com/WESTPOLE_SPA>
> [https://westpole.it/firma/facebook.png]
> <https://www.facebook.com/WESTPOLESPA/>
> [https://westpole.it/firma/linkedin.png]
> <https://www.linkedin.com/company/westpole/>
> 
> 
> This email for the D.lgs.196/2003 (Privacy Code) and European Regulation
> 679/2016/UE (GDPR) may contain confidential and/or privileged information
> for the exclusive use of the intended recipient. Any review or distribution by
> others is strictly prohibited. If you are not the intended recipient, you must
> not use, copy, disclose or take any action based on this message or any
> information here. If you have received this email in error, please contact us
> (email:privacy@westpole.it) by reply email and delete all copies. Legal
> privilege is not waived because you have read this email. Thank you for your
> cooperation.
> 
> 
> [https://westpole.it/firma/ambiente.png] Please consider the environment
> before printing this email
> 
> 
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Claranet. The service is
> powered by MessageLabs. For more information on a proactive anti-virus
> service working around the clock, around the globe, visit:
> http://www.claranet.co.uk
> ________________________________________________________________
> ________
> 
> ________________________________________________________________
> ________
> This e-mail has been scanned for all viruses by Star Internet. The
> service is powered by MessageLabs - For more information on a proactive
> anti-virus service working around the clock, around the globe, visit:
> http://www.star.net.uk
> ________________________________________________________________
> ________


RE: Can't find Japanese words ending with numbers

Posted by Gareth Harper <ga...@mandp.com>.
Could someone please take me off this mailing list.

-----Original Message-----
From: Antonio Facciorusso <A....@westpole.it> 
Sent: 17 April 2019 11:05
To: users@jackrabbit.apache.org; general@lucene.apache.org
Subject: Can't find Japanese words ending with numbers

Dear all,

I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.

I have a node of type "mynodetype" having a property named "description" having the following value: "横浜第2センタ". If I perform a full-text search using "jcr:contains" like:

jcr:contains(., '<value>*')

this query returns 0 results:
"//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"

while all of the following work correctly and return at least one result:

"//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
"//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
"//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"

I tried using both the default analyzer and the Japanese one (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html).

This is the content of my indexingConfiguration.xml file:

<?xml version="1.0"?>
<!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
<configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
    <index-rule nodeType="entity">
        <!-- simple properties -->
        <property isRegexp="true">.*:[^_]+</property>
        <!-- resources_data_xxx -->
        <property isRegexp="true">.*:resources_data_[^_]+</property>
        <!-- resources_xxx (with xxx != 'data') -->
        <property isRegexp="true">.*:resources_data[^_]+</property>
        <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
        <!-- resourcesxyz_xxx -->
        <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
        <!-- all other xxx_yyy (with xxx != resources) -->
        <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
    </index-rule>
</configuration>

Should I use a different configuration/analyzer? Is it a bug?

Thank you.

Best regards,
Antonio.
[https://westpole.it/firma/logo.png]

Antonio Facciorusso
WebRainbow(r) Software Analyst & Developer

P +39 051 8550 562
M +39 335 1219330
E A.Facciorusso@westpole.it
W https://westpole.webex.com/meet/A.Facciorusso
A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno

[https://westpole.it/firma/sito.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/facebook.png] <https://www.facebook.com/WESTPOLESPA/>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>


This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:privacy@westpole.it) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.


[https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email


________________________________________________________________________
This e-mail has been scanned for all viruses by Claranet. The service is powered by MessageLabs. For more information on a proactive anti-virus service working around the clock, around the globe, visit:
http://www.claranet.co.uk
________________________________________________________________________

________________________________________________________________________
This e-mail has been scanned for all viruses by Star Internet. The
service is powered by MessageLabs - For more information on a proactive
anti-virus service working around the clock, around the globe, visit:
http://www.star.net.uk
________________________________________________________________________

Re: Can't find Japanese words ending with numbers

Posted by Piotr Tajduś <pi...@skg.pl>.
Hi,

I am using OAK now so not sure if it will be the same with Lucene  
version in jackrabbit 2.16, but try to add quotation, f.e.:

"//element(*,mynodetype)[(jcr:contains(., '\"横浜第2*\"'))]"


Best regards,
Piotr


On 17.04.2019 12:05, Antonio Facciorusso wrote:
> Dear all,
>
> I'm using Jackrabbit 2.16.1 and Lucene 3.6.2.
>
> I have a node of type "mynodetype" having a property named "description" having the following value: "横浜第2センタ". If I perform a full-text search using "jcr:contains" like:
>
> jcr:contains(., '<value>*')
>
> this query returns 0 results:
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2*'))]"
>
> while all of the following work correctly and return at least one result:
>
> "//element(*,mynodetype)[(jcr:contains(., '横浜第2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '横浜第*'))]"
> "//element(*,mynodetype)[(jcr:contains(., '2センタ*'))]"
> "//element(*,mynodetype)[(jcr:contains(., 'センタ*'))]"
>
> I tried using both the default analyzer and the Japanese one (https://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/analysis/ja/JapaneseAnalyzer.html).
>
> This is the content of my indexingConfiguration.xml file:
>
> <?xml version="1.0"?>
> <!DOCTYPE configuration SYSTEM "http://jackrabbit.apache.org/dtd/indexing-configuration-1.1.dtd">
> <configuration xmlns:nt="http://www.jcp.org/jcr/nt/1.0">
>      <index-rule nodeType="entity">
>          <!-- simple properties -->
>          <property isRegexp="true">.*:[^_]+</property>
>          <!-- resources_data_xxx -->
>          <property isRegexp="true">.*:resources_data_[^_]+</property>
>          <!-- resources_xxx (with xxx != 'data') -->
>          <property isRegexp="true">.*:resources_data[^_]+</property>
>          <property isRegexp="true">.*:resources_(?!data)[^_]+</property>
>          <!-- resourcesxyz_xxx -->
>          <property isRegexp="true">.*:resources[^_]+_[^_]+</property>
>          <!-- all other xxx_yyy (with xxx != resources) -->
>          <property isRegexp="true">.*:(?!resources)[^_]+_[^_]+</property>
>      </index-rule>
> </configuration>
>
> Should I use a different configuration/analyzer? Is it a bug?
>
> Thank you.
>
> Best regards,
> Antonio.
> [https://westpole.it/firma/logo.png]
>
> Antonio Facciorusso
> WebRainbow(r) Software Analyst & Developer
>
> P +39 051 8550 562
> M +39 335 1219330
> E A.Facciorusso@westpole.it
> W https://westpole.webex.com/meet/A.Facciorusso
> A Via Ettore Cristoni, 84 - 40033 Casalecchio di Reno
>
> [https://westpole.it/firma/sito.png]<https://westpole.it>  [https://westpole.it/firma/twitter.png] <https://twitter.com/WESTPOLE_SPA>   [https://westpole.it/firma/facebook.png] <https://www.facebook.com/WESTPOLESPA/>   [https://westpole.it/firma/linkedin.png] <https://www.linkedin.com/company/westpole/>
>
>
> This email for the D.lgs.196/2003 (Privacy Code) and European Regulation 679/2016/UE (GDPR) may contain confidential and/or privileged information for the exclusive use of the intended recipient. Any review or distribution by others is strictly prohibited. If you are not the intended recipient, you must not use, copy, disclose or take any action based on this message or any information here. If you have received this email in error, please contact us (email:privacy@westpole.it) by reply email and delete all copies. Legal privilege is not waived because you have read this email. Thank you for your cooperation.
>
>
> [https://westpole.it/firma/ambiente.png] Please consider the environment before printing this email
>
>