You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by al...@aim.com on 2014/02/13 02:12:13 UTC

change character correspondence in icu lib

Hello,

I use
icu4j-49.1.jar,
lucene-analyzers-icu-4.6-SNAPSHOT.jar

for one of the fields in the form 

<filter class="solr.ICUFoldingFilterFactory" />

I need to change one of the accent char's corresponding letter. I made changes to this file

lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt

recompiled solr and lucene and replaced the above jars with new ones, but no change in the indexing and parsing of keywords.

Any ideas where the appropriate change must be made?

Thanks.
Alex.




Re: change character correspondence in icu lib

Posted by al...@aim.com.
I found out that generated files are the same. I think this is because that these lines inside build file

  <target name="gen-utr30-data-files" depends="compile-tools">
    <java
        classname="org.apache.lucene.analysis.icu.GenerateUTR30DataFiles"
        dir="${utr30.data.dir}"
        fork="true"
        failonerror="true">
      <classpath>
        <path refid="icujar"/>
        <pathelement location="${build.dir}/classes/tools"/>
      </classpath>
    </java>
  </target>

  <property name="gennorm2.src.files"
      value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt"/>
  <property name="gennorm2.tmp" value="${build.dir}/gennorm2/utr30.tmp"/>
  <property name="gennorm2.dst" value="${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm"/>
  <target name="gennorm2" depends="gen-utr30-data-files">
    <echo>Note that the gennorm2 and icupkg tools must be on your PATH. These tools
are part of the ICU4C package. See http://site.icu-project.org/ </echo>
    <mkdir dir="${build.dir}/gennorm2"/>
    <exec executable="gennorm2" failonerror="true">
      <arg value="-v"/>
      <arg value="-s"/>
      <arg value="${utr30.data.dir}"/>
      <arg line="${gennorm2.src.files}"/>
      <arg value="-o"/>
      <arg value="${gennorm2.tmp}"/>
    </exec>
    <!-- now convert binary file to big-endian -->
    <exec executable="icupkg" failonerror="true">
      <arg value="-tb"/>
      <arg value="${gennorm2.tmp}"/>
      <arg value="${gennorm2.dst}"/>
    </exec>
    <delete file="${gennorm2.tmp}"/>
  </target>

Are not executed and resource files are downloaded from internet instead.

Any ideas how to fix this issue?

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Alexandre Rafalovitch <ar...@gmail.com>
To: solr-user <so...@lucene.apache.org>
Sent: Wed, Feb 12, 2014 5:20 pm
Subject: Re: change character correspondence in icu lib


Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not....
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  <al...@aim.com> wrote:
> Hello,
>
> I use
> icu4j-49.1.jar,
> lucene-analyzers-icu-4.6-SNAPSHOT.jar
>
> for one of the fields in the form
>
> <filter class="solr.ICUFoldingFilterFactory" />
>
> I need to change one of the accent char's corresponding letter. I made changes 
to this file
>
> lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
>
> recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.
>
> Any ideas where the appropriate change must be made?
>
> Thanks.
> Alex.
>
>
>

 

Re: change character correspondence in icu lib

Posted by al...@aim.com.
I found out that generated files are the same. I think this is because that these lines inside build file

  <target name="gen-utr30-data-files" depends="compile-tools">
    <java
        classname="org.apache.lucene.analysis.icu.GenerateUTR30DataFiles"
        dir="${utr30.data.dir}"
        fork="true"
        failonerror="true">
      <classpath>
        <path refid="icujar"/>
        <pathelement location="${build.dir}/classes/tools"/>
      </classpath>
    </java>
  </target>

  <property name="gennorm2.src.files"
      value="nfc.txt nfkc.txt nfkc_cf.txt BasicFoldings.txt DiacriticFolding.txt DingbatFolding.txt HanRadicalFolding.txt NativeDigitFolding.txt"/>
  <property name="gennorm2.tmp" value="${build.dir}/gennorm2/utr30.tmp"/>
  <property name="gennorm2.dst" value="${resources.dir}/org/apache/lucene/analysis/icu/utr30.nrm"/>
  <target name="gennorm2" depends="gen-utr30-data-files">
    <echo>Note that the gennorm2 and icupkg tools must be on your PATH. These tools
are part of the ICU4C package. See http://site.icu-project.org/ </echo>
    <mkdir dir="${build.dir}/gennorm2"/>
    <exec executable="gennorm2" failonerror="true">
      <arg value="-v"/>
      <arg value="-s"/>
      <arg value="${utr30.data.dir}"/>
      <arg line="${gennorm2.src.files}"/>
      <arg value="-o"/>
      <arg value="${gennorm2.tmp}"/>
    </exec>
    <!-- now convert binary file to big-endian -->
    <exec executable="icupkg" failonerror="true">
      <arg value="-tb"/>
      <arg value="${gennorm2.tmp}"/>
      <arg value="${gennorm2.dst}"/>
    </exec>
    <delete file="${gennorm2.tmp}"/>
  </target>

Are not executed and resource files are downloaded from internet instead.

Any ideas how to fix this issue?

Thanks.
Alex.

 

 

 

-----Original Message-----
From: Alexandre Rafalovitch <ar...@gmail.com>
To: solr-user <so...@lucene.apache.org>
Sent: Wed, Feb 12, 2014 5:20 pm
Subject: Re: change character correspondence in icu lib


Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not....
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  <al...@aim.com> wrote:
> Hello,
>
> I use
> icu4j-49.1.jar,
> lucene-analyzers-icu-4.6-SNAPSHOT.jar
>
> for one of the fields in the form
>
> <filter class="solr.ICUFoldingFilterFactory" />
>
> I need to change one of the accent char's corresponding letter. I made changes 
to this file
>
> lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
>
> recompiled solr and lucene and replaced the above jars with new ones, but no 
change in the indexing and parsing of keywords.
>
> Any ideas where the appropriate change must be made?
>
> Thanks.
> Alex.
>
>
>

 

Re: change character correspondence in icu lib

Posted by Alexandre Rafalovitch <ar...@gmail.com>.
Not a direct answer, but the usual next question is: are you
absolutely sure you are using the right jars? Try renaming them and
restarting Solr. If it complains, you got the right ones. If not....
Also, unzip those jars and see if your file made it all the way
through the build pipeline.

Regards,
   Alex.

Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Thu, Feb 13, 2014 at 8:12 AM,  <al...@aim.com> wrote:
> Hello,
>
> I use
> icu4j-49.1.jar,
> lucene-analyzers-icu-4.6-SNAPSHOT.jar
>
> for one of the fields in the form
>
> <filter class="solr.ICUFoldingFilterFactory" />
>
> I need to change one of the accent char's corresponding letter. I made changes to this file
>
> lucene/analysis/icu/src/data/utr30/DiacriticFolding.txt
>
> recompiled solr and lucene and replaced the above jars with new ones, but no change in the indexing and parsing of keywords.
>
> Any ideas where the appropriate change must be made?
>
> Thanks.
> Alex.
>
>
>