You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by OliverS <ol...@unibas.ch> on 2012/03/29 18:06:14 UTC

pattern error in PatternReplaceCharFilterFactory

Hello

I am trying to filter out characters per unicode block or before
tokenization, so I use "PatternReplaceCharFilterFactory". In the end, I want
to filter out all non-CJK characters, basically latin, greek, arabic and
hebrew scripts.

The problem is, PatternReplaceCharFilterFactory does not fully support the
block or script pattern notation. Example:
<charFilter class="solr.PatternReplaceCharFilterFactory"
               pattern="\p{InBasic_Latin}"
               replacement=""
              replace="all"
/>
This works. Other patterns tried were: \p{InLatin-1_Supplement} or \p{Latin}
These throw an exception, from the log:
***
Mar 29, 2012 5:56:45 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Plugin init failure for [schema.xml]
analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
org.apache.solr.analysis.PatternReplaceCharFilterFactory
***

I am running the latest 4.0 nightly (version 4.0.0.2012.03.09.11.46.05)

Can anybody help? Or, might this be a java issue?

Thanks a lot
Oliver

--
View this message in context: http://lucene.472066.n3.nabble.com/pattern-error-in-PatternReplaceCharFilterFactory-tp3868174p3868174.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: pattern error in PatternReplaceCharFilterFactory

Posted by Chris Hostetter <ho...@fucit.org>.
: It seems to be an unrecognisable pattern, this is from the log, last
: paragraph says "unknown character block name". The java version is
: "1.6.0_31":

Did you read the rest of my reply? about testing if java recognizes your 
block name independent of Solr ... because that error is coming directly 
from the java regex engine...

: Caused by: java.util.regex.PatternSyntaxException: Unknown character block
: name {Latin-1_Supplement} near index 23
: \p{InLatin-1_Supplement}
:                        ^
:         at java.util.regex.Pattern.error(Pattern.java:1713)
:         at java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424)

Why are you using an "_" at all? Isn't "\p{InLatin-1 Supplement}"  (or 
"\p{InLatin-1Supplement}" what you mean? Either of those work for me, and 
match the javadocs for what block names are supported in the JVM...

http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#ubc
>> The block names supported by Pattern are the valid block names accepted 
>> and defined by UnicodeBlock.forName.

http://docs.oracle.com/javase/6/docs/api/java/lang/Character.UnicodeBlock.html#forName%28java.lang.String%29
>> This method accepts block names in the following forms:
>> 
>>   1. Canonical block names as defined by the Unicode Standard. For
>>   example, the standard defines a "Basic Latin" block. Therefore, this
>>   method accepts "Basic Latin" as a valid block name. The documentation
>>   of each UnicodeBlock provides the canonical name.
>>   2. Canonical block names with all spaces removed. For example,
>>   "BasicLatin" is a valid block name for the "Basic Latin" block.
>>   ...



-Hoss

Re: pattern error in PatternReplaceCharFilterFactory

Posted by OliverS <ol...@unibas.ch>.
Hi

It seems to be an unrecognisable pattern, this is from the log, last
paragraph says "unknown character block name". The java version is
"1.6.0_31":

***
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Plugin init failure for [schema.xml]
analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
org.apache.solr.analysis.PatternReplaceCharFilterFactory
        at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167)
        at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:357)
        at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:106)
        at org.apache.solr.core.CoreContainer.create(CoreContainer.java:756)
        at org.apache.solr.core.CoreContainer.load(CoreContainer.java:473)
        at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:296)
        at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:99)
        at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
        at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
        at
org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
        at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
        at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
        at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
        at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
        at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
        at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943)
        at
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778)
        at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504)
        at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
        at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
        at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
        at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
        at
org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
        at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
        at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
        at
org.apache.catalina.core.StandardService.start(StandardService.java:525)
        at
org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
        at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
        at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
        at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/charFilter:Configuration Error: 'pattern' can not be
parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory
        at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167)
        at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:290)
        at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
        at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
        at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
        ... 33 more
Caused by: java.lang.RuntimeException: Configuration Error: 'pattern' can
not be parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory
        at
org.apache.solr.analysis.PatternReplaceCharFilterFactory.init(PatternReplaceCharFilterFactory.java:54)
        at
org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:278)
        at
org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:268)
        at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:149)
        ... 37 more
Caused by: java.util.regex.PatternSyntaxException: Unknown character block
name {Latin-1_Supplement} near index 23
\p{InLatin-1_Supplement}
                       ^
        at java.util.regex.Pattern.error(Pattern.java:1713)
        at
java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424)
        at java.util.regex.Pattern.family(Pattern.java:2408)
        at java.util.regex.Pattern.sequence(Pattern.java:1831)
        at java.util.regex.Pattern.expr(Pattern.java:1752)
        at java.util.regex.Pattern.compile(Pattern.java:1460)
        at java.util.regex.Pattern.<init>(Pattern.java:1133)
        at java.util.regex.Pattern.compile(Pattern.java:823)
        at
org.apache.solr.analysis.PatternReplaceCharFilterFactory.init(PatternReplaceCharFilterFactory.java:52)
        ... 40 more
***

--
View this message in context: http://lucene.472066.n3.nabble.com/pattern-error-in-PatternReplaceCharFilterFactory-tp3868174p3876986.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: pattern error in PatternReplaceCharFilterFactory

Posted by Chris Hostetter <ho...@fucit.org>.
: This works. Other patterns tried were: \p{InLatin-1_Supplement} or \p{Latin}
: These throw an exception, from the log:
: ***
: Mar 29, 2012 5:56:45 PM org.apache.solr.common.SolrException log
: SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
: [schema.xml] fieldType:Plugin init failure for [schema.xml]
: analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
: org.apache.solr.analysis.PatternReplaceCharFilterFactory

Immediately below that should have been more details on what error 
generated by the Java regex engine when trying to parse your pattern. 
(something like "caused by: ...")  which is fairly crucial to understand 
what might be going wrong.

: Can anybody help? Or, might this be a java issue?

I suspect it's a java issue ... you didn't mention which version of java 
you are using, and i don't know which java versions corripsond to which 
unicode versions in terms of the block names they support, but is it 
possible some of those patterns are only legal in a newer version of java 
then you have?

have you tried running a simple little java main() to verify that those 
patterns are legal in your JVM?

public static final class PatTest {
  public static final void main(String[] args) throws Exception {
    String pat = args[0];
    String input = args[1];
    Pattern p = Pattern.compile(pat);
    System.out.println(input + " does " + 
                       (p.matcher(input).matches() ? "" : "NOT") +
                       " match " + pat);
  }
}


-Hoss