You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by OliverS <ol...@unibas.ch> on 2012/03/29 18:06:14 UTC
pattern error in PatternReplaceCharFilterFactory
Hello
I am trying to filter out characters per unicode block or before
tokenization, so I use "PatternReplaceCharFilterFactory". In the end, I want
to filter out all non-CJK characters, basically latin, greek, arabic and
hebrew scripts.
The problem is, PatternReplaceCharFilterFactory does not fully support the
block or script pattern notation. Example:
<charFilter class="solr.PatternReplaceCharFilterFactory"
pattern="\p{InBasic_Latin}"
replacement=""
replace="all"
/>
This works. Other patterns tried were: \p{InLatin-1_Supplement} or \p{Latin}
These throw an exception, from the log:
***
Mar 29, 2012 5:56:45 PM org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Plugin init failure for [schema.xml]
analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
org.apache.solr.analysis.PatternReplaceCharFilterFactory
***
I am running the latest 4.0 nightly (version 4.0.0.2012.03.09.11.46.05)
Can anybody help? Or, might this be a java issue?
Thanks a lot
Oliver
--
View this message in context: http://lucene.472066.n3.nabble.com/pattern-error-in-PatternReplaceCharFilterFactory-tp3868174p3868174.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: pattern error in PatternReplaceCharFilterFactory
Posted by Chris Hostetter <ho...@fucit.org>.
: It seems to be an unrecognisable pattern, this is from the log, last
: paragraph says "unknown character block name". The java version is
: "1.6.0_31":
Did you read the rest of my reply? about testing if java recognizes your
block name independent of Solr ... because that error is coming directly
from the java regex engine...
: Caused by: java.util.regex.PatternSyntaxException: Unknown character block
: name {Latin-1_Supplement} near index 23
: \p{InLatin-1_Supplement}
: ^
: at java.util.regex.Pattern.error(Pattern.java:1713)
: at java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424)
Why are you using an "_" at all? Isn't "\p{InLatin-1 Supplement}" (or
"\p{InLatin-1Supplement}" what you mean? Either of those work for me, and
match the javadocs for what block names are supported in the JVM...
http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html#ubc
>> The block names supported by Pattern are the valid block names accepted
>> and defined by UnicodeBlock.forName.
http://docs.oracle.com/javase/6/docs/api/java/lang/Character.UnicodeBlock.html#forName%28java.lang.String%29
>> This method accepts block names in the following forms:
>>
>> 1. Canonical block names as defined by the Unicode Standard. For
>> example, the standard defines a "Basic Latin" block. Therefore, this
>> method accepts "Basic Latin" as a valid block name. The documentation
>> of each UnicodeBlock provides the canonical name.
>> 2. Canonical block names with all spaces removed. For example,
>> "BasicLatin" is a valid block name for the "Basic Latin" block.
>> ...
-Hoss
Re: pattern error in PatternReplaceCharFilterFactory
Posted by OliverS <ol...@unibas.ch>.
Hi
It seems to be an unrecognisable pattern, this is from the log, last
paragraph says "unknown character block name". The java version is
"1.6.0_31":
***
SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] fieldType:Plugin init failure for [schema.xml]
analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
org.apache.solr.analysis.PatternReplaceCharFilterFactory
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167)
at
org.apache.solr.schema.IndexSchema.readSchema(IndexSchema.java:357)
at org.apache.solr.schema.IndexSchema.<init>(IndexSchema.java:106)
at org.apache.solr.core.CoreContainer.create(CoreContainer.java:756)
at org.apache.solr.core.CoreContainer.load(CoreContainer.java:473)
at
org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:296)
at
org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:99)
at
org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:295)
at
org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:422)
at
org.apache.catalina.core.ApplicationFilterConfig.<init>(ApplicationFilterConfig.java:115)
at
org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4072)
at
org.apache.catalina.core.StandardContext.start(StandardContext.java:4726)
at
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:799)
at
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:779)
at
org.apache.catalina.core.StandardHost.addChild(StandardHost.java:601)
at
org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:943)
at
org.apache.catalina.startup.HostConfig.deployWARs(HostConfig.java:778)
at
org.apache.catalina.startup.HostConfig.deployApps(HostConfig.java:504)
at
org.apache.catalina.startup.HostConfig.start(HostConfig.java:1317)
at
org.apache.catalina.startup.HostConfig.lifecycleEvent(HostConfig.java:324)
at
org.apache.catalina.util.LifecycleSupport.fireLifecycleEvent(LifecycleSupport.java:142)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1065)
at
org.apache.catalina.core.StandardHost.start(StandardHost.java:840)
at
org.apache.catalina.core.ContainerBase.start(ContainerBase.java:1057)
at
org.apache.catalina.core.StandardEngine.start(StandardEngine.java:463)
at
org.apache.catalina.core.StandardService.start(StandardService.java:525)
at
org.apache.catalina.core.StandardServer.start(StandardServer.java:754)
at org.apache.catalina.startup.Catalina.start(Catalina.java:595)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.catalina.startup.Bootstrap.start(Bootstrap.java:289)
at org.apache.catalina.startup.Bootstrap.main(Bootstrap.java:414)
Caused by: org.apache.solr.common.SolrException: Plugin init failure for
[schema.xml] analyzer/charFilter:Configuration Error: 'pattern' can not be
parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:167)
at
org.apache.solr.schema.FieldTypePluginLoader.readAnalyzer(FieldTypePluginLoader.java:290)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:95)
at
org.apache.solr.schema.FieldTypePluginLoader.create(FieldTypePluginLoader.java:43)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:141)
... 33 more
Caused by: java.lang.RuntimeException: Configuration Error: 'pattern' can
not be parsed in org.apache.solr.analysis.PatternReplaceCharFilterFactory
at
org.apache.solr.analysis.PatternReplaceCharFilterFactory.init(PatternReplaceCharFilterFactory.java:54)
at
org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:278)
at
org.apache.solr.schema.FieldTypePluginLoader$1.init(FieldTypePluginLoader.java:268)
at
org.apache.solr.util.plugin.AbstractPluginLoader.load(AbstractPluginLoader.java:149)
... 37 more
Caused by: java.util.regex.PatternSyntaxException: Unknown character block
name {Latin-1_Supplement} near index 23
\p{InLatin-1_Supplement}
^
at java.util.regex.Pattern.error(Pattern.java:1713)
at
java.util.regex.Pattern.unicodeBlockPropertyFor(Pattern.java:2424)
at java.util.regex.Pattern.family(Pattern.java:2408)
at java.util.regex.Pattern.sequence(Pattern.java:1831)
at java.util.regex.Pattern.expr(Pattern.java:1752)
at java.util.regex.Pattern.compile(Pattern.java:1460)
at java.util.regex.Pattern.<init>(Pattern.java:1133)
at java.util.regex.Pattern.compile(Pattern.java:823)
at
org.apache.solr.analysis.PatternReplaceCharFilterFactory.init(PatternReplaceCharFilterFactory.java:52)
... 40 more
***
--
View this message in context: http://lucene.472066.n3.nabble.com/pattern-error-in-PatternReplaceCharFilterFactory-tp3868174p3876986.html
Sent from the Solr - User mailing list archive at Nabble.com.
Re: pattern error in PatternReplaceCharFilterFactory
Posted by Chris Hostetter <ho...@fucit.org>.
: This works. Other patterns tried were: \p{InLatin-1_Supplement} or \p{Latin}
: These throw an exception, from the log:
: ***
: Mar 29, 2012 5:56:45 PM org.apache.solr.common.SolrException log
: SEVERE: null:org.apache.solr.common.SolrException: Plugin init failure for
: [schema.xml] fieldType:Plugin init failure for [schema.xml]
: analyzer/charFilter:Configuration Error: 'pattern' can not be parsed in
: org.apache.solr.analysis.PatternReplaceCharFilterFactory
Immediately below that should have been more details on what error
generated by the Java regex engine when trying to parse your pattern.
(something like "caused by: ...") which is fairly crucial to understand
what might be going wrong.
: Can anybody help? Or, might this be a java issue?
I suspect it's a java issue ... you didn't mention which version of java
you are using, and i don't know which java versions corripsond to which
unicode versions in terms of the block names they support, but is it
possible some of those patterns are only legal in a newer version of java
then you have?
have you tried running a simple little java main() to verify that those
patterns are legal in your JVM?
public static final class PatTest {
public static final void main(String[] args) throws Exception {
String pat = args[0];
String input = args[1];
Pattern p = Pattern.compile(pat);
System.out.println(input + " does " +
(p.matcher(input).matches() ? "" : "NOT") +
" match " + pat);
}
}
-Hoss