You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Kiraly <pk...@tesuji.eu> on 2009/09/02 12:00:30 UTC
date field type problem
Hi Solr users,
I have a lots of dates from a library catalog in not
solr.DateField compatible format. I wrote a new <fieldType>
definition inside the solrconfig.xml, which creates
eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
It works fine when I tried it with the typical values
in the http://localhost:8983/solr/admin/analysis.jsp,
but it always throws an exception, when I try to index
the records.
<fieldType name="trickyDate" class="solr.DateField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory" />
<filter class="solr.TrimFilterFactory" />
<filter class="solr.PatternReplaceFilterFactory"
pattern="sh..?wa \d\d? " replacement="" replace="first"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="june (\d\d), " replacement="" replace="first"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="september (\d\d), " replacement="" replace="first"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="(\D)" replacement="" replace="all"/>
<filter class="solr.PatternReplaceFilterFactory"
pattern="^(\d{4})\d*$" replacement="$1-01-01T00:00:01"
replace="all"/>
</analyzer>
</fieldType>
It is more than possible, that I misunderstand something. What I
like to do is to 'normalize' somehow the input data, and I thought
that it is more effective in the Solr side, than in the client.
Have you got any advise, how I may continue?
Péter
Re: date field type problem
Posted by Peter Kiraly <pk...@tesuji.eu>.
Hi,
the exception I received:
SEVERE: org.apache.solr.common.SolrException: Error while creating field
'date_df{type=trickyDate,properties=indexed,stored,omitNorms,omitTf,multiValued,sortMissingLast}'
from value 'c1991.'
at org.apache.solr.schema.FieldType.createField(FieldType.java:190)
at
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:244)
at
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
at
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.solr.common.SolrException: Invalid Date
String:'c1991.'
at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
at org.apache.solr.schema.FieldType.createField(FieldType.java:188)
... 27 more
My expectation is, that a field type behaves like this:
0) I give a field type as the storage type
1) I give it a string
2) with tokenizers, and filters I parse into a given form
3) the Solr handles it as the given type
for example:
0) I set the field type as "solr.DateField"
1) input string is "1991."
2) the analyzer creates "1991-01-01T00:00:00Z"
3) and as it is the normal input form of the date type, Solr
indexes it.
It seems, that the input string ("1991.") must match to the
solr.DateField's expectation, and not the output
("1991-01-01T00:00:00Z").
So the question is: is there a solution, in which I can
"preprocess" the inputs, or it is only doable only on the client's
side.
Péter
>From: "Grant Ingersoll" <gs...@apache.org>
>Subject: Re: date field type problem
>What's the exception?
On Sep 2, 2009, at 3:00 AM, Peter Kiraly wrote:
> Hi Solr users,
>
> I have a lots of dates from a library catalog in not
> solr.DateField compatible format. I wrote a new <fieldType>
> definition inside the solrconfig.xml, which creates
> eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
> It works fine when I tried it with the typical values
> in the http://localhost:8983/solr/admin/analysis.jsp,
> but it always throws an exception, when I try to index
> the records.
>
> <fieldType name="trickyDate" class="solr.DateField"
> sortMissingLast="true" omitNorms="true">
> <analyzer>
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.TrimFilterFactory" />
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="sh..?wa \d\d? " replacement="" replace="first"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="june (\d\d), " replacement="" replace="first"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="september (\d\d), " replacement="" replace="first"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="(\D)" replacement="" replace="all"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="^(\d{4})\d*$" replacement="$1-01-01T00:00:01"
> replace="all"/>
> </analyzer>
> </fieldType>
>
> It is more than possible, that I misunderstand something. What I
> like to do is to 'normalize' somehow the input data, and I thought
> that it is more effective in the Solr side, than in the client.
>
> Have you got any advise, how I may continue?
>
> Péter
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
Re: date field type problem
Posted by Grant Ingersoll <gs...@apache.org>.
What's the exception?
On Sep 2, 2009, at 3:00 AM, Peter Kiraly wrote:
> Hi Solr users,
>
> I have a lots of dates from a library catalog in not
> solr.DateField compatible format. I wrote a new <fieldType>
> definition inside the solrconfig.xml, which creates
> eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
> It works fine when I tried it with the typical values
> in the http://localhost:8983/solr/admin/analysis.jsp,
> but it always throws an exception, when I try to index
> the records.
>
> <fieldType name="trickyDate" class="solr.DateField"
> sortMissingLast="true" omitNorms="true">
> <analyzer>
> <tokenizer class="solr.KeywordTokenizerFactory"/>
> <filter class="solr.LowerCaseFilterFactory" />
> <filter class="solr.TrimFilterFactory" />
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="sh..?wa \d\d? " replacement="" replace="first"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="june (\d\d), " replacement="" replace="first"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="september (\d\d), " replacement="" replace="first"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="(\D)" replacement="" replace="all"/>
> <filter class="solr.PatternReplaceFilterFactory"
> pattern="^(\d{4})\d*$" replacement="$1-01-01T00:00:01"
> replace="all"/>
> </analyzer>
> </fieldType>
>
> It is more than possible, that I misunderstand something. What I
> like to do is to 'normalize' somehow the input data, and I thought
> that it is more effective in the Solr side, than in the client.
>
> Have you got any advise, how I may continue?
>
> Péter
>
--------------------------
Grant Ingersoll
http://www.lucidimagination.com/
Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search
Re: date field type problem
Posted by Chris Hostetter <ho...@fucit.org>.
: solr.DateField compatible format. I wrote a new <fieldType>
: definition inside the solrconfig.xml, which creates
: eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
<analyzer> is only supported when the class of the <fieldType> is
TextField ... it would be nice if it worked with any other field type (i
think it would mainly just require removing an instanceof check somewhere)
but since analyzers only work on the *indexed* value it wouldn't help with
cleaning up the *stored* value.
At the moment, general data cleanup like this (that affects the stored
and indexed value) can only be done using an UpdateProcessor.
-Hoss