You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Peter Kiraly <pk...@tesuji.eu> on 2009/09/02 12:00:30 UTC

date field type problem

Hi Solr users,

I have a lots of dates from a library catalog in not
solr.DateField compatible format. I wrote a new <fieldType>
definition inside the solrconfig.xml, which creates
eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
It works fine when I tried it with the typical values
in the http://localhost:8983/solr/admin/analysis.jsp,
but it always throws an exception, when I try to index
the records.

<fieldType name="trickyDate" class="solr.DateField"
  sortMissingLast="true" omitNorms="true">
  <analyzer>
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory" />
    <filter class="solr.TrimFilterFactory" />
    <filter class="solr.PatternReplaceFilterFactory"
      pattern="sh..?wa \d\d? " replacement="" replace="first"/>
    <filter class="solr.PatternReplaceFilterFactory"
      pattern="june (\d\d), " replacement="" replace="first"/>
    <filter class="solr.PatternReplaceFilterFactory"
      pattern="september (\d\d), " replacement="" replace="first"/>
    <filter class="solr.PatternReplaceFilterFactory"
      pattern="(\D)" replacement="" replace="all"/>
    <filter class="solr.PatternReplaceFilterFactory"
      pattern="^(\d{4})\d*$" replacement="$1-01-01T00:00:01"
      replace="all"/>
  </analyzer>
</fieldType>

It is more than possible, that I misunderstand something. What I
like to do is to 'normalize' somehow the input data, and I thought
that it is more effective in the Solr side, than in the client.

Have you got any advise, how I may continue?

Péter


Re: date field type problem

Posted by Peter Kiraly <pk...@tesuji.eu>.
Hi,

the exception I received:

SEVERE: org.apache.solr.common.SolrException: Error while creating field 
'date_df{type=trickyDate,properties=indexed,stored,omitNorms,omitTf,multiValued,sortMissingLast}' 
from value 'c1991.'
        at org.apache.solr.schema.FieldType.createField(FieldType.java:190)
        at 
org.apache.solr.schema.SchemaField.createField(SchemaField.java:94)
        at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:244)
        at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:59)
        at 
org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
        at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
        at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
        at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1333)
        at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
        at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
        at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
        at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
        at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
        at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
        at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
        at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
        at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
        at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
        at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
        at org.mortbay.jetty.Server.handle(Server.java:285)
        at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
        at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:208)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
        at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
        at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)
Caused by: org.apache.solr.common.SolrException: Invalid Date 
String:'c1991.'
        at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
        at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
        at org.apache.solr.schema.FieldType.createField(FieldType.java:188)
        ... 27 more

My expectation is, that a field type behaves like this:
0) I give a field type as the storage type
1) I give it a string
2) with tokenizers, and filters I parse into a given form
3) the Solr handles it as the given type

for example:
0) I set the field type as "solr.DateField"
1) input string is "1991."
2) the analyzer creates "1991-01-01T00:00:00Z"
3) and as it is the normal input form of the date type, Solr
   indexes it.

It seems, that the input string ("1991.") must match to the
solr.DateField's expectation, and not the output
("1991-01-01T00:00:00Z").

So the question is: is there a solution, in which I can
"preprocess" the inputs, or it is only doable only on the client's
side.

Péter

>From: "Grant Ingersoll" <gs...@apache.org>
>Subject: Re: date field type problem


>What's the exception?

On Sep 2, 2009, at 3:00 AM, Peter Kiraly wrote:

> Hi Solr users,
>
> I have a lots of dates from a library catalog in not
> solr.DateField compatible format. I wrote a new <fieldType>
> definition inside the solrconfig.xml, which creates
> eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
> It works fine when I tried it with the typical values
> in the http://localhost:8983/solr/admin/analysis.jsp,
> but it always throws an exception, when I try to index
> the records.
>
> <fieldType name="trickyDate" class="solr.DateField"
> sortMissingLast="true" omitNorms="true">
> <analyzer>
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory" />
>   <filter class="solr.TrimFilterFactory" />
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="sh..?wa \d\d? " replacement="" replace="first"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="june (\d\d), " replacement="" replace="first"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="september (\d\d), " replacement="" replace="first"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="(\D)" replacement="" replace="all"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="^(\d{4})\d*$" replacement="$1-01-01T00:00:01"
>     replace="all"/>
> </analyzer>
> </fieldType>
>
> It is more than possible, that I misunderstand something. What I
> like to do is to 'normalize' somehow the input data, and I thought
> that it is more effective in the Solr side, than in the client.
>
> Have you got any advise, how I may continue?
>
> Péter
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: date field type problem

Posted by Grant Ingersoll <gs...@apache.org>.
What's the exception?

On Sep 2, 2009, at 3:00 AM, Peter Kiraly wrote:

> Hi Solr users,
>
> I have a lots of dates from a library catalog in not
> solr.DateField compatible format. I wrote a new <fieldType>
> definition inside the solrconfig.xml, which creates
> eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.
> It works fine when I tried it with the typical values
> in the http://localhost:8983/solr/admin/analysis.jsp,
> but it always throws an exception, when I try to index
> the records.
>
> <fieldType name="trickyDate" class="solr.DateField"
> sortMissingLast="true" omitNorms="true">
> <analyzer>
>   <tokenizer class="solr.KeywordTokenizerFactory"/>
>   <filter class="solr.LowerCaseFilterFactory" />
>   <filter class="solr.TrimFilterFactory" />
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="sh..?wa \d\d? " replacement="" replace="first"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="june (\d\d), " replacement="" replace="first"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="september (\d\d), " replacement="" replace="first"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="(\D)" replacement="" replace="all"/>
>   <filter class="solr.PatternReplaceFilterFactory"
>     pattern="^(\d{4})\d*$" replacement="$1-01-01T00:00:01"
>     replace="all"/>
> </analyzer>
> </fieldType>
>
> It is more than possible, that I misunderstand something. What I
> like to do is to 'normalize' somehow the input data, and I thought
> that it is more effective in the Solr side, than in the client.
>
> Have you got any advise, how I may continue?
>
> Péter
>

--------------------------
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:
http://www.lucidimagination.com/search


Re: date field type problem

Posted by Chris Hostetter <ho...@fucit.org>.
: solr.DateField compatible format. I wrote a new <fieldType>
: definition inside the solrconfig.xml, which creates
: eg. 1991-01-01T00:00:01Z from the input '[c1991.]' string.

<analyzer> is only supported when the class of the <fieldType> is 
TextField ... it would be nice if it worked with any other field type (i 
think it would mainly just require removing an instanceof check somewhere) 
but since analyzers only work on the *indexed* value it wouldn't help with 
cleaning up the *stored* value.

At the moment, general data cleanup like this (that affects the stored 
and indexed value) can only be done using an UpdateProcessor.


-Hoss