You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Allan <ma...@ed.ac.uk> on 2009/04/16 09:50:30 UTC

Invalid_Date_String on posting XML to the index

Hi all,

I'm encountering a problem when I try to add records with a date field  
to the index.

The records I'm adding have very little date precision, usually  
YYYYMMDD but some only have year and month, others only have a year.   
I'm trying to get around this by using a text pattern factory to  
modify the field before indexing.  This seems to work fine if the  
class is solr.TextField and a date will be converted from eg 1953 to  
1953-01-01T00:00:00.000Z and then inserted into the index.

However, if I want to have the field as an actual date field (for  
doing range searches etc) I get the following error when I post the  
XML file.

	SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953

The corresponding stack trace from the solr server is:

Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953'
	at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
	at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
	at org.apache.solr.schema.FieldType.createField(FieldType.java:179)
	at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
	at  
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java: 
243)
	at  
org 
.apache 
.solr 
.update 
.processor 
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
	at  
org 
.apache 
.solr 
.handler 
.XmlUpdateRequestHandler 
.handleRequestBody(XmlUpdateRequestHandler.java:123)
	at  
org 
.apache 
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)
	at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
	at  
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
	at org.mortbay.jetty.servlet.ServletHandler 
$CachedChain.doFilter(ServletHandler.java:1089)
	at  
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
	at  
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java: 
216)
	at  
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
	at  
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
	at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java: 
405)
	at  
org 
.mortbay 
.jetty 
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java: 
211)
	at  
org 
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java: 
114)
	at  
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
	at org.mortbay.jetty.Server.handle(Server.java:285)
	at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java: 
502)
	at org.mortbay.jetty.HttpConnection 
$RequestHandler.content(HttpConnection.java:835)
	at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
	at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
	at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
	at org.mortbay.jetty.bio.SocketConnector 
$Connection.run(SocketConnector.java:226)
	at org.mortbay.thread.BoundedThreadPool 
$PoolThread.run(BoundedThreadPool.java:442)

My schema.xml file looks something like this:

...
    <fieldType name="dateFormatter" class="solr.DateField"  
sortMissingLast="true" omitNorms="true">
		<analyzer>
			<filter class="solr.TrimFilterFactory" />
			<tokenizer class="solr.KeywordTokenizerFactory"/>
			<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4}) 
$" replacement="$1.01.01" replace="all" />
			<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\. 
(\d{2})$" replacement="$1.$2.01" replace="all" />
			<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\. 
(\d{2})\.(\d{2})$" replacement="$1-$2-$3T00:00:00.000Z" replace="all" />
		</analyzer>
    </fieldType>
...
<field name="DateRecorded" type="dateFormatter" indexed="true"  
stored="true" multiValued="false"/>
...


My thinking is that Solr is trying to add the field directly as '1953'  
before doing the text factory stuff and is therefore not in the right  
format for indexing.  Does that sound like a reasonable assumption and  
am I missing something which is causing it to go wrong?  Can anyone  
help please?

I was originally storing the date in YYMMDD format as a text field and  
searching with wildcards, but that strikes me as somewhat  
inefficient.  I could go back to doing that if necessary, but I'd  
rather do it the right way if I can.

Many thanks for your help.

Mark
PS. Apologies if this message comes through twice - I sent it  
yesterday afternoon but it hasn't turned up on the mailing list yet,  
so I'm trying again.

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Invalid_Date_String on posting XML to the index

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan <ma...@ed.ac.uk> wrote:

>
> Hi, thanks for your prompt reply.  I'm a bit confused though - the only way
> to do this is a two-step process?
>
> I have to write code to munge the XML into another document which is
> exactly the same except for the format of the Date field, and then import
> that second file?  Isn't that the whole purpose of having an analyzer with
> the solr.PatternReplaceFilterFactory filters?  What's odd is that the
> pattern replacement works if I store the field as text but not as a date.
>  Are you sure this isn't a bug?
>

Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.

There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Invalid_Date_String on posting XML to the index

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan <ma...@ed.ac.uk> wrote:

>
> Hi, thanks for your prompt reply.  I'm a bit confused though - the only way
> to do this is a two-step process?
>
> I have to write code to munge the XML into another document which is
> exactly the same except for the format of the Date field, and then import
> that second file?  Isn't that the whole purpose of having an analyzer with
> the solr.PatternReplaceFilterFactory filters?  What's odd is that the
> pattern replacement works if I store the field as text but not as a date.
>  Are you sure this isn't a bug?
>

Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.

There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.

-- 
Regards,
Shalin Shekhar Mangar.

Re: Invalid_Date_String on posting XML to the index

Posted by Mark Allan <ma...@ed.ac.uk>.
On 16 Apr 2009, at 9:00 am, Shalin Shekhar Mangar wrote:

> On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan <ma...@ed.ac.uk>  
> wrote:
>
>> My thinking is that Solr is trying to add the field directly as  
>> '1953'
>> before doing the text factory stuff and is therefore not in the  
>> right format
>> for indexing.  Does that sound like a reasonable assumption and am  
>> I missing
>> something which is causing it to go wrong?  Can anyone help please?
>
> That is correct. You'll need to do the date creation in your own  
> code so
> that you send a well-formed date to Solr.


Hi, thanks for your prompt reply.  I'm a bit confused though - the  
only way to do this is a two-step process?

I have to write code to munge the XML into another document which is  
exactly the same except for the format of the Date field, and then  
import that second file?  Isn't that the whole purpose of having an  
analyzer with the solr.PatternReplaceFilterFactory filters?  What's  
odd is that the pattern replacement works if I store the field as text  
but not as a date.  Are you sure this isn't a bug?

Mark

-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


Re: Invalid_Date_String on posting XML to the index

Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan <ma...@ed.ac.uk> wrote:

>
> My thinking is that Solr is trying to add the field directly as '1953'
> before doing the text factory stuff and is therefore not in the right format
> for indexing.  Does that sound like a reasonable assumption and am I missing
> something which is causing it to go wrong?  Can anyone help please?


That is correct. You'll need to do the date creation in your own code so
that you send a well-formed date to Solr.

-- 
Regards,
Shalin Shekhar Mangar.