You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Mark Allan <ma...@ed.ac.uk> on 2009/04/16 09:50:30 UTC
Invalid_Date_String on posting XML to the index
Hi all,
I'm encountering a problem when I try to add records with a date field
to the index.
The records I'm adding have very little date precision, usually
YYYYMMDD but some only have year and month, others only have a year.
I'm trying to get around this by using a text pattern factory to
modify the field before indexing. This seems to work fine if the
class is solr.TextField and a date will be converted from eg 1953 to
1953-01-01T00:00:00.000Z and then inserted into the index.
However, if I want to have the field as an actual date field (for
doing range searches etc) I get the following error when I post the
XML file.
SimplePostTool: FATAL: Solr returned an error: Invalid_Date_String1953
The corresponding stack trace from the solr server is:
Apr 15, 2009 4:27:26 PM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'1953'
at org.apache.solr.schema.DateField.parseMath(DateField.java:167)
at org.apache.solr.schema.DateField.toInternal(DateField.java:138)
at org.apache.solr.schema.FieldType.createField(FieldType.java:179)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:93)
at
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:
243)
at
org
.apache
.solr
.update
.processor
.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:58)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler.processUpdate(XmlUpdateRequestHandler.java:196)
at
org
.apache
.solr
.handler
.XmlUpdateRequestHandler
.handleRequestBody(XmlUpdateRequestHandler.java:123)
at
org
.apache
.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:
131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1204)
at
org
.apache
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:303)
at
org
.apache
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:232)
at org.mortbay.jetty.servlet.ServletHandler
$CachedChain.doFilter(ServletHandler.java:1089)
at
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:
216)
at
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:
405)
at
org
.mortbay
.jetty
.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:
211)
at
org
.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:
114)
at
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)
at org.mortbay.jetty.Server.handle(Server.java:285)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:
502)
at org.mortbay.jetty.HttpConnection
$RequestHandler.content(HttpConnection.java:835)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at org.mortbay.jetty.bio.SocketConnector
$Connection.run(SocketConnector.java:226)
at org.mortbay.thread.BoundedThreadPool
$PoolThread.run(BoundedThreadPool.java:442)
My schema.xml file looks something like this:
...
<fieldType name="dateFormatter" class="solr.DateField"
sortMissingLast="true" omitNorms="true">
<analyzer>
<filter class="solr.TrimFilterFactory" />
<tokenizer class="solr.KeywordTokenizerFactory"/>
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})
$" replacement="$1.01.01" replace="all" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\.
(\d{2})$" replacement="$1.$2.01" replace="all" />
<filter class="solr.PatternReplaceFilterFactory" pattern="^(\d{4})\.
(\d{2})\.(\d{2})$" replacement="$1-$2-$3T00:00:00.000Z" replace="all" />
</analyzer>
</fieldType>
...
<field name="DateRecorded" type="dateFormatter" indexed="true"
stored="true" multiValued="false"/>
...
My thinking is that Solr is trying to add the field directly as '1953'
before doing the text factory stuff and is therefore not in the right
format for indexing. Does that sound like a reasonable assumption and
am I missing something which is causing it to go wrong? Can anyone
help please?
I was originally storing the date in YYMMDD format as a text field and
searching with wildcards, but that strikes me as somewhat
inefficient. I could go back to doing that if necessary, but I'd
rather do it the right way if I can.
Many thanks for your help.
Mark
PS. Apologies if this message comes through twice - I sent it
yesterday afternoon but it hasn't turned up on the mailing list yet,
so I'm trying again.
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: Invalid_Date_String on posting XML to the index
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan <ma...@ed.ac.uk> wrote:
>
> Hi, thanks for your prompt reply. I'm a bit confused though - the only way
> to do this is a two-step process?
>
> I have to write code to munge the XML into another document which is
> exactly the same except for the format of the Date field, and then import
> that second file? Isn't that the whole purpose of having an analyzer with
> the solr.PatternReplaceFilterFactory filters? What's odd is that the
> pattern replacement works if I store the field as text but not as a date.
> Are you sure this isn't a bug?
>
Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.
There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.
--
Regards,
Shalin Shekhar Mangar.
Re: Invalid_Date_String on posting XML to the index
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 16, 2009 at 1:45 PM, Mark Allan <ma...@ed.ac.uk> wrote:
>
> Hi, thanks for your prompt reply. I'm a bit confused though - the only way
> to do this is a two-step process?
>
> I have to write code to munge the XML into another document which is
> exactly the same except for the format of the Date field, and then import
> that second file? Isn't that the whole purpose of having an analyzer with
> the solr.PatternReplaceFilterFactory filters? What's odd is that the
> pattern replacement works if I store the field as text but not as a date.
> Are you sure this isn't a bug?
>
Analyzers are applied only for the indexed value but not the stored value. A
value which is added to DateField is converted to the same internal format
(for both indexing and storing purposes) and then added to the index. The
DateField#toInternal method is the one which is attempting to parse the
string into a date and failing when the field is created.
There is another option. You could create a class which extends DateField
and overrides toInternal(String) to do the conversion. You can specify this
class in the schema.xml instead of DateField.
--
Regards,
Shalin Shekhar Mangar.
Re: Invalid_Date_String on posting XML to the index
Posted by Mark Allan <ma...@ed.ac.uk>.
On 16 Apr 2009, at 9:00 am, Shalin Shekhar Mangar wrote:
> On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan <ma...@ed.ac.uk>
> wrote:
>
>> My thinking is that Solr is trying to add the field directly as
>> '1953'
>> before doing the text factory stuff and is therefore not in the
>> right format
>> for indexing. Does that sound like a reasonable assumption and am
>> I missing
>> something which is causing it to go wrong? Can anyone help please?
>
> That is correct. You'll need to do the date creation in your own
> code so
> that you send a well-formed date to Solr.
Hi, thanks for your prompt reply. I'm a bit confused though - the
only way to do this is a two-step process?
I have to write code to munge the XML into another document which is
exactly the same except for the format of the Date field, and then
import that second file? Isn't that the whole purpose of having an
analyzer with the solr.PatternReplaceFilterFactory filters? What's
odd is that the pattern replacement works if I store the field as text
but not as a date. Are you sure this isn't a bug?
Mark
--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.
Re: Invalid_Date_String on posting XML to the index
Posted by Shalin Shekhar Mangar <sh...@gmail.com>.
On Thu, Apr 16, 2009 at 1:20 PM, Mark Allan <ma...@ed.ac.uk> wrote:
>
> My thinking is that Solr is trying to add the field directly as '1953'
> before doing the text factory stuff and is therefore not in the right format
> for indexing. Does that sound like a reasonable assumption and am I missing
> something which is causing it to go wrong? Can anyone help please?
That is correct. You'll need to do the date creation in your own code so
that you send a well-formed date to Solr.
--
Regards,
Shalin Shekhar Mangar.