You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Swapna Vuppala <Sw...@arup.com> on 2011/11/11 06:22:04 UTC
Problem indexing msg files
Hi,
Am using Tika to index .msg files of Outlook. It has been working very good for me but am facing problem while indexing some .msg files. The indexing fails with the below Solr exception
SEVERE: org.apache.solr.common.SolrException: Invalid Date String:' Fri, 14 Oct 2011 12:35:51 +0100'
at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
at org.apache.solr.schema.TrieField.createField(TrieField.java:387)
at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276)
at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:137)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:142)
at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:222)
at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)
Can someone please suggest a solution to overcome this issue and index these msg files successfully ?
Thanks and Regards,
Swapna.
____________________________________________________________
Electronic mail messages entering and leaving Arup business
systems are scanned for acceptability of content and viruses
Re: Problem indexing msg files
Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 11 Nov 2011, Swapna Vuppala wrote:
> Am using Tika to index .msg files of Outlook. It has been working very
> good for me but am facing problem while indexing some .msg files. The
> indexing fails with the below Solr exception
>
> SEVERE: org.apache.solr.common.SolrException: Invalid Date String:' Fri, 14 Oct 2011 12:35:51 +0100'
> at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
> at org.apache.solr.schema.TrieField.createField(TrieField.java:387)
> at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
> at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
> at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
This would look to be a SOLR issue. Tika will try to return an ISO-8601
date string where it can, but sometimes it gets a random String.
Downstream apps will need to handle this
Nick