You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@tika.apache.org by Swapna Vuppala <Sw...@arup.com> on 2011/11/11 06:22:04 UTC

Problem indexing msg files

Hi,

Am using Tika to index .msg files of Outlook. It has been working very good for me but am facing problem while indexing some .msg files. The indexing fails with the below Solr exception

SEVERE: org.apache.solr.common.SolrException: Invalid Date String:' Fri, 14 Oct 2011 12:35:51 +0100'
                at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
                at org.apache.solr.schema.TrieField.createField(TrieField.java:387)
                at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
                at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
                at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)
                at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:276)
                at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
                at org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:115)
                at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:137)
                at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:142)
                at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:222)
                at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:67)
                at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
                at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:241)
                at org.apache.solr.core.SolrCore.execute(SolrCore.java:1368)

Can someone please suggest a solution to overcome this issue and index these msg files successfully ?

Thanks and Regards,
Swapna.
____________________________________________________________
Electronic mail messages entering and leaving Arup  business
systems are scanned for acceptability of content and viruses

Re: Problem indexing msg files

Posted by Nick Burch <ni...@alfresco.com>.
On Fri, 11 Nov 2011, Swapna Vuppala wrote:
> Am using Tika to index .msg files of Outlook. It has been working very 
> good for me but am facing problem while indexing some .msg files. The 
> indexing fails with the below Solr exception
>
> SEVERE: org.apache.solr.common.SolrException: Invalid Date String:' Fri, 14 Oct 2011 12:35:51 +0100'
>                at org.apache.solr.schema.DateField.parseMath(DateField.java:165)
>                at org.apache.solr.schema.TrieField.createField(TrieField.java:387)
>                at org.apache.solr.schema.TrieDateField.createField(TrieDateField.java:120)
>                at org.apache.solr.schema.SchemaField.createField(SchemaField.java:104)
>                at org.apache.solr.update.DocumentBuilder.addField(DocumentBuilder.java:203)

This would look to be a SOLR issue. Tika will try to return an ISO-8601 
date string where it can, but sometimes it gets a random String. 
Downstream apps will need to handle this

Nick