You are viewing a plain text version of this content. The canonical link for it is here.
Posted to solr-user@lucene.apache.org by Alexander Cougarman <ac...@bwc.org> on 2012/08/23 11:27:04 UTC

Can't extract Outlook message files

Hi. We're trying to use the following Curl command to perform an "extract only" of *.MSG file, but it blows up:

   curl "http://localhost:8983/solr/update/extract?extractOnly=true" -F "myfile=@900002.msg"

If we do this, it works fine:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@900002.msg"

We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong?

Here's the exception the extractOnly=true command generates:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 500 null

org.apache.solr.common.SolrException
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@aaf063
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
        ... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element st
ate is zero.
        at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
        at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
        at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
        at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        ... 26 more
</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/update/extract. Reason:
<pre>    null

org.apache.solr.common.SolrException
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
from org.apache.tika.parser.microsoft.OfficeParser@aaf063
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
        ... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element st
ate is zero.
        at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
        at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
        at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
        at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        ... 26 more
</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

</body>
</html>


Sincerely,
Alex 



RE: Can't extract Outlook message files

Posted by Alexander Cougarman <ac...@bwc.org>.
This is an issue with "extractOnly=true" on Solr 3.6.1. We upgraded to 4.0 Beta 2 and the problem went away. Just in case anyone runs into this.

Sincerely,
Alex 


-----Original Message-----
From: Alexander Cougarman [mailto:acougarm@bwc.org] 
Sent: 23 August 2012 12:27 PM
To: solr-user@lucene.apache.org
Subject: Can't extract Outlook message files

Hi. We're trying to use the following Curl command to perform an "extract only" of *.MSG file, but it blows up:

   curl "http://localhost:8983/solr/update/extract?extractOnly=true" -F "myfile=@900002.msg"

If we do this, it works fine:

  curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@900002.msg"

We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong?

Here's the exception the extractOnly=true command generates:

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> <title>Error 500 null

org.apache.solr.common.SolrException
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
        ... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero.
        at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
        at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
        at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
        at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        ... 26 more
</title>
</head>
<body><h2>HTTP ERROR 500</h2>
<p>Problem accessing /solr/update/extract. Reason:
<pre>    null

org.apache.solr.common.SolrException
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:233)
        at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
ntentStreamHandlerBase.java:58)
        at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
erBase.java:129)
        at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
Request(RequestHandlers.java:244)
        at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
        at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
.java:365)
        at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
r.java:260)
        at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
Handler.java:1212)
        at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
99)
        at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
a:216)
        at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
82)
        at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
66)
        at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)

        at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
lerCollection.java:230)
        at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
java:114)
        at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
52)
        at org.mortbay.jetty.Server.handle(Server.java:326)
        at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
2)
        at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
n.java:945)
        at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
        at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
        at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
        at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
java:228)
        at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
ava:582)
Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.microsoft.OfficeParser@aaf063
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
20)
        at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
actingDocumentLoader.java:227)
        ... 23 more
Caused by: java.lang.IllegalStateException: Internal: Internal error: element st ate is zero.
        at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
wn Source)
        at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
        at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
ler.java:256)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
Decorator.java:136)
        at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
java:273)
        at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
er.java:213)
        at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
:178)
        at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
)
        ... 26 more
</pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>

<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>

</body>
</html>


Sincerely,
Alex 



Re: Can't extract Outlook message files

Posted by Erick Erickson <er...@gmail.com>.
Hmmm, it kind of looks like your file doesn't have an "id" field, but
that's just guessing based on your statement hat providing an ID
"works just fine". Does it work if you take the <uniqueKey> definition
out of your schema.xml (and you'll also
have to remove the 'required="true" ' from the id field)?

But this is a wild shot in the dark....

Best
Erick

On Thu, Aug 23, 2012 at 5:27 AM, Alexander Cougarman <ac...@bwc.org> wrote:
> Hi. We're trying to use the following Curl command to perform an "extract only" of *.MSG file, but it blows up:
>
>    curl "http://localhost:8983/solr/update/extract?extractOnly=true" -F "myfile=@900002.msg"
>
> If we do this, it works fine:
>
>   curl "http://localhost:8983/solr/update/extract?literal.id=doc1&commit=true" -F "myfile=@900002.msg"
>
> We've tried a variety of MSG files and they all produce the same error; they all have content in them. What are we doing wrong?
>
> Here's the exception the extractOnly=true command generates:
>
> <html>
> <head>
> <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
> <title>Error 500 null
>
> org.apache.solr.common.SolrException
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
> actingDocumentLoader.java:233)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> ntentStreamHandlerBase.java:58)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> erBase.java:129)
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
> Request(RequestHandlers.java:244)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> .java:365)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:260)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
> Handler.java:1212)
>         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 99)
>         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
>         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
>         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 66)
>         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>
>         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
> lerCollection.java:230)
>         at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
>         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
> 52)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
> 2)
>         at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
> n.java:945)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
> java:228)
>         at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
> ava:582)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
> from org.apache.tika.parser.microsoft.OfficeParser@aaf063
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
> )
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
> )
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
> 20)
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
> actingDocumentLoader.java:227)
>         ... 23 more
> Caused by: java.lang.IllegalStateException: Internal: Internal error: element st
> ate is zero.
>         at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
> wn Source)
>         at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
>         at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
> ler.java:256)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
> java:273)
>         at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
> er.java:213)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
> :178)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
> )
>         ... 26 more
> </title>
> </head>
> <body><h2>HTTP ERROR 500</h2>
> <p>Problem accessing /solr/update/extract. Reason:
> <pre>    null
>
> org.apache.solr.common.SolrException
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
> actingDocumentLoader.java:233)
>         at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(Co
> ntentStreamHandlerBase.java:58)
>         at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandl
> erBase.java:129)
>         at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handle
> Request(RequestHandlers.java:244)
>         at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
>         at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter
> .java:365)
>         at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilte
> r.java:260)
>         at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(Servlet
> Handler.java:1212)
>         at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:3
> 99)
>         at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.jav
> a:216)
>         at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:1
> 82)
>         at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:7
> 66)
>         at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
>
>         at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHand
> lerCollection.java:230)
>         at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.
> java:114)
>         at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:1
> 52)
>         at org.mortbay.jetty.Server.handle(Server.java:326)
>         at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:54
> 2)
>         at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnectio
> n.java:945)
>         at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
>         at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
>         at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
>         at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.
> java:228)
>         at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.j
> ava:582)
> Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException
> from org.apache.tika.parser.microsoft.OfficeParser@aaf063
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244
> )
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
> )
>         at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:1
> 20)
>         at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(Extr
> actingDocumentLoader.java:227)
>         ... 23 more
> Caused by: java.lang.IllegalStateException: Internal: Internal error: element st
> ate is zero.
>         at org.apache.xml.serialize.BaseMarkupSerializer.leaveElementState(Unkno
> wn Source)
>         at org.apache.xml.serialize.XMLSerializer.endElementIO(Unknown Source)
>         at org.apache.xml.serialize.XMLSerializer.endElement(Unknown Source)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.SecureContentHandler.endElement(SecureContentHand
> ler.java:256)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.ContentHandlerDecorator.endElement(ContentHandler
> Decorator.java:136)
>         at org.apache.tika.sax.SafeContentHandler.endElement(SafeContentHandler.
> java:273)
>         at org.apache.tika.sax.XHTMLContentHandler.endDocument(XHTMLContentHandl
> er.java:213)
>         at org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java
> :178)
>         at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:242
> )
>         ... 26 more
> </pre></p><hr /><i><small>Powered by Jetty://</small></i><br/>
>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
> <br/>
>
> </body>
> </html>
>
>
> Sincerely,
> Alex
>
>