You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2023/01/30 14:37:00 UTC

[jira] [Commented] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

    [ https://issues.apache.org/jira/browse/TIKA-3961?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17682120#comment-17682120 ] 

Tim Allison commented on TIKA-3961:
-----------------------------------

I'm not able to reproduce this on linux at least.  I'm wondering why you're getting a different exception than I am?

I'll break out my Windows laptop and see if I can reproduce it there.  

I'm wondering if there's something weird going on with the \r on Windows?

{noformat}curl -X PUT -H "Content-Disposition: attachment; filename=something.docx" --upload-file testWORD_protected_passtika.docx http://localhost:9998/rmeta
{noformat}

On linux, I get back:

{noformat}
[{"X-TIKA:EXCEPTION:container_exception":"org.apache.tika.exception.EncryptedDocumentException: Unable to process: document is encrypted\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:262)\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\n\tat org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:163)\n\tat org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)\n\tat java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)\n\tat java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.base/java.lang.reflect.Method.invoke(Method.java:568)\n\tat org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(AbstractInvoker.java:179)\n\tat org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\n\tat org.apache.cxf.phase.PhaseInterceptorChain.doIntercept(PhaseInterceptorChain.java:307)\n\tat org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\n\tat org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:190)\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\n\tat org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\n\tat org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\n\tat org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(ChannelEndPoint.java:104)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\n\tat java.base/java.lang.Thread.run(Thread.java:833)\n","resourceName":"something.docx"}]
{noformat}

> When a parser exception happens, the "resourceName" key becomes "esourceName"
> -----------------------------------------------------------------------------
>
>                 Key: TIKA-3961
>                 URL: https://issues.apache.org/jira/browse/TIKA-3961
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 2.4.1
>         Environment: Windows 10.   Tika 2.4.1.  Tika server.   
>            Reporter: Josh Burchard
>            Priority: Major
>
> Test env: Windows 10
> Tika 2.4.1, tika server
>  
> In my config I've specified:
>     <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
>       <params>
>         <include>
>           <field>X-TIKA:content</field>
>           <field>dc:creator</field>
>           <field>dc:title</field>
>           <field>resourceName</field>
>           <field>X-TIKA:EXCEPTION:container_exception</field>
>         </include>
>       </params>
>     </metadataFilter>
>  
> For a password-protected docx file Tika returns the following (see bold txt at the bottom):
> [{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException: java.security.NoSuchAlgorithmException: Cannot find any provider supporting AES/CBC/NoPadding\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080] java:274)\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\r\n\tat org.eclipse.jetty.servr.HttpChannel.lambda$handle$1(HttpChannel.java:487)\r\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFilable(HttpConnection.java:277)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\r\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(hannelEndPoint.java:104)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\r\n\tat org.eclipse.jetty.util.thread.trategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\r\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.jaa:409)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\r\n\tat java.lang.Thread.run(Thread.java:827)\r\nCaused by: java.ecurity.NoSuchAlgorithmException: Cannot find any provider supporting AES/CBC/NoPadding\r\n\tat javax.crypto.Cipher.getInstance(Cipher.java:543)\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:258)\r\n\t... 51 more\r\n",{*}"esourceName":"encrypted.docx"{*}}]
>  
> If I disable return of the exception meta, then resourceName is returned correctly:
> [8D84:0002-60C4] 01/26/2023 05:45:58 PM DEBUG_TIKA write_callback - ptr = t:
> [\{"resourceName":"encrypted.docx"}]
>  
> Believe this is reproducible with any password-protected docx file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)