You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Josh Burchard (Jira)" <ji...@apache.org> on 2023/01/26 23:00:00 UTC

[jira] [Created] (TIKA-3961) When a parser exception happens, the "resourceName" key becomes "esourceName"

Josh Burchard created TIKA-3961:
-----------------------------------

             Summary: When a parser exception happens, the "resourceName" key becomes "esourceName"
                 Key: TIKA-3961
                 URL: https://issues.apache.org/jira/browse/TIKA-3961
             Project: Tika
          Issue Type: Bug
          Components: core
    Affects Versions: 2.4.1
         Environment: Windows 10.   Tika 2.4.1.  Tika server.   
            Reporter: Josh Burchard


Test env: Windows 10
Tika 2.4.1, tika server

 

In my config I've specified:
    <metadataFilter class="org.apache.tika.metadata.filter.IncludeFieldMetadataFilter">
      <params>
        <include>
          <field>X-TIKA:content</field>
          <field>dc:creator</field>
          <field>dc:title</field>
          <field>resourceName</field>
          <field>X-TIKA:EXCEPTION:container_exception</field>
        </include>
      </params>
    </metadataFilter>
 

For a password-protected docx file Tika returns the following (see bold txt at the bottom):
[{"X-TIKA:EXCEPTION:container_exception":"org.apache.poi.EncryptedDocumentException: java.security.NoSuchAlgorithmException: Cannot find any provider supporting AES/CBC/NoPadding\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions[7B14:0002-7080] java:274)\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:223)\r\n\tat org.apache.poi.poifs.crypt.agile.AgileDecryptor.hashInput(AgileDecryptor.java:196)\r\n\tat org.apache.poi.poifs.crypt.agile.AgileDecryptor.verifyPasswrd(AgileDecryptor.java:102)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:261)\r\n\tat org.apache.tika.parser.microsoft.OfficeParser.parse(OfficeParser.java:175)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositParser.java:298)\r\n\tat org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:298)\r\n\tat org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:167)\r\n\tat org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWraper.java:163)\r\n\tat org.apache.tika.server.core.resource.TikaResource.parse(TikaResource.java:352)\r\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.parseMetadata(RecursiveMetadataResource.java:78)\r\n\tat org.apache.tika.server.cor.resource.RecursiveMetadataResource.parseMetadataToMetadataList(RecursiveMetadataResource.java:190)\r\n\tat org.apache.tika.server.core.resource.RecursiveMetadataResource.getMetadata(RecursiveMetadataResource.java:179)\r\n\tat sun.reflect.GeneratedMethodAcessor7.invoke(Unknown Source)\r\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\r\n\tat java.lang.reflect.Method.invoke(Method.java:498)\r\n\tat org.apache.cxf.service.invoker.AbstractInvoker.performInvocation(bstractInvoker.java:179)\r\n\tat org.apache.cxf.service.invoker.AbstractInvoker.invoke(AbstractInvoker.java:96)\r\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:201)\r\n\tat org.apache.cxf.jaxrs.JAXRSInvoker.invoke(JAXRSInvoker.java:104)r\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor$1.run(ServiceInvokerInterceptor.java:59)\r\n\tat org.apache.cxf.interceptor.ServiceInvokerInterceptor.handleMessage(ServiceInvokerInterceptor.java:96)\r\n\tat org.apache.cxf.phase.PhaseInterceptrChain.doIntercept(PhaseInterceptorChain.java:307)\r\n\tat org.apache.cxf.transport.ChainInitiationObserver.onMessage(ChainInitiationObserver.java:121)\r\n\tat org.apache.cxf.transport.http.AbstractHTTPDestination.invoke(AbstractHTTPDestination.java:265)\\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPDestination.doService(JettyHTTPDestination.java:247)\r\n\tat org.apache.cxf.transport.http_jetty.JettyHTTPHandler.handle(JettyHTTPHandler.java:79)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.andle(HandlerWrapper.java:127)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:235)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)\r\n\tat org.eclipse.jetty.server.handler.ScpedHandler.nextScope(ScopedHandler.java:190)\r\n\tat org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)\r\n\tat org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)\r\n\tat org.eclipse.jetty.server.hndler.ContextHandlerCollection.handle(ContextHandlerCollection.java:191)\r\n\tat org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)\r\n\tat org.eclipse.jetty.server.Server.handle(Server.java:516)\r\n\tat org.eclipse.jetty.servr.HttpChannel.lambda$handle$1(HttpChannel.java:487)\r\n\tat org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)\r\n\tat org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:479)\r\n\tat org.eclipse.jetty.server.HttpConnection.onFilable(HttpConnection.java:277)\r\n\tat org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)\r\n\tat org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:105)\r\n\tat org.eclipse.jetty.io.ChannelEndPoint$1.run(hannelEndPoint.java:104)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.runTask(EatWhatYouKill.java:338)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.doProduce(EatWhatYouKill.java:315)\r\n\tat org.eclipse.jetty.util.thread.trategy.EatWhatYouKill.tryProduce(EatWhatYouKill.java:173)\r\n\tat org.eclipse.jetty.util.thread.strategy.EatWhatYouKill.run(EatWhatYouKill.java:131)\r\n\tat org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.jaa:409)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:883)\r\n\tat org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1034)\r\n\tat java.lang.Thread.run(Thread.java:827)\r\nCaused by: java.ecurity.NoSuchAlgorithmException: Cannot find any provider supporting AES/CBC/NoPadding\r\n\tat javax.crypto.Cipher.getInstance(Cipher.java:543)\r\n\tat org.apache.poi.poifs.crypt.CryptoFunctions.getCipher(CryptoFunctions.java:258)\r\n\t... 51 more\r\n",{*}"esourceName":"encrypted.docx"{*}}]

 

If I disable return of the exception meta, then resourceName is returned correctly:
[8D84:0002-60C4] 01/26/2023 05:45:58 PM DEBUG_TIKA write_callback - ptr = t:
[\{"resourceName":"encrypted.docx"}]

 

Believe this is reproducible with any password-protected docx file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)