You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Hudson (Jira)" <ji...@apache.org> on 2021/04/16 16:07:00 UTC

[jira] [Commented] (TIKA-3359) Extract swf from PDFs

    [ https://issues.apache.org/jira/browse/TIKA-3359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323918#comment-17323918 ] 

Hudson commented on TIKA-3359:
------------------------------

SUCCESS: Integrated in Jenkins build Tika ยป tika-main-jdk8 #201 (See [https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/201/])
TIKA-3359 -- extract rich media from PDFs (tallison: [https://github.com/apache/tika/commit/601cfff8762e0bf69a6e08f2cdf09590a6dc311b])
* (edit) tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java
* (add) tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/test/resources/test-documents/testFlashInPDF.pdf
* (edit) tika-parsers/tika-parsers-classic/tika-parsers-classic-modules/tika-parser-pdf-module/src/main/java/org/apache/tika/parser/pdf/AbstractPDF2XHTML.java


> Extract swf from PDFs
> ---------------------
>
>                 Key: TIKA-3359
>                 URL: https://issues.apache.org/jira/browse/TIKA-3359
>             Project: Tika
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>             Fix For: 2.0.0
>
>
> On twitter, @terminalboredom and Tyler Thorsted shared examples of PDF files with embedded flash.  I ran -z on tika-app, and we're not extracting these files.  I suspect they're in a structure we're not currently checking.
> https://twitter.com/CHLThor/status/1382888365767360513?s=20
> https://twitter.com/sonicstacey/status/1382956466332573701?s=20
> Many thanks to @beet_keeper for putting us in touch.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)