You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@pdfbox.apache.org by "Tim Allison (Jira)" <ji...@apache.org> on 2021/04/16 13:26:00 UTC

[jira] [Commented] (PDFBOX-5166) Implement RichMedia annotation

    [ https://issues.apache.org/jira/browse/PDFBOX-5166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17323809#comment-17323809 ] 

Tim Allison commented on PDFBOX-5166:
-------------------------------------

Completely unsurprisingly, [~tilman] has already shown how to extract these files on SO: https://stackoverflow.com/questions/45460027/what-is-the-best-way-to-extract-embedded-flash-file-from-a-pdf-using-the-pdfbox

If this is a "not going to fix", no problem!  I'm happy to put that code into Tika for now, and if a RichMedia annotation gets implemented in PDFBox, I can update our code accordingly.

> Implement RichMedia annotation
> ------------------------------
>
>                 Key: PDFBOX-5166
>                 URL: https://issues.apache.org/jira/browse/PDFBOX-5166
>             Project: PDFBox
>          Issue Type: Task
>            Reporter: Tim Allison
>            Priority: Major
>         Attachments: testFlashInPDF.pdf
>
>
> See TIKA-3359.  The attached file as an embedded Flash/swf file.  Tika is not currently extracting the embedded file.
> In the debugger, I can see the Annotation as a PDAnnotationUnknown.  In the COSDictionary, I can see the subtype is "RichMedia".  If someone has the time, it'd be great to implement this so that we can extract more attachments in Tika...  Obv, others may find use too. :D
> Many thanks to Tyler Thorsted for the test file and many thanks to @terminalboredom and @beet_keeper.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@pdfbox.apache.org
For additional commands, e-mail: dev-help@pdfbox.apache.org