You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Jukka Zitting (JIRA)" <ji...@apache.org> on 2010/09/29 00:37:45 UTC

[jira] Commented: (TIKA-519) Display embedded images in the GUI Formatted Text pane where they occur in the document

    [ https://issues.apache.org/jira/browse/TIKA-519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12915948#action_12915948 ] 

Jukka Zitting commented on TIKA-519:
------------------------------------

Cool! It would be nice if we could turn this into a more generic feature that we could put into tika-core, but as a first step the patch looks great.

The only thing I'd like to see added is better tracking of the temporary files - they should get automatically removed once they are no longer needed. It would also be good to better avoid potential problems caused by file name collisions, either by mapping each embedded file to a new File.createTempFile(), or by replacing "tmpDir = t.getParentFile();" with "t.delete(); t.mkdir(); tmpDir = t;".

Even better if we could entirely avoid temporary files and the extra URL mapping by customizing the HTMLEditorKit with a decorated HTMLFactory that maps the incoming <img> tags to corresponding a ImageView objects that construct requested Images directly from the embedded byte streams. But that might well be much more trouble than it's worth... :-)

PS. A minor nit:

    if(type != null && type.startsWith("image/"))
        return true;
    return false;

is (IMHO) better written as:

    return type != null && type.startsWith("image/");


> Display embedded images in the GUI Formatted Text pane where they occur in the document
> ---------------------------------------------------------------------------------------
>
>                 Key: TIKA-519
>                 URL: https://issues.apache.org/jira/browse/TIKA-519
>             Project: Tika
>          Issue Type: New Feature
>          Components: gui
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Assignee: Nick Burch
>            Priority: Minor
>         Attachments: TikaGuiImages.patch
>
>
> Some parsers are now able to generate img tags in their HTML, in the spot where an embedded image lives
> I think it would be nice to show these images in the GUI in the graphical view. The attached patch will allow the GUI to spot when an embedded: img link is found, re-write it to be in a URL to the temporary directory, and also request the recursing parser capture it.
> The result is that you can drop a suitable file (eg .docx) onto the gui, and in the Formatted Text pane see the image inline
> Are people happy with the patch? (and the idea?)

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.