You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/11/10 17:02:14 UTC

[jira] Commented: (TIKA-482) Refactor image and jpeg parsers for access to MetadataExtractor API

    [ https://issues.apache.org/jira/browse/TIKA-482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12930631#action_12930631 ] 

Nick Burch commented on TIKA-482:
---------------------------------

Sorry for the delay in finally reviewing this, now committed to Tika in r1033546.

I agree with your code comments about MetadataFields being both useful and more widely usable than just for the image stuff. Maybe someone like Jukka could cast an eye over it, and decide if it's better off being a core part of the Metadata object, or staying in a separate helper class as now but put elsewhere.

I'll keep this bug open until we've decided where to put the class

Thanks for all the work on the improvements though!

> Refactor image and jpeg parsers for access to MetadataExtractor API
> -------------------------------------------------------------------
>
>                 Key: TIKA-482
>                 URL: https://issues.apache.org/jira/browse/TIKA-482
>             Project: Tika
>          Issue Type: Improvement
>          Components: parser
>    Affects Versions: 0.7
>            Reporter: Staffan Olsson
>         Attachments: testJPEG_commented_pspcs2mac.jpg, testJPEG_commented_xnviewmp026.jpg, testTIFF.tif, TIKA-451-DublinCore_and_TIKA-482.patch, TIKA-482_exif_and_xmp.patch
>
>
> When I added support for more image metadata in TIKA-472, i realized
> the current design had some restrictions:
>  * I could not access the typed getters from Metadata Extractor, such
> as getDate (to format iso date) and getStringArray (for keywords).
>  * The handler function was called one field at a time which prevents
> logic where one field depends on the value of another (there is for
> example record versions and fields that specify encoding)
> See attached patch. It refactors TiffExtractor to MetadataExtractorExtractor.
> The patch also includes the date fix, see https://issues.apache.org/jira/browse/TIKA-451#action_12898794
> We can later add more Extractors using other libraries, and map to parsers based on format. For example we already use ImageIO in ImageParser so maybe there should be an ImageIOExtractor.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.