You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/06/16 16:38:22 UTC

[jira] Created: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

Image extractors use inconsistent metadata keys and formats for common features
-------------------------------------------------------------------------------

                 Key: TIKA-442
                 URL: https://issues.apache.org/jira/browse/TIKA-442
             Project: Tika
          Issue Type: Improvement
          Components: metadata, parser
    Affects Versions: 0.7
            Reporter: Nick Burch
            Priority: Minor


Currently Tika has a number of parsers for image formats, but the way they return their data is inconsistent. For example:

Jpeg: "Image Width" = "420 pixels", "Data Precision" = "8 bits"
Gif: "width" = "420"
Png: "width" = "420", "IHDR" = ".... bitDepth = 8 ....."
Bmp: "width" = "420", "BitsPerSample" = "8 8 8"

I think that the common keys, such as width and height, should be returned in a consistent format of key and value. If someone would like to suggest the namespace for this (maybe under XMDPM), and the short or long form (eg 420 vs 420 pixels), then I'm happy to work up a patch for this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12880484#action_12880484 ] 

Jukka Zitting commented on TIKA-442:
------------------------------------

I added a new class o.a.t.metadata.TIFF for these constants. Feel free to add more entries from http://www.adobe.com/devnet/xmp/pdfs/XMPSpecificationPart2.pdf if needed.

> Image extractors use inconsistent metadata keys and formats for common features
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-442
>                 URL: https://issues.apache.org/jira/browse/TIKA-442
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Priority: Minor
>
> Currently Tika has a number of parsers for image formats, but the way they return their data is inconsistent. For example:
> Jpeg: "Image Width" = "420 pixels", "Data Precision" = "8 bits"
> Gif: "width" = "420"
> Png: "width" = "420", "IHDR" = ".... bitDepth = 8 ....."
> Bmp: "width" = "420", "BitsPerSample" = "8 8 8"
> I think that the common keys, such as width and height, should be returned in a consistent format of key and value. If someone would like to suggest the namespace for this (maybe under XMDPM), and the short or long form (eg 420 vs 420 pixels), then I'm happy to work up a patch for this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Closed: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch closed TIKA-442.
---------------------------

    Fix Version/s: 0.8
       Resolution: Fixed

I've updated the parsers to use the new metadata entries, in r958581.

> Image extractors use inconsistent metadata keys and formats for common features
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-442
>                 URL: https://issues.apache.org/jira/browse/TIKA-442
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Priority: Minor
>             Fix For: 0.8
>
>
> Currently Tika has a number of parsers for image formats, but the way they return their data is inconsistent. For example:
> Jpeg: "Image Width" = "420 pixels", "Data Precision" = "8 bits"
> Gif: "width" = "420"
> Png: "width" = "420", "IHDR" = ".... bitDepth = 8 ....."
> Bmp: "width" = "420", "BitsPerSample" = "8 8 8"
> I think that the common keys, such as width and height, should be returned in a consistent format of key and value. If someone would like to suggest the namespace for this (maybe under XMDPM), and the short or long form (eg 420 vs 420 pixels), then I'm happy to work up a patch for this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879515#action_12879515 ] 

Jukka Zitting commented on TIKA-442:
------------------------------------

I'd go with XMP as much as possible. XMP leverages Exif for image metadata, and the most relevant fields are probably:

    tiff:ImageLength
    tiff:ImageWidth
    tiff:SamplesPerPixel
    tiff:BitsPerSample


> Image extractors use inconsistent metadata keys and formats for common features
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-442
>                 URL: https://issues.apache.org/jira/browse/TIKA-442
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Priority: Minor
>
> Currently Tika has a number of parsers for image formats, but the way they return their data is inconsistent. For example:
> Jpeg: "Image Width" = "420 pixels", "Data Precision" = "8 bits"
> Gif: "width" = "420"
> Png: "width" = "420", "IHDR" = ".... bitDepth = 8 ....."
> Bmp: "width" = "420", "BitsPerSample" = "8 8 8"
> I think that the common keys, such as width and height, should be returned in a consistent format of key and value. If someone would like to suggest the namespace for this (maybe under XMDPM), and the short or long form (eg 420 vs 420 pixels), then I'm happy to work up a patch for this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-442) Image extractors use inconsistent metadata keys and formats for common features

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12879532#action_12879532 ] 

Nick Burch commented on TIKA-442:
---------------------------------

OK, I'll work up a patch that uses these keys, hopefully some time next week

If you get a chance in the mean time, do please add those entries to the XMPDM class ready :)

> Image extractors use inconsistent metadata keys and formats for common features
> -------------------------------------------------------------------------------
>
>                 Key: TIKA-442
>                 URL: https://issues.apache.org/jira/browse/TIKA-442
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata, parser
>    Affects Versions: 0.7
>            Reporter: Nick Burch
>            Priority: Minor
>
> Currently Tika has a number of parsers for image formats, but the way they return their data is inconsistent. For example:
> Jpeg: "Image Width" = "420 pixels", "Data Precision" = "8 bits"
> Gif: "width" = "420"
> Png: "width" = "420", "IHDR" = ".... bitDepth = 8 ....."
> Bmp: "width" = "420", "BitsPerSample" = "8 8 8"
> I think that the common keys, such as width and height, should be returned in a consistent format of key and value. If someone would like to suggest the namespace for this (maybe under XMDPM), and the short or long form (eg 420 vs 420 pixels), then I'm happy to work up a patch for this

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.