You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Nick Burch (JIRA)" <ji...@apache.org> on 2010/01/19 17:42:54 UTC

[jira] Created: (TIKA-365) Extract more OpenDocument metadata

Extract more OpenDocument metadata
----------------------------------

                 Key: TIKA-365
                 URL: https://issues.apache.org/jira/browse/TIKA-365
             Project: Tika
          Issue Type: Improvement
          Components: metadata
    Affects Versions: 0.6
            Reporter: Nick Burch
            Priority: Minor


The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.

There's also stubbed-out some user defined metadata support, but this is disabled as it will require some work in core that I hope to add later. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined> and so we'll need to use an attribute to know the name of the metadata to save as)

Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Resolved: (TIKA-365) Extract more OpenDocument metadata

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Jukka Zitting resolved TIKA-365.
--------------------------------

       Resolution: Fixed
    Fix Version/s: 0.7
         Assignee: Jukka Zitting

Thanks! Patch applied in revision 903187.

PS. There were some apparently forced line breaks in the patch that I needed to fix manually before the patch applied cleanly.

> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: oo-metadata.patch, testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-365) Extract more OpenDocument metadata

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch updated TIKA-365:
----------------------------

    Description: 
The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.

There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.

Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

  was:
The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.

There's also stubbed-out some user defined metadata support, but this is disabled as it will require some work in core that I hope to add later. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined> and so we'll need to use an attribute to know the name of the metadata to save as)

Also included are several more tests for the OpenDocument parser, and one more test file to go with this.


> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Priority: Minor
>         Attachments: testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-365) Extract more OpenDocument metadata

Posted by "Ingo Renner (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837878#action_12837878 ] 

Ingo Renner commented on TIKA-365:
----------------------------------

Somehow I only get a very limited set of meta data on the command line (trunk export from a few minutes ago):

java -jar tika-app-0.7-SNAPSHOT.jar -m ~/tika-0.7/trunk/tika-parsers/src/test/resources/test-documents/testOpenOffice2.odf 
Content-Length: 10977
Content-Type: application/zip
resourceName: testOpenOffice2.odf

Is that a known issue / limitation or is there something wrong on my end? (OS X 10.6.2)


> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: oo-metadata.patch, testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-365) Extract more OpenDocument metadata

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch updated TIKA-365:
----------------------------

    Attachment: testOpenOffice2.odf
                oo-metadata.patch

Patch and extra test file it needs

> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Priority: Minor
>         Attachments: oo-metadata.patch, testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also stubbed-out some user defined metadata support, but this is disabled as it will require some work in core that I hope to add later. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined> and so we'll need to use an attribute to know the name of the metadata to save as)
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-365) Extract more OpenDocument metadata

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch updated TIKA-365:
----------------------------

    Attachment: oo-metadata.patch

Updated version of the patch with custom metadata support

> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Priority: Minor
>         Attachments: oo-metadata.patch, testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Commented: (TIKA-365) Extract more OpenDocument metadata

Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12837886#action_12837886 ] 

Uwe Schindler commented on TIKA-365:
------------------------------------

The problem is currently that not all file extensions of OpenDocument are in the mime.types, the file is simply only detected as ZIP file. We need some generic OpenDocument matcher pattern that is more specific than the ZIP file one, like for MSOffice formats (old and -x formats). One idea is to look for contents.xml or metadata.xml in the pattern.

> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Assignee: Jukka Zitting
>            Priority: Minor
>             Fix For: 0.7
>
>         Attachments: oo-metadata.patch, testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


[jira] Updated: (TIKA-365) Extract more OpenDocument metadata

Posted by "Nick Burch (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Nick Burch updated TIKA-365:
----------------------------

    Attachment:     (was: oo-metadata.patch)

> Extract more OpenDocument metadata
> ----------------------------------
>
>                 Key: TIKA-365
>                 URL: https://issues.apache.org/jira/browse/TIKA-365
>             Project: Tika
>          Issue Type: Improvement
>          Components: metadata
>    Affects Versions: 0.6
>            Reporter: Nick Burch
>            Priority: Minor
>         Attachments: testOpenOffice2.odf
>
>
> The attached patch adds support for a few more kinds of OpenDocument metadata. These are added to the metadata object much like the existing ones.
> There's also support for  user defined metadata support. (Custom Metadata is stored in lines like <meta:user-defined meta:name="Info 1">Text 1</meta:user-defined>). There's a new MetadataHandler, AttributeDependantMetadataHandler, which can use the value of an attribute on the node to decide what to call the metadata when done with the node.
> Also included are several more tests for the OpenDocument parser, and one more test file to go with this.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.