You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/05/15 21:25:06 UTC
[jira] [Created] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Erik Peterson created TIKA-918:
----------------------------------
Summary: iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
Key: TIKA-918
URL: https://issues.apache.org/jira/browse/TIKA-918
Project: Tika
Issue Type: Bug
Components: parser
Affects Versions: 1.0
Environment: Windows 7, 64 bit
Reporter: Erik Peterson
Priority: Minor
Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404532#comment-13404532 ]
Jukka Zitting commented on TIKA-918:
------------------------------------
Do you have a test case that illustrates this problem?
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Priority: Minor
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496357#comment-13496357 ]
Erik Peterson commented on TIKA-918:
------------------------------------
sent.
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480662#comment-13480662 ]
Erik Peterson commented on TIKA-918:
------------------------------------
I'd have to get access to another Mac system, I've lost touch with the contact we initially were working with. If it's critical I can email it to someone else with access for trimming the file down.
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Erik Peterson updated TIKA-918:
-------------------------------
Attachment: testNumbersTemplateCharts.numbers
numbers file with Charts embedded. Nothing about the chart is being parsed at this time.
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503502#comment-13503502 ]
Dave Meikle commented on TIKA-918:
----------------------------------
Erik - I am afraid I have not received it. What email did you send it to?
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Assigned] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless reassigned TIKA-918:
---------------------------------------
Assignee: Michael McCandless
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439044#comment-13439044 ]
Erik Peterson commented on TIKA-918:
------------------------------------
I can append a sample document illustrating the issue.
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Priority: Minor
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453082#comment-13453082 ]
Michael McCandless commented on TIKA-918:
-----------------------------------------
I committed the fix for numbers docs; Erik do you have a Keynote and Pages example...?
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456382#comment-13456382 ]
Michael McCandless commented on TIKA-918:
-----------------------------------------
Maybe try to whittle down the Keynote example? Ideally it'd be a minimal test case showing the issue.
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480684#comment-13480684 ]
Dave Meikle commented on TIKA-918:
----------------------------------
Erik - feel free to fire it to me and I can slim the file down.
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Michael McCandless updated TIKA-918:
------------------------------------
Attachment: TIKA-918.patch
Patch, extracting the title of charts from numbers docs. The chart comes out like this:
{noformat}
<div class="chart"><h1>Chart Title</h1></div>
{noformat}
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all
products (Pages, Numbers, Keynote)
Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456021#comment-13456021 ]
Erik Peterson commented on TIKA-918:
------------------------------------
I do not have a pages example, but I do have a Keynote example. However it's over the limit on file upload sizes. > 10MB .
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
> Key: TIKA-918
> URL: https://issues.apache.org/jira/browse/TIKA-918
> Project: Tika
> Issue Type: Bug
> Components: parser
> Affects Versions: 1.0
> Environment: Windows 7, 64 bit
> Reporter: Erik Peterson
> Assignee: Michael McCandless
> Priority: Minor
> Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira