You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/05/15 21:25:06 UTC

[jira] [Created] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Erik Peterson created TIKA-918:
----------------------------------

             Summary: iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
                 Key: TIKA-918
                 URL: https://issues.apache.org/jira/browse/TIKA-918
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0
         Environment: Windows 7, 64 bit
            Reporter: Erik Peterson
            Priority: Minor


Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13404532#comment-13404532 ] 

Jukka Zitting commented on TIKA-918:
------------------------------------

Do you have a test case that illustrates this problem?
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Minor
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13496357#comment-13496357 ] 

Erik Peterson commented on TIKA-918:
------------------------------------

sent.
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480662#comment-13480662 ] 

Erik Peterson commented on TIKA-918:
------------------------------------

I'd have to get access to another Mac system, I've lost touch with the contact we initially were working with.  If it's critical I can email it to someone else with access for trimming the file down.
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Peterson updated TIKA-918:
-------------------------------

    Attachment: testNumbersTemplateCharts.numbers

numbers file with Charts embedded.  Nothing about the chart is being parsed at this time.
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13503502#comment-13503502 ] 

Dave Meikle commented on TIKA-918:
----------------------------------

Erik - I am afraid I have not received it. What email did you send it to?
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned TIKA-918:
---------------------------------------

    Assignee: Michael McCandless
    
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13439044#comment-13439044 ] 

Erik Peterson commented on TIKA-918:
------------------------------------

I can append a sample document illustrating the issue.  
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Minor
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13453082#comment-13453082 ] 

Michael McCandless commented on TIKA-918:
-----------------------------------------

I committed the fix for numbers docs; Erik do you have a Keynote and Pages example...?
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456382#comment-13456382 ] 

Michael McCandless commented on TIKA-918:
-----------------------------------------

Maybe try to whittle down the Keynote example?  Ideally it'd be a minimal test case showing the issue.
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Dave Meikle (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13480684#comment-13480684 ] 

Dave Meikle commented on TIKA-918:
----------------------------------

Erik - feel free to fire it to me and I can slim the file down.
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-918:
------------------------------------

    Attachment: TIKA-918.patch

Patch, extracting the title of charts from numbers docs.  The chart comes out like this:
{noformat}
<div class="chart"><h1>Chart Title</h1></div>
{noformat}
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (TIKA-918) iWork Charts not being parsed in all products (Pages, Numbers, Keynote)

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13456021#comment-13456021 ] 

Erik Peterson commented on TIKA-918:
------------------------------------

I do not have a pages example, but I do have a Keynote example.  However it's over the limit on file upload sizes.  > 10MB .  
                
> iWork Charts not being parsed in all products (Pages, Numbers, Keynote)
> -----------------------------------------------------------------------
>
>                 Key: TIKA-918
>                 URL: https://issues.apache.org/jira/browse/TIKA-918
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Minor
>         Attachments: testNumbersTemplateCharts.numbers, TIKA-918.patch
>
>
> Charts titles, axis', and other textual information is all being ignored by the TIKA parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira