You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Erik Peterson (JIRA)" <ji...@apache.org> on 2012/05/15 21:47:07 UTC

[jira] [Created] (TIKA-923) iWork keynote Tables are not being parsed

Erik Peterson created TIKA-923:
----------------------------------

             Summary: iWork keynote Tables are not being parsed 
                 Key: TIKA-923
                 URL: https://issues.apache.org/jira/browse/TIKA-923
             Project: Tika
          Issue Type: Bug
          Components: parser
    Affects Versions: 1.0
         Environment: Windows 7, 64 bit
            Reporter: Erik Peterson
            Priority: Critical


iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-923) iWork keynote content on master slides are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-923:
------------------------------------

    Summary: iWork keynote content on master slides are not being parsed   (was: iWork keynote Tables are not being parsed )
    
> iWork keynote content on master slides are not being parsed 
> ------------------------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: TIKA-923.patch, testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Assigned] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless reassigned TIKA-923:
---------------------------------------

    Assignee: Michael McCandless
    
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280356#comment-13280356 ] 

Michael McCandless commented on TIKA-923:
-----------------------------------------

OK, I see: the table is defined on the master slide ... so we are not extracting user-created items from master slides ... I'll dig.
                
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Resolved] (TIKA-923) iWork keynote content on master slides are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless resolved TIKA-923.
-------------------------------------

       Resolution: Fixed
    Fix Version/s: 1.2
    
> iWork keynote content on master slides are not being parsed 
> ------------------------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>             Fix For: 1.2
>
>         Attachments: TIKA-923.patch, testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Erik Peterson updated TIKA-923:
-------------------------------

    Attachment: testKeynoteTemplateTable.key

A table on slides 5,6, & 7 are not parsing
                
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-923:
------------------------------------

    Attachment: testTables.key

I created this simple test case (attached) but I see the text inside the table cells being correctly extracted on Tika's current trunk (rev 1340046).  I'll commit this as a test case...

Erik can you attach an example Keynote table that doesn't extract correctly?  Thanks.
                
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Priority: Critical
>              Labels: iwork
>         Attachments: testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Erik Peterson (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280377#comment-13280377 ] 

Erik Peterson commented on TIKA-923:
------------------------------------

I forgot to check the license box, so I now grant ASF License for the attachment 'testKeynoteTemplateTable.key
                
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Commented] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
    [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13280373#comment-13280373 ] 

Michael McCandless commented on TIKA-923:
-----------------------------------------

Erik it looks like you forgot to check the "grant ASF license" box... can you do that, so I can use this as a test file?  Thanks.
                
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

[jira] [Updated] (TIKA-923) iWork keynote Tables are not being parsed

Posted by "Michael McCandless (JIRA)" <ji...@apache.org>.
     [ https://issues.apache.org/jira/browse/TIKA-923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Michael McCandless updated TIKA-923:
------------------------------------

    Attachment: TIKA-923.patch

Patch, also visiting master slides to extract content...
                
> iWork keynote Tables are not being parsed 
> ------------------------------------------
>
>                 Key: TIKA-923
>                 URL: https://issues.apache.org/jira/browse/TIKA-923
>             Project: Tika
>          Issue Type: Bug
>          Components: parser
>    Affects Versions: 1.0
>         Environment: Windows 7, 64 bit
>            Reporter: Erik Peterson
>            Assignee: Michael McCandless
>            Priority: Critical
>              Labels: iwork
>         Attachments: TIKA-923.patch, testKeynoteTemplateTable.key, testTables.key
>
>
> iWork Keynote slides can contain tables, however these are being dropped entirely by the Tika parser.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira