You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@tika.apache.org by "Karl Heinz Marbaise (JIRA)" <ji...@apache.org> on 2009/05/21 22:53:45 UTC
[jira] Created: (TIKA-231) Difference between Web-Site and real
working code
Difference between Web-Site and real working code
-------------------------------------------------
Key: TIKA-231
URL: https://issues.apache.org/jira/browse/TIKA-231
Project: Tika
Issue Type: Bug
Components: documentation
Affects Versions: 0.3
Environment: All
Reporter: Karl Heinz Marbaise
Priority: Minor
On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715273#action_12715273 ]
Uwe Schindler commented on TIKA-231:
------------------------------------
This is incorrect in your commit:
bq. The older sxc, sxw formats (OpenOffice 1.0) are not supported.
They are supported!
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Assignee: Jukka Zitting
> Priority: Minor
> Attachments: TIKA-231.patch
>
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Karl Heinz Marbaise (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711802#action_12711802 ]
Karl Heinz Marbaise commented on TIKA-231:
------------------------------------------
I have observed that i can parse OpenOffice .odp files as well and get a result. So this should be documented as well.
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Priority: Minor
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12711819#action_12711819 ]
Uwe Schindler commented on TIKA-231:
------------------------------------
Yes ODP and other StarOffice/OpenDocument files work since TIKA-172, even basic formatting and tables are extracted and written to the XHTML SAX stream.
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Priority: Minor
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Resolved: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Jukka Zitting resolved TIKA-231.
--------------------------------
Resolution: Fixed
Assignee: Jukka Zitting
Patch applied in revision 780831. Thanks!
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Assignee: Jukka Zitting
> Priority: Minor
> Attachments: TIKA-231.patch
>
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Uwe Schindler (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712306#action_12712306 ]
Uwe Schindler commented on TIKA-231:
------------------------------------
sxw & co files from OpenOffice 1.0 are supported (so the pre-release of OpenDocument with the other sun-specific namespaces). The mapping is done using a SAX filter, that rewrites the outdated namespaces to the new ones.
The problem is currently only mime-types.conf, that only detects sxw, the other signatures should be added soon). My idea would be to use a internal catch-all mime-type (like for office) for all Open Document types. When I am back home, I will prepare a patch.
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Priority: Minor
> Attachments: TIKA-231.patch
>
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12715278#action_12715278 ]
Jukka Zitting commented on TIKA-231:
------------------------------------
Thanks, I updated the documentation accordingly.
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Assignee: Jukka Zitting
> Priority: Minor
> Attachments: TIKA-231.patch
>
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Commented: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Jukka Zitting (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12712280#action_12712280 ]
Jukka Zitting commented on TIKA-231:
------------------------------------
Good point. Do you have a patch for this? The site sources are in src/site/apt within Tika trunk.
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Priority: Minor
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.
[jira] Updated: (TIKA-231) Difference between Web-Site and real
working code
Posted by "Karl Heinz Marbaise (JIRA)" <ji...@apache.org>.
[ https://issues.apache.org/jira/browse/TIKA-231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Karl Heinz Marbaise updated TIKA-231:
-------------------------------------
Attachment: TIKA-231.patch
Take a look at the text and do a review on it.
> Difference between Web-Site and real working code
> -------------------------------------------------
>
> Key: TIKA-231
> URL: https://issues.apache.org/jira/browse/TIKA-231
> Project: Tika
> Issue Type: Bug
> Components: documentation
> Affects Versions: 0.3
> Environment: All
> Reporter: Karl Heinz Marbaise
> Priority: Minor
> Attachments: TIKA-231.patch
>
>
> On the official web site there is written that OpenOffice files will not be scanned or to be more accurate "TODO", but if i scan a tar.gz / zip archive with open office files their contents will be extracted. So I think the web site should be updated to document the correct state of code.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.