You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2013/03/29 20:06:07 UTC
[Bug 54771] New: Read text from Cover Page, Table of Contents and
Bibliography
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
Bug ID: 54771
Summary: Read text from Cover Page, Table of Contents and
Bibliography
Product: POI
Version: 3.9-dev
Hardware: All
OS: All
Status: NEW
Severity: enhancement
Priority: P2
Component: XWPF
Assignee: dev@poi.apache.org
Reporter: vikas.garg@blackboard.com
Classification: Unclassified
Currently, XWPFWordExtractor.getText() is not reading text from Cover Page,
Table Of Contents or Bibliography parts of the docx file. Are there any plans
to add the support for extracting the text from these parts? If so then, will
it be in next release? OR Is there any other API available to do so?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 54771] Read text from Cover Page, Table of Contents and
Bibliography
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
--- Comment #3 from Tim Allison <ta...@mitre.org> ---
Created attachment 31704
--> https://issues.apache.org/bugzilla/attachment.cgi?id=31704&action=edit
rough draft of patch
Rough draft of patch attached. I need to clean up a few things before I commit
(end of the week?). All feedback welcome. Thank you!
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 54771] Read text from Cover Page, Table of Contents and
Bibliography
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
--- Comment #4 from Nick Burch <ap...@gagravarr.org> ---
At first glance the patch looks promising
Any chance you could also look at updating appendTableText in XWPFWordExtractor
with similar logic to in your updated unit test?
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 54771] Read text from Cover Page, Table of Contents and
Bibliography
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
--- Comment #2 from Tim Allison <ta...@mitre.org> ---
Vladimir Glina just submitted test docs over on TIKA-1317. This issue is
related to POI-54849, which got most SDTs but apparently didn't capture this
case. I'll try to fix this soon.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 54771] Read text from Cover Page, Table of Contents and
Bibliography
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
--- Comment #1 from Nick Burch <ap...@gagravarr.org> ---
Apache Tika might be a better bet - it uses Apache POI internally but pulls out
a richer set of text and styling
Otherwise, please submit a patch to enhance XWPFWordExtractor if it isn't doing
everything required!
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 54771] Read text from Cover Page, Table of Contents and
Bibliography
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
--- Comment #5 from Tim Allison <ta...@mitre.org> ---
Thank you, Nick!
There's a slight difference between the test's extractSDTs and the way that
XWPFDocumentExtractor works. The general goal is to return all text
recursively from an XWPFSDTCell's content object; this is what the extractor
calls. The test recursively goes through all objects to gather the SDTs, so
that we can test numbers of SDTs and text within them.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 54771] Read text from Cover Page, Table of Contents and
Bibliography
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=54771
Tim Allison <ta...@mitre.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |RESOLVED
Resolution|--- |FIXED
--- Comment #6 from Tim Allison <ta...@mitre.org> ---
Fixed r1602960.
Thank you, Vikas, for submitting this issue.
Thank you, Vladimir, for submitting test docs on TIKA-1317.
Thank you, Nick, for your review.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org