You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2008/08/08 12:10:20 UTC
DO NOT REPLY [Bug 45594] New: poi-3.5-beta1-20080718.jar - content
from the document properties of a 2003 doc document is not fully extracted.
For example, the text contained in the property "company" is not extracted.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594
Summary: poi-3.5-beta1-20080718.jar - content from the document
properties of a 2003 doc document is not fully
extracted. For example, the text contained in the
property "company" is not extracted.
Product: POI
Version: unspecified
Platform: PC
OS/Version: Windows Server 2003
Status: NEW
Severity: normal
Priority: P2
Component: POI Overall
AssignedTo: dev@poi.apache.org
ReportedBy: xtrimxtrim@yahoo.fr
The text contained in the document properties of a word 2003 document is not
extracted for all properties.
Especially, the values of the properties "Category", "Company" and "Manager"
are not extracted.
(Look in Document Properties > Summary tab).
Note that the same issue appears with excel 2003 and power point 2003
documents.
Find in attachments the JUnit test class and the document used for testing.
We expected to extract the words "testdoc" and "test phrase".
Notes on the attached documents:
- the document "classic_SummaryProperties_Company" contains the words "testdoc"
and "test phrase" in the value of the document property "Company".
"TestUnitPoi35Filter.java" is the JUnit class.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45594] poi-3.5-beta1-20080718.jar - content from
the document properties of a 2003 doc document is not fully extracted. For
example, the text contained in the property "company" is not extracted.
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594
--- Comment #2 from xtrim <xt...@yahoo.fr> 2008-08-08 03:18:38 PST ---
Created an attachment (id=22410)
--> (https://issues.apache.org/bugzilla/attachment.cgi?id=22410)
Contains JUnit test and document used for testing.
Contains JUnit test and document used for testing.
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45594] poi-3.5-beta1-20080718.jar - content from
the document properties of a 2003 doc document is not fully extracted. For
example, the text contained in the property "company" is not extracted.
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594
Nick Burch <ni...@torchbox.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
--- Comment #1 from Nick Burch <ni...@torchbox.com> 2008-08-08 03:16:17 PST ---
HPSF properties are not yet extracted
I'll look into adding a generic extractor for them, it should be generally
useful
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
DO NOT REPLY [Bug 45594] poi-3.5-beta1-20080718.jar - content from
the document properties of a 2003 doc document is not fully extracted. For
example, the text contained in the property "company" is not extracted.
Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594
Nick Burch <ni...@torchbox.com> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|ASSIGNED |RESOLVED
Resolution| |FIXED
--- Comment #3 from Nick Burch <ni...@torchbox.com> 2008-08-12 13:56:53 PST ---
Now supported in svn. You'll need to call getMetadataTextExtractor() on your
existing extractor, and that'll give you a text extractor that'll parse out
your metadata
--
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org