You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2008/08/08 12:10:20 UTC

DO NOT REPLY [Bug 45594] New: poi-3.5-beta1-20080718.jar - content from the document properties of a 2003 doc document is not fully extracted. For example, the text contained in the property "company" is not extracted.

https://issues.apache.org/bugzilla/show_bug.cgi?id=45594

           Summary: poi-3.5-beta1-20080718.jar - content from the document
                    properties of a 2003 doc document is not fully
                    extracted. For example, the text contained in the
                    property "company" is not extracted.
           Product: POI
           Version: unspecified
          Platform: PC
        OS/Version: Windows Server 2003
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
        AssignedTo: dev@poi.apache.org
        ReportedBy: xtrimxtrim@yahoo.fr


The text contained in the document properties of a word 2003 document is not
extracted for all properties.
Especially, the values of the properties "Category", "Company" and "Manager"
are not extracted.
(Look in Document Properties > Summary tab).
Note that the same issue appears with excel 2003 and power point 2003
documents.

Find in attachments the JUnit test class and the document used for testing.
We expected to extract the words "testdoc" and "test phrase".

Notes on the attached documents:

- the document "classic_SummaryProperties_Company" contains the words "testdoc"
and "test phrase" in the value of the document property "Company".

"TestUnitPoi35Filter.java" is the JUnit class.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45594] poi-3.5-beta1-20080718.jar - content from the document properties of a 2003 doc document is not fully extracted. For example, the text contained in the property "company" is not extracted.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594





--- Comment #2 from xtrim <xt...@yahoo.fr>  2008-08-08 03:18:38 PST ---
Created an attachment (id=22410)
 --> (https://issues.apache.org/bugzilla/attachment.cgi?id=22410)
Contains JUnit test and document used for testing.

Contains JUnit test and document used for testing.


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45594] poi-3.5-beta1-20080718.jar - content from the document properties of a 2003 doc document is not fully extracted. For example, the text contained in the property "company" is not extracted.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




--- Comment #1 from Nick Burch <ni...@torchbox.com>  2008-08-08 03:16:17 PST ---
HPSF properties are not yet extracted

I'll look into adding a generic extractor for them, it should be generally
useful


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 45594] poi-3.5-beta1-20080718.jar - content from the document properties of a 2003 doc document is not fully extracted. For example, the text contained in the property "company" is not extracted.

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=45594


Nick Burch <ni...@torchbox.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|ASSIGNED                    |RESOLVED
         Resolution|                            |FIXED




--- Comment #3 from Nick Burch <ni...@torchbox.com>  2008-08-12 13:56:53 PST ---
Now supported in svn. You'll need to call getMetadataTextExtractor() on your
existing extractor, and that'll give you a text extractor that'll parse out
your metadata


-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org