You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2017/01/10 17:18:45 UTC
[Bug 60570] New: Add rudimentary EMF read-only capability
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570
Bug ID: 60570
Summary: Add rudimentary EMF read-only capability
Product: POI
Version: 3.16-dev
Hardware: PC
Status: NEW
Severity: normal
Priority: P2
Component: POI Overall
Assignee: dev@poi.apache.org
Reporter: tallison@mitre.org
Target Milestone: ---
Created attachment 34605
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34605&action=edit
initial patch
It would be useful to start building up some emf parsing functionality. EMFs
can contain text as well as fully embedded documents.
A full EMF parser would take some time; let's focus on extraction first.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60570] Add rudimentary EMF read-only capability
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570
Tim Allison <ta...@mitre.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
OS| |All
--- Comment #3 from Tim Allison <ta...@mitre.org> ---
I made everything @Internal and dumped this all in scratchpad. Let me know
what you think.
Obviously, I have to strip out the static calls in the test files...
<face_palm/>
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60570] Add rudimentary EMF read-only capability
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570
--- Comment #1 from Tim Allison <ta...@mitre.org> ---
Created attachment 34606
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34606&action=edit
test file 1
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60570] Add rudimentary EMF read-only capability
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570
Tim Allison <ta...@mitre.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Resolution|--- |FIXED
Status|NEW |RESOLVED
--- Comment #4 from Tim Allison <ta...@mitre.org> ---
r1779493
This patch adds the capability to perform a rudimentary parse of EMF and
EMFPlus records with the goals of extracting embedded pdfs (and other binary
files) as well as wmfs.
This offers a start towards text extraction, although more work remains,
including:
1) parsing and tracking the fonts to handle exttextouta and polytexta
2) implementation of the polytexts (I couldn't find examples)
I developed this code with emfs and wmfs extracted from commoncrawl and
govdocs1. I only included unit tests for emfs/wmfs that I could extract from
POI's test files and/or Tika's test files.
If we're ok adding commoncrawl and/or govdocs1 docs to our unit test suite, I
can add more unit tests.
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60570] Add rudimentary EMF read-only capability
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570
Dominik Stadler <do...@gmx.at> changed:
What |Removed |Added
----------------------------------------------------------------------------
Severity|normal |enhancement
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org
[Bug 60570] Add rudimentary EMF read-only capability
Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570
--- Comment #2 from Tim Allison <ta...@mitre.org> ---
Created attachment 34607
--> https://bz.apache.org/bugzilla/attachment.cgi?id=34607&action=edit
test file 2
--
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org