You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2017/01/10 17:18:45 UTC

[Bug 60570] New: Add rudimentary EMF read-only capability

https://bz.apache.org/bugzilla/show_bug.cgi?id=60570

            Bug ID: 60570
           Summary: Add rudimentary EMF read-only capability
           Product: POI
           Version: 3.16-dev
          Hardware: PC
            Status: NEW
          Severity: normal
          Priority: P2
         Component: POI Overall
          Assignee: dev@poi.apache.org
          Reporter: tallison@mitre.org
  Target Milestone: ---

Created attachment 34605
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34605&action=edit
initial patch

It would be useful to start building up some emf parsing functionality.  EMFs
can contain text as well as fully embedded documents.

A full EMF parser would take some time; let's focus on extraction first.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60570] Add rudimentary EMF read-only capability

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570

Tim Allison <ta...@mitre.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

--- Comment #3 from Tim Allison <ta...@mitre.org> ---
I made everything @Internal and dumped this all in scratchpad.  Let me know
what you think.

Obviously, I have to strip out the static calls in the test files...
<face_palm/>

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60570] Add rudimentary EMF read-only capability

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570

--- Comment #1 from Tim Allison <ta...@mitre.org> ---
Created attachment 34606
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34606&action=edit
test file 1

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60570] Add rudimentary EMF read-only capability

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570

Tim Allison <ta...@mitre.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |FIXED
             Status|NEW                         |RESOLVED

--- Comment #4 from Tim Allison <ta...@mitre.org> ---
r1779493

This patch adds the capability to perform a rudimentary parse of EMF and
EMFPlus records with the goals of extracting embedded pdfs (and other binary
files) as well as wmfs.

This offers a start towards text extraction, although more work remains,
including: 
1) parsing and tracking the fonts to handle exttextouta and polytexta
2) implementation of the polytexts (I couldn't find examples)

I developed this code with emfs and wmfs extracted from commoncrawl and
govdocs1.  I only included unit tests for emfs/wmfs that I could extract from
POI's test files and/or Tika's test files.

If we're ok adding commoncrawl and/or govdocs1 docs to our unit test suite, I
can add more unit tests.

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60570] Add rudimentary EMF read-only capability

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Severity|normal                      |enhancement

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 60570] Add rudimentary EMF read-only capability

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=60570

--- Comment #2 from Tim Allison <ta...@mitre.org> ---
Created attachment 34607
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=34607&action=edit
test file 2

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org