You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2016/01/14 10:15:35 UTC

[Bug 58858] New: hidden characters not removed

https://bz.apache.org/bugzilla/show_bug.cgi?id=58858

            Bug ID: 58858
           Summary: hidden characters not removed
           Product: POI
           Version: unspecified
          Hardware: PC
            Status: NEW
          Severity: critical
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: sebastian.a.aguirre@gmail.com

Created attachment 33442
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=33442&action=edit
sample doc file to test

After reading the file and turning it into a String the hidden characters are
not removed.
This happens in XWPF as well.

For reading the file I'm using a very simple method.

File file = new File("file.doc");
FileInputStream fis;
fis = new FileInputStream(file);
HWPFDocument doc = new HWPFDocument(fis);
WordExtractor ex = new WordExtractor(doc);
String toReturn = ex.getText();

Same thing happens when using XWPF, very simple code.

XWPFDocument doc = new XWPFDocument(fis);
XWPFWordExtractor ex = new XWPFWordExtractor(doc);
String toReturn = ex.getText();

I'm attaching a file you can use as sample.
You can show/hide the hidden characters with ctrl+shift+8

Thanks.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58858] hidden characters not removed

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58858

Dominik Stadler <do...@gmx.at> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 OS|                            |All

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 58858] hidden characters not removed

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=58858

Hamza Gobir <hg...@googlemail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |hgobir@googlemail.com

-- 
You are receiving this mail because:
You are the assignee for the bug.
---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org