You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2015/04/22 10:43:59 UTC

[Bug 57847] New: doc to html conversion does not create bullet points

https://bz.apache.org/bugzilla/show_bug.cgi?id=57847

            Bug ID: 57847
           Summary: doc to html conversion does not create bullet points
           Product: POI
           Version: 3.10-FINAL
          Hardware: PC
                OS: Mac OS X 10.4
            Status: NEW
          Severity: major
          Priority: P2
         Component: HWPF
          Assignee: dev@poi.apache.org
          Reporter: madhavabk@gmail.com

Created attachment 32676
  --> https://bz.apache.org/bugzilla/attachment.cgi?id=32676&action=edit
Document has some style and bullets to check

When a document is converted to HTML using WordToHtmlConverter the bullet
points are not listed as UL/LI elements in HTML copy.
due to this when code is pasted in tinymce etc does not look like bullet
points. Also the bullet marker chosen gets changed in html version

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57847] doc to html conversion does not create bullet points

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57847

--- Comment #1 from Madhava Kulkarni <ma...@gmail.com> ---
Basically following code is converting unordered list into paragraph elements.
Also, some how its picking up wrong bullet element/char

AbstractWordConverter.java:1094

 String label = AbstractWordUtils.getBulletText(
                            numberingState, hwpfList,
                            (char) paragraph.getIlvl() );

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57847] doc to html conversion does not create bullet points

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57847

--- Comment #3 from Madhava Kulkarni <ma...@gmail.com> ---
Tika did not work here but it showed another bug of Tika which removed bullets
itself. 
did like this -
java -jar tika-app-1.8.jar --html ~/Documents/sample.doc  > test.html
Here it removed the bullets itself.

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


[Bug 57847] doc to html conversion does not create bullet points

Posted by bu...@apache.org.
https://bz.apache.org/bugzilla/show_bug.cgi?id=57847

Nick Burch <ap...@gagravarr.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |NEEDINFO

--- Comment #2 from Nick Burch <ap...@gagravarr.org> ---
For slightly complicated reasons, we have two different .doc -> .html
converters, one in the POI codebase (WordToHtmlConverter) and one in the Tika
codebase (org.apache.tika.parser.microsoft.WordExtractor)

If you could, it'd be great if you could try your same file with Apache Tika,
and see if that manages to get the lists out. (Grab the tika-app jar and run it
with --html for a quick way to check)

If Apache Tika does it right, we can hopefully bring over the logic to the
AbstractWordConverter family of converters. If not, we can look to fix it in
both at the same time!

-- 
You are receiving this mail because:
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org