You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@poi.apache.org by bu...@apache.org on 2011/06/09 23:01:28 UTC

DO NOT REPLY [Bug 51351] New: New Doc to FO extractor

https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

             Bug #: 51351
           Summary: New Doc to FO extractor
           Product: POI
           Version: unspecified
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: HWPF
        AssignedTo: dev@poi.apache.org
        ReportedBy: vlsergey@gmail.com
    Classification: Unclassified


There is extractor (converter?) from DOC to FO in hdf package, but it's not
based on HWPF code. Neither support images or customization.

in patch new extractor is proposed:
 - it is based on HWPF code
 - it is using DOM creation of FO document, not string building
 - with correct implementation of ImageHandler it can even convert MathType
equations (i.e. extract WMF of those and let your ImageHandler do everything) 

Some things are not tested yet (for example, images, shapes or nested tables),
but current code already creates nice PDF documents (with Apache FOP)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #11 from Yegor Kozlov <ye...@dinom.ru> 2011-06-16 08:03:03 UTC ---
> 
> Graphic handling won't be part of extractor code. It's a lot of additional code
> AND additional libraries like Apache Batik or even ImageMagic calls. Also file
> creation and cleaning up should be coded.
> 
> So there is an empty processImage() method that should be implemented in
> subclass if anyone want image to be included in XSL FO. createExternalGraphic()
> and setImageProperties() are helper methods for those people.

I see, but we can provide default support for png/jpeg with minimum efforts! I
added the following code and it worked for me:


    protected void processImage(Element currentBlock, boolean inlined,
            Picture picture) {

        byte[] bytes = picture.getContent();
        String ext = picture.getMimeType();
        if(ext.equals("image/jpeg") || ext.equals("image/png")){
            File file = new File(picture.suggestFullFileName()); 

            try {
                // dump images in the work dir 
                FileOutputStream out = new FileOutputStream(file);
                out.write(bytes);
                out.close();

                Element graphics =
createExternalGraphic(file.toURI().toASCIIString());
                WordToFoUtils.setPictureProperties(picture, graphics);
                currentBlock.appendChild(graphics);

            } catch (IOException e){
                e.printStackTrace();
            }
        }

    }

I agree that handling other mimetypes is not trivial and may involve
third-party libraries, but jpeg and png are most commons and should be
supported by default.

Does it make sense for you?

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #7 from Yegor Kozlov <ye...@dinom.ru> 2011-06-14 09:14:02 UTC ---
Patch applied in r1135414. I added  Java main() to WordToFoExtractor, this is
how I tested your code:

Usage: WordToFoExtractor <inputFile.doc> <saveTo.fo>

Except for formatting and indentation, the output is identical to the XML in
the uploaded archive. 

> 
> Personally I intent to continue supporting this code until it will be ready for
> production usage and may be implement xls-to-fo extractor. My goal to create
> one-way converter with maximum readability, i.e. without lost text but may be
> without some formatting.
> 

Great! It will be a really valuable contribution.

A xls-to-fo converter has been asked several times on the mailing lists. We
already have ToCSV and ToHtml apps in poi-examples and  xls-to-fo can borrow/
share code from them.

> Should I also notice that doc-to-html shall be easy to implement ? ;)
> 

That would be nice to have too. 

P.S. I'm leaving this ticket open for updates. Close it if you prefer to upload
new patches in a new ticket.

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #18 from Sergey Vladimirov <vl...@gmail.com> 2011-06-24 10:08:52 UTC ---
Created attachment 27204
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27204
Workaround for NPE in Bug 33519

(In reply to comment #17)
> I resolved an old Bug 33519 which complained that HWPF failed on open a
> document and added the problematic file to our test collection. As result,
> TestWordToFoExtractorSuite started to fail on Bug33519.doc with a NPE:

Wordkaround in proposed patch.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27143|0                           |1
        is obsolete|                            |

--- Comment #5 from Sergey Vladimirov <vl...@gmail.com> 2011-06-13 23:09:58 UTC ---
Created attachment 27153
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27153
Updated patch

Fixed "keep" attributes;
Removed ImageHandler interface (handle images by extending extractor)

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #14 from Sergey Vladimirov <vl...@gmail.com> 2011-06-20 09:07:28 UTC ---
Created attachment 27178
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27178
Additional test docs

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #9 from Yegor Kozlov <ye...@dinom.ru> 2011-06-15 11:49:17 UTC ---
Applied in r1136001

It looks like you attached not the most recent version of your code -
createExternalGraphic is never called and the fo:external-graphic element is
missing in the output. No prob - I expect it in next patches. 

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27204|0                           |1
        is obsolete|                            |

--- Comment #19 from Sergey Vladimirov <vl...@gmail.com> 2011-06-25 14:02:23 UTC ---
Created attachment 27205
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27205
Workaround for NPE in Bug 33519

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #17 from Yegor Kozlov <ye...@dinom.ru> 2011-06-24 08:51:47 UTC ---
I made a small change in TestWordToFoExtractorSuite and added an option to
exclude certain files from the suite.

I resolved an old Bug 33519 which complained that HWPF failed on open a
document and added the problematic file to our test collection. As result,
TestWordToFoExtractorSuite started to fail on Bug33519.doc with a NPE:

java.lang.NullPointerException
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processCharacters(WordToFoExtractor.java:255)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processParagraph(WordToFoExtractor.java:492)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processSectionParagraphes(WordToFoExtractor.java:571)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processSection(WordToFoExtractor.java:519)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processDocument(WordToFoExtractor.java:332)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.process(WordToFoExtractor.java:167)

Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Platform|PC                          |All
            Version|unspecified                 |3.8-dev
         OS/Version|Linux                       |All

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27198|0                           |1
        is obsolete|                            |
  Attachment #27205|0                           |1
        is obsolete|                            |

--- Comment #21 from Sergey Vladimirov <vl...@gmail.com> 2011-06-27 09:15:22 UTC ---
Created attachment 27207
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27207
Latest patch

Okey, here the latest patch. All tests passed.

The problem arised by Bug 33519 is not solved - there is just a workaround. It
seems like CHPX are NOT SORTED, so it is not correct to assume it in Range
class, so _charStart, _charEnd shall be removed and all code linked to those
fields shall be rewritten.

It's a big task so i would like to have some kind of confirmation if my
assumption about missing CHPX order is correct.

In addition (in fact, main part of) this patch includes doc-to-html extractor.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #12 from Sergey Vladimirov <vl...@gmail.com> 2011-06-16 09:39:26 UTC ---
Yegor,

Yes, the provided code can correctly words for png and jpeg images. i shall
assume, FOP can also handle BMP, TIFFs and GIFs, so they can be listed there as
well. May be even WMF, according to
http://xmlgraphics.apache.org/fop/0.95/graphics.html (but i wouldn't advise to
assume it).

But the main question is about cleaning up the files after work. Where those
image files shall be stored? Who and when should delete them? What happens in
can of exception with those files? (and for your code - what happens in case of
parallel processing?)

Either this part is handled by external code, or it can be handled by Extractor
code. In second case we will need some kind of close() or cleanup() method to
delete those files after FOP processing.

Regards,
Sergey.

P.S.: I subscribed to poi-user/poi-dev mailists, so we can move discussion
there.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #16 from Yegor Kozlov <ye...@dinom.ru> 2011-06-23 11:28:58 UTC ---
Applied in r1138836

Yegor

(In reply to comment #15)
> Created attachment 27198 [details]
> Patch to fix ListEntryNoListTable and MBD001D0B89 tests; additional tests

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #2 from Yegor Kozlov <ye...@dinom.ru> 2011-06-10 07:19:27 UTC ---
Thanks for the patch.

Can you upload some samples so we can test your code? It would be nice to see a
source .doc file, the produced XSL-FO and the resulting PDF. 

How do you run FOP, via command line? Please post the exact command.

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #4 from Sergey Vladimirov <vl...@gmail.com> 2011-06-13 21:42:36 UTC ---
Yegor,

I will test and upload updated patch and examples using test doc files from POI
collections of test files.

Currently FOP is called using additional bunch of code, including image
handling. This is tightly linked to our internal system (for example, it's
includes converting images from WMF to SVG format), so extracting an example is
tricky.

This is currently proof-of-concept code, but i believe it will be part of
production system in several month. Also I believe functionality of new
extractor is better (at least not worse) then old org.apache.poi.hdf.extractor
package.

Personally I intent to continue supporting this code until it will be ready for
production usage and may be implement xls-to-fo extractor. My goal to create
one-way converter with maximum readability, i.e. without lost text but may be
without some formatting.

Should I also notice that doc-to-html shall be easy to implement ? ;)

Regards,
Sergey

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27155|0                           |1
        is obsolete|                            |

--- Comment #13 from Sergey Vladimirov <vl...@gmail.com> 2011-06-20 09:05:50 UTC ---
Created attachment 27177
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27177
new patch

Add hyperlinks support;
Add common fields support;
Split WordToFoExtractor and AbstractToFoExtractor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27177|0                           |1
        is obsolete|                            |

--- Comment #15 from Sergey Vladimirov <vl...@gmail.com> 2011-06-23 08:11:35 UTC ---
Created attachment 27198
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27198
Patch to fix ListEntryNoListTable and MBD001D0B89 tests; additional tests

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #20 from Yegor Kozlov <ye...@dinom.ru> 2011-06-26 10:27:08 UTC ---
It still fails on Bug33519.doc, but with a different exception:

java.lang.IllegalArgumentException: The end (1077) must not be before the start
(1985)
    at org.apache.poi.hwpf.usermodel.Range.sanityCheckStartEnd(Range.java:243)
    at org.apache.poi.hwpf.usermodel.Range.<init>(Range.java:176)
    at org.apache.poi.hwpf.usermodel.CharacterRun.<init>(CharacterRun.java:97)
    at org.apache.poi.hwpf.usermodel.Range.getCharacterRun(Range.java:802)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processCharacters(WordToFoExtractor.java:243)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processParagraph(WordToFoExtractor.java:529)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processSectionParagraphes(WordToFoExtractor.java:608)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processSection(WordToFoExtractor.java:556)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.processDocument(WordToFoExtractor.java:341)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.process(WordToFoExtractor.java:167)
    at
org.apache.poi.hwpf.extractor.WordToFoExtractor.main(WordToFoExtractor.java:142)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)


Shall I commit this patch or wait for the next one?

Yegor

(In reply to comment #19)
> Created attachment 27205 [details]
> Workaround for NPE in Bug 33519

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED

--- Comment #23 from Sergey Vladimirov <vl...@gmail.com> 2011-07-04 19:52:49 UTC ---
I'm closing this bug because all patches applied. Newly patches can be applied
to SVN (have commiter access now). Patches from other users are still welcome
(as new issues).

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27153|0                           |1
        is obsolete|                            |

--- Comment #8 from Sergey Vladimirov <vl...@gmail.com> 2011-06-14 14:58:59 UTC ---
Created attachment 27155
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27155
Add most of images handling, except cropping

Added all possible image handling, except cropping - i can't find a way to
obtain this information, neither pictures SPRM.

See testPictures.doc.pdf and testCroppedPictures.doc.pdf for examples.

http://www.sendspace.com/file/adie2l

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

Sergey Vladimirov <vl...@gmail.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
  Attachment #27207|0                           |1
        is obsolete|                            |

--- Comment #22 from Sergey Vladimirov <vl...@gmail.com> 2011-06-27 09:37:54 UTC ---
Created attachment 27208
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27208
Latest patch

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #1 from Sergey Vladimirov <vl...@gmail.com> 2011-06-09 21:01:59 UTC ---
Created attachment 27143
  --> https://issues.apache.org/bugzilla/attachment.cgi?id=27143
patch

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #6 from Sergey Vladimirov <vl...@gmail.com> 2011-06-13 23:18:00 UTC ---
POI doc sources, fo xml and PDF results
http://www.sendspace.com/file/sf4o24
2.5 Mb in size

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #3 from Yegor Kozlov <ye...@dinom.ru> 2011-06-11 15:58:55 UTC ---
How much efforts are you going to invest in this utility? In its current form
it is rather a proof of concept than a full-featured convertor. 

I was able to generate XSL-FO for some files from our collection of test Word
documents. FOP 1.0 renders simple files, but stumbles on more complex ones:

Several times I've seen this:

Caused by: org.xml.sax.SAXParseException: Character reference "&#12" is an
invalid XML character.
        at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
        at org.apache.xerces.jaxp.SAXParserImpl$JAXPSAXParser.parse(Unknown
Source)

and this:

javax.xml.transform.TransformerException:
org.apache.fop.fo.ValidationException: Border and padding for fo:region-body
"xsl-region-body" sho
ld be '0' (See 6.4.14 in XSL 1.1); non-standard values are allowed if relaxed
validation is enabled.  (See position 5:411)
       at org.apache.fop.cli.InputHandler.transformTo(InputHandler.java:302)
       at org.apache.fop.cli.InputHandler.renderTo(InputHandler.java:130)
       at org.apache.fop.cli.Main.startFOP(Main.java:174)

Also, I see warnings in the console:

Jun 11, 2011 7:19:10 PM org.apache.fop.events.LoggingEventListener processEvent
SEVERE: Invalid property value encountered in linefeed-treatment="false":
org.apache.fop.fo.expr.PropertyException: file:/C:/temp/DiffFirstP
ageHeadFoot.xml:9:38: No conversion defined false;
property:'linefeed-treatment' (See position 10:520)
Jun 11, 2011 7:19:10 PM org.apache.fop.events.LoggingEventListener processEvent
SEVERE: Invalid property value encountered in linefeed-treatment="false":
org.apache.fop.fo.expr.PropertyException: file:/C:/temp/DiffFirstP
ageHeadFoot.xml:9:38: No conversion defined false;
property:'linefeed-treatment' (See position 16:520)
Jun 11, 2011 7:19:10 PM org.apache.fop.events.LoggingEventListener processEvent
SEVERE: Invalid property value encountered in keep-together.within-page="true":
org.apache.fop.fo.expr.PropertyException: file:/C:/temp/Diff
FirstPageHeadFoot.xml:9:38: No conversion defined true;
property:'keep-together.within-page' (See position 20:578)
Jun 11, 2011 7:19:10 PM org.apache.fop.events.LoggingEventListener processEvent
SEVERE: Invalid property value encountered in
keep-with-next.within-page="true": org.apache.fop.fo.expr.PropertyException:
file:/C:/temp/Dif
fFirstPageHeadFoot.xml:9:38: No conversion defined true;
property:'keep-with-next.within-page' (See position 20:578)
Jun 11, 2011 7:19:10 PM org.apache.fop.events.LoggingEventListener processEvent 

WordToFOExtractor is a very promising feature, the question is where it should
live: in poi-examples as a simple demo, or with the rest of HWPF code as a
well-tested code.

Regards,
Yegor

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org


DO NOT REPLY [Bug 51351] New Doc to FO extractor

Posted by bu...@apache.org.
https://issues.apache.org/bugzilla/show_bug.cgi?id=51351

--- Comment #10 from Sergey Vladimirov <vl...@gmail.com> 2011-06-15 12:05:49 UTC ---
Yegor,

Graphic handling won't be part of extractor code. It's a lot of additional code
AND additional libraries like Apache Batik or even ImageMagic calls. Also file
creation and cleaning up should be coded.

So there is an empty processImage() method that should be implemented in
subclass if anyone want image to be included in XSL FO. createExternalGraphic()
and setImageProperties() are helper methods for those people.

-- 
Configure bugmail: https://issues.apache.org/bugzilla/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@poi.apache.org
For additional commands, e-mail: dev-help@poi.apache.org