You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Tatiyana <ta...@egr.msu.edu> on 2008/05/13 19:48:27 UTC

xml file with html code inside. How to make xml -> FOP -> pdf

Dear All,
 Two months ago fop was completely new area for me but with great help of
this forum I got our web application to use FOP transformation (xml -> xsl
-> pdf) to make pdf  summary form for users.  Users were filling information
fields and getting  standard  summary form in pdf format as a one of the 
results. 
Now  users are allowed to edit some of the text fields using tiny-mce editor
and because of it  our xml file ended up having  some parts with html code
inside.  Could somebody please  help me  to figure out what changes need to
be done in the xml or xsl files to recognize html elements in xml file and
show them in the pdf form properly?  I am using fop-0.94, jdk1.5.0.
I am attaching the part of xsl file and xml file. The Title and Description
elements of the Public Service node  and  International Service node will
have html code inside. 
Thank you. 

http://www.nabble.com/file/p17214471/xsl_file.xsl.doc xsl_file.xsl.doc 
http://www.nabble.com/file/p17214471/xml_file.xml.doc xml_file.xml.doc 


-- 
View this message in context: http://www.nabble.com/xml-file-with-html-code-inside.-How-to-make--xml--%3E-FOP--%3E-pdf-tp17214471p17214471.html
Sent from the FOP - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: xml file with html code inside. How to make xml -> FOP -> pdf

Posted by Andreas Delmelle <an...@telenet.be>.
On May 14, 2008, at 01:14, kindaian@gmail.com wrote:

FWIW: Small correction

<snip />
>
> In the example you provided the following will break xml:
>
>          <Title>Na zolotom kryl&apos;ce sideli</Title>

This is not true. The entity &apos; is also defined by XML. The  
intention is correct, though. There are many entities predefined for  
(X)HTML, while XML only has five (the bare essentials):

&apos;
&quot;
&lt;
&gt;
&amp;

The others, you can replace, as you did, or you can define them in a  
DTD and reference that from the XML.


HTH!

Cheers

Andreas


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: xml file with html code inside. How to make xml -> FOP -> pdf

Posted by "kindaian@gmail.com" <ki...@gmail.com>.
Ok... I had a worse problem with the project i was involved...

I had to "clean-up" from MSOffice html to produce nice xml that would be 
"eaten" up by fop.

The issue was simply sorted with the use of tons of regexp, coupled with 
the use of tidy to convert the html to xml compliant html and then the 
final fase of transform to fo (i opted for doing it by hand, but xslt 
can also be used after you have a compliant xml).

I opted not to use xslt because of the ton of exceptions that MSOffice 
trash inserted into the html. And my cleanup system had more then 500 
lines of code! Full of regexps, replaces and the like.

My main problems where:

a) tags intermingled in the wrong order like < b > < i > < / b > < / i > 
(tidy)
b) tags without start/end < br > < l i > (replaces)
c) uncompreensible stuff: all the css and meta tags MSOffice likes so 
much (regexp and replaces)
d) strange characters: unicode entities not declared, characters that 
break xml < & (regexp and replaces)

To conclude, yes, it was a work intensive task, but in the end, i had a 
clean html, with the formatings i needed to keep and those that i didn't 
needed removed. And when you have 5k pages of text to review, you don't 
go and tell the client: "correct the text and try again"... You do 
something to cleanup the problems (the great majority of them at least).

In the example you provided the following will break xml:

          <Title>Na zolotom kryl&apos;ce sideli</Title>

Most of the regexp and replaces i had to use was to replace stuff like 
that with the xml compliant ones (and/or define stuff as entity).

So, good luck on the project... and hope i was of some help...
LF

P.S.- it was all done with php/tidy/javabridge/tomcat/fop btw and if you 
want an advice, twick the editor to allow only the stuff you need and 
try to use tidy for the rest, prolly you can sort out all the problems 
that way.


Tatiyana escreveu:
> Dear All,
>  Two months ago fop was completely new area for me but with great help of
> this forum I got our web application to use FOP transformation (xml -> xsl
> -> pdf) to make pdf  summary form for users.  Users were filling information
> fields and getting  standard  summary form in pdf format as a one of the 
> results. 
> Now  users are allowed to edit some of the text fields using tiny-mce editor
> and because of it  our xml file ended up having  some parts with html code
> inside.  Could somebody please  help me  to figure out what changes need to
> be done in the xml or xsl files to recognize html elements in xml file and
> show them in the pdf form properly?  I am using fop-0.94, jdk1.5.0.
> I am attaching the part of xsl file and xml file. The Title and Description
> elements of the Public Service node  and  International Service node will
> have html code inside. 
> Thank you. 
>
> http://www.nabble.com/file/p17214471/xsl_file.xsl.doc xsl_file.xsl.doc 
> http://www.nabble.com/file/p17214471/xml_file.xml.doc xml_file.xml.doc 
>
>
>   


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org