You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@commons.apache.org by Bill Keese <bi...@tech.beacon-it.co.jp> on 2004/05/21 02:34:17 UTC

[digester] reading embedded HTML (or other mixed text)

Is there any way to tell digester to read in the entire content of an
element (including text and sub-elements) as a single String? For
example, if I persist e-mail to XML, I'd like to use digester to read
the e-mail address list, etc., but the HTML content of the mail should
be read verbatim.

<email>
<to>bill</to>
<subject>test</subject>
<content> Hello world! <i>This text is italic</i> and <b>this text is
bold</b> This is plain text.</content>
</email>

====>

Class email {
List to, cc, bcc;
String subject;
String content; // HTML content
};



---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [digester] reading embedded HTML (or other mixed text)

Posted by Bill Keese <bi...@tech.beacon-it.co.jp>.
>HTML is not valid XML...you could wrap your HTML in CDATA
>tags in the input document...Alternatively, you could use
>XHTML, which most browsers support. In this
>case, you could then use NodeCreateRule.
>
Yup, I should have said "XHTML". The point was that the content is
free-form (arbitrary levels of nesting of tags, mixed content, etc.), so
it isn't suitable for parsing by normal pattern-matching Digester rules.
But CDATA or NodeCreateRule seem to do the trick.

Thanks!

>  
>

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [digester] reading embedded HTML (or other mixed text)

Posted by Simon Kitching <si...@ecnetwork.co.nz>.
On Fri, 2004-05-21 at 12:34, Bill Keese wrote:
> Is there any way to tell digester to read in the entire content of an
> element (including text and sub-elements) as a single String? For
> example, if I persist e-mail to XML, I'd like to use digester to read
> the e-mail address list, etc., but the HTML content of the mail should
> be read verbatim.
> 

Hi Bill,

HTML is not valid XML. Digester uses a standard XML parser to parse the
input, so it is not possible to process an input document which is not
valid XML.

As Jose has said in a separate reply, you could wrap your HTML in CDATA
tags in the input document. The xml parser will then see the contents of
that cdata section as just a text string - and so will Digester.

Alternatively, you could use XHTML, which most browsers support. In this
case, you could then use NodeCreateRule.

Regards,

Simon 


---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org


Re: [digester] reading embedded HTML (or other mixed text)

Posted by José Antonio Pérez Testa <ja...@indra.es>.
Try surrounding the content with <![CDATA[ ... ]]>

<email>
<to>bill</to>
<subject>test</subject>
<content><![CDATA[ Hello world! <i>This text is italic</i> and <b>this 
text is
bold</b> This is plain text.]]></content>
</email>

Bill Keese wrote:

>Is there any way to tell digester to read in the entire content of an
>element (including text and sub-elements) as a single String? For
>example, if I persist e-mail to XML, I'd like to use digester to read
>the e-mail address list, etc., but the HTML content of the mail should
>be read verbatim.
>
><email>
><to>bill</to>
><subject>test</subject>
><content> Hello world! <i>This text is italic</i> and <b>this text is
>bold</b> This is plain text.</content>
></email>
>
>====>
>
>Class email {
>List to, cc, bcc;
>String subject;
>String content; // HTML content
>};
>
>
>
>---------------------------------------------------------------------
>To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
>For additional commands, e-mail: commons-user-help@jakarta.apache.org
>
>  
>

-------------------------------------------------------------------------------------------------------------------
Este correo electr�nico y, en su caso, cualquier fichero anexo al mismo, contiene informaci�n de car�cter confidencial exclusivamente dirigida a su destinatario o destinatarios. Queda prohibida su divulgaci�n, copia o distribuci�n a terceros sin la previa autorizaci�n escrita de Indra. En el caso de haber recibido este correo electr�nico por error, se ruega notificar inmediatamente esta circunstancia mediante reenv�o a la direcci�n electr�nica del remitente.

The information in this e-mail and in any attachments is confidential and solely for the attention and use of the named addressee(s). You are hereby notified that any dissemination, distribution or copy of this communication is prohibited without the prior written consent of Indra. If you have received this communication in error, please, notify the sender by reply e-mail

---------------------------------------------------------------------
To unsubscribe, e-mail: commons-user-unsubscribe@jakarta.apache.org
For additional commands, e-mail: commons-user-help@jakarta.apache.org