You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Derek Hohls <DH...@csir.co.za> on 2008/01/09 08:52:17 UTC

REPOST: HTML Tags from a database field

(POSTED ON: 11 Dec 2007 - but no reply... possibly due to Christmas?!)
 
I am using the SQLTransformer from Cocoon 2.1.8 to extract data 
from text fields in a database.  At present, <tags> are being returned
by the transformer (i.e. before any downstream alteration) as &lt;tags&gt; 

I know XSP has an option to extract data in XML format  see: 
http://marc.info/?l=xml-cocoon-users&m=99821370522888&w=2 
but how do I do this with the SQLTransformer?
 
Thanks
Derek


-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Tobia <to...@linux.it>.
Derek Hohls wrote:
> Tobia wrote:
>> I use the HTMLTransformer for that purpose, with a couple of XSLT  
>> stylesheets before and after it.  The first one writes <unescape>  
>> tags around each piece of escaped  HTML.  Then the HTMLTransformer  
>> is instructed to only transform text nodes inside those tags.   
>> Finally a second piece of XSLT gets rid of the <unescape> tags  
>> along with the fake <body> and <html> inserted by the transformer,  
>> and does a bit of magic with &amp; and &lt; characters
>>
>
> Whew!  Sounds like a lot of work for what I thought would be a  
> simple config issue.   Any chance you could make the stylesheets  
> available - maybe upload them to the wiki?

It's not much work really.

First identify the text nodes that need unescaping and wrap them in  
<unescape> tags.
For me that was all text nodes under a field tag, containing either an  
"&" or a "<":

<xsl:template match="sql:row/*[not(self::sql:rowset)]">
   <xsl:choose>
     <xsl:when test="contains(., '&amp') or contains(., '&lt;')">
       <unescape>
         <xsl:copy-of select="node()"/>
       </unescape>
     </xsl:when>
     <xsl:otherwise>
       <xsl:copy-of select="node()"/>
     </xsl:otherwise>
   </xsl:choose>
</xsl:template>

<xsl:template match="@*|node()" priority="-1">
   <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>


Then put it in the pipeline followed by the HTML transformer:

<map:transform src="unescape-pre.xsl">
<map:transform type="html">
   <map:parameter name="tags" value="unescape"/>
</map:transform>
<map:transform src="unescape-post.xsl">


The latter XSLT cleans up unnecessary elements:

<xsl:template match="unescape">
   <xsl:apply-templates select="html/body/node()"/>
</xsl:template>

<xsl:template match="@*|node()" priority="-1">
   <xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>


In the last step I also needed to replace "&" with "&amp;" and "<"  
with "&lt;" (Java string notation) in all text nodes under <unescape>,  
by calling a utility Java function from within XSLT. But that's not an  
elegant solution and is probably related to my particular pipeline  
setup, so I won't bother you with the details unless you have the same  
problem.


Tobia

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Derek Hohls <DH...@csir.co.za>.
I hope that really was a joke!!

>>> On 2008/01/31 at 07:17, in message <47...@gmx.de>, Joerg Heinicke <jo...@gmx.de> wrote:
On 12.01.2008 19:06, Grzegorz Kossakowski wrote:

> I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
>         protected void serializeData(String value)
>         throws SQLException, SAXException {
>             if (value != null) {
>                 value = value.trim();
>                 // Could this be XML ?
>                 if (value.length() > 0 && value.charAt(0) == '<') {
>                     try {
>                         stream(value);
>                     } catch (Exception ignored) {
>                         // FIXME: bad coding "catch(Exception)"
>                         // If an exception occured the data was not (valid) xml
>                         data(value);
>                     }
>                 } else {
>                     data(value);
>                 }
>             }
>         }

The SQLTransformer has always been suspicious to me! ;)

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org 
For additional commands, e-mail: users-help@cocoon.apache.org 


-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.


Re: REPOST: HTML Tags from a database field

Posted by Joerg Heinicke <jo...@gmx.de>.
On 12.01.2008 19:06, Grzegorz Kossakowski wrote:

> I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
>         protected void serializeData(String value)
>         throws SQLException, SAXException {
>             if (value != null) {
>                 value = value.trim();
>                 // Could this be XML ?
>                 if (value.length() > 0 && value.charAt(0) == '<') {
>                     try {
>                         stream(value);
>                     } catch (Exception ignored) {
>                         // FIXME: bad coding "catch(Exception)"
>                         // If an exception occured the data was not (valid) xml
>                         data(value);
>                     }
>                 } else {
>                     data(value);
>                 }
>             }
>         }

The SQLTransformer has always been suspicious to me! ;)

Joerg

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Derek Hohls <DH...@csir.co.za>.
Grzegorz

Sadly my Java skills are not up to this task.  I guess I will have to go 
for the quick-and-nasty approach for now (which I would have to do
anyway, as upgrading Cocoon to the ultra-latest version is not an option
on the current project).

Thanks
Derek

>>> On 2008/01/16 at 10:51, in message <47...@apache.org>, Grzegorz Kossakowski <gk...@apache.org> wrote:
Derek Hohls pisze:
> Grzegorz
>  
> That could be the issue - the database field may *contain* XML tags
> but does not necessarily *start* with one.  Maybe I could wrap the
> whole field in a "<div>" tag before posting it, although that seems a
> little clumsy - I agree it would be easier to specifically request a field
> as XML, with the option to add a wrapper, at request time.
>  
> Any chance this approach can be "patched" as part of the 2.1.x series?

Of course it's possible to patch 2.1.x series but the question is who will prepare a patch? :)
If you make one I could commit it to 2.1.x if it's good enough.

-- 
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org 
For additional commands, e-mail: users-help@cocoon.apache.org 



-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Grzegorz Kossakowski <gk...@apache.org>.
Derek Hohls pisze:
> Grzegorz
>  
> That could be the issue - the database field may *contain* XML tags
> but does not necessarily *start* with one.  Maybe I could wrap the
> whole field in a "<div>" tag before posting it, although that seems a
> little clumsy - I agree it would be easier to specifically request a field
> as XML, with the option to add a wrapper, at request time.
>  
> Any chance this approach can be "patched" as part of the 2.1.x series?

Of course it's possible to patch 2.1.x series but the question is who will prepare a patch? :)
If you make one I could commit it to 2.1.x if it's good enough.

-- 
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Derek Hohls <DH...@csir.co.za>.
Grzegorz
 
That could be the issue - the database field may *contain* XML tags
but does not necessarily *start* with one.  Maybe I could wrap the
whole field in a "<div>" tag before posting it, although that seems a 
little clumsy - I agree it would be easier to specifically request a field
as XML, with the option to add a wrapper, at request time.
 
Any chance this approach can be "patched" as part of the 2.1.x series?
 
Derek

>>> Grzegorz Kossakowski <gk...@apache.org> 2008/01/13 02:06 AM >>>
Derek Hohls pisze:
> Tobia
>  
> Whew!  Sounds like a lot of work for what I thought would be a
> simple config issue.   Any chance you could make the stylesheets
> available - maybe upload them to the wiki?

I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
        protected void serializeData(String value)
        throws SQLException, SAXException {
            if (value != null) {
                value = value.trim();
                // Could this be XML ?
                if (value.length() > 0 && value.charAt(0) == '<') {
                    try {
                        stream(value);
                    } catch (Exception ignored) {
                        // FIXME: bad coding "catch(Exception)"
                        // If an exception occured the data was not (valid) xml
                        data(value);
                    }
                } else {
                    data(value);
                }
            }
        }

As you see, SQLTransformer checks for possibility that it got an XML as value. The check is quite
silly but should work. The question is: does the XML stored in a database starts with "<" as a first
character or something else?

BTW. I'm quite surprised that it seems there is no way to explicitly tell the SQLTransformer to
handle some columns as a serialized XML content that needs parsing...

-- 
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/ 

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org 
For additional commands, e-mail: users-help@cocoon.apache.org 


-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.


Re: REPOST: HTML Tags from a database field

Posted by Grzegorz Kossakowski <gk...@apache.org>.
Derek Hohls pisze:
> Tobia
>  
> Whew!  Sounds like a lot of work for what I thought would be a
> simple config issue.   Any chance you could make the stylesheets
> available - maybe upload them to the wiki?

I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
        protected void serializeData(String value)
        throws SQLException, SAXException {
            if (value != null) {
                value = value.trim();
                // Could this be XML ?
                if (value.length() > 0 && value.charAt(0) == '<') {
                    try {
                        stream(value);
                    } catch (Exception ignored) {
                        // FIXME: bad coding "catch(Exception)"
                        // If an exception occured the data was not (valid) xml
                        data(value);
                    }
                } else {
                    data(value);
                }
            }
        }

As you see, SQLTransformer checks for possibility that it got an XML as value. The check is quite
silly but should work. The question is: does the XML stored in a database starts with "<" as a first
character or something else?

BTW. I'm quite surprised that it seems there is no way to explicitly tell the SQLTransformer to
handle some columns as a serialized XML content that needs parsing...

-- 
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Derek Hohls <DH...@csir.co.za>.
Tobia
 
Whew!  Sounds like a lot of work for what I thought would be a
simple config issue.   Any chance you could make the stylesheets
available - maybe upload them to the wiki?
 
Thanks
Derek

>>> Tobia <to...@linux.it> 2008/01/09 05:17 PM >>>
Derek Hohls wrote:
> I am using the SQLTransformer from Cocoon 2.1.8 to extract data from  
> text fields in a database.  At present, <tags> are being returned by  
> the transformer (i.e. before any downstream alteration) as  
> &lt;tags&gt;


I use the HTMLTransformer for that purpose, with a couple of XSLT  
stylesheets before and after it.

The first one writes <unescape> tags around each piece of escaped  
HTML.  Then the HTMLTransformer is instructed to only transform text  
nodes inside those tags (by passing it a <parameter name="tags"  
value="unescape"/>.)  Finally a second piece of XSLT gets rid of the  
<unescape> tags along with the fake <body> and <html> inserted by the  
transformer, and does a bit of magic with &amp; and &lt; characters.


Tobia




-- 
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard. 
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.

This message has been scanned for viruses and dangerous content by MailScanner, 
and is believed to be clean.  MailScanner thanks Transtec Computers for their support.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org


Re: REPOST: HTML Tags from a database field

Posted by Tobia <to...@linux.it>.
Derek Hohls wrote:
> I am using the SQLTransformer from Cocoon 2.1.8 to extract data from  
> text fields in a database.  At present, <tags> are being returned by  
> the transformer (i.e. before any downstream alteration) as  
> &lt;tags&gt;


I use the HTMLTransformer for that purpose, with a couple of XSLT  
stylesheets before and after it.

The first one writes <unescape> tags around each piece of escaped  
HTML.  Then the HTMLTransformer is instructed to only transform text  
nodes inside those tags (by passing it a <parameter name="tags"  
value="unescape"/>.)  Finally a second piece of XSLT gets rid of the  
<unescape> tags along with the fake <body> and <html> inserted by the  
transformer, and does a bit of magic with &amp; and &lt; characters.


Tobia

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org