You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@cocoon.apache.org by Derek Hohls <DH...@csir.co.za> on 2008/01/09 08:52:17 UTC
REPOST: HTML Tags from a database field
(POSTED ON: 11 Dec 2007 - but no reply... possibly due to Christmas?!)
I am using the SQLTransformer from Cocoon 2.1.8 to extract data
from text fields in a database. At present, <tags> are being returned
by the transformer (i.e. before any downstream alteration) as <tags>
I know XSP has an option to extract data in XML format see:
http://marc.info/?l=xml-cocoon-users&m=99821370522888&w=2
but how do I do this with the SQLTransformer?
Thanks
Derek
--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Tobia <to...@linux.it>.
Derek Hohls wrote:
> Tobia wrote:
>> I use the HTMLTransformer for that purpose, with a couple of XSLT
>> stylesheets before and after it. The first one writes <unescape>
>> tags around each piece of escaped HTML. Then the HTMLTransformer
>> is instructed to only transform text nodes inside those tags.
>> Finally a second piece of XSLT gets rid of the <unescape> tags
>> along with the fake <body> and <html> inserted by the transformer,
>> and does a bit of magic with & and < characters
>>
>
> Whew! Sounds like a lot of work for what I thought would be a
> simple config issue. Any chance you could make the stylesheets
> available - maybe upload them to the wiki?
It's not much work really.
First identify the text nodes that need unescaping and wrap them in
<unescape> tags.
For me that was all text nodes under a field tag, containing either an
"&" or a "<":
<xsl:template match="sql:row/*[not(self::sql:rowset)]">
<xsl:choose>
<xsl:when test="contains(., '&') or contains(., '<')">
<unescape>
<xsl:copy-of select="node()"/>
</unescape>
</xsl:when>
<xsl:otherwise>
<xsl:copy-of select="node()"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="@*|node()" priority="-1">
<xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>
Then put it in the pipeline followed by the HTML transformer:
<map:transform src="unescape-pre.xsl">
<map:transform type="html">
<map:parameter name="tags" value="unescape"/>
</map:transform>
<map:transform src="unescape-post.xsl">
The latter XSLT cleans up unnecessary elements:
<xsl:template match="unescape">
<xsl:apply-templates select="html/body/node()"/>
</xsl:template>
<xsl:template match="@*|node()" priority="-1">
<xsl:copy><xsl:apply-templates select="@*|node()"/></xsl:copy>
</xsl:template>
In the last step I also needed to replace "&" with "&" and "<"
with "<" (Java string notation) in all text nodes under <unescape>,
by calling a utility Java function from within XSLT. But that's not an
elegant solution and is probably related to my particular pipeline
setup, so I won't bother you with the details unless you have the same
problem.
Tobia
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Derek Hohls <DH...@csir.co.za>.
I hope that really was a joke!!
>>> On 2008/01/31 at 07:17, in message <47...@gmx.de>, Joerg Heinicke <jo...@gmx.de> wrote:
On 12.01.2008 19:06, Grzegorz Kossakowski wrote:
> I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
> protected void serializeData(String value)
> throws SQLException, SAXException {
> if (value != null) {
> value = value.trim();
> // Could this be XML ?
> if (value.length() > 0 && value.charAt(0) == '<') {
> try {
> stream(value);
> } catch (Exception ignored) {
> // FIXME: bad coding "catch(Exception)"
> // If an exception occured the data was not (valid) xml
> data(value);
> }
> } else {
> data(value);
> }
> }
> }
The SQLTransformer has always been suspicious to me! ;)
Joerg
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.
Re: REPOST: HTML Tags from a database field
Posted by Joerg Heinicke <jo...@gmx.de>.
On 12.01.2008 19:06, Grzegorz Kossakowski wrote:
> I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
> protected void serializeData(String value)
> throws SQLException, SAXException {
> if (value != null) {
> value = value.trim();
> // Could this be XML ?
> if (value.length() > 0 && value.charAt(0) == '<') {
> try {
> stream(value);
> } catch (Exception ignored) {
> // FIXME: bad coding "catch(Exception)"
> // If an exception occured the data was not (valid) xml
> data(value);
> }
> } else {
> data(value);
> }
> }
> }
The SQLTransformer has always been suspicious to me! ;)
Joerg
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Derek Hohls <DH...@csir.co.za>.
Grzegorz
Sadly my Java skills are not up to this task. I guess I will have to go
for the quick-and-nasty approach for now (which I would have to do
anyway, as upgrading Cocoon to the ultra-latest version is not an option
on the current project).
Thanks
Derek
>>> On 2008/01/16 at 10:51, in message <47...@apache.org>, Grzegorz Kossakowski <gk...@apache.org> wrote:
Derek Hohls pisze:
> Grzegorz
>
> That could be the issue - the database field may *contain* XML tags
> but does not necessarily *start* with one. Maybe I could wrap the
> whole field in a "<div>" tag before posting it, although that seems a
> little clumsy - I agree it would be easier to specifically request a field
> as XML, with the option to add a wrapper, at request time.
>
> Any chance this approach can be "patched" as part of the 2.1.x series?
Of course it's possible to patch 2.1.x series but the question is who will prepare a patch? :)
If you make one I could commit it to 2.1.x if it's good enough.
--
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Grzegorz Kossakowski <gk...@apache.org>.
Derek Hohls pisze:
> Grzegorz
>
> That could be the issue - the database field may *contain* XML tags
> but does not necessarily *start* with one. Maybe I could wrap the
> whole field in a "<div>" tag before posting it, although that seems a
> little clumsy - I agree it would be easier to specifically request a field
> as XML, with the option to add a wrapper, at request time.
>
> Any chance this approach can be "patched" as part of the 2.1.x series?
Of course it's possible to patch 2.1.x series but the question is who will prepare a patch? :)
If you make one I could commit it to 2.1.x if it's good enough.
--
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Derek Hohls <DH...@csir.co.za>.
Grzegorz
That could be the issue - the database field may *contain* XML tags
but does not necessarily *start* with one. Maybe I could wrap the
whole field in a "<div>" tag before posting it, although that seems a
little clumsy - I agree it would be easier to specifically request a field
as XML, with the option to add a wrapper, at request time.
Any chance this approach can be "patched" as part of the 2.1.x series?
Derek
>>> Grzegorz Kossakowski <gk...@apache.org> 2008/01/13 02:06 AM >>>
Derek Hohls pisze:
> Tobia
>
> Whew! Sounds like a lot of work for what I thought would be a
> simple config issue. Any chance you could make the stylesheets
> available - maybe upload them to the wiki?
I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
protected void serializeData(String value)
throws SQLException, SAXException {
if (value != null) {
value = value.trim();
// Could this be XML ?
if (value.length() > 0 && value.charAt(0) == '<') {
try {
stream(value);
} catch (Exception ignored) {
// FIXME: bad coding "catch(Exception)"
// If an exception occured the data was not (valid) xml
data(value);
}
} else {
data(value);
}
}
}
As you see, SQLTransformer checks for possibility that it got an XML as value. The check is quite
silly but should work. The question is: does the XML stored in a database starts with "<" as a first
character or something else?
BTW. I'm quite surprised that it seems there is no way to explicitly tell the SQLTransformer to
handle some columns as a serialized XML content that needs parsing...
--
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.
Re: REPOST: HTML Tags from a database field
Posted by Grzegorz Kossakowski <gk...@apache.org>.
Derek Hohls pisze:
> Tobia
>
> Whew! Sounds like a lot of work for what I thought would be a
> simple config issue. Any chance you could make the stylesheets
> available - maybe upload them to the wiki?
I just looked at the code of SQLTransformer and found this, which is rather interesting to you:
protected void serializeData(String value)
throws SQLException, SAXException {
if (value != null) {
value = value.trim();
// Could this be XML ?
if (value.length() > 0 && value.charAt(0) == '<') {
try {
stream(value);
} catch (Exception ignored) {
// FIXME: bad coding "catch(Exception)"
// If an exception occured the data was not (valid) xml
data(value);
}
} else {
data(value);
}
}
}
As you see, SQLTransformer checks for possibility that it got an XML as value. The check is quite
silly but should work. The question is: does the XML stored in a database starts with "<" as a first
character or something else?
BTW. I'm quite surprised that it seems there is no way to explicitly tell the SQLTransformer to
handle some columns as a serialized XML content that needs parsing...
--
Grzegorz Kossakowski
Committer and PMC Member of Apache Cocoon
http://reflectingonthevicissitudes.wordpress.com/
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Derek Hohls <DH...@csir.co.za>.
Tobia
Whew! Sounds like a lot of work for what I thought would be a
simple config issue. Any chance you could make the stylesheets
available - maybe upload them to the wiki?
Thanks
Derek
>>> Tobia <to...@linux.it> 2008/01/09 05:17 PM >>>
Derek Hohls wrote:
> I am using the SQLTransformer from Cocoon 2.1.8 to extract data from
> text fields in a database. At present, <tags> are being returned by
> the transformer (i.e. before any downstream alteration) as
> <tags>
I use the HTMLTransformer for that purpose, with a couple of XSLT
stylesheets before and after it.
The first one writes <unescape> tags around each piece of escaped
HTML. Then the HTMLTransformer is instructed to only transform text
nodes inside those tags (by passing it a <parameter name="tags"
value="unescape"/>.) Finally a second piece of XSLT gets rid of the
<unescape> tags along with the fake <body> and <html> inserted by the
transformer, and does a bit of magic with & and < characters.
Tobia
--
This message is subject to the CSIR's copyright terms and conditions, e-mail legal notice, and implemented Open Document Format (ODF) standard.
The full disclaimer details can be found at http://www.csir.co.za/disclaimer.html.
This message has been scanned for viruses and dangerous content by MailScanner,
and is believed to be clean. MailScanner thanks Transtec Computers for their support.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org
Re: REPOST: HTML Tags from a database field
Posted by Tobia <to...@linux.it>.
Derek Hohls wrote:
> I am using the SQLTransformer from Cocoon 2.1.8 to extract data from
> text fields in a database. At present, <tags> are being returned by
> the transformer (i.e. before any downstream alteration) as
> <tags>
I use the HTMLTransformer for that purpose, with a couple of XSLT
stylesheets before and after it.
The first one writes <unescape> tags around each piece of escaped
HTML. Then the HTMLTransformer is instructed to only transform text
nodes inside those tags (by passing it a <parameter name="tags"
value="unescape"/>.) Finally a second piece of XSLT gets rid of the
<unescape> tags along with the fake <body> and <html> inserted by the
transformer, and does a bit of magic with & and < characters.
Tobia
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscribe@cocoon.apache.org
For additional commands, e-mail: users-help@cocoon.apache.org