You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by maxmus <ar...@web.de> on 2008/06/10 15:59:40 UTC

counting items with nonempty content (except html)

Hallo, 
I have the following situation:

i want to count all items in an XML file with the following properties:
attribute1 has a certain value and attribute2 is not empty.

<xsl:variable name="num" select="count(//*[@attribute1 = 'value1' and
normalize-space(@attribute2)!=''])"/>
...
<xsl:if test="$num &gt; 0"><xsl:value-of select="@attribute2"/></xsl:if>
..

the code works fine but there is one exeption:
if attribute2 contains only html tags (e.g.
attribute2="&lt;b&gt;&lt;/b&gt;"),
then normalize-space(@attribute2) returns true,
but <xsl:value-of select="@attribute2"/> of course gives "".

For several reasons, I have to use this structure and I can't change the XML
file.
So is there any possibility to count all tags with the property
"attribute1 has a certain value and attribute2 does not RETURN empty (if
displayed)."?

best regards,
maxmus



-- 
View this message in context: http://www.nabble.com/counting-items-with-nonempty-content-%28except-html%29-tp17756165p17756165.html
Sent from the FOP - Users mailing list archive at Nabble.com.


---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: counting items with nonempty content (except html)

Posted by Abel Braaksma <ab...@xs4all.nl>.
Hi Maxmus,

answer below:


maxmus wrote:
> Hallo, 
> I have the following situation:
>
> i want to count all items in an XML file with the following properties:
> attribute1 has a certain value and attribute2 is not empty.
>
> <xsl:variable name="num" select="count(//*[@attribute1 = 'value1' and
> normalize-space(@attribute2)!=''])"/>
> ...
> <xsl:if test="$num &gt; 0"><xsl:value-of select="@attribute2"/></xsl:if>
> ..
>
> the code works fine but there is one exeption:
> if attribute2 contains only html tags (e.g.
> attribute2="&lt;b&gt;&lt;/b&gt;"),
> then normalize-space(@attribute2) returns true,
> but <xsl:value-of select="@attribute2"/> of course gives "".
>   

No, this is not "of course". In fact, it is the opposite. The 
@attribute2 of your source contains a character string, not HTML tags 
(note that XSLT cannot "see" tags, it only sees nodes, and inside an 
attribute that are never nodes).

Obviously, the character string seems to contain, once unescaped, a part 
of HTML (or XML for that matter). So, the string is full, i.e., non-empty.

> For several reasons, I have to use this structure and I can't change the XML
> file.
>   

That's a bummer, because putting HTML stringized into attributes is bad 
design. It leaves you no way to check for validity of the content and 
makes parsing hard, if not very hard or sometimes next to impossible (as 
you are finding out).

> So is there any possibility to count all tags with the property
> "attribute1 has a certain value and attribute2 does not RETURN empty (if
> displayed)."?

No. You're talking of the transformation phase here, not the formatting 
phase. There's no way the transformation phase (which comes way before 
the formatting phase) can know up front that a particular character 
string would display empty.

And what is empty? Is that a bunch of <br /><br />? Or is that 
<p>&160;</p>? Or is that <img src="notfoundsource" /> or is that <div 
style="display:none">Some text here</div>?

Though it might seem a trivial question ("give me all elements that 
won't be visible") it is very hard to resolve, even if you would have 
the power of JavaScript and browser extensions at your fingertips, which 
you don't, because you are using XSLT.

What you might do, though I can't really recommend it, is parse the 
content of the HTML-strings with an XSLT extension (for one, Saxon has 
saxon-parse(), which parses a string containing XML) and then do your 
math on the resulting node set. But then it would still be far from trivial.

Bottom line is: you are trying to do something seemingly simple which is 
notoriously hard to solve and XSLT is the wrong place to do that. It 
should be changed at the source and not later on.

Btw: what's your use case that you want to strip stringized HTML that 
would 'return' empty, i.e., render invisible or spaces only? If you 
render it, it won't be visible, if you strip it, it won't be visible, so 
what's the catch?

Good luck,
-- Abel

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org