You are viewing a plain text version of this content. The canonical link for it is here.
Posted to j-users@xalan.apache.org by Claus Kick <cl...@googlemail.com> on 2009/09/21 10:01:30 UTC

Catch a Character ...

Hello everyone,

I am trying to catch a special character with the following style sheet:

<xsl:param name="specChar" select="'\u201C'" />

    <xsl:output indent="yes" method="xml"/>
    <xsl:strip-space elements="*"/>

    <xsl:template
match="CATALOOM-OPENENGINE/PRODUCTS/PRODUCT/PRODUCTREVISION">
    <xsl:variable name="primKey2">
            <xsl:value-of select="substring-before(@primarykey, '/')"/>
    </xsl:variable>

     <xsl:for-each select="FEATURE/VALUE">
        <xsl:variable name="cdata">
            <xsl:value-of select="FEATURE/VALUE/text()"/>
        </xsl:variable>

        <xsl:if test="contains($cdata, $specChar)">
            <xsl:text>Found: </xsl:text>
            <xsl:value-of select="$primKey2" />
            <xsl:text>&#10;</xsl:text>
        </xsl:if>

    </xsl:for-each>

    </xsl:template>


I am not getting any hits, though there should be a couple of thousands.
A few questions:

Is there anything overly wrong with this stylesheet?
The XML is like

<PRODUCTREVISION>
<FEATURE><VALUE>...</VALUE></FEATURE>
</PRODUCTREVISION>

How do I have to mask a unicode char inside a string inside  a stylesheet?

Re: Catch a Character ...

Posted by Claus Kick <cl...@googlemail.com>.
2009/9/22 Michael Ludwig <ml...@as-guides.com>

> Claus Kick schrieb:
>
>> 2009/9/21 Michael Ludwig <ml...@as-guides.com>
>>
>>> Claus Kick schrieb:
>>>
>>>  <xsl:param name="specChar" select="'\u201C'" />
>>>>
>>>>  That's the Java syntax. Doesn't work in XML. Use a numerical
>>> character reference as per the XML spec.
>>>
>>>  <xsl:param name="specChar" select="'&#x201C;'" /> in hex, or
>>>  <xsl:param name="specChar" select="'&#8220;'" /> in decimal
>>>
>>
>> OK, I completely forgot about that. That actually was the issue ...
>>
>
> Good! (BTW, this list doesn't set the Reply-To header to the list,
> which I think it should really do.)
>
>  Ok, thank you so much for your pointers, I have actually quite a few
>> transformations to work on, so this will indeed help me deepening my
>> knowledge!
>>
>
> Okay then, here are some more pointers :-) It helps to get familiarized
> with the weird XML and XSLT terminology. As for XML:
>
> * numerical character reference - as above
> * entity reference - &lt; (built-in), &myEnt; (user-defined) - same
>  syntax, but not exactly the same thing
> * entities (XML/DTD)
>  * general entity
>    * external [general] parsed entity (EGPE)
>    * external [general] unparsed entity
>  * parameter entity
> * internal subset (DTD)
> * external subset (DTD)
>
> You can read up on those in the XML recommendation (specification). The
> terminology is a bit weird. The thing to keep in mind is that the stuff
> is easier than the terminology. As for XSLT:
>
> * attribute value template (AVT)
> * result tree fragment (RTF)
> * node set
> * literal result element
> * match pattern
> * node test
>
> See this page [1] on Dave Pawson's site, which is a great resource for
> XSLT. Also, see Jeni Tennison's site [2], which has very nice tutorials.
> Also, see a recent thread on XSL-List [3] for more pointers.
>
> Finally, Xalan is a 1.0 processor. XSLT 2.0 is much more powerful than
> 1.0. Personally, I find it quite okay to get started with 1.0, which is
> a much smaller language, and therefore easier to learn. But 1.0 has its
> limits, and when reaching those, it's good to know about (a) EXSLT [4],
> (b) extension functions (for example, JavaScript in Xalan), (c) the
> possibility to upgrade to 2.0 by switching to Saxon.
>
> [1] http://www.dpawson.co.uk/xsl/xslvocab.html
> [2] http://www.jenitennison.com/xslt/
> [3] http://markmail.org/thread/myu2h7quwbh4rjdi - How did you learn XSL?
> [4] http://exslt.org/
>
>
Hello Michael,

thanks for reminding me (yet again - sigh) to include the group.

Thanks for your help - regarding Xalan or not: We have Xalan in use in a
huge amount of different places (data storage/exchange platform) and I
currently dread to even think about switching.
Currently, there is simply no way I could ensure that no breakage happens.

Re: Catch a Character ...

Posted by Michael Ludwig <ml...@as-guides.com>.
Claus Kick schrieb:
> 2009/9/21 Michael Ludwig <ml...@as-guides.com>
>> Claus Kick schrieb:
>>
>>> <xsl:param name="specChar" select="'\u201C'" />
>>>
>> That's the Java syntax. Doesn't work in XML. Use a numerical
>> character reference as per the XML spec.
>>
>>  <xsl:param name="specChar" select="'&#x201C;'" /> in hex, or
>>  <xsl:param name="specChar" select="'&#8220;'" /> in decimal
>
> OK, I completely forgot about that. That actually was the issue ...

Good! (BTW, this list doesn't set the Reply-To header to the list,
which I think it should really do.)

> Ok, thank you so much for your pointers, I have actually quite a few
> transformations to work on, so this will indeed help me deepening my
> knowledge!

Okay then, here are some more pointers :-) It helps to get familiarized
with the weird XML and XSLT terminology. As for XML:

* numerical character reference - as above
* entity reference - &lt; (built-in), &myEnt; (user-defined) - same
   syntax, but not exactly the same thing
* entities (XML/DTD)
   * general entity
     * external [general] parsed entity (EGPE)
     * external [general] unparsed entity
   * parameter entity
* internal subset (DTD)
* external subset (DTD)

You can read up on those in the XML recommendation (specification). The
terminology is a bit weird. The thing to keep in mind is that the stuff
is easier than the terminology. As for XSLT:

* attribute value template (AVT)
* result tree fragment (RTF)
* node set
* literal result element
* match pattern
* node test

See this page [1] on Dave Pawson's site, which is a great resource for
XSLT. Also, see Jeni Tennison's site [2], which has very nice tutorials.
Also, see a recent thread on XSL-List [3] for more pointers.

Finally, Xalan is a 1.0 processor. XSLT 2.0 is much more powerful than
1.0. Personally, I find it quite okay to get started with 1.0, which is
a much smaller language, and therefore easier to learn. But 1.0 has its
limits, and when reaching those, it's good to know about (a) EXSLT [4],
(b) extension functions (for example, JavaScript in Xalan), (c) the
possibility to upgrade to 2.0 by switching to Saxon.

[1] http://www.dpawson.co.uk/xsl/xslvocab.html
[2] http://www.jenitennison.com/xslt/
[3] http://markmail.org/thread/myu2h7quwbh4rjdi - How did you learn XSL?
[4] http://exslt.org/

Cheers,

-- 
Michael Ludwig

Re: Catch a Character ...

Posted by Michael Ludwig <ml...@as-guides.com>.
Claus Kick schrieb:
>
> I am trying to catch a special character with the following style sheet:
>
> <xsl:param name="specChar" select="'\u201C'" />

That's the Java syntax. Doesn't work in XML. Use a numerical character
reference as per the XML spec.

   <xsl:param name="specChar" select="'&#x201C;'" /> in hex, or
   <xsl:param name="specChar" select="'&#8220;'" /> in decimal

>     <xsl:output indent="yes" method="xml"/>
>     <xsl:strip-space elements="*"/>
>
>     <xsl:template
> match="CATALOOM-OPENENGINE/PRODUCTS/PRODUCT/PRODUCTREVISION">

Not knowing your input, I can't be sure, but simply doing
match="PRODUCTREVISION" would probably be specific enough.

>     <xsl:variable name="primKey2">
>             <xsl:value-of select="substring-before(@primarykey, '/')"/>
>     </xsl:variable>

That's a very bad way of getting the value. Instead, use:

   <xsl:variable name="primKey2"
     select="substring-before(@primarykey, '/')"/>

Your version creates a so-called "result tree fragment" (RTF), which is
inefficient and cumbersome.

>      <xsl:for-each select="FEATURE/VALUE">
>         <xsl:variable name="cdata">
>             <xsl:value-of select="FEATURE/VALUE/text()"/>
>         </xsl:variable>

Same story here. In addition, avoid using the text() node test to get
the string value:

   <xsl:value-of select="FEATURE/VALUE"/>

But are you sure your input is FEATURE/VALUE/FEATURE/VALUE?

> How do I have to mask a unicode char inside a string inside  a
> stylesheet?

As shown above. Or simply as a literal, if you're using a Unicode
encoding and your input device and display support that character.

You can learn XSLT by reading XSL-List at Mulberrytech or any good book
in XSLT, like, for starters, the Pocket Guide by Evan Lenz.

-- 
Michael Ludwig