You are viewing a plain text version of this content. The canonical link for it is here.
Posted to fop-users@xmlgraphics.apache.org by Theresa Jayne Forster <th...@inbrand.co.uk> on 2011/09/08 13:48:23 UTC

Problem with foreign characters,

I have a minor issue and would like some help if I can,

Before I start there are a couple of pointers here.

1)      I cannot change the java code nor the version of FOP (modified 0.23)

2)      I have a partial resolution already in place

3)      I am just looking for the way to get the information I need.

 

I have code which scrapes a web page and rips out text turning it into the
downloadable pdf.

 

Some characters like é do not display correctly so I am doing a replace in a
template, 

I need to find what the characters are coming in as so I can convert them in
the replace, 

For instance the é character comes in as the character codes &#195;&#169;

How can I find the character codes coming in for all the other characters
(or convert them on the fly within xsl)

 

My template currently is as follows:

 

               <xsl:template name="loose_nasty_entities">

                                <xsl:param name="thisstring" select="."/>

 

                                <xsl:variable name="thisstring1">

                                                <xsl:call-template
name="replace">

 
<xsl:with-param name="str" select="$thisstring"/>

 
<xsl:with-param name="search-for" select="'&#226;&#8364;&#8220;'"/>

 
<xsl:with-param name="replace-with" select="'-'"/>

                                                </xsl:call-template>

                                </xsl:variable>

 

                                <xsl:variable name="thisstring2">

                                                <xsl:call-template
name="replace">

 
<xsl:with-param name="str" select="$thisstring1"/>

 
<xsl:with-param name="search-for" select="'&#239;&#187;&#191;'"/>

 
<xsl:with-param name="replace-with" select="''"/>

                                                </xsl:call-template>

                                </xsl:variable>

 

                                <xsl:variable name="thisstring3">

                                                <xsl:call-template
name="replace">

 
<xsl:with-param name="str" select="$thisstring2"/>

 
<xsl:with-param name="search-for" select="'&#194;'"/>

 
<xsl:with-param name="replace-with" select="''"/>

                                                </xsl:call-template>

                                </xsl:variable>

 

                                <xsl:variable name="thisstring4">

                                                <xsl:call-template
name="replace">

 
<xsl:with-param name="str" select="$thisstring3"/>

 
<xsl:with-param name="search-for" select="'&#195;&#169;'"/>

 
<xsl:with-param name="replace-with" select="'é'"/>

                                                </xsl:call-template>

                                </xsl:variable>                   

 

                                <xsl:variable name="thisstring5">

                                                <xsl:call-template
name="replace">

 
<xsl:with-param name="str" select="$thisstring4"/>

 
<xsl:with-param name="search-for" select="'&#195;&#8211;'"/>

 
<xsl:with-param name="replace-with" select="'&#214;'"/>

                                                </xsl:call-template>

                                </xsl:variable>                   

                                <xsl:value-of select="$thisstring5"/>

 

                </xsl:template>


 

Kindest regards

 


Theresa Forster

Senior Software Developer



 


RE: Problem with foreign characters,

Posted by Theresa Jayne Forster <th...@inbrand.co.uk>.
Well what happens is my xslt is calling in a html webpage via tagsoup 
So I have no visibility of it until it gets to me in the xsl...


Kindest regards


Theresa Forster
Senior Software Developer
-----Original Message-----
From: Pascal Sancho [mailto:pascal.sancho@takoma.fr] 
Sent: 08 September 2011 14:02
To: fop-users@xmlgraphics.apache.org
Subject: Re: Problem with foreign characters,

Hi theresa,

&#195;&#169; is an UTF-8 sequence (0xC3 0xA9) that encode EACUTE as UTF-8;
&#239;&#187;&#191; is an UTF-8 sequence (0xEF 0xBB 0xBB) that encode The
BOM as UTF-8 (this is the UTF-8 signature);

You should have a look on how char encoding is handled in your app, it
that seems to be an issue there.

That said, to convert a string in XSLT I imagine to ways:
 either in pure XSLT, using a recursive template (see below),
 or using embedded script (see [1] for Xalan).

<xsl:template match="text()">
  <xsl:call-template name="text"/>
</xsl:template>

<xsl:template name="text">
  <xsl:param name="str" select="."/>
  <xsl:param name="find" select="'&#xa0;'"/>
  <xsl:param name="replace" select="'&#x20;'"/>
  <xsl:choose>
    <xsl:when test="contains($str,$find)">
      <xsl:value-of select="substring-before($str,$find)"/>
      <xsl:value-of select="$replace"/>
      <xsl:call-template name="text">
        <xsl:with-param name="str"
            select="substring-after($str,$find)"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$str"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

[1] http://xml.apache.org/xalan-j/extensions.html


Le 08/09/2011 13:48, Theresa Jayne Forster a écrit :
> I have a minor issue and would like some help if I can,
> 
> Before I start there are a couple of pointers here.
> 1)      I cannot change the java code nor the version of FOP (modified 0.23)
> 2)      I have a partial resolution already in place
> 3)      I am just looking for the way to get the information I need.
> 
> I have code which scrapes a web page and rips out text turning it into
> the downloadable pdf.
> Some characters like é do not display correctly so I am doing a replace
> in a template,
> I need to find what the characters are coming in as so I can convert
> them in the replace,
> For instance the é character comes in as the character codes &#195;&#169;
> How can I find the character codes coming in for all the other
> characters (or convert them on the fly within xsl)
> 
> My template currently is as follows:
> <xsl:template name="loose_nasty_entities">
>   <xsl:param name="thisstring" select="."/>
>   <xsl:variable name="thisstring1">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring"/>
>       <xsl:with-param name="search-for" select="'&#226;&#8364;&#8220;'"/>
>       <xsl:with-param name="replace-with" select="'-'"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring2">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring1"/>
>       <xsl:with-param name="search-for" select="'&#239;&#187;&#191;'"/>
>       <xsl:with-param name="replace-with" select="''"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring3">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring2"/>
>       <xsl:with-param name="search-for" select="'&#194;'"/>
>       <xsl:with-param name="replace-with" select="''"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring4">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring3"/>
>       <xsl:with-param name="search-for" select="'&#195;&#169;'"/>
>       <xsl:with-param name="replace-with" select="'é'"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring5">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring4"/>
>       <xsl:with-param name="search-for" select="'&#195;&#8211;'"/>
>       <xsl:with-param name="replace-with" select="'&#214;'"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:value-of select="$thisstring5"/>
> </xsl:template>
> 
> Kindest regards
> Theresa Forster
-- 
Pascal

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org



-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 10.0.1392 / Virus Database: 1520/3880 - Release Date: 09/06/11



---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org


Re: Problem with foreign characters,

Posted by Pascal Sancho <pa...@takoma.fr>.
Hi theresa,

&#195;&#169; is an UTF-8 sequence (0xC3 0xA9) that encode EACUTE as UTF-8;
&#239;&#187;&#191; is an UTF-8 sequence (0xEF 0xBB 0xBB) that encode The
BOM as UTF-8 (this is the UTF-8 signature);

You should have a look on how char encoding is handled in your app, it
that seems to be an issue there.

That said, to convert a string in XSLT I imagine to ways:
 either in pure XSLT, using a recursive template (see below),
 or using embedded script (see [1] for Xalan).

<xsl:template match="text()">
  <xsl:call-template name="text"/>
</xsl:template>

<xsl:template name="text">
  <xsl:param name="str" select="."/>
  <xsl:param name="find" select="'&#xa0;'"/>
  <xsl:param name="replace" select="'&#x20;'"/>
  <xsl:choose>
    <xsl:when test="contains($str,$find)">
      <xsl:value-of select="substring-before($str,$find)"/>
      <xsl:value-of select="$replace"/>
      <xsl:call-template name="text">
        <xsl:with-param name="str"
            select="substring-after($str,$find)"/>
      </xsl:call-template>
    </xsl:when>
    <xsl:otherwise>
      <xsl:value-of select="$str"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:template>

[1] http://xml.apache.org/xalan-j/extensions.html


Le 08/09/2011 13:48, Theresa Jayne Forster a écrit :
> I have a minor issue and would like some help if I can,
> 
> Before I start there are a couple of pointers here.
> 1)      I cannot change the java code nor the version of FOP (modified 0.23)
> 2)      I have a partial resolution already in place
> 3)      I am just looking for the way to get the information I need.
> 
> I have code which scrapes a web page and rips out text turning it into
> the downloadable pdf.
> Some characters like é do not display correctly so I am doing a replace
> in a template,
> I need to find what the characters are coming in as so I can convert
> them in the replace,
> For instance the é character comes in as the character codes &#195;&#169;
> How can I find the character codes coming in for all the other
> characters (or convert them on the fly within xsl)
> 
> My template currently is as follows:
> <xsl:template name="loose_nasty_entities">
>   <xsl:param name="thisstring" select="."/>
>   <xsl:variable name="thisstring1">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring"/>
>       <xsl:with-param name="search-for" select="'&#226;&#8364;&#8220;'"/>
>       <xsl:with-param name="replace-with" select="'-'"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring2">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring1"/>
>       <xsl:with-param name="search-for" select="'&#239;&#187;&#191;'"/>
>       <xsl:with-param name="replace-with" select="''"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring3">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring2"/>
>       <xsl:with-param name="search-for" select="'&#194;'"/>
>       <xsl:with-param name="replace-with" select="''"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring4">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring3"/>
>       <xsl:with-param name="search-for" select="'&#195;&#169;'"/>
>       <xsl:with-param name="replace-with" select="'é'"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:variable name="thisstring5">
>     <xsl:call-template name="replace">
>       <xsl:with-param name="str" select="$thisstring4"/>
>       <xsl:with-param name="search-for" select="'&#195;&#8211;'"/>
>       <xsl:with-param name="replace-with" select="'&#214;'"/>
>     </xsl:call-template>
>   </xsl:variable>
>   <xsl:value-of select="$thisstring5"/>
> </xsl:template>
> 
> Kindest regards
> Theresa Forster
-- 
Pascal

---------------------------------------------------------------------
To unsubscribe, e-mail: fop-users-unsubscribe@xmlgraphics.apache.org
For additional commands, e-mail: fop-users-help@xmlgraphics.apache.org