You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@ant.apache.org by Jerry Chimey <je...@yahoo.com> on 2008/11/16 23:48:12 UTC
ANT replaceregexp problem
Hi,
I am seeing a weird problem of using replaceregexp in ANT.
Basically, for the non-English characters, they are updated even though they are not matched by the regular expression.
The following is my original input:
-------------<replace.xml before> ------
<?xml version="1.0" encoding="UTF-8"?>
<web-app>
<url>file://localhost/$server_root$/deployed/archive/wcm.contentviewer.1001/ilwwcm-localrendering-portlet.war</url> <title>â€Ø¨Ø¯ÙˆÙ† شكل عامâ€</title>
</web-app>
------------------------------------------------
My original intension is ONLY to update the content related to <url> tag. The following is the code:
<target name ="testRegex">
<replaceregexp file="replace.xml"
match="file://localhost/\$server_root\$/deployed/archive/[a-zA-Z.0-9]+/"
replace="file://localhost/$server_root$/installableApps/"
byline="yes"/>
</target>
After I executed this target, I got the following result:
-----------------<replace.xml after>------------
<?xml version="1.0" encoding="UTF-8"?>
<web-app>
<url>file://localhost/$server_root$/installableApps/ilwwcm-localrendering-portlet.war</url>
<title>�بدون شكل عام�</title>
</web-app>
----------------------------------------------
The url was replaced correctly; however, some characters in the title tag were replaced with '?'.
I used ANT 1.7 and tried to use jakarta-oro-2.0.8 to perform regular expression replacement but still got the same problem.
C:\>ant -Dant.regexp.regexpimpl=org.apache.tools
.ant.util.regexp.JakartaOroRegexp -f test.xml testRegex
Buildfile: test.xml
testRegex:
BUILD SUCCESSFUL
Total time: 0 seconds
Have anyone seen this problem and have any idea how to fix it?
Thanks
Jerry
Re: ANT replaceregexp problem
Posted by Jerry Chimey <je...@yahoo.com>.
Hi, Brian:
You are right. After setting the file.encoding, it is working for now. I agree with you that this is not a good solution for XML replacement.
I am going to try XML Task or XSLT to do this type of work.
Thanks again.
Jerry
________________________________
From: Brian Agnew <br...@oopsconsultancy.com>
To: Ant Users List <us...@ant.apache.org>
Sent: Monday, November 17, 2008 4:02:08 AM
Subject: Re: ANT replaceregexp problem
Your XML file specifies the char encoding being used (UTF-8), but I'm
guessing that the replaceregexp task won't understand this, since it's not
XML-aware. So your replacement string may be getting written out using
another encoding - most likely your default environment encoding.
Try setting -Dfile.encoding=utf8 (or it might be utf-8, or similar. You
get the idea).
Note that if you want to do XML replacement, and you need to maintain
encodings, then XMLTask may be a better solution.
Brian
On Sun, November 16, 2008 22:48, Jerry Chimey wrote:
> Hi,
> I am seeing a weird problem of using replaceregexp in ANT.
> Basically, for the non-English characters, they are updated even though
> they are not matched by the regular expression.
>
> The following is my original input:
> -------------<replace.xml before> ------
> <?xml version="1.0" encoding="UTF-8"?>
> <web-app>
> <url>file://localhost/$server_root$/deployed/archive/wcm.contentviewer.1001/ilwwcm-localrendering-portlet.war</url>
> <title>â€Ø¨Ø¯ÙˆÙ†
> شكل عامâ€</title>
> </web-app>
> ------------------------------------------------
>
> My original intension is ONLY to update the content related to <url>
> tag. The following is the code:
> <target name ="testRegex">
> <replaceregexp file="replace.xml"
> match="file://localhost/\$server_root\$/deployed/archive/[a-zA-Z.0-9]+/"
> replace="file://localhost/$server_root$/installableApps/"
> byline="yes"/>
>
> </target>
>
> After I executed this target, I got the following result:
> -----------------<replace.xml after>------------
> <?xml version="1.0" encoding="UTF-8"?>
> <web-app>
> <url>file://localhost/$server_root$/installableApps/ilwwcm-localrendering-portlet.war</url>
> <title>�بدون شكل عام�</title>
> </web-app>
> ----------------------------------------------
>
> The url was replaced correctly; however, some characters in the title tag
> were replaced with '?'.
> I used ANT 1.7 and tried to use jakarta-oro-2.0.8 to perform regular
> expression replacement but still got the same problem.
>
> C:\>ant -Dant.regexp.regexpimpl=org.apache.tools
> .ant.util.regexp.JakartaOroRegexp -f test.xml testRegex
> Buildfile: test.xml
>
> testRegex:
>
> BUILD SUCCESSFUL
> Total time: 0 seconds
>
> Have anyone seen this problem and have any idea how to fix it?
>
> Thanks
>
> Jerry
>
>
>
--
Brian Agnew http://www.oopsconsultancy.com
OOPS Consultancy Ltd
Tel: +44 (0)7720 397526
Fax: +44 (0)20 8682 0012
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org
Re: ANT replaceregexp problem
Posted by Brian Agnew <br...@oopsconsultancy.com>.
Your XML file specifies the char encoding being used (UTF-8), but I'm
guessing that the replaceregexp task won't understand this, since it's not
XML-aware. So your replacement string may be getting written out using
another encoding - most likely your default environment encoding.
Try setting -Dfile.encoding=utf8 (or it might be utf-8, or similar. You
get the idea).
Note that if you want to do XML replacement, and you need to maintain
encodings, then XMLTask may be a better solution.
Brian
On Sun, November 16, 2008 22:48, Jerry Chimey wrote:
> Hi,
> I am seeing a weird problem of using replaceregexp in ANT.
> Basically, for the non-English characters, they are updated even though
> they are not matched by the regular expression.
>
> The following is my original input:
> -------------<replace.xml before> ------
> <?xml version="1.0" encoding="UTF-8"?>
> <web-app>
> <url>file://localhost/$server_root$/deployed/archive/wcm.contentviewer.1001/ilwwcm-localrendering-portlet.war</url>
> <title>âبدÙÙ
> Ø´Ù٠عاÙ
â</title>
> </web-app>
> ------------------------------------------------
>
> My original intension is ONLY to update the content related to <url>
> tag. The following is the code:
> <target name ="testRegex">
> <replaceregexp file="replace.xml"
> match="file://localhost/\$server_root\$/deployed/archive/[a-zA-Z.0-9]+/"
> replace="file://localhost/$server_root$/installableApps/"
> byline="yes"/>
>
> </target>
>
> After I executed this target, I got the following result:
> -----------------<replace.xml after>------------
> <?xml version="1.0" encoding="UTF-8"?>
> <web-app>
> <url>file://localhost/$server_root$/installableApps/ilwwcm-localrendering-portlet.war</url>
> <title>â?بدÙÙ Ø´Ù٠عاÙ
â?</title>
> </web-app>
> ----------------------------------------------
>
> The url was replaced correctly; however, some characters in the title tag
> were replaced with '?'.
> I used ANT 1.7 and tried to use jakarta-oro-2.0.8 to perform regular
> expression replacement but still got the same problem.
>
> C:\>ant -Dant.regexp.regexpimpl=org.apache.tools
> .ant.util.regexp.JakartaOroRegexp -f test.xml testRegex
> Buildfile: test.xml
>
> testRegex:
>
> BUILD SUCCESSFUL
> Total time: 0 seconds
>
> Have anyone seen this problem and have any idea how to fix it?
>
> Thanks
>
> Jerry
>
>
>
--
Brian Agnew http://www.oopsconsultancy.com
OOPS Consultancy Ltd
Tel: +44 (0)7720 397526
Fax: +44 (0)20 8682 0012
---------------------------------------------------------------------
To unsubscribe, e-mail: user-unsubscribe@ant.apache.org
For additional commands, e-mail: user-help@ant.apache.org