You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@nutch.apache.org by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org> on 2005/12/20 00:59:30 UTC
[jira] Created: (NUTCH-145) ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM
ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM
--------------------------------------------------------------------------
Key: NUTCH-145
URL: http://issues.apache.org/jira/browse/NUTCH-145
Project: Nutch
Type: Bug
Components: web gui
Versions: 0.8-dev
Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1
Reporter: KuroSaka TeruHiko
Priority: Minor
When I ran ant build from within Eclipse, it failed on src/web/include/zh/header.xml and src/web/pages/zh/*.xml because "document does not h ave a root element" (translated from Japanese message).
At a closer look at these files, they have an invisible Unicode UTF-8 BOM character, that is EF BB BF in hex, or \357\273\277 in octal, at the beginning.
Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. (Note that BOM was orginially intended to be used to UTF-16 and UTF-32 encodings to self-identify the endianness. But Microsoft started using UTF-8-ized BOM as a character encoding signature.)
Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by LF, unlike other ??/*.xml files which use UNIX style EOL.
Fixed files are available.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-145) ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM
Posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ]
KuroSaka TeruHiko updated NUTCH-145:
------------------------------------
Attachment: NUTCH-145-fix.zip
header.xml should go to src/web/include/zh/header.xml, and other *.xml should go to src/web/pages/zh/
> ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM
> --------------------------------------------------------------------------
>
> Key: NUTCH-145
> URL: http://issues.apache.org/jira/browse/NUTCH-145
> Project: Nutch
> Type: Bug
> Components: web gui
> Versions: 0.8-dev
> Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1
> Reporter: KuroSaka TeruHiko
> Priority: Minor
> Attachments: NUTCH-145-fix.zip
>
> When I ran ant build from within Eclipse, it failed on src/web/include/zh/header.xml and src/web/pages/zh/*.xml because "document does not h ave a root element" (translated from Japanese message).
> At a closer look at these files, they have an invisible Unicode UTF-8 BOM character, that is EF BB BF in hex, or \357\273\277 in octal, at the beginning.
> Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. (Note that BOM was orginially intended to be used to UTF-16 and UTF-32 encodings to self-identify the endianness. But Microsoft started using UTF-8-ized BOM as a character encoding signature.)
> Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by LF, unlike other ??/*.xml files which use UNIX style EOL.
> Fixed files are available.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Updated: (NUTCH-145) build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
Posted by "KuroSaka TeruHiko (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ]
KuroSaka TeruHiko updated NUTCH-145:
------------------------------------
Summary: build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM (was: ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM)
> build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
> -------------------------------------------------------------------
>
> Key: NUTCH-145
> URL: http://issues.apache.org/jira/browse/NUTCH-145
> Project: Nutch
> Type: Bug
> Components: web gui
> Versions: 0.8-dev
> Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1
> Reporter: KuroSaka TeruHiko
> Priority: Minor
> Attachments: NUTCH-145-fix.zip
>
> When I ran ant build from within Eclipse, it failed on src/web/include/zh/header.xml and src/web/pages/zh/*.xml because "document does not h ave a root element" (translated from Japanese message).
> At a closer look at these files, they have an invisible Unicode UTF-8 BOM character, that is EF BB BF in hex, or \357\273\277 in octal, at the beginning.
> Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. (Note that BOM was orginially intended to be used to UTF-16 and UTF-32 encodings to self-identify the endianness. But Microsoft started using UTF-8-ized BOM as a character encoding signature.)
> Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by LF, unlike other ??/*.xml files which use UNIX style EOL.
> Fixed files are available.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Resolved: (NUTCH-145) build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
Posted by "Sami Siren (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-145?page=all ]
Sami Siren resolved NUTCH-145:
------------------------------
Fix Version: 0.8-dev
Resolution: Fixed
Assign To: Sami Siren
this is now committed, thanks
> build of war file fails on Chinese (zh) .xml files due to UTF-8 BOM
> -------------------------------------------------------------------
>
> Key: NUTCH-145
> URL: http://issues.apache.org/jira/browse/NUTCH-145
> Project: Nutch
> Type: Bug
> Components: web gui
> Versions: 0.8-dev
> Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1
> Reporter: KuroSaka TeruHiko
> Assignee: Sami Siren
> Priority: Minor
> Fix For: 0.8-dev
> Attachments: NUTCH-145-fix.zip
>
> When I ran ant build from within Eclipse, it failed on src/web/include/zh/header.xml and src/web/pages/zh/*.xml because "document does not h ave a root element" (translated from Japanese message).
> At a closer look at these files, they have an invisible Unicode UTF-8 BOM character, that is EF BB BF in hex, or \357\273\277 in octal, at the beginning.
> Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. (Note that BOM was orginially intended to be used to UTF-16 and UTF-32 encodings to self-identify the endianness. But Microsoft started using UTF-8-ized BOM as a character encoding signature.)
> Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by LF, unlike other ??/*.xml files which use UNIX style EOL.
> Fixed files are available.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira
[jira] Commented: (NUTCH-145) ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM
Posted by "Stefan Groschupf (JIRA)" <ji...@apache.org>.
[ http://issues.apache.org/jira/browse/NUTCH-145?page=comments#action_12360876 ]
Stefan Groschupf commented on NUTCH-145:
----------------------------------------
Pach files are always welcome, also if it take some time to be commited. :)
However just create a patch file like below and attach it to this issue:
NutchSourceHome: svn diff . > ../patchFileName.txt
> ant build of the war fie fails on Chinese (zh) .xml files due to UTF-8 BOM
> --------------------------------------------------------------------------
>
> Key: NUTCH-145
> URL: http://issues.apache.org/jira/browse/NUTCH-145
> Project: Nutch
> Type: Bug
> Components: web gui
> Versions: 0.8-dev
> Environment: Windows XP, Cygwin, Eclipse, JDK 1.4.1
> Reporter: KuroSaka TeruHiko
> Priority: Minor
>
> When I ran ant build from within Eclipse, it failed on src/web/include/zh/header.xml and src/web/pages/zh/*.xml because "document does not h ave a root element" (translated from Japanese message).
> At a closer look at these files, they have an invisible Unicode UTF-8 BOM character, that is EF BB BF in hex, or \357\273\277 in octal, at the beginning.
> Perhaps JDK 1.4.x UTF-8 converter does not handle the BOM for UTF-8 files. (Note that BOM was orginially intended to be used to UTF-16 and UTF-32 encodings to self-identify the endianness. But Microsoft started using UTF-8-ized BOM as a character encoding signature.)
> Also noticed was, they use MS-DOS style end-of-line sequence, CR followed by LF, unlike other ??/*.xml files which use UNIX style EOL.
> Fixed files are available.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see:
http://www.atlassian.com/software/jira