You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by David Weintraub <qa...@gmail.com> on 2009/10/01 17:54:31 UTC
Ensuring File Encoding
We are beginning to have problems with file encoding. We want to ensure all
files we commit are in fact encoded in UTF-8. I would like to add this
ability in my pre-commit hook, and reject any commits which has files in it
that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how
to test a file's encoding.
How can I test to see if a file is encoded in UTF-8?
--
David Weintraub
qazwart@gmail.com
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402633
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: Ensuring File Encoding
Posted by "B. Smith-Mannschott" <bs...@gmail.com>.
2009/10/1 David Weintraub <qa...@gmail.com>:
> We are beginning to have problems with file encoding. We want to ensure all files we commit are in fact encoded in UTF-8. I would like to add this ability in my pre-commit hook, and reject any commits which has files in it that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how to test a file's encoding.
>
> How can I test to see if a file is encoded in UTF-8?
I just do something like this. works well enough in practice since not
all possible byte sequences are vaild UTF-8.
def looks_like_utf8(bytes):
"""Attempt to decode bytes under the assumption that they are
UTF-8. Return False if this throws a UnicodeDecodeError, otherwise
return True."""
try:
bytes.decode("UTF-8")
except UnicodeDecodeError:
return False
else:
return True
def looks_like_utf8_file(path):
return looks_like_utf8(file(path, "rb").read())
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402661
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].
Re: Ensuring File Encoding
Posted by "B. Smith-Mannschott" <bs...@gmail.com>.
2009/10/1 B Smith-Mannschott <bs...@gmail.com>:
>
>
> 2009/10/1 David Weintraub <qa...@gmail.com>:
>> We are beginning to have problems with file encoding. We want to ensure all files we commit are in fact encoded in UTF-8. I would like to add this ability in my pre-commit hook, and reject any commits which has files in it that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how to test a file's encoding.
>>
>> How can I test to see if a file is encoded in UTF-8?
>
> I just do something like this. works well enough in practice since not all possible byte sequences are vaild UTF-8.
>
> def looks_like_utf8(bytes):
> """Attempt to decode bytes under the assumption that they are
> UTF-8. Return False if this throws a UnicodeDecodeError, otherwise
> return True."""
> try:
> bytes.decode("UTF-8")
> except UnicodeDecodeError:
> return False
> else:
> return True
>
> def looks_like_utf8_file(path):
> return looks_like_utf8(file(path, "rb").read())
G*D D**N F***$^#&^! gmail. see attachment.
------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402662
To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].