You are viewing a plain text version of this content. The canonical link for it is here.
Posted to users@subversion.apache.org by David Weintraub <qa...@gmail.com> on 2009/10/01 17:54:31 UTC

Ensuring File Encoding

We are beginning to have problems with file encoding. We want to ensure all
files we commit are in fact encoded in UTF-8. I would like to add this
ability in my pre-commit hook, and reject any commits which has files in it
that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how
to test a file's encoding.

How can I test to see if a file is encoded in UTF-8?

-- 
David Weintraub
qazwart@gmail.com

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402633

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: Ensuring File Encoding

Posted by "B. Smith-Mannschott" <bs...@gmail.com>.
2009/10/1 David Weintraub <qa...@gmail.com>:
> We are beginning to have problems with file encoding. We want to ensure all files we commit are in fact encoded in UTF-8. I would like to add this ability in my pre-commit hook, and reject any commits which has files in it that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how to test a file's encoding.
>
> How can I test to see if a file is encoded in UTF-8?

I just do something like this. works well enough in practice since not
all possible byte sequences are vaild UTF-8.

def looks_like_utf8(bytes):
    """Attempt to decode bytes under the assumption that they are
UTF-8. Return False if this throws a UnicodeDecodeError, otherwise
return True."""
    try:
        bytes.decode("UTF-8")
    except UnicodeDecodeError:
        return False
    else:
        return True

def looks_like_utf8_file(path):
    return looks_like_utf8(file(path, "rb").read())

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402661

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].

Re: Ensuring File Encoding

Posted by "B. Smith-Mannschott" <bs...@gmail.com>.
2009/10/1 B Smith-Mannschott <bs...@gmail.com>:
>
>
> 2009/10/1 David Weintraub <qa...@gmail.com>:
>> We are beginning to have problems with file encoding. We want to ensure all files we commit are in fact encoded in UTF-8. I would like to add this ability in my pre-commit hook, and reject any commits which has files in it that aren't encoded in UTF-8 (well, text files). But I am not 100% sure how to test a file's encoding.
>>
>> How can I test to see if a file is encoded in UTF-8?
>
> I just do something like this. works well enough in practice since not all possible byte sequences are vaild UTF-8.
>
> def looks_like_utf8(bytes):
> """Attempt to decode bytes under the assumption that they are
> UTF-8. Return False if this throws a UnicodeDecodeError, otherwise
> return True."""
> try:
> bytes.decode("UTF-8")
> except UnicodeDecodeError:
> return False
> else:
> return True
>
> def looks_like_utf8_file(path):
> return looks_like_utf8(file(path, "rb").read())

G*D D**N F***$^#&^! gmail. see attachment.

------------------------------------------------------
http://subversion.tigris.org/ds/viewMessage.do?dsForumId=1065&dsMessageId=2402662

To unsubscribe from this discussion, e-mail: [users-unsubscribe@subversion.tigris.org].