You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@allura.apache.org by Dave Brondsema <da...@brondsema.net> on 2020/02/11 22:54:19 UTC

[allura:tickets] #8350 non-unicode filenames in hg



---

** [tickets:#8350] non-unicode filenames in hg**

**Status:** in-progress
**Milestone:** unreleased
**Created:** Tue Feb 11, 2020 10:54 PM UTC by Dave Brondsema
**Last Updated:** Tue Feb 11, 2020 10:54 PM UTC
**Owner:** Dave Brondsema


with a non-unicode filename this error is threown

```
  File "/src/forgehg/forgehg/model/hg.py", line 324, in refresh_commit_info
    fake_tree = self._tree_from_changectx(obj)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 120, in wrapper
    return self.run_and_log(func, inst, *args, **kwargs)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 152, in run_and_log
    retval = func(*args, **kwargs)
  File "/src/forgehg/forgehg/model/hg.py", line 453, in _tree_from_changectx
    root.set_blob(filepath, oid)
  File "/src/allura/Allura/allura/model/repository.py", line 1847, in set_blob
    path = six.ensure_text(path)
  File "/var/local/env-allura/lib/python2.7/site-packages/six.py", line 904, in ensure_text
    return s.decode(encoding, errors)
  File "/var/local/env-allura/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 5: invalid continuation byte
```


---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] #8350 non-unicode filenames in hg

Posted by Dave Brondsema <da...@brondsema.net>.
- **status**: in-progress --> review
- **Comment**:

fixed on db/8350  This illustrates how it works to handle a name with a different encoding:

```
>>> 'data/\xCA\xEE\xEF\xE8\xFF scene.txt'.decode('utf-8')
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/var/local/env-allura/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 5: invalid continuation byte
>>> h.really_unicode('data/\xCA\xEE\xEF\xE8\xFF scene.txt')
u'data/\u041a\u043e\u043f\u0438\u044f scene.txt'
>>> print h.really_unicode('data/\xCA\xEE\xEF\xE8\xFF scene.txt')
data/Копия scene.txt
```

Unfortunately that only gets directory browsing working.  Trying to view or diff the file raises `    ManifestLookupError: data/Копия scene.txt@a18ff7d3ef0d: not found in manifest` because we converted the filename to unicode for mongo and web purposes, but then when requesting it from the hg repo it is encoded differently so the utf8 version of the filename is not found.  I don't know how to deal with that



---

** [tickets:#8350] non-unicode filenames in hg**

**Status:** review
**Milestone:** unreleased
**Created:** Tue Feb 11, 2020 10:54 PM UTC by Dave Brondsema
**Last Updated:** Tue Feb 11, 2020 10:54 PM UTC
**Owner:** Dave Brondsema


with a non-unicode filename this error is threown

```
  File "/src/forgehg/forgehg/model/hg.py", line 324, in refresh_commit_info
    fake_tree = self._tree_from_changectx(obj)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 120, in wrapper
    return self.run_and_log(func, inst, *args, **kwargs)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 152, in run_and_log
    retval = func(*args, **kwargs)
  File "/src/forgehg/forgehg/model/hg.py", line 453, in _tree_from_changectx
    root.set_blob(filepath, oid)
  File "/src/allura/Allura/allura/model/repository.py", line 1847, in set_blob
    path = six.ensure_text(path)
  File "/var/local/env-allura/lib/python2.7/site-packages/six.py", line 904, in ensure_text
    return s.decode(encoding, errors)
  File "/var/local/env-allura/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 5: invalid continuation byte
```


---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] #8350 non-unicode filenames in hg

Posted by Kenton Taylor <kt...@slashdotmedia.com.INVALID>.
- **status**: review --> closed



---

** [tickets:#8350] non-unicode filenames in hg**

**Status:** closed
**Milestone:** unreleased
**Created:** Tue Feb 11, 2020 10:54 PM UTC by Dave Brondsema
**Last Updated:** Tue Feb 11, 2020 10:57 PM UTC
**Owner:** Dave Brondsema


with a non-unicode filename this error is threown

```
  File "/src/forgehg/forgehg/model/hg.py", line 324, in refresh_commit_info
    fake_tree = self._tree_from_changectx(obj)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 120, in wrapper
    return self.run_and_log(func, inst, *args, **kwargs)
  File "/src/timermiddleware/timermiddleware/__init__.py", line 152, in run_and_log
    retval = func(*args, **kwargs)
  File "/src/forgehg/forgehg/model/hg.py", line 453, in _tree_from_changectx
    root.set_blob(filepath, oid)
  File "/src/allura/Allura/allura/model/repository.py", line 1847, in set_blob
    path = six.ensure_text(path)
  File "/var/local/env-allura/lib/python2.7/site-packages/six.py", line 904, in ensure_text
    return s.decode(encoding, errors)
  File "/var/local/env-allura/lib64/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xca in position 5: invalid continuation byte
```


---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.