You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@allura.apache.org by Dave Brondsema <da...@brondsema.net> on 2015/06/18 22:21:28 UTC

[allura:tickets] #7757 UnicodeDecodeError when generating code snapshot on hg repo



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** open
**Milestone:** unreleased
**Labels:** support sf-current 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Thu Jan 29, 2015 07:12 PM UTC
**Owner:** nobody

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by LXj <go...@gmail.com>.
This seems like a problem in mercurial-py, though I am still investigating it. The trigger for error is this directory, which contains some files with unicode symbols in their names https://sourceforge.net/p/nhunspell/code/ci/default/tree/NHunspell/UnitTests/

Interestingly, if you click on one of these, you will get a 500 error https://sourceforge.net/p/nhunspell/code/ci/default/tree/NHunspell/UnitTests/de_DE_%C3%B6_frami.aff


---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** in-progress
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Mon Jun 29, 2015 03:45 PM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Dave Brondsema <da...@brondsema.net>.
- **Reviewer**: Dave Brondsema



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** review
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Sat Jul 04, 2015 10:58 AM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Dave Brondsema <da...@brondsema.net>.
- **status**: review --> closed
- **Comment**:

Looks good, one step forward for the code snapshot.  We can leave the `ManifestLookupError` issue when browsing for another day I guess.



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** closed
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Tue Jul 07, 2015 04:18 PM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Dave Brondsema <da...@brondsema.net>.
- **labels**: support, sf-current, 42cc --> support, sf-current, 42cc, sf-1



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** in-progress
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Fri Jun 19, 2015 07:38 AM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by LXj <go...@gmail.com>.
So I made an interesting finding. The file in question has a name de_DE_ö_frami.aff. 

I cloned the repo and made some experiments in the same dir as the file. First I tried usual tricks with .encode('utf-8') and the like, but it didn't help. But then it struck me:

    In [6]: path = os.listdir('.')[-11]

    In [7]: path
    Out[7]: 'de_DE_\xf6_frami.aff'

    In [8]: print path
    de_DE_�_frami.aff

    n [12]: "de_DE_ö_frami.aff" == 'de_DE_\xf6_frami.aff'
    Out[12]: False

    In [13]: os.listdir(u'.')[-11]
    Out[13]: 'de_DE_\xf6_frami.aff'

    In [14]: "de_DE_ö_frami.aff"
    Out[14]: 'de_DE_\xc3\xb6_frami.aff'

So this seems like python's os.listdir reports the filename incorrectly! I experimented with cyrillic file names and found no problems

    In [8]: os.listdir('.')[-8]
    Out[8]: '\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'

    In [9]: print '\xd0\xbf\xd1\x80\xd0\xb8\xd0\xb2\xd0\xb5\xd1\x82'
    привет

    In [10]: os.listdir(u'.')[-8]
    Out[10]: u'\u043f\u0440\u0438\u0432\u0435\u0442'

    In [12]: print os.listdir(u'.')[-8]
    привет

Conclusion: we have a strange rare bug with python's os module scrambling unicode filenames. 


---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** in-progress
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Tue Jun 30, 2015 05:13 PM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Igor Bondarenko <je...@gmail.com>.
- **labels**: support, sf-current --> support, sf-current, 42cc
- **status**: open --> in-progress
- **assigned_to**: Igor Bondarenko



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** in-progress
**Milestone:** unreleased
**Labels:** support sf-current 42cc 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Thu Jun 18, 2015 08:21 PM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Igor Bondarenko <je...@gmail.com>.
- **status**: in-progress --> review
- **Comment**:

Closed #811. `forgehg:ib/7757`

The problem was that we had path to archive directory as unicode, and mercurial tried to decode it while concatenating it with file name, which is utf-8 encoded plain string, not unicode. I've fixed it by encoding path to archive directory as utf-8 plain string.

I could not fix the issue with browsing, though https://sourceforge.net/p/nhunspell/code/ci/default/tree/NHunspell/UnitTests/de_DE_%C3%B6_frami.aff

The error is:

     ManifestLookupError: NHunspell/UnitTests/de_DE_��_frami.aff@d1baa762529d: not found in manifest

I did some digging:

1. String comes from browser and we unquote it and convert to unicode: `u'/NHunspell/UnitTests/de_DE_\xf6_frami.aff'`
2. Then we encode it to pass to mercurial and it looks like this: `'de_DE_\xc3\xb6_frami.aff'`.
3. But mercurial manifest contains: `'NHunspell/UnitTests/de_DE_\xf6_frami.aff'` (looks like (1), but str, not unicode)

I've tried several places to fix it, but did't succeed.



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** review
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Thu Jul 02, 2015 11:38 AM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Igor Bondarenko <je...@gmail.com>.
I saw this error while working on a docker ticket on all repos with unicode filenames. Generating UTF-8 locale and setting it as default fixed that for docker:

~~~~~
# Snapshot generation for SVN (and maybe other SCMs) might fail without this
RUN locale-gen en_US.UTF-8
ENV LANG en_US.UTF-8
~~~~~

It seems like deployment specific thing to me. Server might be missing some locale, which is needed to properly decode filenames. We'll investigate it further to confirm.


---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** in-progress
**Milestone:** unreleased
**Labels:** support sf-current 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Tue Jun 30, 2015 06:57 PM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Dave Brondsema <da...@brondsema.net>.
- **labels**: support --> support, sf-current
- Description has changed:

Diff:

~~~~



~~~~

- **Component**: allura-forge-classic --> General



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** open
**Milestone:** unreleased
**Labels:** support sf-current 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Thu Jan 29, 2015 07:12 PM UTC
**Owner:** nobody

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.

[allura:tickets] Ticket 7757 discussion

Posted by Dave Brondsema <da...@brondsema.net>.
- **labels**: support, sf-current, 42cc, sf-1 --> support, 42cc, sf-1



---

** [tickets:#7757] UnicodeDecodeError when generating code snapshot on hg repo**

**Status:** closed
**Milestone:** unreleased
**Labels:** support 42cc sf-1 
**Created:** Fri Oct 10, 2014 03:14 PM UTC by Anonymous
**Last Updated:** Tue Jul 07, 2015 04:26 PM UTC
**Owner:** Igor Bondarenko

*Originally created by:* jwb1980

https://sourceforge.net/p/forge/site-support/8700/

----

[forge:site-support:#8700]


----

>From IRC #sourceForge
download the source code of this project https://sourceforge.net/p/nhunspell/code/ci/default/tree/
3:55 When I try the snapshot Sourceforge says "We're having trouble finding that snapshot. Would you like to resubmit?"
3:55 TortoiseSVN gives me error 500 in my fork repository

----




---

Sent from forge-allura.apache.org because dev@allura.apache.org is subscribed to https://forge-allura.apache.org/p/allura/tickets/

To unsubscribe from further messages, a project admin can change settings at https://forge-allura.apache.org/p/allura/admin/tickets/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.