You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@lucy.apache.org by Nick Wellnhofer <we...@aevum.de> on 2016/09/30 18:10:56 UTC

[lucy-dev] (LUCY-311) Non-ASCII error messages from strerror cause exceptions

Lucifers,

Regarding issue LUCY-311:

------------------------------------

              Summary: Non-ASCII error messages from strerror cause exceptions
                  Key: LUCY-311
                  URL: https://issues.apache.org/jira/browse/LUCY-311
              Project: Lucy
           Issue Type: Bug
           Components: Store
             Reporter: Nick Wellnhofer


The code in Lucy/Store creates Err objects with error messages returned from 
strerror. Especially under non-English locales, these messages aren't 
necessarily valid UTF-8. Now that CB_VCatF checks C strings for invalid UTF-8, 
this results in an exception.

Here's an example with a German error message in Latin1 encoding:

http://www.cpantesters.org/cpan/report/20d4902a-8673-11e6-9bc4-e52240a03099

------------------------------------

Does any have a good idea how to solve this? I can see the following approaches.

1. Switch to numeric error codes. Not very informative. Maybe use custom 
messages for a couple of error codes.

2. Replace non-ASCII chars in the error message with Unicode replacement 
character.

3. Use strerror_l with the "C" locale and hope that error messages are ASCII, 
replacing unlikely non-ASCII chars. POSIX only.

4. Call nl_langinfo(CODESET) to detect the character set and try to convert. 
POSIX only. Complicated.

Nick