You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@daffodil.apache.org by "Steve Lawrence (Jira)" <ji...@apache.org> on 2021/12/06 20:28:00 UTC

[jira] [Assigned] (DAFFODIL-2600) encoding varies with environment - UTF-8 not properly set somewhere

     [ https://issues.apache.org/jira/browse/DAFFODIL-2600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Steve Lawrence reassigned DAFFODIL-2600:
----------------------------------------

    Assignee: Steve Lawrence

> encoding varies with environment - UTF-8 not properly set somewhere
> -------------------------------------------------------------------
>
>                 Key: DAFFODIL-2600
>                 URL: https://issues.apache.org/jira/browse/DAFFODIL-2600
>             Project: Daffodil
>          Issue Type: Bug
>          Components: Infrastructure
>    Affects Versions: 3.1.0, 3.2.0
>            Reporter: Mike Beckerle
>            Assignee: Steve Lawrence
>            Priority: Major
>
> DFDL schemas and the behavior of parsers/unparsers are NOT supposed to be dependent on environment variables like LANG.
> Our diagnostic messages might be affected, but infoset contents and data contents should not be. So only negative tests which are checking error/warning messages should be sensitive to environmental things like LANG. 
> However, positive tests fail if UTF-8 is not properly specified environmentally. This is a bug because it means somewhere we're getting a default (environmentally specified) character set encoding, when we should be specifying the encoding. 
> In addition, Daffodil does require that systems are setup to enable Unicode.  A clear diagnostic is needed if, when building daffodil, the UTF-8 capabilities are not properly setup. This otherwise leads to a long list of errors that are not easily interpreted.
> Note that LANG=en_US isn't sufficient. On some systems unicode/UTF-8 is the default, on others some other charset for en_US.  A portable check here may be somewhat challenging, given that different systems have different defaults (e.g, Linux MINT, vs. Linux Red-Hat, .... and that's just considering Linux.) We know MS-Windows also requires specific UTF-8 configuration. So likely we need a test that
> (1) runs very early or first, so that the error message isn't lost in the mix
> (2) checks that UTF-8 behaviors are working properly for Daffodil, regardless of how that particular operating system variant must be configured to get those settings. 
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)