You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by Karl Fogel <kf...@newton.ch.collab.net> on 2002/07/18 20:29:43 UTC

Removing the --enable-utf8 flag

I'd like to remove the --enable-utf8 configuration option from
Subversion, even though HEAD of apr/apr-util doesn't have working i18n
at the moment.  Here's how this would work:

Currently, subversion/libsvn_subr/utf.c has two compile-time
conditional code paths:

   * If --enable-utf8, then attempt conversion from/to native/utf8.
     If a conversion function returns error, then bomb out entirely.

   * Else if not --enable-utf8, then never attempt conversion, but
     just check for "illegal" chars in the data we would have
     converted.  (Illegal here means eighth-bit set and non-whitespace
     control characters.  See check_non_ascii() in utf.c.)

Here's how this would become a run-time decision:

   * Always attempt conversion.  If the conversion fails (for example
     because the underlying xlation mechanism isn't working, as is
     currently the case), *then* check for non_ascii, and bomb only if
     there are illegal characters in the data.  Otherwise, we proceed,
     effectively treating the data as if it were already UTF-8,
     because we know it's all safe ascii characters.

Thus we remove a compile-time option, become more robust, make
everyone's lives simpler, and fulfill our requisite ten hours of
mandatory asteroid mining per week.

Does anyone see any problems with this?

Even the shifted charset encodings use ESC or something to signal the
shift, so I feel pretty confident that check_non_ascii() will rarely
allow a false positive to pass.  But i18n is a treacherous minefield
-- anyone who sees a hole in this plan, please speak up now.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Ulrich Drepper <dr...@redhat.com>.
On Sun, 2002-07-21 at 00:14, Ulrich Drepper wrote:

>   /* Check whether the buffer contains any non-ASCII characters.  */
>   while (len-- > 0)
>     {
>       if (*buf < 0x20 || *buf >= 0x7f)
>         goto out;
>       ++buf;
>     }

The test is incomplete.  '\n' and '\t' should also be allowed (I guess
'\r' has to be added, too).

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

RE: Removing the --enable-utf8 flag

Posted by Ulrich Drepper <dr...@redhat.com>.
On Sun, 2002-07-21 at 14:52, Sander Striker wrote:

> How about adding a new target:  make bootstrap.

That'd be nice.  Also, I happen to have a svn binary lying around, just
in case.  There are also statically binaries available for download.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

RE: Removing the --enable-utf8 flag

Posted by Sander Striker <st...@apache.org>.
> From: sussman@collab.net [mailto:sussman@collab.net]
> Sent: 22 July 2002 00:09

> Ulrich Drepper <dr...@redhat.com> writes:
> 
> > On Sun, 2002-07-21 at 07:01, Ben Collins-Sussman wrote:
> > 
> > > So maybe this means we should post tarballs more often than every 3-6
> > > weeks.  If we *did* set up a system that posted a tarball every single
> > > night, then we wouldn't need to scream at people to bootstrap to HEAD
> > > anymore.
> > 
> > Right, that would help a lot.
> > 
> > An additional plus would be if you could unpack the sources and is
> > necessary/wanted update from the repository.  This way you could get
> > somebody to try out  patches you just checked in.
> 
> Sure... it sounds like you'd want the "nightly tarballs" to be live
> working copies.  But even so, how could the working copies update
> themselves completely?  Most people have cvs installed, so the apr,
> apr-util and neon trees could be updated -- but not the main svn tree
> itself, unless you have an svn binary lying around already.  It
> becomes a chicken-and-egg problem.  :-)

How about adding a new target:  make bootstrap.

Sander


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Ben Collins-Sussman <su...@collab.net>.
Ulrich Drepper <dr...@redhat.com> writes:

> On Sun, 2002-07-21 at 07:01, Ben Collins-Sussman wrote:
> 
> > So maybe this means we should post tarballs more often than every 3-6
> > weeks.  If we *did* set up a system that posted a tarball every single
> > night, then we wouldn't need to scream at people to bootstrap to HEAD
> > anymore.
> 
> Right, that would help a lot.
> 
> An additional plus would be if you could unpack the sources and is
> necessary/wanted update from the repository.  This way you could get
> somebody to try out  patches you just checked in.

Sure... it sounds like you'd want the "nightly tarballs" to be live
working copies.  But even so, how could the working copies update
themselves completely?  Most people have cvs installed, so the apr,
apr-util and neon trees could be updated -- but not the main svn tree
itself, unless you have an svn binary lying around already.  It
becomes a chicken-and-egg problem.  :-)


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Ulrich Drepper <dr...@redhat.com>.
On Sun, 2002-07-21 at 07:01, Ben Collins-Sussman wrote:

> So maybe this means we should post tarballs more often than every 3-6
> weeks.  If we *did* set up a system that posted a tarball every single
> night, then we wouldn't need to scream at people to bootstrap to HEAD
> anymore.

Right, that would help a lot.

An additional plus would be if you could unpack the sources and is
necessary/wanted update from the repository.  This way you could get
somebody to try out  patches you just checked in.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Re: Removing the --enable-utf8 flag

Posted by Ben Collins-Sussman <su...@collab.net>.
"Sander Striker" <st...@apache.org> writes:

> From what I remember of talking with Ulrich, he would just like to
> unpack a recent tarball, since he doesn't have much time for hacking on
> the side.  Requiring to bootstrap is too high a treshold.  I've heard
> this same 'complaint' from other parties.

So maybe this means we should post tarballs more often than every 3-6
weeks.  If we *did* set up a system that posted a tarball every single
night, then we wouldn't need to scream at people to bootstrap to HEAD
anymore.

In the current situation, we have no choice but to make people do
that;  otherwise people end up filing bugs that have already been
fixed.  :-(   The code changes too quickly.




---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Removing the --enable-utf8 flag

Posted by Sander Striker <st...@apache.org>.
> From: sussman@collab.net [mailto:sussman@collab.net]
> Sent: 21 July 2002 15:34

> Ulrich Drepper <dr...@redhat.com> writes:
> 
> > I'm honestly frustated that there is not one single repository where
> > all the needed sources can be get and a simple configure script to
> > get the build started.
> 
> I don't understand your frustration.  Or you frustrated that
> Subversion isn't 100% self-contained?  That it depends on external
> libraries?   That's an incredibly common thing.
> 
> At the moment, if you check out Subversion, the ./autogen.sh script
> tells you exactly what CVS commands to run to checkout apr, apr-util,
> and neon as sub-trees.  Then you run ./configure and make, as usual.
> 
> Would it be better if ./autogen.sh (or ./configure) simply bombed out
> saying, "libapr not found on system"?

Re: Removing the --enable-utf8 flag

Posted by Ben Collins-Sussman <su...@collab.net>.
Ulrich Drepper <dr...@redhat.com> writes:

> I'm honestly frustated that there is not one single repository where
> all the needed sources can be get and a simple configure script to
> get the build started.

I don't understand your frustration.  Or you frustrated that
Subversion isn't 100% self-contained?  That it depends on external
libraries?   That's an incredibly common thing.

At the moment, if you check out Subversion, the ./autogen.sh script
tells you exactly what CVS commands to run to checkout apr, apr-util,
and neon as sub-trees.  Then you run ./configure and make, as usual.

Would it be better if ./autogen.sh (or ./configure) simply bombed out
saying, "libapr not found on system"?



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Ulrich Drepper <dr...@redhat.com>.
On Sat, 2002-07-20 at 22:26, Karl Fogel wrote:

> Hmmm, I re-read your definition and think I now understand what you
> mean by "the ASCII encoding" above, and that it effectively is the
> definition I was expecting.
> 
> [karl makes a try...]  Care to patch?

I never managed to find time to build svn myself.  I'm honestly
frustated that there is not one single repository where all the needed
sources can be get and a simple configure script to get the build
started.  So all I can do is provide code which you'll have to bend to
fit in.  is_ascii_text is the main function.  You should call it before
accepting the bytes of the text as ASCII.  I don't know how to get the
current locale's coset name in your environment (on Unix systems that'd
be a nl_langinfo(CODESET) call) so please fill this in.

The code (completely untested but it should work):


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
static const char *
normalize_codeset (const char *name)
{
  name = strdup (name);
  if (name != NULL)
    {
      char *rp = name;
      char *wp = name;

      while (*rp != '\0')
        {
          if (isalnum (*rp))
            *wp++ = *rp;
          ++rp;
        }
      *wp = '\0';
    }
  return name;
}


/* Keep sorted according to strcmp.  */
static const char *ascii_safe_names[] =
{
  "ascii", "iso88591", "iso88592"
};
const size_t nascii_safe_names = sizeof (ascii_safe_names) / sizeof
(ascii_safe_names[0]);


static int
xstrcmp  (const void *a, const void *b)
{
  return strcmp (*(const char **) a, *(const char **) b);
}


bool
is_ascii_text (const char *buf, size_t len)
{
  const char *codeset;
  size_t cnt;
  bool result = false;

  /* Get the codeset for the current locale.  */
  // XXX fill in
  codeset = get_current_codeset ();
  if (codeset == NULL)
    /* Out of memory.  */
    return false;

  /* Normalize.  This creates a copy of the string.  */
  codeset = normalize_codeset (codeset);

  /* Check whether this is the name of a ASCII-save codeset.  */
  if (bsearch (&codeset, ascii_safe_names, nascii_safe_names,
               sizeof (ascii_safe_names[0]), xstrcmp) == NULL)
    /* No.  We cannot use the text.  */
    goto out;

  /* Check whether the buffer contains any non-ASCII characters.  */
  while (len-- > 0)
    {
      if (*buf < 0x20 || *buf >= 0x7f)
        goto out;
      ++buf;
    }

  /* All went well.  */
  result = true;

 out:
  free (codeset);

  return result;
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Ulrich Drepper <dr...@redhat.com> writes:
> > An encoding is ASCII-safe if
> > 
> >   From it's initial state it is not possible to create a character
> >   which does not have the ASCII encoding when only using ASCII input
> >   bytes.

Hmmm, I re-read your definition and think I now understand what you
mean by "the ASCII encoding" above, and that it effectively is the
definition I was expecting.

[karl makes a try...]  Care to patch?

-K


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Ulrich Drepper <dr...@redhat.com> writes:
> I suggest one additional test before running the non_ascii test for the
> entire string.  Check whether the encoding used is known to be
> ASCII-safe.  Only if this test succeeds should the non_ascii tests be
> performed.

Fair warning: I am blatantly trying to manipulate you into doing work.
You are free to refuse :-).

> The checks for ASCII-safeness can be performed by string comparisons
> with the name of the encoding of the incoming data.  The names of all
> the safe encodings could be collected.  Variations in names can and
> probably should be eliminated by normalization before the comparison.

Okay, makes sense to me.

> An encoding is ASCII-safe if
> 
>   From it's initial state it is not possible to create a character
>   which does not have the ASCII encoding when only using ASCII input
>   bytes.

This I don't quite understand.

I understand that there are many encodings that meet this criterion,
including most (all?) of the stateful encodings.  But just because the
encoding is "ASCII-safe" doesn't mean that the input ASCII always
bears any meaningful relationship to the output ASCII.

The definition I expected was something like:

   An encoding E is ASCII-safe if it is more or less a superset of
   7-bit ASCII, in which the 7-bit codes mean the same thing in E as
   they do in ASCII.  For example, ISO-8859-1 and UTF-8 are both
   ASCII-safe.

The reason I expected this definition is that it means we effectively
*get* UTF-8 by simply accepting the characters from the local
encoding.  But if we hit an eighth-bit character, then we know the
game is over (the current implementation also filters out a lot of the
control characters, just in case, but that's a detail).

So our task is to compose a list of the encodings that meet this
criterion, and in the non-conversion case in utf.c, make sure that the
client is using one of those encodings before accepting the data.

?

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Ulrich Drepper <dr...@redhat.com>.
On Thu, 2002-07-18 at 13:29, Karl Fogel wrote:
> Here's how this would become a run-time decision:
> 
>    * Always attempt conversion.  If the conversion fails (for example
>      because the underlying xlation mechanism isn't working, as is
>      currently the case), *then* check for non_ascii, and bomb only if
>      there are illegal characters in the data.  Otherwise, we proceed,
>      effectively treating the data as if it were already UTF-8,
>      because we know it's all safe ascii characters.

I like the idea of removing the option but this outlines algorithm is
very unsafe.  Admittedly it will work in most cases but not all.  And
for something like a version control tool this isn't enough IMO.

Look at this "message":

  M@]@`n@J@ZK

Consists only of ASCII characters and therefore would pass the non_ascii
test.  But it's not readable and not comparable to other strings  since
it's encoded using IBM870 [*]:

$ echo -n 'M@]@`n@J@ZK' | iconv -f IBM870; echo
( ) -> [ ].


I suggest one additional test before running the non_ascii test for the
entire string.  Check whether the encoding used is known to be
ASCII-safe.  Only if this test succeeds should the non_ascii tests be
performed.

The checks for ASCII-safeness can be performed by string comparisons
with the name of the encoding of the incoming data.  The names of all
the safe encodings could be collected.  Variations in names can and
probably should be eliminated by normalization before the comparison.

An encoding is ASCII-safe if

  From it's initial state it is not possible to create a character
  which does not have the ASCII encoding when only using ASCII input
  bytes.

The catch stateful encodings etc only the printable ASCII characters are
allowed.  I.e., 0x20 <= ch < 0x7f && isprint (ch).  Note that the
sometimes available isascii() test in <ctype.h> is *not* sufficient.



[*] This is one example I came up with right away.  Yes, it is a
constructed example.  There are certainly more compelling examples.

-- 
---------------.                          ,-.   1325 Chesapeake Terrace
Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
Red Hat          `--' drepper at redhat.com   `------------------------

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Bill Tutt" <ra...@lyra.org> writes:
> I'm trying to be optimistic and hoping that apr-iconv will be bug free
> and shippable by 1.0. :)

Well, as long as the apr-iconv xlation routines succeed, the
check_non_ascii() case will never get invoked :-)...



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Re: Removing the --enable-utf8 flag

Posted by Bill Tutt <ra...@lyra.org>.
I'm trying to be optimistic and hoping that apr-iconv will be bug free
and shippable by 1.0. :)

Bill


> -----Original Message-----
> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> Sent: Thursday, July 18, 2002 1:55 PM
> To: Branko Cibej
> Cc: Bill Tutt; dev@subversion.tigris.org
> Subject: Re: Removing the --enable-utf8 flag
> 
> Branko Čibej <br...@xbc.nu> writes:
> > Your plan is the same as my plan. You're a mind-reading
> > plagiator. Have you no respect for intellectual property? I'm
calling
> > the RIAA and BSA.
> 
> What?  You mean you had my plan *before* I even thought of it?  THAT'S
> the most brutal, dastardly form of plagiarism of all!  You have no
> right to complain; in fact, you ought to be all shamed on yourself.
> 
> Ahem.
> 
> Okay, great.  If Bill thinks it's fine for Alpha at least, and you had
> this plan all along, then I can at least feel I'm not missing anything
> big here.  I will proceed with the plan, while waiting for Bill's
> objections to maintaining it in the long run.
> 
> -K
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Branko Čibej <br...@xbc.nu> writes:
> Your plan is the same as my plan. You're a mind-reading
> plagiator. Have you no respect for intellectual property? I'm calling
> the RIAA and BSA.

What?  You mean you had my plan *before* I even thought of it?  THAT'S
the most brutal, dastardly form of plagiarism of all!  You have no
right to complain; in fact, you ought to be all shamed on yourself.

Ahem.

Okay, great.  If Bill thinks it's fine for Alpha at least, and you had
this plan all along, then I can at least feel I'm not missing anything
big here.  I will proceed with the plan, while waiting for Bill's
objections to maintaining it in the long run.

-K

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Branko Čibej <br...@xbc.nu>.
Karl,

Your plan is the same as my plan. You're a mind-reading plagiator. Have 
you no respect for intellectual property? I'm calling the RIAA and BSA.


Karl Fogel wrote:

>"Bill Tutt" <ra...@lyra.org> writes:
>  
>
>>I think this is fine for Alpha. I'm not so sure for 1.0 though.
>>    
>>
>
>Why not for 1.0?
>

It should be quite O.K. and totally cool even for 1.0, if the locale 
charset is an ASCII derivative. It will fail horribly in 
EBCDIC-dereivative locales without translation.

>(I had been planning to leave it in permanently.)
>

+1. I for one am not about to lose sleep worrying about what happens to 
people who insist on using ENCDIC and don't have the foresight to arm 
themselves with libiconv.

>Thanks for the quick response,
>-K
>
>  
>
>>>   * Always attempt conversion.  If the conversion fails (for example
>>>     because the underlying xlation mechanism isn't working, as is
>>>     currently the case), *then* check for non_ascii, and bomb only if
>>>     there are illegal characters in the data.  Otherwise, we proceed,
>>>     effectively treating the data as if it were already UTF-8,
>>>     because we know it's all safe ascii characters.
>>>      
>>>


-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
"Bill Tutt" <ra...@lyra.org> writes:
> I think this is fine for Alpha. I'm not so sure for 1.0 though.

Why not for 1.0?

(I had been planning to leave it in permanently.)

Thanks for the quick response,
-K

> >    * Always attempt conversion.  If the conversion fails (for example
> >      because the underlying xlation mechanism isn't working, as is
> >      currently the case), *then* check for non_ascii, and bomb only if
> >      there are illegal characters in the data.  Otherwise, we proceed,
> >      effectively treating the data as if it were already UTF-8,
> >      because we know it's all safe ascii characters.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Branko Čibej <br...@xbc.nu>.
Karl Fogel wrote:

>Branko Čibej <br...@xbc.nu> writes:
>  
>
>>How about reverting to the ascii check only if apr_xlate_open returns
>>APR_ENOTIMPL? In other words, only when there's no iconv available.
>>
>>Huh, but then it could be #ifdef APR_HAS_XLATE.
>>    
>>
>
>Sure.
>
>I've committed rev 2586 now, but we can change the conditions under
>which it falls back to check_non_ascii() vs errors.  Just wanted
>mainly to get that #ifdef out of there, so we don't have to always ask
>"Did you compile with or without --enable-utf8" anymore. :-)
>  
>

O.K.

BTW, once my apr_filepath_encoding patch is in APR, the logic can be:

    encoding = apr_filepath_encoding();
    if (!APR_HAS_XLATE || encoding == APR_FILEPATH_ENCODING_UNKNOWN)
        check_non_ascii(path);
    else if (path_is_from_command_line_arg
             || encoding == APR_FILEPATH_ENCODING_DEFAULT)
        apr_xlate(path);
    else if (encoding == APR_FILEPATH_ENCODING_UTF8)
        just_use(path);
    else
        alarums_and_excursione();


IIRC, APR_HAS_XLATE will always be 0 or 1 in HEAD apr-util, so that you 
can use it in runtime conditions.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Branko Čibej <br...@xbc.nu> writes:
> How about reverting to the ascii check only if apr_xlate_open returns
> APR_ENOTIMPL? In other words, only when there's no iconv available.
> 
> Huh, but then it could be #ifdef APR_HAS_XLATE.

Sure.

I've committed rev 2586 now, but we can change the conditions under
which it falls back to check_non_ascii() vs errors.  Just wanted
mainly to get that #ifdef out of there, so we don't have to always ask
"Did you compile with or without --enable-utf8" anymore. :-)

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Branko Čibej <br...@xbc.nu>.
Karl Fogel wrote:

>Marcus Comstedt <ma...@mc.pp.se> writes:
>  
>
>>The operative word here is "meantime".  If you have to set an
>>envronment variable or something to activate it, then fine.  But you
>>should have to be _aware_ that you're using a workaround when that is
>>the case.  It should not kick in silently.  Then you might not even
>>realize that you have a potential problem.
>>    
>>
>
>Sure, but we should do some risk analysis to determine the real-life
>probability of someone being kicked here.  With the current code, it
>looks low to me, but need more data...
>

How about reverting to the ascii check only if apr_xlate_open returns 
APR_ENOTIMPL? In other words, only when there's no iconv available.

Huh, but then it could be #ifdef APR_HAS_XLATE.

>>(And I'm still talking about 1.0 here btw.)
>>    
>>

Same here.

-- 
Brane Čibej   <br...@xbc.nu>   http://www.xbc.nu/brane/


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:
> The operative word here is "meantime".  If you have to set an
> envronment variable or something to activate it, then fine.  But you
> should have to be _aware_ that you're using a workaround when that is
> the case.  It should not kick in silently.  Then you might not even
> realize that you have a potential problem.

Sure, but we should do some risk analysis to determine the real-life
probability of someone being kicked here.  With the current code, it
looks low to me, but need more data...

> (And I'm still talking about 1.0 here btw.)

Agreed.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Marcus Comstedt <ma...@mc.pp.se>.
Karl Fogel <kf...@newton.ch.collab.net> writes:

> Marcus Comstedt <ma...@mc.pp.se> writes:
> > I'm with Bill here.  While basically working, this approach is not as
> > robust as it could be.  It's possible to fool it, by using things like
> > ISO-646-whatever for example.  (I don't seriously expect anyone to use
> > any ISO-646 variant other than ISO-646-US novadays, that was in the
> > time before ISO-8859, but it still feels a little flakey.)  My
> > intention for the --disable-utf8 variants was as a transitory measure
> > while we iron out the problems with the real code.  It was never meant
> > to stay.
> 
> I think it's a question of how dangerous is it, versus how much will
> it help people who are having iconv problems but need Subversion
> working in the meantime?  Anyway, I'll check it in for now, since it's
> basically the same level of robustness as the old code, just without
> the extra compile-time option.  We can iron out the robustness at our
> leisure.

The operative word here is "meantime".  If you have to set an
envronment variable or something to activate it, then fine.  But you
should have to be _aware_ that you're using a workaround when that is
the case.  It should not kick in silently.  Then you might not even
realize that you have a potential problem.

(And I'm still talking about 1.0 here btw.)


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Karl Fogel <kf...@newton.ch.collab.net>.
Marcus Comstedt <ma...@mc.pp.se> writes:
> I'm with Bill here.  While basically working, this approach is not as
> robust as it could be.  It's possible to fool it, by using things like
> ISO-646-whatever for example.  (I don't seriously expect anyone to use
> any ISO-646 variant other than ISO-646-US novadays, that was in the
> time before ISO-8859, but it still feels a little flakey.)  My
> intention for the --disable-utf8 variants was as a transitory measure
> while we iron out the problems with the real code.  It was never meant
> to stay.

I think it's a question of how dangerous is it, versus how much will
it help people who are having iconv problems but need Subversion
working in the meantime?  Anyway, I'll check it in for now, since it's
basically the same level of robustness as the old code, just without
the extra compile-time option.  We can iron out the robustness at our
leisure.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: Removing the --enable-utf8 flag

Posted by Marcus Comstedt <ma...@mc.pp.se>.
"Bill Tutt" <ra...@lyra.org> writes:

> I think this is fine for Alpha. I'm not so sure for 1.0 though.

I'm with Bill here.  While basically working, this approach is not as
robust as it could be.  It's possible to fool it, by using things like
ISO-646-whatever for example.  (I don't seriously expect anyone to use
any ISO-646 variant other than ISO-646-US novadays, that was in the
time before ISO-8859, but it still feels a little flakey.)  My
intention for the --disable-utf8 variants was as a transitory measure
while we iron out the problems with the real code.  It was never meant
to stay.


  // Marcus



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

RE: Removing the --enable-utf8 flag

Posted by Bill Tutt <ra...@lyra.org>.
I think this is fine for Alpha. I'm not so sure for 1.0 though.

FYI,
Bill
----
Do you want a dangerous fugitive staying in your flat?
No.
Well, don't upset him and he'll be a nice fugitive staying in your flat.
 

> -----Original Message-----
> From: Karl Fogel [mailto:kfogel@newton.ch.collab.net]
> Sent: Thursday, July 18, 2002 1:30 PM
> To: dev@subversion.tigris.org
> Subject: Removing the --enable-utf8 flag
> 
> I'd like to remove the --enable-utf8 configuration option from
> Subversion, even though HEAD of apr/apr-util doesn't have working i18n
> at the moment.  Here's how this would work:
> 
> Currently, subversion/libsvn_subr/utf.c has two compile-time
> conditional code paths:
> 
>    * If --enable-utf8, then attempt conversion from/to native/utf8.
>      If a conversion function returns error, then bomb out entirely.
> 
>    * Else if not --enable-utf8, then never attempt conversion, but
>      just check for "illegal" chars in the data we would have
>      converted.  (Illegal here means eighth-bit set and non-whitespace
>      control characters.  See check_non_ascii() in utf.c.)
> 
> Here's how this would become a run-time decision:
> 
>    * Always attempt conversion.  If the conversion fails (for example
>      because the underlying xlation mechanism isn't working, as is
>      currently the case), *then* check for non_ascii, and bomb only if
>      there are illegal characters in the data.  Otherwise, we proceed,
>      effectively treating the data as if it were already UTF-8,
>      because we know it's all safe ascii characters.
> 
> Thus we remove a compile-time option, become more robust, make
> everyone's lives simpler, and fulfill our requisite ten hours of
> mandatory asteroid mining per week.
> 
> Does anyone see any problems with this?
> 
> Even the shifted charset encodings use ESC or something to signal the
> shift, so I feel pretty confident that check_non_ascii() will rarely
> allow a false positive to pass.  But i18n is a treacherous minefield
> -- anyone who sees a hole in this plan, please speak up now.
> 
> -K
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
> For additional commands, e-mail: dev-help@subversion.tigris.org



---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org