You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@httpd.apache.org by Greg Stein <gs...@lyra.org> on 2000/01/15 00:56:31 UTC

Re: cvs commit: apache-2.0/src/lib/apr/include apr_iconv.h

On 14 Jan 2000 rbb@hyperreal.org wrote:
>...
>   void ap_codepage_open(ap_iconv_t **convset, const char *topage, 
>                            const char *frompage, ap_context_t *context); 
>   void ap_translate_codepage(ap_iconv_t *convset, const char *inbuf, 
>                                 ap_size_t inbytes_left, const char *outbuf,
>                                 ap_size_t outbytes_left);
>   /* The purpose of ap_translate char is to translate one character
>    * at a time.  This needs to be written carefully so that it works
>    * with double-byte character sets. 
>    */
>   void ap_translate_char(ap_iconv_t *convset, char inchar, char outchar);
>   void ap_codepage_close(ap_iconv_t *convset)

Ryan,

You'll need ap_status_t on (at least) the first three. Certainly on the
creation, but a codepage conversion *can* produce errors (for invalid byte
sequences).

Also note that the prototype for ap_translate_char() has "char outchar".
It won't be able to do its job :-). Looks like the same for
"outbytes_left"?

I know you're just starting on it, but it might be nice to provide a
couple examples for "topage" and "frompage" ... i.e. what is the format of
the string? It would also be nice to provide a registration function so
that modules can register a conversion(s). For example:
  ./configure --enable-module=shift_jis

And maybe it is obvious, but expand your comment about "written carefully"
to note that the ap_iconv_t would store state to manage this process. This
would imply an ap_iconv_t could not be shared for translation. It also
implies that you may want a way to reset the state (e.g. "I'm about to
parse a new string; reset your inside-multi-byte flags"), along with a way
to say "the string is done; did I end inside a multibyte sequence?" (which
is an error)

And last but not least... maybe include a variant that operates on files?
(or at least provide a utility that implements this version in terms of
ap_translate_codepage)  It would be nice to translate an input file
directly into an output socket.

Cheers,
-g

-- 
Greg Stein, http://www.lyra.org/


Re: cvs commit: apache-2.0/src/lib/apr/include apr_iconv.h

Posted by Greg Stein <gs...@lyra.org>.
On Sat, 15 Jan 2000 rbb@apache.org wrote:
>...
> I know, I was struggling with some other stuff at the time, and just
> wanted to get this out there.  I'll fix it Monday or Tuesday.  :-)

No rush! Just pointing it out. Getting around to it in March is fine :-)

>...
> > I know you're just starting on it, but it might be nice to provide a
> > couple examples for "topage" and "frompage" ... i.e. what is the format of
> > the string? It would also be nice to provide a registration function so
> > that modules can register a conversion(s). For example:
> >   ./configure --enable-module=shift_jis
> 
> Huh?  This is a wrapper around iconv.  The strings are the names of the

Ah!  See... there is my problem :-)

I thought you were starting a new interface and was providing suggestions
based on some stuff that came up on a Python list a couple months ago.

My mistake... please ignore :-)

>...
> Greg,
> 
> Write the code.  I know next to nothing about codepage translation.  I

Sorry... I read your frustration there. I would never intend to sign you
up for writing code; I was offering some suggestions for translation
utilities is all. If the intent to wrap the iconv model, then never mind.

I had never heard of iconv. I just went and found it... nice lib, and I
see where you're going. This darn email... no way to tell what's going on
on the other end... :-)

Cheers,
-g


-- 
Greg Stein, http://www.lyra.org/


Re: cvs commit: apache-2.0/src/lib/apr/include apr_iconv.h

Posted by rb...@apache.org.
> Ryan,
> 
> You'll need ap_status_t on (at least) the first three. Certainly on the
> creation, but a codepage conversion *can* produce errors (for invalid byte
> sequences).
> 

I know, I was struggling with some other stuff at the time, and just
wanted to get this out there.  I'll fix it Monday or Tuesday.  :-)

> Also note that the prototype for ap_translate_char() has "char outchar".
> It won't be able to do its job :-). Looks like the same for
> "outbytes_left"?

My mistake.

> 
> I know you're just starting on it, but it might be nice to provide a
> couple examples for "topage" and "frompage" ... i.e. what is the format of
> the string? It would also be nice to provide a registration function so
> that modules can register a conversion(s). For example:
>   ./configure --enable-module=shift_jis

Huh?  This is a wrapper around iconv.  The strings are the names of the
two character sets.  I don't see how a registration function would work.
The idea of these functions is that the code that uses APR can have a
consistent interface with which to do codepage translation.  What kind of
module would register a translation?  If you are thinking of an Apache
module, that belongs in Apache, not APR.  If you are thinking of an APR
module, those don't even exist yet, and they may never exist, we would
need to find a good use for them first.

> 
> And maybe it is obvious, but expand your comment about "written carefully"
> to note that the ap_iconv_t would store state to manage this process. This
> would imply an ap_iconv_t could not be shared for translation. It also
> implies that you may want a way to reset the state (e.g. "I'm about to
> parse a new string; reset your inside-multi-byte flags"), along with a way
> to say "the string is done; did I end inside a multibyte sequence?" (which
> is an error)
> 

I don't believe we would need to store state.  If a character is two
bytes, this function could just return APR_EDOUBLEBYTE which means don't
use this function for double byte character sets.  Or, it could be
modified to return something other than a character.  I leave this
decision for people who understand the implications better than I do.  I
just want the people who do write it to remember that case.

> And last but not least... maybe include a variant that operates on files?
> (or at least provide a utility that implements this version in terms of
> ap_translate_codepage)  It would be nice to translate an input file
> directly into an output socket.

Greg,

Write the code.  I know next to nothing about codepage translation.  I
only did what I have done so far because there were two groups in IBM that
wanted it there.  I have given next to no thought about this, but people
weren't working on it because there was nothing to work on.  I did this
entire thing in about fifteen minutes, because I am hacking on other stuff
in 2.0 right now, and this isn't a priority for me.

I expect somebody who really understands the issues with codepage
translation to finish this stuff, I was just trying to get things started.
If nobody else does work on this stuff before next week, I may look at it
again.  For right now though, I have bigger fish to fry.

Ryan


_______________________________________________________________________________
Ryan Bloom                        	rbb@ntrnet.net
6209 H Shanda Dr.
Raleigh, NC 27609		Ryan Bloom -- thinker, adventurer, artist,
				     writer, but mostly, friend.
-------------------------------------------------------------------------------