You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@subversion.apache.org by "Peter N. Lundblad" <pe...@famlundblad.se> on 2004/10/17 21:39:12 UTC

RFC: building eol and encoding support into svn_stream_t

Hi,

While svn has been localized, several encoding bugs have been found, we
have some open that are hard to solve, I'm sure we will introduce new ones
and there are probably more that we haven't found yet.

I think this is partly because we have different ways of recoding and
end-of-line translating text in different places:
- we have svn_cmdline_{printf,puts,...} that take arguments in UTF-8 and
output in console/locale encoding and native eol-style.
- we have some (no many) direct C stdio calls left in the code for various
reasons. They do eol-style translation, but expect their arguments to be
in the correct encoding.
- When we write to svn_stream_t* objects or apr_file_t*, we have to do
recoding and eol-translation ourselves, which is messy and error-prone. It
is also problematic since we have different encodings for stdout (the
console) and other files on some platforms, i.e. Windows. This is a
problem since a function that produces output needs to know if the output
is for a console or a file. This is the case for svn_client_diff. Also,
some functions need to turn off recoding for parts of their output.
Examples are proplist -v and diff where we might not know the encoding of
the propval or the file contents.

I think this is a real mess that we have to fix in some way.

My proposal is to add support for recoding and eol translation to our
svn_stream_t interface. Then we should use streams instread of apr_file_t
whereever possible, such as in svn_client_diff. It should be possible to
set the encoding for a stream and turn it into text/binary mode.

Why not just stack a recoding stream onto an existing stream like we do
for compression? That's because the creator of the stream knows the
encoding it should be outputing, but the user of the stream may need to
turn it into binary mode.

Since the stream objects are opaque to the outside world, I think we can
add encoding information to it without breaking any compability.

I'd like to get this done for 1.2, so we can get correct and consistent
output from our commands on all platforms.

Comments?

Regards,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: building eol and encoding support into svn_stream_t

Posted by "Peter N. Lundblad" <pe...@famlundblad.se>.
On Sun, 17 Oct 2004, Erik Huelsmann wrote:

> > My proposal is to add support for recoding and eol translation to our
> > svn_stream_t interface. Then we should use streams instread of apr_file_t
> > whereever possible, such as in svn_client_diff. It should be possible to
> > set the encoding for a stream and turn it into text/binary mode.
>
> > Why not just stack a recoding stream onto an existing stream like we do
> > for compression? That's because the creator of the stream knows the
> > encoding it should be outputing, but the user of the stream may need to
> > turn it into binary mode.
>
> Right. This is exactly the approach I was using.  This is not a
> solution for the use-case of libsvn_diff however: There you have some
> output which needs recoding and some which doesn't. All that needs to
> end up in the correct order in the output stream. I was still thinking
> about what to do there, but I think there are 2 solutions: tell the
> diff lib which encoding the header should be, let it recode and send
> the output (just as it is now) to the stream.  Or you can add a
> callback. Tell the client you are passing it header or file
> information and let the client decide what to do....
>
I probably wasn't clear on the main reason to build this into the streams.
I was envisioning a way for the producer to swithc between text and binary
mode. This would help svn_diff, that outputs both headers (text) and the
file contents (raw data, since we don't know the encoding).

svn_error_t svn_stream_set_encoding(...);
const char *svn_stream_get_encoding(...);
void svn_tream_set_text_mode(..., svn_boolean_t);
svn_boolean_t svn_stream_get_text_mode(...);

My goal is to be more consistent: either use svn_cmdline_* or
svn_stream_*. We wouldn't have to explicitly recode in so many places and
care about eol conversion. But looking more closely, we have to use
apr_file_t, for example in svn_client_diff, since it is necessary for the
external diff. So, maybe we have to live with this mess. :-(

I think I'll leave the encoding problems to you for the moment. I have
enough with other things to work on when time permits...

Thanks,
//Peter

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: building eol and encoding support into svn_stream_t

Posted by Erik Huelsmann <eh...@gmail.com>.
> While svn has been localized, several encoding bugs have been found, we
> have some open that are hard to solve, I'm sure we will introduce new ones
> and there are probably more that we haven't found yet.
> 
> I think this is partly because we have different ways of recoding and
> end-of-line translating text in different places:

[ enumeration of all methods used for creating output ]

> Also,
> some functions need to turn off recoding for parts of their output.
> Examples are proplist -v and diff where we might not know the encoding of
> the propval or the file contents.

> I think this is a real mess that we have to fix in some way.

Yup. I even started this. But then you started messing with the same
files (UTF-8 translation speedup) and I decided to toss the work and
stall until yours was done. I didn't get back to it yet.

> My proposal is to add support for recoding and eol translation to our
> svn_stream_t interface. Then we should use streams instread of apr_file_t
> whereever possible, such as in svn_client_diff. It should be possible to
> set the encoding for a stream and turn it into text/binary mode.

> Why not just stack a recoding stream onto an existing stream like we do
> for compression? That's because the creator of the stream knows the
> encoding it should be outputing, but the user of the stream may need to
> turn it into binary mode.

Right. This is exactly the approach I was using.  This is not a
solution for the use-case of libsvn_diff however: There you have some
output which needs recoding and some which doesn't. All that needs to
end up in the correct order in the output stream. I was still thinking
about what to do there, but I think there are 2 solutions: tell the
diff lib which encoding the header should be, let it recode and send
the output (just as it is now) to the stream.  Or you can add a
callback. Tell the client you are passing it header or file
information and let the client decide what to do....

> Since the stream objects are opaque to the outside world, I think we can
> add encoding information to it without breaking any compability.
> 
> I'd like to get this done for 1.2, so we can get correct and consistent
> output from our commands on all platforms.

I'd love to have it done by 1.2 too.  There is an issue for these
recoding problems in the issue tracker already, so if any relevant
discussion comes from this thread, we could add a link to it there.

> Comments?

+1 on dealing with this.
 
> Regards,
> //Peter

bye,

Erik.

---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org

Re: RFC: building eol and encoding support into svn_stream_t

Posted by Greg Hudson <gh...@MIT.EDU>.
On Sun, 2004-10-17 at 17:39, Peter N. Lundblad wrote:
> My proposal is to add support for recoding and eol translation to our
> svn_stream_t interface.

That's not a complete proposal, but I'm -1 on it pretty much no matter
how you flesh it out, because I think stacking streams is a cleaner way
of handling the issue.

Of course, you thought of that too:

> Why not just stack a recoding stream onto an existing stream like we do
> for compression? That's because the creator of the stream knows the
> encoding it should be outputing, but the user of the stream may need to
> turn it into binary mode.

This argument doesn't make sense to me.  If the user of the stream needs
to make the final decision about encoding and newlines, then the
producer should just use a specified value (UTF-8 and \n) and the
receiver can stack a recoding stream on top of it.


---------------------------------------------------------------------
To unsubscribe, e-mail: dev-unsubscribe@subversion.tigris.org
For additional commands, e-mail: dev-help@subversion.tigris.org