You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by "Michael A. Smith" <mi...@smith-li.com> on 2021/06/25 15:48:44 UTC

Python CLI Update – dumping binary to standard output

I wonder if I could get a review of this pull request
https://github.com/apache/avro/pull/1270 from the larger avro dev
community.

One thing I noticed when adding type hints to python avro is that some
of our CLI tooling supports output to standardio, but standardio is
really text, not binary. I had to relax type checking to support this
special case directly in avro.datafile and avro.io.

I think the counterpart Parquet tools would automatically output
base64-encoded binary parquet in such cases, but I'm pretty sure that
the Java avro tools does just naïvely dump binary to standard io
without any checking.

Should we change this behavior? What do you think?

Thanks,
Michael

Re: Python CLI Update – dumping binary to standard output

Posted by "Michael A. Smith" <mi...@smith-li.com>.
Thanks, Spencer, that helped a bunch. I was mixing the standard io up
with what appears on the terminal. Python's sys.std* objects are all
TextIO objects, but TextIO all have buffer properties that are
BinaryIO. I can work with that.

On Fri, Jun 25, 2021 at 12:40 PM Spencer Nelson <s...@spencerwnelson.com> wrote:
>
> Unix’s stdio file descriptors just push around binary data, and don’t come
> with any expectations about text encoding. I would prefer that CLI tools
> emit bytes to stdout, and not change the encoding in any way like with
> base64.
>
> That said, when stdout is being sent to a terminal, I can see an argument
> for attempting to encode data as text, since ttys usually dont handle
> non-text data well. You could use the Python standard library’s os.isatty
> function to detect whether stdout (or stderr) are connected to a terminal.
>
> On Fri, Jun 25, 2021 at 8:49 AM Michael A. Smith <mi...@smith-li.com>
> wrote:
>
> > I wonder if I could get a review of this pull request
> > https://github.com/apache/avro/pull/1270 from the larger avro dev
> > community.
> >
> > One thing I noticed when adding type hints to python avro is that some
> > of our CLI tooling supports output to standardio, but standardio is
> > really text, not binary. I had to relax type checking to support this
> > special case directly in avro.datafile and avro.io.
> >
> > I think the counterpart Parquet tools would automatically output
> > base64-encoded binary parquet in such cases, but I'm pretty sure that
> > the Java avro tools does just naïvely dump binary to standard io
> > without any checking.
> >
> > Should we change this behavior? What do you think?
> >
> > Thanks,
> > Michael
> >

Re: Python CLI Update – dumping binary to standard output

Posted by "Michael A. Smith" <mi...@smith-li.com>.
Thanks, Spencer, that helped a bunch. I was mixing the standard io up
with what appears on the terminal. Python's sys.std* objects are all
TextIO objects, but TextIO all have buffer properties that are
BinaryIO. I can work with that.

On Fri, Jun 25, 2021 at 12:40 PM Spencer Nelson <s...@spencerwnelson.com> wrote:
>
> Unix’s stdio file descriptors just push around binary data, and don’t come
> with any expectations about text encoding. I would prefer that CLI tools
> emit bytes to stdout, and not change the encoding in any way like with
> base64.
>
> That said, when stdout is being sent to a terminal, I can see an argument
> for attempting to encode data as text, since ttys usually dont handle
> non-text data well. You could use the Python standard library’s os.isatty
> function to detect whether stdout (or stderr) are connected to a terminal.
>
> On Fri, Jun 25, 2021 at 8:49 AM Michael A. Smith <mi...@smith-li.com>
> wrote:
>
> > I wonder if I could get a review of this pull request
> > https://github.com/apache/avro/pull/1270 from the larger avro dev
> > community.
> >
> > One thing I noticed when adding type hints to python avro is that some
> > of our CLI tooling supports output to standardio, but standardio is
> > really text, not binary. I had to relax type checking to support this
> > special case directly in avro.datafile and avro.io.
> >
> > I think the counterpart Parquet tools would automatically output
> > base64-encoded binary parquet in such cases, but I'm pretty sure that
> > the Java avro tools does just naïvely dump binary to standard io
> > without any checking.
> >
> > Should we change this behavior? What do you think?
> >
> > Thanks,
> > Michael
> >

Re: Python CLI Update – dumping binary to standard output

Posted by Spencer Nelson <s...@spencerwnelson.com>.
Unix’s stdio file descriptors just push around binary data, and don’t come
with any expectations about text encoding. I would prefer that CLI tools
emit bytes to stdout, and not change the encoding in any way like with
base64.

That said, when stdout is being sent to a terminal, I can see an argument
for attempting to encode data as text, since ttys usually dont handle
non-text data well. You could use the Python standard library’s os.isatty
function to detect whether stdout (or stderr) are connected to a terminal.

On Fri, Jun 25, 2021 at 8:49 AM Michael A. Smith <mi...@smith-li.com>
wrote:

> I wonder if I could get a review of this pull request
> https://github.com/apache/avro/pull/1270 from the larger avro dev
> community.
>
> One thing I noticed when adding type hints to python avro is that some
> of our CLI tooling supports output to standardio, but standardio is
> really text, not binary. I had to relax type checking to support this
> special case directly in avro.datafile and avro.io.
>
> I think the counterpart Parquet tools would automatically output
> base64-encoded binary parquet in such cases, but I'm pretty sure that
> the Java avro tools does just naïvely dump binary to standard io
> without any checking.
>
> Should we change this behavior? What do you think?
>
> Thanks,
> Michael
>