You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Alexander Farber <al...@gmail.com> on 2020/09/28 16:43:31 UTC

How to use avro-python3 on Windows 10 to parse files?

Hello and good evening!

With python 3.8.5 and avro 1.10.0 installed via pip I have tried running
the following script:

import os, avro
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

reader = DataFileReader(open("48.avro", "rb"), DatumReader())
for d in reader:
    print(d)
reader.close()

Unfortunately, nothing is printed by the script.

Then I have noticed, that the avro-python3 is recommended and have
installed it with pip install avro-python3

Now the pip list shows both packages, but how to switch to using the newer
one?

I have tried "pip uninstall avro", but then import avro fails.

What is the correct module name for avro-python3, how to import it in my
script please?

I have also asked my question at Stackoverflow and there you can see my
screenshots:
https://stackoverflow.com/questions/64105500/how-to-use-avro-python3-on-windows-10-to-parse-files

Thank you for any hints
Alex

Re: How to use avro-python3 on Windows 10 to parse files?

Posted by Alexander Farber <al...@gmail.com>.
Hello and thanks for your replies!

I have to apologize - the avro file I was using did not contain any useful
data. The reason for my confusion is that a colleague (in a screen sharing
session) was using a different file with the same name while testing for me.

Now I have tried both avro and fastavro modules on a different avro file
and both worked. I will look at PySpark as well.

RE: How to use avro-python3 on Windows 10 to parse files?

Posted by David Beswick <Da...@bupa.com.au>.
Hello all,

I think Alexander is reading a binary AVRO file -- just that there is a serialised JSON string in one of the fields. I actually have read this exact kind of data from Azure Event Hub using the Python libraries before, so it should work.

Your code looks fine to me Alexander -- maybe you'd like to try reading a test file from another source just to make sure that there's nothing obvious like your file maybe being empty. I know that Event Hub writes empty files in its default configuration.

Lastly, I'll just note that in the end I used the "fastavro" library for my code rather than this reference (if that's the right word) Python implementation. It uses a C implementation and I found it to be 4 times faster, so I would recommend moving to that library at an early stage if speed is important.

From: Michael A. Smith <mi...@smith-li.com>
Sent: Tuesday, 29 September 2020 7:42 AM
To: user@avro.apache.org
Subject: Re: How to use avro-python3 on Windows 10 to parse files?



________________________________
On Mon, Sep 28, 2020 at 14:24 Alexander Farber <al...@gmail.com>> wrote:
Hi Michael -

On Mon, Sep 28, 2020 at 8:19 PM Michael A. Smith <mi...@smith-li.com>> wrote:
Where did you find that avro-python3 is recommended? I would like to update that.

here: https://stackoverflow.com/a/43606979/165071<https://stackoverflow.com/a/43606979/165071>

avro-python3 is deprecated. Your should use the avro library instead.

If the avro library for python doesn't work, please let me know.

Yes, it does not work for me with python 3.8.0 - please see below

On Mon, Sep 28, 2020 at 12:43 Alexander Farber <al...@gmail.com>> wrote:
With python 3.8.5 and avro 1.10.0 installed via pip I have tried running the following script:

import os, avro
from avro.datafile import DataFileReader, DataFileWriter
from avro.io<http://avro.io> import DatumReader, DatumWriter

reader = DataFileReader(open("48.avro", "rb"), DatumReader())
for d in reader:
    print(d)
reader.close()

Unfortunately, nothing is printed by the script.
https://stackoverflow.com/questions/64105500/how-to-use-avro-python3-on-windows-10-to-parse-files<https://stackoverflow.com/questions/64105500/how-to-use-avro-python3-on-windows-10-to-parse-files>

Michael, how could I debug this issue? I am an avro and python newbie

Greetings from Germany
Alex

Aha, sorry for reading carelessly! I think you are using a json-encoded avro file, yes? Python avro only supports binary encoded avro at this time. Does that help?

I will admit that I've never tried to use avro on windows. But I don't think that is the problem here.

Bupa A&NZ email disclaimer: The information contained in this email and any attachments is confidential and may be subject to copyright or other intellectual property protection. If you are not the intended recipient, you are not authorized to use or disclose this information, and we request that you notify us by reply mail or telephone and delete the original message from your mail system.

Re: How to use avro-python3 on Windows 10 to parse files?

Posted by "Michael A. Smith" <mi...@smith-li.com>.
On Mon, Sep 28, 2020 at 14:24 Alexander Farber <al...@gmail.com>
wrote:

> Hi Michael -
>
> On Mon, Sep 28, 2020 at 8:19 PM Michael A. Smith <mi...@smith-li.com>
> wrote:
>
>> Where did you find that avro-python3 is recommended? I would like to
>> update that.
>>
>
> here: https://stackoverflow.com/a/43606979/165071
>
>
>> avro-python3 is deprecated. Your should use the avro library instead.
>>
>> If the avro library for python doesn't work, please let me know.
>>
>
> Yes, it does not work for me with python 3.8.0 - please see below
>

> On Mon, Sep 28, 2020 at 12:43 Alexander Farber <al...@gmail.com>
>> wrote:
>>
>>> With python 3.8.5 and avro 1.10.0 installed via pip I have tried running
>>> the following script:
>>>
>>> import os, avro
>>> from avro.datafile import DataFileReader, DataFileWriter
>>> from avro.io import DatumReader, DatumWriter
>>>
>>> reader = DataFileReader(open("48.avro", "rb"), DatumReader())
>>> for d in reader:
>>>     print(d)
>>> reader.close()
>>>
>>> Unfortunately, nothing is printed by the script.
>>>
>>> https://stackoverflow.com/questions/64105500/how-to-use-avro-python3-on-windows-10-to-parse-files
>>>
>>
> Michael, how could I debug this issue? I am an avro and python newbie
>
> Greetings from Germany
> Alex
>

Aha, sorry for reading carelessly! I think you are using a json-encoded
avro file, yes? Python avro only supports binary encoded avro at this time.
Does that help?

I will admit that I've never tried to use avro on windows. But I don't
think that is the problem here.

Re: How to use avro-python3 on Windows 10 to parse files?

Posted by Alexander Farber <al...@gmail.com>.
Hi Michael -

On Mon, Sep 28, 2020 at 8:19 PM Michael A. Smith <mi...@smith-li.com>
wrote:

> Where did you find that avro-python3 is recommended? I would like to
> update that.
>

here: https://stackoverflow.com/a/43606979/165071


> avro-python3 is deprecated. Your should use the avro library instead.
>
> If the avro library for python doesn't work, please let me know.
>

Yes, it does not work for me with python 3.8.0 - please see below

On Mon, Sep 28, 2020 at 12:43 Alexander Farber <al...@gmail.com>
> wrote:
>
>> With python 3.8.5 and avro 1.10.0 installed via pip I have tried running
>> the following script:
>>
>> import os, avro
>> from avro.datafile import DataFileReader, DataFileWriter
>> from avro.io import DatumReader, DatumWriter
>>
>> reader = DataFileReader(open("48.avro", "rb"), DatumReader())
>> for d in reader:
>>     print(d)
>> reader.close()
>>
>> Unfortunately, nothing is printed by the script.
>>
>> https://stackoverflow.com/questions/64105500/how-to-use-avro-python3-on-windows-10-to-parse-files
>>
>
Michael, how could I debug this issue? I am an avro and python newbie

Greetings from Germany
Alex

Re: How to use avro-python3 on Windows 10 to parse files?

Posted by "Michael A. Smith" <mi...@smith-li.com>.
Where did you find that avro-python3 is recommended? I would like to update
that.

avro-python3 is deprecated. Your should use the avro library instead.

If the avro library for python doesn't work, please let me know.

On Mon, Sep 28, 2020 at 12:43 Alexander Farber <al...@gmail.com>
wrote:

> Hello and good evening!
>
> With python 3.8.5 and avro 1.10.0 installed via pip I have tried running
> the following script:
>
> import os, avro
> from avro.datafile import DataFileReader, DataFileWriter
> from avro.io import DatumReader, DatumWriter
>
> reader = DataFileReader(open("48.avro", "rb"), DatumReader())
> for d in reader:
>     print(d)
> reader.close()
>
> Unfortunately, nothing is printed by the script.
>
> Then I have noticed, that the avro-python3 is recommended and have
> installed it with pip install avro-python3
>
> Now the pip list shows both packages, but how to switch to using the newer
> one?
>
> I have tried "pip uninstall avro", but then import avro fails.
>
> What is the correct module name for avro-python3, how to import it in my
> script please?
>
> I have also asked my question at Stackoverflow and there you can see my
> screenshots:
>
> https://stackoverflow.com/questions/64105500/how-to-use-avro-python3-on-windows-10-to-parse-files
>
> Thank you for any hints
> Alex
>
>
>
>
>