You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by roger peppe <ro...@gmail.com> on 2020/01/16 08:30:34 UTC

avro-tools illegal reflective access warnings

Hi,

I've been trying to use avro-tools to verify Avro implementations, and I've
come across an issue. Perhaps someone here might be able to help?

When I run avro-tools with some subcommands, it prints a bunch of warnings
(see below) to the standard output. Does anyone know a way to disable this?
I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools 1.9.1.

The warnings are somewhat annoying because they can corrupt output of tools
that print to the standard output, such as recodec.

Aside: is there any documentation for the commands in avro-tools? Some seem
to have some command-line help (though unfortunately there doesn't seem to
be a standard way of showing it), but often that help often doesn't
describe what the command actually does.

Here's the output that I see:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/rog/other/avro-tools-1.9.1.jar) to method
sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

  cheers,
    rog.

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
On Fri, 17 Jan 2020 at 11:45, roger peppe <ro...@gmail.com> wrote:

>
>
> On Fri, 17 Jan 2020 at 09:17, Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
>
>> Hi Roger,
>>
>> We also have Java11 in our CI, but it might be that there are still some
>> issues with it. I haven't battletested Avro with Java 11 at least. For
>> skipping the tests, you can provide a flag to Maven:
>>
>> # Make sure that you're in the Java project
>> cd lang/java/
>> mvn clean install -DskipTests
>>
>> Let me know if this works for you.
>>
>
> I think it got further, but still no cigar.
> https://gist.github.com/rogpeppe/330edaaeaeb7ed6530092e73952bed0a
>
> I shall try to work out how to build avro-tools alone - I have a suspicion
> that "Apache Avro Maven Service Archetype" isn't a hard requirement for
> that.
>

In fact, scratch that - it appears that it has successfully created
./tools/target/avro-tools-1.10.0-SNAPSHOT.jar which seems to be what I'm
after. Game on. Thanks a lot for your help!

>
>   cheers,
>     rog.
>
>
>> Cheers, Fokko
>>
>>
>> Op do 16 jan. 2020 om 18:48 schreef roger peppe <ro...@gmail.com>:
>>
>>> On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
>>>
>>>> Hello!  For a simple, silent log4j, I use:
>>>>
>>>> $ cat /tmp/log4j.properties
>>>> log4j.rootLogger=off
>>>>
>>>
>>> Apparently passing those flags has sorted my stdin/stderr issue as well
>>> as suppressing the warnings. I wonder what was going on there. Thanks very
>>> much!
>>>
>>>
>>>> I didn't find anything currently in the avro-tools that uses both
>>>> reader and writer schemas while deserializing data...  It should be a
>>>> pretty easy feature to add as an option to the DataFileReadTool
>>>> (a.k.a. tojson)!
>>>>
>>>> You are correct about running ./build.sh dist in the java directory --
>>>> it fails with JDK 11 (likely fixable:
>>>> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>>>
>>>> You should probably do a simple mvn clean install instead and find the
>>>> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>>>> should work with JDK11 without any problem (well-tested in the build).
>>>>
>>>
>>> I tried that (I ran it in the lang/java directory) and I still get a
>>> failure:
>>> https://gist.github.com/rogpeppe/e7f199c6fefb9c05eedad9e9841de14f
>>>
>>> Maybe there's a way to build without running the tests, perhaps? Please
>>> pardon my ignorance here.
>>>
>>>   cheers,
>>>
>>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
On Fri, 17 Jan 2020 at 09:17, Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Roger,
>
> We also have Java11 in our CI, but it might be that there are still some
> issues with it. I haven't battletested Avro with Java 11 at least. For
> skipping the tests, you can provide a flag to Maven:
>
> # Make sure that you're in the Java project
> cd lang/java/
> mvn clean install -DskipTests
>
> Let me know if this works for you.
>

I think it got further, but still no cigar.
https://gist.github.com/rogpeppe/330edaaeaeb7ed6530092e73952bed0a

I shall try to work out how to build avro-tools alone - I have a suspicion
that "Apache Avro Maven Service Archetype" isn't a hard requirement for
that.

  cheers,
    rog.


> Cheers, Fokko
>
>
> Op do 16 jan. 2020 om 18:48 schreef roger peppe <ro...@gmail.com>:
>
>> On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
>>
>>> Hello!  For a simple, silent log4j, I use:
>>>
>>> $ cat /tmp/log4j.properties
>>> log4j.rootLogger=off
>>>
>>
>> Apparently passing those flags has sorted my stdin/stderr issue as well
>> as suppressing the warnings. I wonder what was going on there. Thanks very
>> much!
>>
>>
>>> I didn't find anything currently in the avro-tools that uses both
>>> reader and writer schemas while deserializing data...  It should be a
>>> pretty easy feature to add as an option to the DataFileReadTool
>>> (a.k.a. tojson)!
>>>
>>> You are correct about running ./build.sh dist in the java directory --
>>> it fails with JDK 11 (likely fixable:
>>> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>>
>>> You should probably do a simple mvn clean install instead and find the
>>> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>>> should work with JDK11 without any problem (well-tested in the build).
>>>
>>
>> I tried that (I ran it in the lang/java directory) and I still get a
>> failure:
>> https://gist.github.com/rogpeppe/e7f199c6fefb9c05eedad9e9841de14f
>>
>> Maybe there's a way to build without running the tests, perhaps? Please
>> pardon my ignorance here.
>>
>>   cheers,
>>
>

Re: avro-tools illegal reflective access warnings

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi Roger,

We also have Java11 in our CI, but it might be that there are still some
issues with it. I haven't battletested Avro with Java 11 at least. For
skipping the tests, you can provide a flag to Maven:

# Make sure that you're in the Java project
cd lang/java/
mvn clean install -DskipTests

Let me know if this works for you.

Cheers, Fokko


Op do 16 jan. 2020 om 18:48 schreef roger peppe <ro...@gmail.com>:

> On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
>
>> Hello!  For a simple, silent log4j, I use:
>>
>> $ cat /tmp/log4j.properties
>> log4j.rootLogger=off
>>
>
> Apparently passing those flags has sorted my stdin/stderr issue as well as
> suppressing the warnings. I wonder what was going on there. Thanks very
> much!
>
>
>> I didn't find anything currently in the avro-tools that uses both
>> reader and writer schemas while deserializing data...  It should be a
>> pretty easy feature to add as an option to the DataFileReadTool
>> (a.k.a. tojson)!
>>
>> You are correct about running ./build.sh dist in the java directory --
>> it fails with JDK 11 (likely fixable:
>> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>
>> You should probably do a simple mvn clean install instead and find the
>> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>> should work with JDK11 without any problem (well-tested in the build).
>>
>
> I tried that (I ran it in the lang/java directory) and I still get a
> failure: https://gist.github.com/rogpeppe/e7f199c6fefb9c05eedad9e9841de14f
>
> Maybe there's a way to build without running the tests, perhaps? Please
> pardon my ignorance here.
>
>   cheers,
>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  For a simple, silent log4j, I use:
>
> $ cat /tmp/log4j.properties
> log4j.rootLogger=off
>

Apparently passing those flags has sorted my stdin/stderr issue as well as
suppressing the warnings. I wonder what was going on there. Thanks very
much!


> I didn't find anything currently in the avro-tools that uses both
> reader and writer schemas while deserializing data...  It should be a
> pretty easy feature to add as an option to the DataFileReadTool
> (a.k.a. tojson)!
>
> You are correct about running ./build.sh dist in the java directory --
> it fails with JDK 11 (likely fixable:
> https://issues.apache.org/jira/browse/MJAVADOC-562).
>
> You should probably do a simple mvn clean install instead and find the
> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
> should work with JDK11 without any problem (well-tested in the build).
>

I tried that (I ran it in the lang/java directory) and I still get a
failure: https://gist.github.com/rogpeppe/e7f199c6fefb9c05eedad9e9841de14f

Maybe there's a way to build without running the tests, perhaps? Please
pardon my ignorance here.

  cheers,

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
Thanks Fokko. Thanks to some very kind help from Ryan Skraba, I managed to
fix the issue. The problem was that the writer needed to be created with
the same schema in the code (the actual schemas that I was using were
fine). The resulting PR is here: https://github.com/apache/avro/pull/785

  cheers,
    rog.


On Tue, 21 Jan 2020 at 07:52, Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Sorry for the late reply Rog, been kinda busy lately.
>
> Please look into the schema evolution of Avro. Confluent has an excellent
> article on this:
> https://docs.confluent.io/current/schema-registry/avro.html
>
> Could you try again with optional fields? e.g. "type": ["null", "array"].
>
> Since the names are different, I would expect the default value (or even
> an exception). If you do a cat on the Avro file, you can see that the
> original schema is in the header of the file. The B field is not there the
> record, so the reader field is not compatible, so it won't work. I'll check
> if we can come up with a more meaningful exception.
>
> Cheers, Fokko
>
>
>
> Op vr 17 jan. 2020 om 17:02 schreef roger peppe <ro...@gmail.com>:
>
>>
>>
>> On Fri, 17 Jan 2020 at 13:35, Ryan Skraba <ry...@skraba.com> wrote:
>>
>>> Hello!  I just created a JIRA for this as an improvement :D
>>> https://issues.apache.org/jira/browse/AVRO-2689
>>>
>>> To check evolution, we'd probably want to specify the reader schema in
>>> the GenericDatumReader created here:
>>>
>>> https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L75
>>>
>>> The writer schema is automatically set when the DataFileStream is
>>> created.  If we want to set a different reader schema (than the one
>>> found in the file), it should be set by calling
>>> reader.setExpected(readerSchema) just after the DataFileStream is
>>> created.
>>>
>>
>> Ah, that's a good pointer, thanks! I was looking for an appropriate
>> constructor, but there didn't seem to be one.
>>
>>
>>>
>>> I think it's a pretty good idea -- it feels like we're seeing more
>>> questions about schema evolution these days, so that would be a neat
>>> way for a user to test (or to create reproducible scenarios for bug
>>> reports).  If you're interested, feel free to take the JIRA!  I'd be
>>> happy to help out.
>>>
>>
>> So, I've had a go at it... see
>> https://github.com/rogpeppe-contrib/avro/commit/1236e9d33207a11d557c1eb2a171972e085dfcf2
>>
>> I did the following to see if it was working ("avro" is my shell script
>> wrapper around the avro-tools jar):
>>
>> % cat schema.avsc
>> {
>>   "name": "R",
>>   "type": "record",
>>   "fields": [
>>     {
>>       "name": "A",
>>       "type": {
>>         "type": "array",
>>         "items": "int"
>>       }
>>     }
>>   ]
>> }
>> % cat schema1.avsc
>> {
>>   "name": "R",
>>   "type": "record",
>>   "fields": [
>>     {
>>       "name": "B",
>>       "type": "string",
>>       "default": "hello"
>>     }
>>   ]
>> }
>> %
>> AVRO_TOOLS_JAR=/home/rog/other/avro/lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.ja%
>> avro random --count 1 --schema-file schema.avsc x.out
>> % avro tojson x.out
>> {"A":[-890831012,1123049230,302974832]}
>> % cp schema.avsc schema1.avsc
>> % avro tojson --reader-schema-file schema1.avsc x.out
>> Exception in thread "main" java.lang.ClassCastException: class
>> org.apache.avro.util.Utf8 cannot be cast to class java.util.Collection
>> (org.apache.avro.util.Utf8 is in unnamed module of loader 'app';
>> java.util.Collection is in module java.base of loader 'bootstrap')
>> at
>> org.apache.avro.generic.GenericDatumWriter.getArraySize(GenericDatumWriter.java:258)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:228)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:136)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:206)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
>> at
>> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
>> at
>> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
>> at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:99)
>> at org.apache.avro.tool.Main.run(Main.java:66)
>> at org.apache.avro.tool.Main.main(Main.java:55)
>> %
>>
>> I am a bit clueless when it comes to interpreting that exception... sorry
>> for the ignorance - this is the first Java code I've ever written!
>> Any idea what's going on? This is maybe getting a bit too noisy for the
>> list - feel to reply directly.
>>
>>   cheers,
>>     rog.
>>
>>
>>> Ryan
>>>
>>>
>>> On Fri, Jan 17, 2020 at 2:22 PM roger peppe <ro...@gmail.com> wrote:
>>> >
>>> > On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
>>> >>
>>> >> didn't find anything currently in the avro-tools that uses both
>>> >> reader and writer schemas while deserializing data...  It should be a
>>> >> pretty easy feature to add as an option to the DataFileReadTool
>>> >> (a.k.a. tojson)!
>>> >
>>> >
>>> > Thanks for that suggestion. I've been delving into that code a bit and
>>> trying to understand what's going on.
>>> >
>>> > At the heart of it is this code:
>>> >
>>> >     GenericDatumReader<Object> reader = new GenericDatumReader<>();
>>> >     try (DataFileStream<Object> streamReader = new
>>> DataFileStream<>(inStream, reader)) {
>>> >       Schema schema = streamReader.getSchema();
>>> >       DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
>>> >       JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema,
>>> out, pretty);
>>> >
>>> > I'm trying to work out where the best place to put the specific reader
>>> schema (taken from a command line flag) might be.
>>> >
>>> > Would it be best to do it when creating the DatumReader (it looks like
>>> there might be a way to create that with a generic writer schema and a
>>> specific reader schema, although I can't quite see how to do that atm), or
>>> when creating the DatumWriter?
>>> > Or perhaps there's a better way?
>>> >
>>> > Thanks for any guidance.
>>> >
>>> >    cheers,
>>> >     rog.
>>> >>
>>> >>
>>> >> You are correct about running ./build.sh dist in the java directory --
>>> >> it fails with JDK 11 (likely fixable:
>>> >> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>> >>
>>> >> You should probably do a simple mvn clean install instead and find the
>>> >> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>>> >> should work with JDK11 without any problem (well-tested in the build).
>>> >>
>>> >> Best regards, Ryan
>>> >>
>>> >>
>>> >>
>>> >> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <ro...@gmail.com>
>>> wrote:
>>> >> >
>>> >> > Update: I tried running `build.sh dist` in `lang/java` and it
>>> failed (at least, it looks like a failure message) after downloading a load
>>> of Maven deps with the following errors:
>>> https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
>>> >> >
>>> >> > Any hints on what I should do to build the avro-tools jar?
>>> >> >
>>> >> >   cheers,
>>> >> >     rog.
>>> >> >
>>> >> > On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com>
>>> wrote:
>>> >> >>
>>> >> >>
>>> >> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
>>> >> >>>
>>> >> >>> Hello!  Is it because you are using brew to install avro-tools?
>>> I'm
>>> >> >>> not entirely familiar with how it packages the command, but using
>>> a
>>> >> >>> direct bash-like solution instead might solve this problem of
>>> mixing
>>> >> >>> stdout and stderr.  This could be the simplest (and right)
>>> solution
>>> >> >>> for piping.
>>> >> >>
>>> >> >>
>>> >> >> No, I downloaded the jar and am directly running it with "java
>>> -jar ~/other/avro-tools-1.9.1.jar".
>>> >> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
>>> package openjdk-11-jre-headless.
>>> >> >>
>>> >> >> I'm going to try compiling avro-tools myself to investigate but
>>> I'm a total Java ignoramus - wish me luck!
>>> >> >>
>>> >> >>>
>>> >> >>> alias avrotoolx='java -jar
>>> >> >>>
>>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
>>> >> >>> avrotoolx tojson x.out 2> /dev/null
>>> >> >>>
>>> >> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
>>> >> >>> warnings and logs should not be piped along with the normal
>>> content.)
>>> >> >>>
>>> >> >>> Otherwise, IIRC, there is no way to disable the first illegal
>>> >> >>> reflective access warning when running in Java 9+, but you can
>>> "fix"
>>> >> >>> these module errors, and deactivate the NativeCodeLoader logs
>>> with an
>>> >> >>> explicit log4j.properties:
>>> >> >>>
>>> >> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties
>>> --add-opens
>>> >> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>>> >> >>>
>>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>>> >> >>> tojson x.out
>>> >> >>
>>> >> >>
>>> >> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
>>> properties files though. What do I need to put in /tmp/log4j.properties to
>>> make this work?
>>> >> >>
>>> >> >>> None of that is particularly satisfactory, but it could be a
>>> >> >>> workaround for your immediate use.
>>> >> >>
>>> >> >>
>>> >> >> Yeah, not ideal, because if something goes wrong, stdout will be
>>> corrupted, but at least some noise should go away :)
>>> >> >>
>>> >> >>> I'd also like to see a more unified experience with the CLI tool
>>> for
>>> >> >>> documentation and usage.  The current state requires a bit of Avro
>>> >> >>> expertise to use, but it has some functions that would be pretty
>>> >> >>> useful for a user working with Avro data.  I raised
>>> >> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an
>>> improvement.
>>> >> >>>
>>> >> >>> In my opinion, a schema compatibility tool would be a useful and
>>> >> >>> welcome feature!
>>> >> >>
>>> >> >>
>>> >> >> That would indeed be nice, but in the meantime, is there really
>>> nothing in the avro-tools commands that uses a chosen schema to read a data
>>> file written with some other schema? That would give me what I'm after
>>> currently.
>>> >> >>
>>> >> >> Thanks again for the helpful response.
>>> >> >>
>>> >> >>    cheers,
>>> >> >>      rog.
>>> >> >>
>>> >> >>>
>>> >> >>> Best regards, Ryan
>>> >> >>>
>>> >> >>>
>>> >> >>>
>>> >> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com>
>>> wrote:
>>> >> >>> >
>>> >> >>> > Hi Fokko,
>>> >> >>> >
>>> >> >>> > Thanks for your swift response!
>>> >> >>> >
>>> >> >>> > Stdout and stderr definitely seem to be merged on this platform
>>> at least. Here's a sample:
>>> >> >>> >
>>> >> >>> > % avrotool random --count 1 --schema '"int"'  x.out
>>> >> >>> > % avrotool tojson x.out > x.json
>>> >> >>> > % cat x.json
>>> >> >>> > 125140891
>>> >> >>> > WARNING: An illegal reflective access operation has occurred
>>> >> >>> > WARNING: Illegal reflective access by
>>> org.apache.hadoop.security.authentication.util.KerberosUtil
>>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>>> sun.security.krb5.Config.getInstance()
>>> >> >>> > WARNING: Please consider reporting this to the maintainers of
>>> org.apache.hadoop.security.authentication.util.KerberosUtil
>>> >> >>> > WARNING: Use --illegal-access=warn to enable warnings of
>>> further illegal reflective access operations
>>> >> >>> > WARNING: All illegal access operations will be denied in a
>>> future release
>>> >> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> >> >>> > %
>>> >> >>> >
>>> >> >>> > I've just verified that it's not a problem with the java
>>> executable itself (I ran a program that printed to System.err and the text
>>> correctly goes to the standard error).
>>> >> >>> >
>>> >> >>> > > Regarding the documentation, the CLI itself contains info on
>>> all the available commands. Also, there are excellent online resources:
>>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>>> Is there anything specific that you're missing?
>>> >> >>> >
>>> >> >>> > There's the single line summary produced for each command by
>>> running "avro-tools" with no arguments, but that's not as much info as I'd
>>> ideally like. For example, it often doesn't say what file format is being
>>> written or read. For some commands, the purpose is not very clear.
>>> >> >>> >
>>> >> >>> > For example the description of the recodec command is "Alters
>>> the codec of a data file". It doesn't describe how it alters it or how one
>>> might configure the alteration parameters. I managed to get some usage help
>>> by passing it more than two parameters (specifying "--help" gives an
>>> exception), but that doesn't provide much more info:
>>> >> >>> >
>>> >> >>> > % avro-tools recodec a b c
>>> >> >>> > Expected at most an input file and output file.
>>> >> >>> > Option             Description
>>> >> >>> > ------             -----------
>>> >> >>> > --codec <String>   Compression codec (default: null)
>>> >> >>> > --level <Integer>  Compression level (only applies to deflate
>>> and xz) (default:
>>> >> >>> >                      -1)
>>> >> >>> >
>>> >> >>> > For the record, I'm wondering it might be possible to get
>>> avrotool to tell me if one schema is compatible with another so that I can
>>> check hypotheses about schema-checking in practice without having to write
>>> Java code.
>>> >> >>> >
>>> >> >>> >   cheers,
>>> >> >>> >     rog.
>>> >> >>> >
>>> >> >>> >
>>> >> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko
>>> <fo...@driesprong.frl> wrote:
>>> >> >>> >>
>>> >> >>> >> Hi Rog,
>>> >> >>> >>
>>> >> >>> >> This is actually a warning produced by the Hadoop library,
>>> that we're using. Please note that htis isn't part of the stdout:
>>> >> >>> >>
>>> >> >>> >> $ find /tmp/tmp
>>> >> >>> >> /tmp/tmp
>>> >> >>> >> /tmp/tmp/._SUCCESS.crc
>>> >> >>> >>
>>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>>> >> >>> >>
>>> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>>> >> >>> >> /tmp/tmp/_SUCCESS
>>> >> >>> >>
>>> >> >>> >> $ avro-tools tojson
>>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>>> >> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> >> >>> >> {"line_of_text":{"string":"Hello"}}
>>> >> >>> >> {"line_of_text":{"string":"World"}}
>>> >> >>> >>
>>> >> >>> >> $ avro-tools tojson
>>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
>>> /tmp/tmp/data.json
>>> >> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> >> >>> >>
>>> >> >>> >> $ cat /tmp/tmp/data.json
>>> >> >>> >> {"line_of_text":{"string":"Hello"}}
>>> >> >>> >> {"line_of_text":{"string":"World"}}
>>> >> >>> >>
>>> >> >>> >> So when you pipe the data, it doesn't include the warnings.
>>> >> >>> >>
>>> >> >>> >> Regarding the documentation, the CLI itself contains info on
>>> all the available commands. Also, there are excellent online resources:
>>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>>> Is there anything specific that you're missing?
>>> >> >>> >>
>>> >> >>> >> Hope this helps.
>>> >> >>> >>
>>> >> >>> >> Cheers, Fokko
>>> >> >>> >>
>>> >> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <
>>> rogpeppe@gmail.com>:
>>> >> >>> >>>
>>> >> >>> >>> Hi,
>>> >> >>> >>>
>>> >> >>> >>> I've been trying to use avro-tools to verify Avro
>>> implementations, and I've come across an issue. Perhaps someone here might
>>> be able to help?
>>> >> >>> >>>
>>> >> >>> >>> When I run avro-tools with some subcommands, it prints a
>>> bunch of warnings (see below) to the standard output. Does anyone know a
>>> way to disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and
>>> avro-tools 1.9.1.
>>> >> >>> >>>
>>> >> >>> >>> The warnings are somewhat annoying because they can corrupt
>>> output of tools that print to the standard output, such as recodec.
>>> >> >>> >>>
>>> >> >>> >>> Aside: is there any documentation for the commands in
>>> avro-tools? Some seem to have some command-line help (though unfortunately
>>> there doesn't seem to be a standard way of showing it), but often that help
>>> often doesn't describe what the command actually does.
>>> >> >>> >>>
>>> >> >>> >>> Here's the output that I see:
>>> >> >>> >>>
>>> >> >>> >>> WARNING: An illegal reflective access operation has occurred
>>> >> >>> >>> WARNING: Illegal reflective access by
>>> org.apache.hadoop.security.authentication.util.KerberosUtil
>>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>>> sun.security.krb5.Config.getInstance()
>>> >> >>> >>> WARNING: Please consider reporting this to the maintainers of
>>> org.apache.hadoop.security.authentication.util.KerberosUtil
>>> >> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of
>>> further illegal reflective access operations
>>> >> >>> >>> WARNING: All illegal access operations will be denied in a
>>> future release
>>> >> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
>>> native-hadoop library for your platform... using builtin-java classes where
>>> applicable
>>> >> >>> >>>
>>> >> >>> >>>   cheers,
>>> >> >>> >>>     rog.
>>> >> >>> >>>
>>>
>>

Re: avro-tools illegal reflective access warnings

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Sorry for the late reply Rog, been kinda busy lately.

Please look into the schema evolution of Avro. Confluent has an excellent
article on this: https://docs.confluent.io/current/schema-registry/avro.html

Could you try again with optional fields? e.g. "type": ["null", "array"].

Since the names are different, I would expect the default value (or even an
exception). If you do a cat on the Avro file, you can see that the original
schema is in the header of the file. The B field is not there the record,
so the reader field is not compatible, so it won't work. I'll check if we
can come up with a more meaningful exception.

Cheers, Fokko



Op vr 17 jan. 2020 om 17:02 schreef roger peppe <ro...@gmail.com>:

>
>
> On Fri, 17 Jan 2020 at 13:35, Ryan Skraba <ry...@skraba.com> wrote:
>
>> Hello!  I just created a JIRA for this as an improvement :D
>> https://issues.apache.org/jira/browse/AVRO-2689
>>
>> To check evolution, we'd probably want to specify the reader schema in
>> the GenericDatumReader created here:
>>
>> https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L75
>>
>> The writer schema is automatically set when the DataFileStream is
>> created.  If we want to set a different reader schema (than the one
>> found in the file), it should be set by calling
>> reader.setExpected(readerSchema) just after the DataFileStream is
>> created.
>>
>
> Ah, that's a good pointer, thanks! I was looking for an appropriate
> constructor, but there didn't seem to be one.
>
>
>>
>> I think it's a pretty good idea -- it feels like we're seeing more
>> questions about schema evolution these days, so that would be a neat
>> way for a user to test (or to create reproducible scenarios for bug
>> reports).  If you're interested, feel free to take the JIRA!  I'd be
>> happy to help out.
>>
>
> So, I've had a go at it... see
> https://github.com/rogpeppe-contrib/avro/commit/1236e9d33207a11d557c1eb2a171972e085dfcf2
>
> I did the following to see if it was working ("avro" is my shell script
> wrapper around the avro-tools jar):
>
> % cat schema.avsc
> {
>   "name": "R",
>   "type": "record",
>   "fields": [
>     {
>       "name": "A",
>       "type": {
>         "type": "array",
>         "items": "int"
>       }
>     }
>   ]
> }
> % cat schema1.avsc
> {
>   "name": "R",
>   "type": "record",
>   "fields": [
>     {
>       "name": "B",
>       "type": "string",
>       "default": "hello"
>     }
>   ]
> }
> %
> AVRO_TOOLS_JAR=/home/rog/other/avro/lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.ja%
> avro random --count 1 --schema-file schema.avsc x.out
> % avro tojson x.out
> {"A":[-890831012,1123049230,302974832]}
> % cp schema.avsc schema1.avsc
> % avro tojson --reader-schema-file schema1.avsc x.out
> Exception in thread "main" java.lang.ClassCastException: class
> org.apache.avro.util.Utf8 cannot be cast to class java.util.Collection
> (org.apache.avro.util.Utf8 is in unnamed module of loader 'app';
> java.util.Collection is in module java.base of loader 'bootstrap')
> at
> org.apache.avro.generic.GenericDatumWriter.getArraySize(GenericDatumWriter.java:258)
> at
> org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:228)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:136)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
> at
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:206)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
> at
> org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
> at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:99)
> at org.apache.avro.tool.Main.run(Main.java:66)
> at org.apache.avro.tool.Main.main(Main.java:55)
> %
>
> I am a bit clueless when it comes to interpreting that exception... sorry
> for the ignorance - this is the first Java code I've ever written!
> Any idea what's going on? This is maybe getting a bit too noisy for the
> list - feel to reply directly.
>
>   cheers,
>     rog.
>
>
>> Ryan
>>
>>
>> On Fri, Jan 17, 2020 at 2:22 PM roger peppe <ro...@gmail.com> wrote:
>> >
>> > On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
>> >>
>> >> didn't find anything currently in the avro-tools that uses both
>> >> reader and writer schemas while deserializing data...  It should be a
>> >> pretty easy feature to add as an option to the DataFileReadTool
>> >> (a.k.a. tojson)!
>> >
>> >
>> > Thanks for that suggestion. I've been delving into that code a bit and
>> trying to understand what's going on.
>> >
>> > At the heart of it is this code:
>> >
>> >     GenericDatumReader<Object> reader = new GenericDatumReader<>();
>> >     try (DataFileStream<Object> streamReader = new
>> DataFileStream<>(inStream, reader)) {
>> >       Schema schema = streamReader.getSchema();
>> >       DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
>> >       JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema,
>> out, pretty);
>> >
>> > I'm trying to work out where the best place to put the specific reader
>> schema (taken from a command line flag) might be.
>> >
>> > Would it be best to do it when creating the DatumReader (it looks like
>> there might be a way to create that with a generic writer schema and a
>> specific reader schema, although I can't quite see how to do that atm), or
>> when creating the DatumWriter?
>> > Or perhaps there's a better way?
>> >
>> > Thanks for any guidance.
>> >
>> >    cheers,
>> >     rog.
>> >>
>> >>
>> >> You are correct about running ./build.sh dist in the java directory --
>> >> it fails with JDK 11 (likely fixable:
>> >> https://issues.apache.org/jira/browse/MJAVADOC-562).
>> >>
>> >> You should probably do a simple mvn clean install instead and find the
>> >> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>> >> should work with JDK11 without any problem (well-tested in the build).
>> >>
>> >> Best regards, Ryan
>> >>
>> >>
>> >>
>> >> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <ro...@gmail.com>
>> wrote:
>> >> >
>> >> > Update: I tried running `build.sh dist` in `lang/java` and it failed
>> (at least, it looks like a failure message) after downloading a load of
>> Maven deps with the following errors:
>> https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
>> >> >
>> >> > Any hints on what I should do to build the avro-tools jar?
>> >> >
>> >> >   cheers,
>> >> >     rog.
>> >> >
>> >> > On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com>
>> wrote:
>> >> >>
>> >> >>
>> >> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
>> >> >>>
>> >> >>> Hello!  Is it because you are using brew to install avro-tools?
>> I'm
>> >> >>> not entirely familiar with how it packages the command, but using a
>> >> >>> direct bash-like solution instead might solve this problem of
>> mixing
>> >> >>> stdout and stderr.  This could be the simplest (and right) solution
>> >> >>> for piping.
>> >> >>
>> >> >>
>> >> >> No, I downloaded the jar and am directly running it with "java -jar
>> ~/other/avro-tools-1.9.1.jar".
>> >> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
>> package openjdk-11-jre-headless.
>> >> >>
>> >> >> I'm going to try compiling avro-tools myself to investigate but I'm
>> a total Java ignoramus - wish me luck!
>> >> >>
>> >> >>>
>> >> >>> alias avrotoolx='java -jar
>> >> >>>
>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
>> >> >>> avrotoolx tojson x.out 2> /dev/null
>> >> >>>
>> >> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
>> >> >>> warnings and logs should not be piped along with the normal
>> content.)
>> >> >>>
>> >> >>> Otherwise, IIRC, there is no way to disable the first illegal
>> >> >>> reflective access warning when running in Java 9+, but you can
>> "fix"
>> >> >>> these module errors, and deactivate the NativeCodeLoader logs with
>> an
>> >> >>> explicit log4j.properties:
>> >> >>>
>> >> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
>> >> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>> >> >>>
>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>> >> >>> tojson x.out
>> >> >>
>> >> >>
>> >> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
>> properties files though. What do I need to put in /tmp/log4j.properties to
>> make this work?
>> >> >>
>> >> >>> None of that is particularly satisfactory, but it could be a
>> >> >>> workaround for your immediate use.
>> >> >>
>> >> >>
>> >> >> Yeah, not ideal, because if something goes wrong, stdout will be
>> corrupted, but at least some noise should go away :)
>> >> >>
>> >> >>> I'd also like to see a more unified experience with the CLI tool
>> for
>> >> >>> documentation and usage.  The current state requires a bit of Avro
>> >> >>> expertise to use, but it has some functions that would be pretty
>> >> >>> useful for a user working with Avro data.  I raised
>> >> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>> >> >>>
>> >> >>> In my opinion, a schema compatibility tool would be a useful and
>> >> >>> welcome feature!
>> >> >>
>> >> >>
>> >> >> That would indeed be nice, but in the meantime, is there really
>> nothing in the avro-tools commands that uses a chosen schema to read a data
>> file written with some other schema? That would give me what I'm after
>> currently.
>> >> >>
>> >> >> Thanks again for the helpful response.
>> >> >>
>> >> >>    cheers,
>> >> >>      rog.
>> >> >>
>> >> >>>
>> >> >>> Best regards, Ryan
>> >> >>>
>> >> >>>
>> >> >>>
>> >> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com>
>> wrote:
>> >> >>> >
>> >> >>> > Hi Fokko,
>> >> >>> >
>> >> >>> > Thanks for your swift response!
>> >> >>> >
>> >> >>> > Stdout and stderr definitely seem to be merged on this platform
>> at least. Here's a sample:
>> >> >>> >
>> >> >>> > % avrotool random --count 1 --schema '"int"'  x.out
>> >> >>> > % avrotool tojson x.out > x.json
>> >> >>> > % cat x.json
>> >> >>> > 125140891
>> >> >>> > WARNING: An illegal reflective access operation has occurred
>> >> >>> > WARNING: Illegal reflective access by
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>> sun.security.krb5.Config.getInstance()
>> >> >>> > WARNING: Please consider reporting this to the maintainers of
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> >> >>> > WARNING: Use --illegal-access=warn to enable warnings of further
>> illegal reflective access operations
>> >> >>> > WARNING: All illegal access operations will be denied in a
>> future release
>> >> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >> >>> > %
>> >> >>> >
>> >> >>> > I've just verified that it's not a problem with the java
>> executable itself (I ran a program that printed to System.err and the text
>> correctly goes to the standard error).
>> >> >>> >
>> >> >>> > > Regarding the documentation, the CLI itself contains info on
>> all the available commands. Also, there are excellent online resources:
>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> Is there anything specific that you're missing?
>> >> >>> >
>> >> >>> > There's the single line summary produced for each command by
>> running "avro-tools" with no arguments, but that's not as much info as I'd
>> ideally like. For example, it often doesn't say what file format is being
>> written or read. For some commands, the purpose is not very clear.
>> >> >>> >
>> >> >>> > For example the description of the recodec command is "Alters
>> the codec of a data file". It doesn't describe how it alters it or how one
>> might configure the alteration parameters. I managed to get some usage help
>> by passing it more than two parameters (specifying "--help" gives an
>> exception), but that doesn't provide much more info:
>> >> >>> >
>> >> >>> > % avro-tools recodec a b c
>> >> >>> > Expected at most an input file and output file.
>> >> >>> > Option             Description
>> >> >>> > ------             -----------
>> >> >>> > --codec <String>   Compression codec (default: null)
>> >> >>> > --level <Integer>  Compression level (only applies to deflate
>> and xz) (default:
>> >> >>> >                      -1)
>> >> >>> >
>> >> >>> > For the record, I'm wondering it might be possible to get
>> avrotool to tell me if one schema is compatible with another so that I can
>> check hypotheses about schema-checking in practice without having to write
>> Java code.
>> >> >>> >
>> >> >>> >   cheers,
>> >> >>> >     rog.
>> >> >>> >
>> >> >>> >
>> >> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko
>> <fo...@driesprong.frl> wrote:
>> >> >>> >>
>> >> >>> >> Hi Rog,
>> >> >>> >>
>> >> >>> >> This is actually a warning produced by the Hadoop library, that
>> we're using. Please note that htis isn't part of the stdout:
>> >> >>> >>
>> >> >>> >> $ find /tmp/tmp
>> >> >>> >> /tmp/tmp
>> >> >>> >> /tmp/tmp/._SUCCESS.crc
>> >> >>> >>
>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >> >>> >>
>> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>> >> >>> >> /tmp/tmp/_SUCCESS
>> >> >>> >>
>> >> >>> >> $ avro-tools tojson
>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >> >>> >> {"line_of_text":{"string":"Hello"}}
>> >> >>> >> {"line_of_text":{"string":"World"}}
>> >> >>> >>
>> >> >>> >> $ avro-tools tojson
>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
>> /tmp/tmp/data.json
>> >> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >> >>> >>
>> >> >>> >> $ cat /tmp/tmp/data.json
>> >> >>> >> {"line_of_text":{"string":"Hello"}}
>> >> >>> >> {"line_of_text":{"string":"World"}}
>> >> >>> >>
>> >> >>> >> So when you pipe the data, it doesn't include the warnings.
>> >> >>> >>
>> >> >>> >> Regarding the documentation, the CLI itself contains info on
>> all the available commands. Also, there are excellent online resources:
>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> Is there anything specific that you're missing?
>> >> >>> >>
>> >> >>> >> Hope this helps.
>> >> >>> >>
>> >> >>> >> Cheers, Fokko
>> >> >>> >>
>> >> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <
>> rogpeppe@gmail.com>:
>> >> >>> >>>
>> >> >>> >>> Hi,
>> >> >>> >>>
>> >> >>> >>> I've been trying to use avro-tools to verify Avro
>> implementations, and I've come across an issue. Perhaps someone here might
>> be able to help?
>> >> >>> >>>
>> >> >>> >>> When I run avro-tools with some subcommands, it prints a bunch
>> of warnings (see below) to the standard output. Does anyone know a way to
>> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
>> 1.9.1.
>> >> >>> >>>
>> >> >>> >>> The warnings are somewhat annoying because they can corrupt
>> output of tools that print to the standard output, such as recodec.
>> >> >>> >>>
>> >> >>> >>> Aside: is there any documentation for the commands in
>> avro-tools? Some seem to have some command-line help (though unfortunately
>> there doesn't seem to be a standard way of showing it), but often that help
>> often doesn't describe what the command actually does.
>> >> >>> >>>
>> >> >>> >>> Here's the output that I see:
>> >> >>> >>>
>> >> >>> >>> WARNING: An illegal reflective access operation has occurred
>> >> >>> >>> WARNING: Illegal reflective access by
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>> sun.security.krb5.Config.getInstance()
>> >> >>> >>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> >> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of
>> further illegal reflective access operations
>> >> >>> >>> WARNING: All illegal access operations will be denied in a
>> future release
>> >> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >> >>> >>>
>> >> >>> >>>   cheers,
>> >> >>> >>>     rog.
>> >> >>> >>>
>>
>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
On Fri, 17 Jan 2020 at 13:35, Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  I just created a JIRA for this as an improvement :D
> https://issues.apache.org/jira/browse/AVRO-2689
>
> To check evolution, we'd probably want to specify the reader schema in
> the GenericDatumReader created here:
>
> https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L75
>
> The writer schema is automatically set when the DataFileStream is
> created.  If we want to set a different reader schema (than the one
> found in the file), it should be set by calling
> reader.setExpected(readerSchema) just after the DataFileStream is
> created.
>

Ah, that's a good pointer, thanks! I was looking for an appropriate
constructor, but there didn't seem to be one.


>
> I think it's a pretty good idea -- it feels like we're seeing more
> questions about schema evolution these days, so that would be a neat
> way for a user to test (or to create reproducible scenarios for bug
> reports).  If you're interested, feel free to take the JIRA!  I'd be
> happy to help out.
>

So, I've had a go at it... see
https://github.com/rogpeppe-contrib/avro/commit/1236e9d33207a11d557c1eb2a171972e085dfcf2

I did the following to see if it was working ("avro" is my shell script
wrapper around the avro-tools jar):

% cat schema.avsc
{
  "name": "R",
  "type": "record",
  "fields": [
    {
      "name": "A",
      "type": {
        "type": "array",
        "items": "int"
      }
    }
  ]
}
% cat schema1.avsc
{
  "name": "R",
  "type": "record",
  "fields": [
    {
      "name": "B",
      "type": "string",
      "default": "hello"
    }
  ]
}
%
AVRO_TOOLS_JAR=/home/rog/other/avro/lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.ja%
avro random --count 1 --schema-file schema.avsc x.out
% avro tojson x.out
{"A":[-890831012,1123049230,302974832]}
% cp schema.avsc schema1.avsc
% avro tojson --reader-schema-file schema1.avsc x.out
Exception in thread "main" java.lang.ClassCastException: class
org.apache.avro.util.Utf8 cannot be cast to class java.util.Collection
(org.apache.avro.util.Utf8 is in unnamed module of loader 'app';
java.util.Collection is in module java.base of loader 'bootstrap')
at
org.apache.avro.generic.GenericDatumWriter.getArraySize(GenericDatumWriter.java:258)
at
org.apache.avro.generic.GenericDatumWriter.writeArray(GenericDatumWriter.java:228)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:136)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:206)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:195)
at
org.apache.avro.generic.GenericDatumWriter.writeWithoutConversion(GenericDatumWriter.java:130)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:82)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:72)
at org.apache.avro.tool.DataFileReadTool.run(DataFileReadTool.java:99)
at org.apache.avro.tool.Main.run(Main.java:66)
at org.apache.avro.tool.Main.main(Main.java:55)
%

I am a bit clueless when it comes to interpreting that exception... sorry
for the ignorance - this is the first Java code I've ever written!
Any idea what's going on? This is maybe getting a bit too noisy for the
list - feel to reply directly.

  cheers,
    rog.


> Ryan
>
>
> On Fri, Jan 17, 2020 at 2:22 PM roger peppe <ro...@gmail.com> wrote:
> >
> > On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
> >>
> >> didn't find anything currently in the avro-tools that uses both
> >> reader and writer schemas while deserializing data...  It should be a
> >> pretty easy feature to add as an option to the DataFileReadTool
> >> (a.k.a. tojson)!
> >
> >
> > Thanks for that suggestion. I've been delving into that code a bit and
> trying to understand what's going on.
> >
> > At the heart of it is this code:
> >
> >     GenericDatumReader<Object> reader = new GenericDatumReader<>();
> >     try (DataFileStream<Object> streamReader = new
> DataFileStream<>(inStream, reader)) {
> >       Schema schema = streamReader.getSchema();
> >       DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
> >       JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema,
> out, pretty);
> >
> > I'm trying to work out where the best place to put the specific reader
> schema (taken from a command line flag) might be.
> >
> > Would it be best to do it when creating the DatumReader (it looks like
> there might be a way to create that with a generic writer schema and a
> specific reader schema, although I can't quite see how to do that atm), or
> when creating the DatumWriter?
> > Or perhaps there's a better way?
> >
> > Thanks for any guidance.
> >
> >    cheers,
> >     rog.
> >>
> >>
> >> You are correct about running ./build.sh dist in the java directory --
> >> it fails with JDK 11 (likely fixable:
> >> https://issues.apache.org/jira/browse/MJAVADOC-562).
> >>
> >> You should probably do a simple mvn clean install instead and find the
> >> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
> >> should work with JDK11 without any problem (well-tested in the build).
> >>
> >> Best regards, Ryan
> >>
> >>
> >>
> >> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <ro...@gmail.com> wrote:
> >> >
> >> > Update: I tried running `build.sh dist` in `lang/java` and it failed
> (at least, it looks like a failure message) after downloading a load of
> Maven deps with the following errors:
> https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
> >> >
> >> > Any hints on what I should do to build the avro-tools jar?
> >> >
> >> >   cheers,
> >> >     rog.
> >> >
> >> > On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com> wrote:
> >> >>
> >> >>
> >> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
> >> >>>
> >> >>> Hello!  Is it because you are using brew to install avro-tools?  I'm
> >> >>> not entirely familiar with how it packages the command, but using a
> >> >>> direct bash-like solution instead might solve this problem of mixing
> >> >>> stdout and stderr.  This could be the simplest (and right) solution
> >> >>> for piping.
> >> >>
> >> >>
> >> >> No, I downloaded the jar and am directly running it with "java -jar
> ~/other/avro-tools-1.9.1.jar".
> >> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
> package openjdk-11-jre-headless.
> >> >>
> >> >> I'm going to try compiling avro-tools myself to investigate but I'm
> a total Java ignoramus - wish me luck!
> >> >>
> >> >>>
> >> >>> alias avrotoolx='java -jar
> >> >>>
> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
> >> >>> avrotoolx tojson x.out 2> /dev/null
> >> >>>
> >> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
> >> >>> warnings and logs should not be piped along with the normal
> content.)
> >> >>>
> >> >>> Otherwise, IIRC, there is no way to disable the first illegal
> >> >>> reflective access warning when running in Java 9+, but you can "fix"
> >> >>> these module errors, and deactivate the NativeCodeLoader logs with
> an
> >> >>> explicit log4j.properties:
> >> >>>
> >> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
> >> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
> >> >>>
> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
> >> >>> tojson x.out
> >> >>
> >> >>
> >> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
> properties files though. What do I need to put in /tmp/log4j.properties to
> make this work?
> >> >>
> >> >>> None of that is particularly satisfactory, but it could be a
> >> >>> workaround for your immediate use.
> >> >>
> >> >>
> >> >> Yeah, not ideal, because if something goes wrong, stdout will be
> corrupted, but at least some noise should go away :)
> >> >>
> >> >>> I'd also like to see a more unified experience with the CLI tool for
> >> >>> documentation and usage.  The current state requires a bit of Avro
> >> >>> expertise to use, but it has some functions that would be pretty
> >> >>> useful for a user working with Avro data.  I raised
> >> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
> >> >>>
> >> >>> In my opinion, a schema compatibility tool would be a useful and
> >> >>> welcome feature!
> >> >>
> >> >>
> >> >> That would indeed be nice, but in the meantime, is there really
> nothing in the avro-tools commands that uses a chosen schema to read a data
> file written with some other schema? That would give me what I'm after
> currently.
> >> >>
> >> >> Thanks again for the helpful response.
> >> >>
> >> >>    cheers,
> >> >>      rog.
> >> >>
> >> >>>
> >> >>> Best regards, Ryan
> >> >>>
> >> >>>
> >> >>>
> >> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com>
> wrote:
> >> >>> >
> >> >>> > Hi Fokko,
> >> >>> >
> >> >>> > Thanks for your swift response!
> >> >>> >
> >> >>> > Stdout and stderr definitely seem to be merged on this platform
> at least. Here's a sample:
> >> >>> >
> >> >>> > % avrotool random --count 1 --schema '"int"'  x.out
> >> >>> > % avrotool tojson x.out > x.json
> >> >>> > % cat x.json
> >> >>> > 125140891
> >> >>> > WARNING: An illegal reflective access operation has occurred
> >> >>> > WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >> >>> > WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >> >>> > WARNING: Use --illegal-access=warn to enable warnings of further
> illegal reflective access operations
> >> >>> > WARNING: All illegal access operations will be denied in a future
> release
> >> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >> >>> > %
> >> >>> >
> >> >>> > I've just verified that it's not a problem with the java
> executable itself (I ran a program that printed to System.err and the text
> correctly goes to the standard error).
> >> >>> >
> >> >>> > > Regarding the documentation, the CLI itself contains info on
> all the available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >> >>> >
> >> >>> > There's the single line summary produced for each command by
> running "avro-tools" with no arguments, but that's not as much info as I'd
> ideally like. For example, it often doesn't say what file format is being
> written or read. For some commands, the purpose is not very clear.
> >> >>> >
> >> >>> > For example the description of the recodec command is "Alters the
> codec of a data file". It doesn't describe how it alters it or how one
> might configure the alteration parameters. I managed to get some usage help
> by passing it more than two parameters (specifying "--help" gives an
> exception), but that doesn't provide much more info:
> >> >>> >
> >> >>> > % avro-tools recodec a b c
> >> >>> > Expected at most an input file and output file.
> >> >>> > Option             Description
> >> >>> > ------             -----------
> >> >>> > --codec <String>   Compression codec (default: null)
> >> >>> > --level <Integer>  Compression level (only applies to deflate and
> xz) (default:
> >> >>> >                      -1)
> >> >>> >
> >> >>> > For the record, I'm wondering it might be possible to get
> avrotool to tell me if one schema is compatible with another so that I can
> check hypotheses about schema-checking in practice without having to write
> Java code.
> >> >>> >
> >> >>> >   cheers,
> >> >>> >     rog.
> >> >>> >
> >> >>> >
> >> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko
> <fo...@driesprong.frl> wrote:
> >> >>> >>
> >> >>> >> Hi Rog,
> >> >>> >>
> >> >>> >> This is actually a warning produced by the Hadoop library, that
> we're using. Please note that htis isn't part of the stdout:
> >> >>> >>
> >> >>> >> $ find /tmp/tmp
> >> >>> >> /tmp/tmp
> >> >>> >> /tmp/tmp/._SUCCESS.crc
> >> >>> >>
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >> >>> >>
> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
> >> >>> >> /tmp/tmp/_SUCCESS
> >> >>> >>
> >> >>> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >> >>> >> {"line_of_text":{"string":"Hello"}}
> >> >>> >> {"line_of_text":{"string":"World"}}
> >> >>> >>
> >> >>> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
> /tmp/tmp/data.json
> >> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >> >>> >>
> >> >>> >> $ cat /tmp/tmp/data.json
> >> >>> >> {"line_of_text":{"string":"Hello"}}
> >> >>> >> {"line_of_text":{"string":"World"}}
> >> >>> >>
> >> >>> >> So when you pipe the data, it doesn't include the warnings.
> >> >>> >>
> >> >>> >> Regarding the documentation, the CLI itself contains info on all
> the available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >> >>> >>
> >> >>> >> Hope this helps.
> >> >>> >>
> >> >>> >> Cheers, Fokko
> >> >>> >>
> >> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <
> rogpeppe@gmail.com>:
> >> >>> >>>
> >> >>> >>> Hi,
> >> >>> >>>
> >> >>> >>> I've been trying to use avro-tools to verify Avro
> implementations, and I've come across an issue. Perhaps someone here might
> be able to help?
> >> >>> >>>
> >> >>> >>> When I run avro-tools with some subcommands, it prints a bunch
> of warnings (see below) to the standard output. Does anyone know a way to
> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
> 1.9.1.
> >> >>> >>>
> >> >>> >>> The warnings are somewhat annoying because they can corrupt
> output of tools that print to the standard output, such as recodec.
> >> >>> >>>
> >> >>> >>> Aside: is there any documentation for the commands in
> avro-tools? Some seem to have some command-line help (though unfortunately
> there doesn't seem to be a standard way of showing it), but often that help
> often doesn't describe what the command actually does.
> >> >>> >>>
> >> >>> >>> Here's the output that I see:
> >> >>> >>>
> >> >>> >>> WARNING: An illegal reflective access operation has occurred
> >> >>> >>> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >> >>> >>> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of
> further illegal reflective access operations
> >> >>> >>> WARNING: All illegal access operations will be denied in a
> future release
> >> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >> >>> >>>
> >> >>> >>>   cheers,
> >> >>> >>>     rog.
> >> >>> >>>
>

Re: avro-tools illegal reflective access warnings

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  I just created a JIRA for this as an improvement :D
https://issues.apache.org/jira/browse/AVRO-2689

To check evolution, we'd probably want to specify the reader schema in
the GenericDatumReader created here:
https://github.com/apache/avro/blob/master/lang/java/tools/src/main/java/org/apache/avro/tool/DataFileReadTool.java#L75

The writer schema is automatically set when the DataFileStream is
created.  If we want to set a different reader schema (than the one
found in the file), it should be set by calling
reader.setExpected(readerSchema) just after the DataFileStream is
created.

I think it's a pretty good idea -- it feels like we're seeing more
questions about schema evolution these days, so that would be a neat
way for a user to test (or to create reproducible scenarios for bug
reports).  If you're interested, feel free to take the JIRA!  I'd be
happy to help out.

Ryan


On Fri, Jan 17, 2020 at 2:22 PM roger peppe <ro...@gmail.com> wrote:
>
> On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:
>>
>> didn't find anything currently in the avro-tools that uses both
>> reader and writer schemas while deserializing data...  It should be a
>> pretty easy feature to add as an option to the DataFileReadTool
>> (a.k.a. tojson)!
>
>
> Thanks for that suggestion. I've been delving into that code a bit and trying to understand what's going on.
>
> At the heart of it is this code:
>
>     GenericDatumReader<Object> reader = new GenericDatumReader<>();
>     try (DataFileStream<Object> streamReader = new DataFileStream<>(inStream, reader)) {
>       Schema schema = streamReader.getSchema();
>       DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
>       JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema, out, pretty);
>
> I'm trying to work out where the best place to put the specific reader schema (taken from a command line flag) might be.
>
> Would it be best to do it when creating the DatumReader (it looks like there might be a way to create that with a generic writer schema and a specific reader schema, although I can't quite see how to do that atm), or when creating the DatumWriter?
> Or perhaps there's a better way?
>
> Thanks for any guidance.
>
>    cheers,
>     rog.
>>
>>
>> You are correct about running ./build.sh dist in the java directory --
>> it fails with JDK 11 (likely fixable:
>> https://issues.apache.org/jira/browse/MJAVADOC-562).
>>
>> You should probably do a simple mvn clean install instead and find the
>> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
>> should work with JDK11 without any problem (well-tested in the build).
>>
>> Best regards, Ryan
>>
>>
>>
>> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <ro...@gmail.com> wrote:
>> >
>> > Update: I tried running `build.sh dist` in `lang/java` and it failed (at least, it looks like a failure message) after downloading a load of Maven deps with the following errors: https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
>> >
>> > Any hints on what I should do to build the avro-tools jar?
>> >
>> >   cheers,
>> >     rog.
>> >
>> > On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com> wrote:
>> >>
>> >>
>> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
>> >>>
>> >>> Hello!  Is it because you are using brew to install avro-tools?  I'm
>> >>> not entirely familiar with how it packages the command, but using a
>> >>> direct bash-like solution instead might solve this problem of mixing
>> >>> stdout and stderr.  This could be the simplest (and right) solution
>> >>> for piping.
>> >>
>> >>
>> >> No, I downloaded the jar and am directly running it with "java -jar ~/other/avro-tools-1.9.1.jar".
>> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian package openjdk-11-jre-headless.
>> >>
>> >> I'm going to try compiling avro-tools myself to investigate but I'm a total Java ignoramus - wish me luck!
>> >>
>> >>>
>> >>> alias avrotoolx='java -jar
>> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
>> >>> avrotoolx tojson x.out 2> /dev/null
>> >>>
>> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
>> >>> warnings and logs should not be piped along with the normal content.)
>> >>>
>> >>> Otherwise, IIRC, there is no way to disable the first illegal
>> >>> reflective access warning when running in Java 9+, but you can "fix"
>> >>> these module errors, and deactivate the NativeCodeLoader logs with an
>> >>> explicit log4j.properties:
>> >>>
>> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
>> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>> >>> tojson x.out
>> >>
>> >>
>> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j properties files though. What do I need to put in /tmp/log4j.properties to make this work?
>> >>
>> >>> None of that is particularly satisfactory, but it could be a
>> >>> workaround for your immediate use.
>> >>
>> >>
>> >> Yeah, not ideal, because if something goes wrong, stdout will be corrupted, but at least some noise should go away :)
>> >>
>> >>> I'd also like to see a more unified experience with the CLI tool for
>> >>> documentation and usage.  The current state requires a bit of Avro
>> >>> expertise to use, but it has some functions that would be pretty
>> >>> useful for a user working with Avro data.  I raised
>> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>> >>>
>> >>> In my opinion, a schema compatibility tool would be a useful and
>> >>> welcome feature!
>> >>
>> >>
>> >> That would indeed be nice, but in the meantime, is there really nothing in the avro-tools commands that uses a chosen schema to read a data file written with some other schema? That would give me what I'm after currently.
>> >>
>> >> Thanks again for the helpful response.
>> >>
>> >>    cheers,
>> >>      rog.
>> >>
>> >>>
>> >>> Best regards, Ryan
>> >>>
>> >>>
>> >>>
>> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com> wrote:
>> >>> >
>> >>> > Hi Fokko,
>> >>> >
>> >>> > Thanks for your swift response!
>> >>> >
>> >>> > Stdout and stderr definitely seem to be merged on this platform at least. Here's a sample:
>> >>> >
>> >>> > % avrotool random --count 1 --schema '"int"'  x.out
>> >>> > % avrotool tojson x.out > x.json
>> >>> > % cat x.json
>> >>> > 125140891
>> >>> > WARNING: An illegal reflective access operation has occurred
>> >>> > WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/rog/other/avro-tools-1.9.1.jar) to method sun.security.krb5.Config.getInstance()
>> >>> > WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
>> >>> > WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
>> >>> > WARNING: All illegal access operations will be denied in a future release
>> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> >>> > %
>> >>> >
>> >>> > I've just verified that it's not a problem with the java executable itself (I ran a program that printed to System.err and the text correctly goes to the standard error).
>> >>> >
>> >>> > > Regarding the documentation, the CLI itself contains info on all the available commands. Also, there are excellent online resources: https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is there anything specific that you're missing?
>> >>> >
>> >>> > There's the single line summary produced for each command by running "avro-tools" with no arguments, but that's not as much info as I'd ideally like. For example, it often doesn't say what file format is being written or read. For some commands, the purpose is not very clear.
>> >>> >
>> >>> > For example the description of the recodec command is "Alters the codec of a data file". It doesn't describe how it alters it or how one might configure the alteration parameters. I managed to get some usage help by passing it more than two parameters (specifying "--help" gives an exception), but that doesn't provide much more info:
>> >>> >
>> >>> > % avro-tools recodec a b c
>> >>> > Expected at most an input file and output file.
>> >>> > Option             Description
>> >>> > ------             -----------
>> >>> > --codec <String>   Compression codec (default: null)
>> >>> > --level <Integer>  Compression level (only applies to deflate and xz) (default:
>> >>> >                      -1)
>> >>> >
>> >>> > For the record, I'm wondering it might be possible to get avrotool to tell me if one schema is compatible with another so that I can check hypotheses about schema-checking in practice without having to write Java code.
>> >>> >
>> >>> >   cheers,
>> >>> >     rog.
>> >>> >
>> >>> >
>> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl> wrote:
>> >>> >>
>> >>> >> Hi Rog,
>> >>> >>
>> >>> >> This is actually a warning produced by the Hadoop library, that we're using. Please note that htis isn't part of the stdout:
>> >>> >>
>> >>> >> $ find /tmp/tmp
>> >>> >> /tmp/tmp
>> >>> >> /tmp/tmp/._SUCCESS.crc
>> >>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >>> >> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>> >>> >> /tmp/tmp/_SUCCESS
>> >>> >>
>> >>> >> $ avro-tools tojson /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> >>> >> {"line_of_text":{"string":"Hello"}}
>> >>> >> {"line_of_text":{"string":"World"}}
>> >>> >>
>> >>> >> $ avro-tools tojson /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro > /tmp/tmp/data.json
>> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> >>> >>
>> >>> >> $ cat /tmp/tmp/data.json
>> >>> >> {"line_of_text":{"string":"Hello"}}
>> >>> >> {"line_of_text":{"string":"World"}}
>> >>> >>
>> >>> >> So when you pipe the data, it doesn't include the warnings.
>> >>> >>
>> >>> >> Regarding the documentation, the CLI itself contains info on all the available commands. Also, there are excellent online resources: https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is there anything specific that you're missing?
>> >>> >>
>> >>> >> Hope this helps.
>> >>> >>
>> >>> >> Cheers, Fokko
>> >>> >>
>> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:
>> >>> >>>
>> >>> >>> Hi,
>> >>> >>>
>> >>> >>> I've been trying to use avro-tools to verify Avro implementations, and I've come across an issue. Perhaps someone here might be able to help?
>> >>> >>>
>> >>> >>> When I run avro-tools with some subcommands, it prints a bunch of warnings (see below) to the standard output. Does anyone know a way to disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools 1.9.1.
>> >>> >>>
>> >>> >>> The warnings are somewhat annoying because they can corrupt output of tools that print to the standard output, such as recodec.
>> >>> >>>
>> >>> >>> Aside: is there any documentation for the commands in avro-tools? Some seem to have some command-line help (though unfortunately there doesn't seem to be a standard way of showing it), but often that help often doesn't describe what the command actually does.
>> >>> >>>
>> >>> >>> Here's the output that I see:
>> >>> >>>
>> >>> >>> WARNING: An illegal reflective access operation has occurred
>> >>> >>> WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/rog/other/avro-tools-1.9.1.jar) to method sun.security.krb5.Config.getInstance()
>> >>> >>> WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
>> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
>> >>> >>> WARNING: All illegal access operations will be denied in a future release
>> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> >>> >>>
>> >>> >>>   cheers,
>> >>> >>>     rog.
>> >>> >>>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
On Thu, 16 Jan 2020 at 17:21, Ryan Skraba <ry...@skraba.com> wrote:

> didn't find anything currently in the avro-tools that uses both
> reader and writer schemas while deserializing data...  It should be a
> pretty easy feature to add as an option to the DataFileReadTool
> (a.k.a. tojson)!
>

Thanks for that suggestion. I've been delving into that code a bit and
trying to understand what's going on.

At the heart of it is this code:

    GenericDatumReader<Object> reader = new GenericDatumReader<>();
    try (DataFileStream<Object> streamReader = new
DataFileStream<>(inStream, reader)) {
      Schema schema = streamReader.getSchema();
      DatumWriter<Object> writer = new GenericDatumWriter<>(schema);
      JsonEncoder encoder = EncoderFactory.get().jsonEncoder(schema, out,
pretty);

I'm trying to work out where the best place to put the specific reader
schema (taken from a command line flag) might be.

Would it be best to do it when creating the DatumReader (it looks like
there might be a way to create that with a generic writer schema and a
specific reader schema, although I can't quite see how to do that atm), or
when creating the DatumWriter?
Or perhaps there's a better way?

Thanks for any guidance.

   cheers,
    rog.

>
> You are correct about running ./build.sh dist in the java directory --
> it fails with JDK 11 (likely fixable:
> https://issues.apache.org/jira/browse/MJAVADOC-562).
>
> You should probably do a simple mvn clean install instead and find the
> jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
> should work with JDK11 without any problem (well-tested in the build).
>
> Best regards, Ryan
>
>
>
> On Thu, Jan 16, 2020 at 5:49 PM roger peppe <ro...@gmail.com> wrote:
> >
> > Update: I tried running `build.sh dist` in `lang/java` and it failed (at
> least, it looks like a failure message) after downloading a load of Maven
> deps with the following errors:
> https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
> >
> > Any hints on what I should do to build the avro-tools jar?
> >
> >   cheers,
> >     rog.
> >
> > On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com> wrote:
> >>
> >>
> >> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
> >>>
> >>> Hello!  Is it because you are using brew to install avro-tools?  I'm
> >>> not entirely familiar with how it packages the command, but using a
> >>> direct bash-like solution instead might solve this problem of mixing
> >>> stdout and stderr.  This could be the simplest (and right) solution
> >>> for piping.
> >>
> >>
> >> No, I downloaded the jar and am directly running it with "java -jar
> ~/other/avro-tools-1.9.1.jar".
> >> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
> package openjdk-11-jre-headless.
> >>
> >> I'm going to try compiling avro-tools myself to investigate but I'm a
> total Java ignoramus - wish me luck!
> >>
> >>>
> >>> alias avrotoolx='java -jar
> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
> >>> avrotoolx tojson x.out 2> /dev/null
> >>>
> >>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
> >>> warnings and logs should not be piped along with the normal content.)
> >>>
> >>> Otherwise, IIRC, there is no way to disable the first illegal
> >>> reflective access warning when running in Java 9+, but you can "fix"
> >>> these module errors, and deactivate the NativeCodeLoader logs with an
> >>> explicit log4j.properties:
> >>>
> >>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
> >>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
> >>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
> >>> tojson x.out
> >>
> >>
> >> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
> properties files though. What do I need to put in /tmp/log4j.properties to
> make this work?
> >>
> >>> None of that is particularly satisfactory, but it could be a
> >>> workaround for your immediate use.
> >>
> >>
> >> Yeah, not ideal, because if something goes wrong, stdout will be
> corrupted, but at least some noise should go away :)
> >>
> >>> I'd also like to see a more unified experience with the CLI tool for
> >>> documentation and usage.  The current state requires a bit of Avro
> >>> expertise to use, but it has some functions that would be pretty
> >>> useful for a user working with Avro data.  I raised
> >>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
> >>>
> >>> In my opinion, a schema compatibility tool would be a useful and
> >>> welcome feature!
> >>
> >>
> >> That would indeed be nice, but in the meantime, is there really nothing
> in the avro-tools commands that uses a chosen schema to read a data file
> written with some other schema? That would give me what I'm after currently.
> >>
> >> Thanks again for the helpful response.
> >>
> >>    cheers,
> >>      rog.
> >>
> >>>
> >>> Best regards, Ryan
> >>>
> >>>
> >>>
> >>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com>
> wrote:
> >>> >
> >>> > Hi Fokko,
> >>> >
> >>> > Thanks for your swift response!
> >>> >
> >>> > Stdout and stderr definitely seem to be merged on this platform at
> least. Here's a sample:
> >>> >
> >>> > % avrotool random --count 1 --schema '"int"'  x.out
> >>> > % avrotool tojson x.out > x.json
> >>> > % cat x.json
> >>> > 125140891
> >>> > WARNING: An illegal reflective access operation has occurred
> >>> > WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >>> > WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >>> > WARNING: Use --illegal-access=warn to enable warnings of further
> illegal reflective access operations
> >>> > WARNING: All illegal access operations will be denied in a future
> release
> >>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> > %
> >>> >
> >>> > I've just verified that it's not a problem with the java executable
> itself (I ran a program that printed to System.err and the text correctly
> goes to the standard error).
> >>> >
> >>> > > Regarding the documentation, the CLI itself contains info on all
> the available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >>> >
> >>> > There's the single line summary produced for each command by running
> "avro-tools" with no arguments, but that's not as much info as I'd ideally
> like. For example, it often doesn't say what file format is being written
> or read. For some commands, the purpose is not very clear.
> >>> >
> >>> > For example the description of the recodec command is "Alters the
> codec of a data file". It doesn't describe how it alters it or how one
> might configure the alteration parameters. I managed to get some usage help
> by passing it more than two parameters (specifying "--help" gives an
> exception), but that doesn't provide much more info:
> >>> >
> >>> > % avro-tools recodec a b c
> >>> > Expected at most an input file and output file.
> >>> > Option             Description
> >>> > ------             -----------
> >>> > --codec <String>   Compression codec (default: null)
> >>> > --level <Integer>  Compression level (only applies to deflate and
> xz) (default:
> >>> >                      -1)
> >>> >
> >>> > For the record, I'm wondering it might be possible to get avrotool
> to tell me if one schema is compatible with another so that I can check
> hypotheses about schema-checking in practice without having to write Java
> code.
> >>> >
> >>> >   cheers,
> >>> >     rog.
> >>> >
> >>> >
> >>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
> >>> >>
> >>> >> Hi Rog,
> >>> >>
> >>> >> This is actually a warning produced by the Hadoop library, that
> we're using. Please note that htis isn't part of the stdout:
> >>> >>
> >>> >> $ find /tmp/tmp
> >>> >> /tmp/tmp
> >>> >> /tmp/tmp/._SUCCESS.crc
> >>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >>> >>
> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
> >>> >> /tmp/tmp/_SUCCESS
> >>> >>
> >>> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> >> {"line_of_text":{"string":"Hello"}}
> >>> >> {"line_of_text":{"string":"World"}}
> >>> >>
> >>> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
> /tmp/tmp/data.json
> >>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> >>
> >>> >> $ cat /tmp/tmp/data.json
> >>> >> {"line_of_text":{"string":"Hello"}}
> >>> >> {"line_of_text":{"string":"World"}}
> >>> >>
> >>> >> So when you pipe the data, it doesn't include the warnings.
> >>> >>
> >>> >> Regarding the documentation, the CLI itself contains info on all
> the available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >>> >>
> >>> >> Hope this helps.
> >>> >>
> >>> >> Cheers, Fokko
> >>> >>
> >>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <rogpeppe@gmail.com
> >:
> >>> >>>
> >>> >>> Hi,
> >>> >>>
> >>> >>> I've been trying to use avro-tools to verify Avro implementations,
> and I've come across an issue. Perhaps someone here might be able to help?
> >>> >>>
> >>> >>> When I run avro-tools with some subcommands, it prints a bunch of
> warnings (see below) to the standard output. Does anyone know a way to
> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
> 1.9.1.
> >>> >>>
> >>> >>> The warnings are somewhat annoying because they can corrupt output
> of tools that print to the standard output, such as recodec.
> >>> >>>
> >>> >>> Aside: is there any documentation for the commands in avro-tools?
> Some seem to have some command-line help (though unfortunately there
> doesn't seem to be a standard way of showing it), but often that help often
> doesn't describe what the command actually does.
> >>> >>>
> >>> >>> Here's the output that I see:
> >>> >>>
> >>> >>> WARNING: An illegal reflective access operation has occurred
> >>> >>> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >>> >>> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >>> >>> WARNING: Use --illegal-access=warn to enable warnings of further
> illegal reflective access operations
> >>> >>> WARNING: All illegal access operations will be denied in a future
> release
> >>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>> >>>
> >>> >>>   cheers,
> >>> >>>     rog.
> >>> >>>
>

Re: avro-tools illegal reflective access warnings

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  For a simple, silent log4j, I use:

$ cat /tmp/log4j.properties
log4j.rootLogger=off

I didn't find anything currently in the avro-tools that uses both
reader and writer schemas while deserializing data...  It should be a
pretty easy feature to add as an option to the DataFileReadTool
(a.k.a. tojson)!

You are correct about running ./build.sh dist in the java directory --
it fails with JDK 11 (likely fixable:
https://issues.apache.org/jira/browse/MJAVADOC-562).

You should probably do a simple mvn clean install instead and find the
jar in lang/java/tools/target/avro-tools-1.10.0-SNAPSHOT.jar.  That
should work with JDK11 without any problem (well-tested in the build).

Best regards, Ryan



On Thu, Jan 16, 2020 at 5:49 PM roger peppe <ro...@gmail.com> wrote:
>
> Update: I tried running `build.sh dist` in `lang/java` and it failed (at least, it looks like a failure message) after downloading a load of Maven deps with the following errors: https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965
>
> Any hints on what I should do to build the avro-tools jar?
>
>   cheers,
>     rog.
>
> On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com> wrote:
>>
>>
>> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
>>>
>>> Hello!  Is it because you are using brew to install avro-tools?  I'm
>>> not entirely familiar with how it packages the command, but using a
>>> direct bash-like solution instead might solve this problem of mixing
>>> stdout and stderr.  This could be the simplest (and right) solution
>>> for piping.
>>
>>
>> No, I downloaded the jar and am directly running it with "java -jar ~/other/avro-tools-1.9.1.jar".
>> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian package openjdk-11-jre-headless.
>>
>> I'm going to try compiling avro-tools myself to investigate but I'm a total Java ignoramus - wish me luck!
>>
>>>
>>> alias avrotoolx='java -jar
>>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
>>> avrotoolx tojson x.out 2> /dev/null
>>>
>>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
>>> warnings and logs should not be piped along with the normal content.)
>>>
>>> Otherwise, IIRC, there is no way to disable the first illegal
>>> reflective access warning when running in Java 9+, but you can "fix"
>>> these module errors, and deactivate the NativeCodeLoader logs with an
>>> explicit log4j.properties:
>>>
>>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
>>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>>> tojson x.out
>>
>>
>> Thanks for that suggestion! I'm afraid I'm not familiar with log4j properties files though. What do I need to put in /tmp/log4j.properties to make this work?
>>
>>> None of that is particularly satisfactory, but it could be a
>>> workaround for your immediate use.
>>
>>
>> Yeah, not ideal, because if something goes wrong, stdout will be corrupted, but at least some noise should go away :)
>>
>>> I'd also like to see a more unified experience with the CLI tool for
>>> documentation and usage.  The current state requires a bit of Avro
>>> expertise to use, but it has some functions that would be pretty
>>> useful for a user working with Avro data.  I raised
>>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>>>
>>> In my opinion, a schema compatibility tool would be a useful and
>>> welcome feature!
>>
>>
>> That would indeed be nice, but in the meantime, is there really nothing in the avro-tools commands that uses a chosen schema to read a data file written with some other schema? That would give me what I'm after currently.
>>
>> Thanks again for the helpful response.
>>
>>    cheers,
>>      rog.
>>
>>>
>>> Best regards, Ryan
>>>
>>>
>>>
>>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com> wrote:
>>> >
>>> > Hi Fokko,
>>> >
>>> > Thanks for your swift response!
>>> >
>>> > Stdout and stderr definitely seem to be merged on this platform at least. Here's a sample:
>>> >
>>> > % avrotool random --count 1 --schema '"int"'  x.out
>>> > % avrotool tojson x.out > x.json
>>> > % cat x.json
>>> > 125140891
>>> > WARNING: An illegal reflective access operation has occurred
>>> > WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/rog/other/avro-tools-1.9.1.jar) to method sun.security.krb5.Config.getInstance()
>>> > WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
>>> > WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
>>> > WARNING: All illegal access operations will be denied in a future release
>>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>> > %
>>> >
>>> > I've just verified that it's not a problem with the java executable itself (I ran a program that printed to System.err and the text correctly goes to the standard error).
>>> >
>>> > > Regarding the documentation, the CLI itself contains info on all the available commands. Also, there are excellent online resources: https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is there anything specific that you're missing?
>>> >
>>> > There's the single line summary produced for each command by running "avro-tools" with no arguments, but that's not as much info as I'd ideally like. For example, it often doesn't say what file format is being written or read. For some commands, the purpose is not very clear.
>>> >
>>> > For example the description of the recodec command is "Alters the codec of a data file". It doesn't describe how it alters it or how one might configure the alteration parameters. I managed to get some usage help by passing it more than two parameters (specifying "--help" gives an exception), but that doesn't provide much more info:
>>> >
>>> > % avro-tools recodec a b c
>>> > Expected at most an input file and output file.
>>> > Option             Description
>>> > ------             -----------
>>> > --codec <String>   Compression codec (default: null)
>>> > --level <Integer>  Compression level (only applies to deflate and xz) (default:
>>> >                      -1)
>>> >
>>> > For the record, I'm wondering it might be possible to get avrotool to tell me if one schema is compatible with another so that I can check hypotheses about schema-checking in practice without having to write Java code.
>>> >
>>> >   cheers,
>>> >     rog.
>>> >
>>> >
>>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl> wrote:
>>> >>
>>> >> Hi Rog,
>>> >>
>>> >> This is actually a warning produced by the Hadoop library, that we're using. Please note that htis isn't part of the stdout:
>>> >>
>>> >> $ find /tmp/tmp
>>> >> /tmp/tmp
>>> >> /tmp/tmp/._SUCCESS.crc
>>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>>> >> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>>> >> /tmp/tmp/_SUCCESS
>>> >>
>>> >> $ avro-tools tojson /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>> >> {"line_of_text":{"string":"Hello"}}
>>> >> {"line_of_text":{"string":"World"}}
>>> >>
>>> >> $ avro-tools tojson /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro > /tmp/tmp/data.json
>>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>> >>
>>> >> $ cat /tmp/tmp/data.json
>>> >> {"line_of_text":{"string":"Hello"}}
>>> >> {"line_of_text":{"string":"World"}}
>>> >>
>>> >> So when you pipe the data, it doesn't include the warnings.
>>> >>
>>> >> Regarding the documentation, the CLI itself contains info on all the available commands. Also, there are excellent online resources: https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is there anything specific that you're missing?
>>> >>
>>> >> Hope this helps.
>>> >>
>>> >> Cheers, Fokko
>>> >>
>>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:
>>> >>>
>>> >>> Hi,
>>> >>>
>>> >>> I've been trying to use avro-tools to verify Avro implementations, and I've come across an issue. Perhaps someone here might be able to help?
>>> >>>
>>> >>> When I run avro-tools with some subcommands, it prints a bunch of warnings (see below) to the standard output. Does anyone know a way to disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools 1.9.1.
>>> >>>
>>> >>> The warnings are somewhat annoying because they can corrupt output of tools that print to the standard output, such as recodec.
>>> >>>
>>> >>> Aside: is there any documentation for the commands in avro-tools? Some seem to have some command-line help (though unfortunately there doesn't seem to be a standard way of showing it), but often that help often doesn't describe what the command actually does.
>>> >>>
>>> >>> Here's the output that I see:
>>> >>>
>>> >>> WARNING: An illegal reflective access operation has occurred
>>> >>> WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/rog/other/avro-tools-1.9.1.jar) to method sun.security.krb5.Config.getInstance()
>>> >>> WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
>>> >>> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
>>> >>> WARNING: All illegal access operations will be denied in a future release
>>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>> >>>
>>> >>>   cheers,
>>> >>>     rog.
>>> >>>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
Update: I tried running `build.sh dist` in `lang/java` and it failed (at
least, it looks like a failure message) after downloading a load of Maven
deps with the following errors:
https://gist.github.com/rogpeppe/df05d993254dc5082253a5ef5027e965

Any hints on what I should do to build the avro-tools jar?

  cheers,
    rog.

On Thu, 16 Jan 2020 at 16:45, roger peppe <ro...@gmail.com> wrote:

>
> On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:
>
>> Hello!  Is it because you are using brew to install avro-tools?  I'm
>> not entirely familiar with how it packages the command, but using a
>> direct bash-like solution instead might solve this problem of mixing
>> stdout and stderr.  This could be the simplest (and right) solution
>> for piping.
>>
>
> No, I downloaded the jar and am directly running it with "java -jar
> ~/other/avro-tools-1.9.1.jar".
> I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
> package openjdk-11-jre-headless.
>
> I'm going to try compiling avro-tools myself to investigate but I'm a
> total Java ignoramus - wish me luck!
>
>
>> alias avrotoolx='java -jar
>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
>> avrotoolx tojson x.out 2> /dev/null
>>
>> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
>> warnings and logs should not be piped along with the normal content.)
>>
>> Otherwise, IIRC, there is no way to disable the first illegal
>> reflective access warning when running in Java 9+, but you can "fix"
>> these module errors, and deactivate the NativeCodeLoader logs with an
>> explicit log4j.properties:
>>
>> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
>> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
>> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
>> tojson x.out
>>
>
> Thanks for that suggestion! I'm afraid I'm not familiar with log4j
> properties files though. What do I need to put in /tmp/log4j.properties to
> make this work?
>
> None of that is particularly satisfactory, but it could be a
>> workaround for your immediate use.
>>
>
> Yeah, not ideal, because if something goes wrong, stdout will be
> corrupted, but at least some noise should go away :)
>
> I'd also like to see a more unified experience with the CLI tool for
>> documentation and usage.  The current state requires a bit of Avro
>> expertise to use, but it has some functions that would be pretty
>> useful for a user working with Avro data.  I raised
>> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>>
>> In my opinion, a schema compatibility tool would be a useful and
>> welcome feature!
>>
>
> That would indeed be nice, but in the meantime, is there really nothing in
> the avro-tools commands that uses a chosen schema to read a data file
> written with some other schema? That would give me what I'm after currently.
>
> Thanks again for the helpful response.
>
>    cheers,
>      rog.
>
>
>> Best regards, Ryan
>>
>>
>>
>> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com> wrote:
>> >
>> > Hi Fokko,
>> >
>> > Thanks for your swift response!
>> >
>> > Stdout and stderr definitely seem to be merged on this platform at
>> least. Here's a sample:
>> >
>> > % avrotool random --count 1 --schema '"int"'  x.out
>> > % avrotool tojson x.out > x.json
>> > % cat x.json
>> > 125140891
>> > WARNING: An illegal reflective access operation has occurred
>> > WARNING: Illegal reflective access by
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>> sun.security.krb5.Config.getInstance()
>> > WARNING: Please consider reporting this to the maintainers of
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> > WARNING: Use --illegal-access=warn to enable warnings of further
>> illegal reflective access operations
>> > WARNING: All illegal access operations will be denied in a future
>> release
>> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> > %
>> >
>> > I've just verified that it's not a problem with the java executable
>> itself (I ran a program that printed to System.err and the text correctly
>> goes to the standard error).
>> >
>> > > Regarding the documentation, the CLI itself contains info on all the
>> available commands. Also, there are excellent online resources:
>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> Is there anything specific that you're missing?
>> >
>> > There's the single line summary produced for each command by running
>> "avro-tools" with no arguments, but that's not as much info as I'd ideally
>> like. For example, it often doesn't say what file format is being written
>> or read. For some commands, the purpose is not very clear.
>> >
>> > For example the description of the recodec command is "Alters the codec
>> of a data file". It doesn't describe how it alters it or how one might
>> configure the alteration parameters. I managed to get some usage help by
>> passing it more than two parameters (specifying "--help" gives an
>> exception), but that doesn't provide much more info:
>> >
>> > % avro-tools recodec a b c
>> > Expected at most an input file and output file.
>> > Option             Description
>> > ------             -----------
>> > --codec <String>   Compression codec (default: null)
>> > --level <Integer>  Compression level (only applies to deflate and xz)
>> (default:
>> >                      -1)
>> >
>> > For the record, I'm wondering it might be possible to get avrotool to
>> tell me if one schema is compatible with another so that I can check
>> hypotheses about schema-checking in practice without having to write Java
>> code.
>> >
>> >   cheers,
>> >     rog.
>> >
>> >
>> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl>
>> wrote:
>> >>
>> >> Hi Rog,
>> >>
>> >> This is actually a warning produced by the Hadoop library, that we're
>> using. Please note that htis isn't part of the stdout:
>> >>
>> >> $ find /tmp/tmp
>> >> /tmp/tmp
>> >> /tmp/tmp/._SUCCESS.crc
>> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>> >> /tmp/tmp/_SUCCESS
>> >>
>> >> $ avro-tools tojson
>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >> {"line_of_text":{"string":"Hello"}}
>> >> {"line_of_text":{"string":"World"}}
>> >>
>> >> $ avro-tools tojson
>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
>> /tmp/tmp/data.json
>> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >>
>> >> $ cat /tmp/tmp/data.json
>> >> {"line_of_text":{"string":"Hello"}}
>> >> {"line_of_text":{"string":"World"}}
>> >>
>> >> So when you pipe the data, it doesn't include the warnings.
>> >>
>> >> Regarding the documentation, the CLI itself contains info on all the
>> available commands. Also, there are excellent online resources:
>> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
>> Is there anything specific that you're missing?
>> >>
>> >> Hope this helps.
>> >>
>> >> Cheers, Fokko
>> >>
>> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:
>> >>>
>> >>> Hi,
>> >>>
>> >>> I've been trying to use avro-tools to verify Avro implementations,
>> and I've come across an issue. Perhaps someone here might be able to help?
>> >>>
>> >>> When I run avro-tools with some subcommands, it prints a bunch of
>> warnings (see below) to the standard output. Does anyone know a way to
>> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
>> 1.9.1.
>> >>>
>> >>> The warnings are somewhat annoying because they can corrupt output of
>> tools that print to the standard output, such as recodec.
>> >>>
>> >>> Aside: is there any documentation for the commands in avro-tools?
>> Some seem to have some command-line help (though unfortunately there
>> doesn't seem to be a standard way of showing it), but often that help often
>> doesn't describe what the command actually does.
>> >>>
>> >>> Here's the output that I see:
>> >>>
>> >>> WARNING: An illegal reflective access operation has occurred
>> >>> WARNING: Illegal reflective access by
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>> sun.security.krb5.Config.getInstance()
>> >>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> >>> WARNING: Use --illegal-access=warn to enable warnings of further
>> illegal reflective access operations
>> >>> WARNING: All illegal access operations will be denied in a future
>> release
>> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>> >>>
>> >>>   cheers,
>> >>>     rog.
>> >>>
>>
>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
On Thu, 16 Jan 2020 at 13:57, Ryan Skraba <ry...@skraba.com> wrote:

> Hello!  Is it because you are using brew to install avro-tools?  I'm
> not entirely familiar with how it packages the command, but using a
> direct bash-like solution instead might solve this problem of mixing
> stdout and stderr.  This could be the simplest (and right) solution
> for piping.
>

No, I downloaded the jar and am directly running it with "java -jar
~/other/avro-tools-1.9.1.jar".
I'm using Ubuntu Linux 18.04 FWIW - the binary comes from Debian
package openjdk-11-jre-headless.

I'm going to try compiling avro-tools myself to investigate but I'm a total
Java ignoramus - wish me luck!


> alias avrotoolx='java -jar
> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
> avrotoolx tojson x.out 2> /dev/null
>
> (As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
> warnings and logs should not be piped along with the normal content.)
>
> Otherwise, IIRC, there is no way to disable the first illegal
> reflective access warning when running in Java 9+, but you can "fix"
> these module errors, and deactivate the NativeCodeLoader logs with an
> explicit log4j.properties:
>
> java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
> java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
> ~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
> tojson x.out
>

Thanks for that suggestion! I'm afraid I'm not familiar with log4j
properties files though. What do I need to put in /tmp/log4j.properties to
make this work?

None of that is particularly satisfactory, but it could be a
> workaround for your immediate use.
>

Yeah, not ideal, because if something goes wrong, stdout will be corrupted,
but at least some noise should go away :)

I'd also like to see a more unified experience with the CLI tool for
> documentation and usage.  The current state requires a bit of Avro
> expertise to use, but it has some functions that would be pretty
> useful for a user working with Avro data.  I raised
> https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.
>
> In my opinion, a schema compatibility tool would be a useful and
> welcome feature!
>

That would indeed be nice, but in the meantime, is there really nothing in
the avro-tools commands that uses a chosen schema to read a data file
written with some other schema? That would give me what I'm after currently.

Thanks again for the helpful response.

   cheers,
     rog.


> Best regards, Ryan
>
>
>
> On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com> wrote:
> >
> > Hi Fokko,
> >
> > Thanks for your swift response!
> >
> > Stdout and stderr definitely seem to be merged on this platform at
> least. Here's a sample:
> >
> > % avrotool random --count 1 --schema '"int"'  x.out
> > % avrotool tojson x.out > x.json
> > % cat x.json
> > 125140891
> > WARNING: An illegal reflective access operation has occurred
> > WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> > WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> > WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> > WARNING: All illegal access operations will be denied in a future release
> > 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> > %
> >
> > I've just verified that it's not a problem with the java executable
> itself (I ran a program that printed to System.err and the text correctly
> goes to the standard error).
> >
> > > Regarding the documentation, the CLI itself contains info on all the
> available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >
> > There's the single line summary produced for each command by running
> "avro-tools" with no arguments, but that's not as much info as I'd ideally
> like. For example, it often doesn't say what file format is being written
> or read. For some commands, the purpose is not very clear.
> >
> > For example the description of the recodec command is "Alters the codec
> of a data file". It doesn't describe how it alters it or how one might
> configure the alteration parameters. I managed to get some usage help by
> passing it more than two parameters (specifying "--help" gives an
> exception), but that doesn't provide much more info:
> >
> > % avro-tools recodec a b c
> > Expected at most an input file and output file.
> > Option             Description
> > ------             -----------
> > --codec <String>   Compression codec (default: null)
> > --level <Integer>  Compression level (only applies to deflate and xz)
> (default:
> >                      -1)
> >
> > For the record, I'm wondering it might be possible to get avrotool to
> tell me if one schema is compatible with another so that I can check
> hypotheses about schema-checking in practice without having to write Java
> code.
> >
> >   cheers,
> >     rog.
> >
> >
> > On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl>
> wrote:
> >>
> >> Hi Rog,
> >>
> >> This is actually a warning produced by the Hadoop library, that we're
> using. Please note that htis isn't part of the stdout:
> >>
> >> $ find /tmp/tmp
> >> /tmp/tmp
> >> /tmp/tmp/._SUCCESS.crc
> >> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
> >> /tmp/tmp/_SUCCESS
> >>
> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> >> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >> {"line_of_text":{"string":"Hello"}}
> >> {"line_of_text":{"string":"World"}}
> >>
> >> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
> /tmp/tmp/data.json
> >> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>
> >> $ cat /tmp/tmp/data.json
> >> {"line_of_text":{"string":"Hello"}}
> >> {"line_of_text":{"string":"World"}}
> >>
> >> So when you pipe the data, it doesn't include the warnings.
> >>
> >> Regarding the documentation, the CLI itself contains info on all the
> available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
> Is there anything specific that you're missing?
> >>
> >> Hope this helps.
> >>
> >> Cheers, Fokko
> >>
> >> Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:
> >>>
> >>> Hi,
> >>>
> >>> I've been trying to use avro-tools to verify Avro implementations, and
> I've come across an issue. Perhaps someone here might be able to help?
> >>>
> >>> When I run avro-tools with some subcommands, it prints a bunch of
> warnings (see below) to the standard output. Does anyone know a way to
> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
> 1.9.1.
> >>>
> >>> The warnings are somewhat annoying because they can corrupt output of
> tools that print to the standard output, such as recodec.
> >>>
> >>> Aside: is there any documentation for the commands in avro-tools? Some
> seem to have some command-line help (though unfortunately there doesn't
> seem to be a standard way of showing it), but often that help often doesn't
> describe what the command actually does.
> >>>
> >>> Here's the output that I see:
> >>>
> >>> WARNING: An illegal reflective access operation has occurred
> >>> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> >>> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> >>> WARNING: Use --illegal-access=warn to enable warnings of further
> illegal reflective access operations
> >>> WARNING: All illegal access operations will be denied in a future
> release
> >>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
> native-hadoop library for your platform... using builtin-java classes where
> applicable
> >>>
> >>>   cheers,
> >>>     rog.
> >>>
>

Re: avro-tools illegal reflective access warnings

Posted by Ryan Skraba <ry...@skraba.com>.
Hello!  Is it because you are using brew to install avro-tools?  I'm
not entirely familiar with how it packages the command, but using a
direct bash-like solution instead might solve this problem of mixing
stdout and stderr.  This could be the simplest (and right) solution
for piping.

alias avrotoolx='java -jar
~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar'
avrotoolx tojson x.out 2> /dev/null

(As Fokko mentioned, the 2> /dev/null isn't even necessary -- the
warnings and logs should not be piped along with the normal content.)

Otherwise, IIRC, there is no way to disable the first illegal
reflective access warning when running in Java 9+, but you can "fix"
these module errors, and deactivate the NativeCodeLoader logs with an
explicit log4j.properties:

java -Dlog4j.configuration=file:///tmp/log4j.properties --add-opens
java.security.jgss/sun.security.krb5=ALL-UNNAMED -jar
~/.m2/repository/org/apache/avro/avro-tools/1.9.1/avro-tools-1.9.1.jar
tojson x.out

None of that is particularly satisfactory, but it could be a
workaround for your immediate use.

I'd also like to see a more unified experience with the CLI tool for
documentation and usage.  The current state requires a bit of Avro
expertise to use, but it has some functions that would be pretty
useful for a user working with Avro data.  I raised
https://issues.apache.org/jira/browse/AVRO-2688 as an improvement.

In my opinion, a schema compatibility tool would be a useful and
welcome feature!

Best regards, Ryan



On Thu, Jan 16, 2020 at 12:25 PM roger peppe <ro...@gmail.com> wrote:
>
> Hi Fokko,
>
> Thanks for your swift response!
>
> Stdout and stderr definitely seem to be merged on this platform at least. Here's a sample:
>
> % avrotool random --count 1 --schema '"int"'  x.out
> % avrotool tojson x.out > x.json
> % cat x.json
> 125140891
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/rog/other/avro-tools-1.9.1.jar) to method sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
> %
>
> I've just verified that it's not a problem with the java executable itself (I ran a program that printed to System.err and the text correctly goes to the standard error).
>
> > Regarding the documentation, the CLI itself contains info on all the available commands. Also, there are excellent online resources: https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is there anything specific that you're missing?
>
> There's the single line summary produced for each command by running "avro-tools" with no arguments, but that's not as much info as I'd ideally like. For example, it often doesn't say what file format is being written or read. For some commands, the purpose is not very clear.
>
> For example the description of the recodec command is "Alters the codec of a data file". It doesn't describe how it alters it or how one might configure the alteration parameters. I managed to get some usage help by passing it more than two parameters (specifying "--help" gives an exception), but that doesn't provide much more info:
>
> % avro-tools recodec a b c
> Expected at most an input file and output file.
> Option             Description
> ------             -----------
> --codec <String>   Compression codec (default: null)
> --level <Integer>  Compression level (only applies to deflate and xz) (default:
>                      -1)
>
> For the record, I'm wondering it might be possible to get avrotool to tell me if one schema is compatible with another so that I can check hypotheses about schema-checking in practice without having to write Java code.
>
>   cheers,
>     rog.
>
>
> On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl> wrote:
>>
>> Hi Rog,
>>
>> This is actually a warning produced by the Hadoop library, that we're using. Please note that htis isn't part of the stdout:
>>
>> $ find /tmp/tmp
>> /tmp/tmp
>> /tmp/tmp/._SUCCESS.crc
>> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
>> /tmp/tmp/_SUCCESS
>>
>> $ avro-tools tojson /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
>> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>> {"line_of_text":{"string":"Hello"}}
>> {"line_of_text":{"string":"World"}}
>>
>> $ avro-tools tojson /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro > /tmp/tmp/data.json
>> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>
>> $ cat /tmp/tmp/data.json
>> {"line_of_text":{"string":"Hello"}}
>> {"line_of_text":{"string":"World"}}
>>
>> So when you pipe the data, it doesn't include the warnings.
>>
>> Regarding the documentation, the CLI itself contains info on all the available commands. Also, there are excellent online resources: https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is there anything specific that you're missing?
>>
>> Hope this helps.
>>
>> Cheers, Fokko
>>
>> Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:
>>>
>>> Hi,
>>>
>>> I've been trying to use avro-tools to verify Avro implementations, and I've come across an issue. Perhaps someone here might be able to help?
>>>
>>> When I run avro-tools with some subcommands, it prints a bunch of warnings (see below) to the standard output. Does anyone know a way to disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools 1.9.1.
>>>
>>> The warnings are somewhat annoying because they can corrupt output of tools that print to the standard output, such as recodec.
>>>
>>> Aside: is there any documentation for the commands in avro-tools? Some seem to have some command-line help (though unfortunately there doesn't seem to be a standard way of showing it), but often that help often doesn't describe what the command actually does.
>>>
>>> Here's the output that I see:
>>>
>>> WARNING: An illegal reflective access operation has occurred
>>> WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/home/rog/other/avro-tools-1.9.1.jar) to method sun.security.krb5.Config.getInstance()
>>> WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
>>> WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
>>> WARNING: All illegal access operations will be denied in a future release
>>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
>>>
>>>   cheers,
>>>     rog.
>>>

Re: avro-tools illegal reflective access warnings

Posted by roger peppe <ro...@gmail.com>.
Hi Fokko,

Thanks for your swift response!

Stdout and stderr definitely seem to be merged on this platform at least.
Here's a sample:

% avrotool random --count 1 --schema '"int"'  x.out
% avrotool tojson x.out > x.json
% cat x.json
125140891
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by
org.apache.hadoop.security.authentication.util.KerberosUtil
(file:/home/rog/other/avro-tools-1.9.1.jar) to method
sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of
org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/01/16 11:00:37 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
%

I've just verified that it's not a problem with the java executable itself
(I ran a program that printed to System.err and the text correctly goes to
the standard error).

> Regarding the documentation, the CLI itself contains info on all the
available commands. Also, there are excellent online resources:
https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
Is
there anything specific that you're missing?

There's the single line summary produced for each command by running
"avro-tools" with no arguments, but that's not as much info as I'd ideally
like. For example, it often doesn't say what file format is being written
or read. For some commands, the purpose is not very clear.

For example the description of the recodec command is "Alters the codec of
a data file". It doesn't describe how it alters it or how one might
configure the alteration parameters. I managed to get some usage help by
passing it more than two parameters (specifying "--help" gives an
exception), but that doesn't provide much more info:

% avro-tools recodec a b c
Expected at most an input file and output file.
Option             Description

------             -----------

--codec <String>   Compression codec (default: null)

--level <Integer>  Compression level (only applies to deflate and xz)
(default:
                     -1)


For the record, I'm wondering it might be possible to get avrotool to tell
me if one schema is compatible with another so that I can check hypotheses
about schema-checking in practice without having to write Java code.

  cheers,
    rog.


On Thu, 16 Jan 2020 at 10:30, Driesprong, Fokko <fo...@driesprong.frl>
wrote:

> Hi Rog,
>
> This is actually a warning produced by the Hadoop library, that we're
> using. Please note that htis isn't part of the stdout:
>
> $ find /tmp/tmp
> /tmp/tmp
> /tmp/tmp/._SUCCESS.crc
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> /tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
> /tmp/tmp/_SUCCESS
>
> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
> 20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
> {"line_of_text":{"string":"Hello"}}
> {"line_of_text":{"string":"World"}}
>
> $ avro-tools tojson
> /tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
> /tmp/tmp/data.json
> 20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
> $ cat /tmp/tmp/data.json
> {"line_of_text":{"string":"Hello"}}
> {"line_of_text":{"string":"World"}}
>
> So when you pipe the data, it doesn't include the warnings.
>
> Regarding the documentation, the CLI itself contains info on all the
> available commands. Also, there are excellent online resources:
> https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/ Is
> there anything specific that you're missing?
>
> Hope this helps.
>
> Cheers, Fokko
>
> Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:
>
>> Hi,
>>
>> I've been trying to use avro-tools to verify Avro implementations, and
>> I've come across an issue. Perhaps someone here might be able to help?
>>
>> When I run avro-tools with some subcommands, it prints a bunch of
>> warnings (see below) to the standard output. Does anyone know a way to
>> disable this? I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools
>> 1.9.1.
>>
>> The warnings are somewhat annoying because they can corrupt output of
>> tools that print to the standard output, such as recodec.
>>
>> Aside: is there any documentation for the commands in avro-tools? Some
>> seem to have some command-line help (though unfortunately there doesn't
>> seem to be a standard way of showing it), but often that help often doesn't
>> describe what the command actually does.
>>
>> Here's the output that I see:
>>
>> WARNING: An illegal reflective access operation has occurred
>> WARNING: Illegal reflective access by
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
>> sun.security.krb5.Config.getInstance()
>> WARNING: Please consider reporting this to the maintainers of
>> org.apache.hadoop.security.authentication.util.KerberosUtil
>> WARNING: Use --illegal-access=warn to enable warnings of further illegal
>> reflective access operations
>> WARNING: All illegal access operations will be denied in a future release
>> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load
>> native-hadoop library for your platform... using builtin-java classes where
>> applicable
>>
>>   cheers,
>>     rog.
>>
>>

Re: avro-tools illegal reflective access warnings

Posted by "Driesprong, Fokko" <fo...@driesprong.frl>.
Hi Rog,

This is actually a warning produced by the Hadoop library, that we're
using. Please note that htis isn't part of the stdout:

$ find /tmp/tmp
/tmp/tmp
/tmp/tmp/._SUCCESS.crc
/tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
/tmp/tmp/.part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro.crc
/tmp/tmp/_SUCCESS

$ avro-tools tojson
/tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro
20/01/16 11:26:10 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable
{"line_of_text":{"string":"Hello"}}
{"line_of_text":{"string":"World"}}

$ avro-tools tojson
/tmp/tmp/part-00000-9300fba6-ccdd-4ecc-97cb-0c3ae3631be5-c000.avro >
/tmp/tmp/data.json
20/01/16 11:26:20 WARN util.NativeCodeLoader: Unable to load native-hadoop
library for your platform... using builtin-java classes where applicable

$ cat /tmp/tmp/data.json
{"line_of_text":{"string":"Hello"}}
{"line_of_text":{"string":"World"}}

So when you pipe the data, it doesn't include the warnings.

Regarding the documentation, the CLI itself contains info on all the
available commands. Also, there are excellent online resources:
https://www.michael-noll.com/blog/2013/03/17/reading-and-writing-avro-files-from-the-command-line/
Is
there anything specific that you're missing?

Hope this helps.

Cheers, Fokko

Op do 16 jan. 2020 om 09:30 schreef roger peppe <ro...@gmail.com>:

> Hi,
>
> I've been trying to use avro-tools to verify Avro implementations, and
> I've come across an issue. Perhaps someone here might be able to help?
>
> When I run avro-tools with some subcommands, it prints a bunch of warnings
> (see below) to the standard output. Does anyone know a way to disable this?
> I'm using openjdk 11.0.5 under Ubuntu 18.04 and avro-tools 1.9.1.
>
> The warnings are somewhat annoying because they can corrupt output of
> tools that print to the standard output, such as recodec.
>
> Aside: is there any documentation for the commands in avro-tools? Some
> seem to have some command-line help (though unfortunately there doesn't
> seem to be a standard way of showing it), but often that help often doesn't
> describe what the command actually does.
>
> Here's the output that I see:
>
> WARNING: An illegal reflective access operation has occurred
> WARNING: Illegal reflective access by
> org.apache.hadoop.security.authentication.util.KerberosUtil
> (file:/home/rog/other/avro-tools-1.9.1.jar) to method
> sun.security.krb5.Config.getInstance()
> WARNING: Please consider reporting this to the maintainers of
> org.apache.hadoop.security.authentication.util.KerberosUtil
> WARNING: Use --illegal-access=warn to enable warnings of further illegal
> reflective access operations
> WARNING: All illegal access operations will be denied in a future release
> 20/01/16 08:12:39 WARN util.NativeCodeLoader: Unable to load native-hadoop
> library for your platform... using builtin-java classes where applicable
>
>   cheers,
>     rog.
>
>