You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "David Mollitor (Jira)" <ji...@apache.org> on 2020/01/24 20:04:00 UTC

[jira] [Updated] (PARQUET-1778) Do Not Record Class for Avro Generic Record Reader

     [ https://issues.apache.org/jira/browse/PARQUET-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

David Mollitor updated PARQUET-1778:
------------------------------------
    Description: 
 
{code:java|title=Example Code}
final ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(path).build();
final GenericRecord genericRecord = reader.read();
{code}
It fails with...
{code:none}
java.lang.NoSuchMethodException: io.github.belugabehr.app.Record.<init>()
	at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_232]
	at java.lang.Class.getDeclaredConstructor(Class.java:2178) ~[na:1.8.0_232]
	at org.apache.avro.specific.SpecificData$1.computeValue(SpecificData.java:63) ~[avro-1.9.1.jar:1.9.1]
	at org.apache.avro.specific.SpecificData$1.computeValue(SpecificData.java:58) ~[avro-1.9.1.jar:1.9.1]
	at java.lang.ClassValue.getFromHashMap(ClassValue.java:227) ~[na:1.8.0_232]
	at java.lang.ClassValue.getFromBackup(ClassValue.java:209) ~[na:1.8.0_232]
	at java.lang.ClassValue.get(ClassValue.java:115) ~[na:1.8.0_232]
{code}
I was surprised because it should just load a {{GenericRecord}} view of the data. But alas, I have the Avro Schema defined with the {{namespace}} and {{name}} fields pointing to {{io.github.belugabehr.app.Record}} which just so happens to be a real class on the class path, so it is trying to call the public constructor on the class which does not exist.

There {{GenericRecordReader}} should always ignore this Avro Schema namespace information.

I am putting {{GenericRecords}} into the Parquet file, I expect to get {{GenericRecords}} back out when I read it.

  was:
{code:java}

final ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(path).build();final ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(path).build(); final GenericRecord genericRecord = reader.read();
{code}

It fails with...

{code:none}
java.lang.NoSuchMethodException: io.github.belugabehr.app.Record.<init>()
	at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_232]
	at java.lang.Class.getDeclaredConstructor(Class.java:2178) ~[na:1.8.0_232]
	at org.apache.avro.specific.SpecificData$1.computeValue(SpecificData.java:63) ~[avro-1.9.1.jar:1.9.1]
	at org.apache.avro.specific.SpecificData$1.computeValue(SpecificData.java:58) ~[avro-1.9.1.jar:1.9.1]
	at java.lang.ClassValue.getFromHashMap(ClassValue.java:227) ~[na:1.8.0_232]
	at java.lang.ClassValue.getFromBackup(ClassValue.java:209) ~[na:1.8.0_232]
	at java.lang.ClassValue.get(ClassValue.java:115) ~[na:1.8.0_232]
{code}

I was surprised because it should just load a {{GenericRecord}} view of the data.  But alas, I have the Avro Schema defined with the {{namespace}} and {{name}} fields pointing to {{io.github.belugabehr.app.Record}} which just so happens to be a real class on the class path, so it is trying to call the public constructor on the class which does not exist.

There {{GenericRecordReader}} should always ignore this Avro Schema namespace information.

I am putting {{GenericRecords}} into the Parquet file, I expect to get {{GenericRecords}} back out when I read it.


> Do Not Record Class for Avro Generic Record Reader
> --------------------------------------------------
>
>                 Key: PARQUET-1778
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1778
>             Project: Parquet
>          Issue Type: Improvement
>            Reporter: David Mollitor
>            Priority: Major
>
>  
> {code:java|title=Example Code}
> final ParquetReader<GenericRecord> reader = AvroParquetReader.<GenericRecord>builder(path).build();
> final GenericRecord genericRecord = reader.read();
> {code}
> It fails with...
> {code:none}
> java.lang.NoSuchMethodException: io.github.belugabehr.app.Record.<init>()
> 	at java.lang.Class.getConstructor0(Class.java:3082) ~[na:1.8.0_232]
> 	at java.lang.Class.getDeclaredConstructor(Class.java:2178) ~[na:1.8.0_232]
> 	at org.apache.avro.specific.SpecificData$1.computeValue(SpecificData.java:63) ~[avro-1.9.1.jar:1.9.1]
> 	at org.apache.avro.specific.SpecificData$1.computeValue(SpecificData.java:58) ~[avro-1.9.1.jar:1.9.1]
> 	at java.lang.ClassValue.getFromHashMap(ClassValue.java:227) ~[na:1.8.0_232]
> 	at java.lang.ClassValue.getFromBackup(ClassValue.java:209) ~[na:1.8.0_232]
> 	at java.lang.ClassValue.get(ClassValue.java:115) ~[na:1.8.0_232]
> {code}
> I was surprised because it should just load a {{GenericRecord}} view of the data. But alas, I have the Avro Schema defined with the {{namespace}} and {{name}} fields pointing to {{io.github.belugabehr.app.Record}} which just so happens to be a real class on the class path, so it is trying to call the public constructor on the class which does not exist.
> There {{GenericRecordReader}} should always ignore this Avro Schema namespace information.
> I am putting {{GenericRecords}} into the Parquet file, I expect to get {{GenericRecords}} back out when I read it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)