You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "ASF GitHub Bot (JIRA)" <ji...@apache.org> on 2018/06/13 16:34:00 UTC

[jira] [Commented] (PARQUET-952) Avro union with single type fails with 'is not a group'

    [ https://issues.apache.org/jira/browse/PARQUET-952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16511397#comment-16511397 ] 

ASF GitHub Bot commented on PARQUET-952:
----------------------------------------

michaelandrepearce commented on issue #459: PARQUET-952: Avro union with single type fails with 'is not a group'
URL: https://github.com/apache/parquet-mr/pull/459#issuecomment-397003375
 
 
   Any update on this getting merged and in a release? We have his the same issue.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


> Avro union with single type fails with 'is not a group'
> -------------------------------------------------------
>
>                 Key: PARQUET-952
>                 URL: https://issues.apache.org/jira/browse/PARQUET-952
>             Project: Parquet
>          Issue Type: Bug
>            Reporter: Jarek Jarcec Cecho
>            Assignee: Nandor Kollar
>            Priority: Major
>         Attachments: PARQUET-952-repro.tar.gz
>
>
> When one uses Avro schema with a union that has only one type specified, the {{AvroParquetWriter}} throws an exception. See the following repro test case:
> {code}
>   @Test
>   public void reproCase() throws Exception {
>     System.out.println("Parquet version: " + Version.FULL_VERSION);
>     // Schema with a single field 'value' with type of union that have a single item (=string)
>     Schema avroSchema = Schema.parse("{" +
>       "\"type\": \"record\", " +
>       "\"name\": \"RandomRecord\", " +
>       "\"fields\": [" +
>       "{\"name\": \"value\", \"type\": [\"string\"] }" +
>       "]" +
>       "}");
>     // Parquet writer
>     ParquetWriter parquetWriter = AvroParquetWriter.builder(path).withSchema(avroSchema)
>       .withConf(new Configuration())
>       .build();
>     GenericRecord record = new GenericRecordBuilder(avroSchema)
>       .set("value", "Surprise!")
>       .build();
>     parquetWriter.write(record);
>   }
> {code}
> Will result in:
> {code}
> Parquet version: parquet-mr version 1.9.0 (build 38262e2c80015d0935dad20f8e18f2d6f9fbd03c)
> java.lang.ClassCastException: required binary value (UTF8) is not a group
> 	at org.apache.parquet.schema.Type.asGroupType(Type.java:202)
> 	at org.apache.parquet.avro.AvroWriteSupport.writeValueWithoutConversion(AvroWriteSupport.java:357)
> 	at org.apache.parquet.avro.AvroWriteSupport.writeValue(AvroWriteSupport.java:274)
> 	at org.apache.parquet.avro.AvroWriteSupport.writeRecordFields(AvroWriteSupport.java:187)
> 	at org.apache.parquet.avro.AvroWriteSupport.write(AvroWriteSupport.java:161)
> 	at org.apache.parquet.hadoop.InternalParquetRecordWriter.write(InternalParquetRecordWriter.java:123)
> 	at org.apache.parquet.hadoop.ParquetWriter.write(ParquetWriter.java:292)
> 	at net.jarcec.AvroParquet.reproCase(AvroParquet.java:49)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> 	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> 	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> 	at java.lang.reflect.Method.invoke(Method.java:498)
> 	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
> 	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
> 	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
> 	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
> 	at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
> 	at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
> 	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
> 	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
> 	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
> 	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
> 	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
> 	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
> 	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
> 	at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
> 	at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
> 	at com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
> 	at com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
> 	at com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:237)
> 	at com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
> {code}
> I'm attaching a small maven project with all the dependencies to make it easier to reproduce locally.
> Trying to isolate the problem further, it seems that the [AvroSchemaConverter|https://git-wip-us.apache.org/repos/asf?p=parquet-mr.git;a=blob;f=parquet-avro/src/main/java/org/apache/parquet/avro/AvroSchemaConverter.java;h=70b6525f6059889889dd5fc321322d045613a8cd;hb=refs/heads/master#l111] converts the Avro schema to just {{required binary value (UTF8);}} (e.g. primitive type). But then the writer will go based on the Avro schema (which is a union) and [tries to call asGroupType() on the primite type|https://git-wip-us.apache.org/repos/asf?p=parquet-mr.git;a=blob;f=parquet-avro/src/main/java/org/apache/parquet/avro/AvroWriteSupport.java;h=460565bb01d27c46d342e635860a81cbf45f7b09;hb=refs/heads/master#l357].



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)