You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@parquet.apache.org by "Felix Kizhakkel Jose (Jira)" <ji...@apache.org> on 2019/11/05 15:52:00 UTC
[jira] [Commented] (PARQUET-1679) Invalid SchemaException for UUID while using AvroParquetWriter

    [ https://issues.apache.org/jira/browse/PARQUET-1679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16967620#comment-16967620 ] 

Felix Kizhakkel Jose commented on PARQUET-1679:
-----------------------------------------------

Hi [~q.xu],
Thank you so much. Do you know whether there is any Converter [Parquet Converter] instead of AvroConverter? I couldn't find one. Could you please provide some insights?

> Invalid SchemaException for UUID while using AvroParquetWriter
> --------------------------------------------------------------
>
>                 Key: PARQUET-1679
>                 URL: https://issues.apache.org/jira/browse/PARQUET-1679
>             Project: Parquet
>          Issue Type: Bug
>          Components: parquet-avro
>    Affects Versions: 1.10.1
>            Reporter: Felix Kizhakkel Jose
>            Priority: Major
>
> Hi,
> I am getting org.apache.parquet.schema.InvalidSchemaException: Cannot write a schema with an empty group: optional group id {} while I include a UUID field on my POJO object. Without UUID everything worked fine. I have seen Parquet suports UUID as part of [#PR-71] on 2.4 release. 
>  But I am getting InvalidSchemaException on UUID. Is there anything that I am missing or its a known issue?
> *My setup details:*
> *gradle dependency :*
> dependencies
> { compile group: 'org.springframework.boot', name: 'spring-boot-starter' compile group: 'org.projectlombok', name: 'lombok', version: '1.16.6' compile group: 'com.amazonaws', name: 'aws-java-sdk-bundle', version: '1.11.271' compile group: 'org.apache.parquet', name: 'parquet-avro', version: '1.10.1' compile group: 'org.apache.hadoop', name: 'hadoop-common', version: '3.1.1' compile group: 'org.apache.hadoop', name: 'hadoop-aws', version: '3.1.1' compile group: 'org.apache.hadoop', name: 'hadoop-client', version: '3.1.1' compile group: 'joda-time', name: 'joda-time' compile group: 'com.fasterxml.jackson.core', name: 'jackson-databind', version: '2.6.5' compile group: 'com.fasterxml.jackson.datatype', name: 'jackson-datatype-joda', version: '2.6.5' }
> *Model used:*
> @Data
>  public class Employee
> { private UUID id; private String name; private int age; private Address address; }
> @Data
>  public class Address
> { private String streetName; private String city; private Zip zip; }
> @Data
>  public class Zip
> { private int zip; private int ext; }
>  
> +*My Serializer Code:*+
> public void serialize(List<D> inputDataToSerialize, CompressionCodecName compressionCodecName) throws IOException {
> Path path = new Path("s3a://parquetpoc/data_"++compressionCodecName++".parquet");
>  Class clazz = inputDataToSerialize.get(0).getClass();
> try (ParquetWriter<D> writer = AvroParquetWriter.<D>builder(path)
>  .withSchema(ReflectData.AllowNull.get().getSchema(clazz)) // generate nullable fields
>  .withDataModel(ReflectData.get())
>  .withConf(parquetConfiguration)
>  .withCompressionCodec(compressionCodecName)
>  .withWriteMode(OVERWRITE)
>  .withWriterVersion(ParquetProperties.WriterVersion.PARQUET_2_0)
>  .build()) {
> for (D input : inputDataToSerialize)
> { writer.write(input); }
> }
>  }
> private List<Employee> *getInputDataToSerialize*(){
> Address address = new Address();
> address.setStreetName("Murry Ridge Dr");
> address.setCity("Murrysville");
> Zip zip = new Zip();
> zip.setZip(15668);
> zip.setExt(1234);
> address.setZip(zip);
> List<Employee> employees = new ArrayList<>();
> IntStream.range(0, 100000).forEach(i->
> { Employee employee = new Employee(); // employee.setId(UUID.randomUUID()); employee.setAge(20); employee.setName("Test"+i); employee.setAddress(address); employees.add(employee); }
> );
> return employees;
> }
> _**Where generic Type D is Employee_



--
This message was sent by Atlassian Jira
(v8.3.4#803005)