You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@iceberg.apache.org by GitBox <gi...@apache.org> on 2019/10/31 10:41:48 UTC
[GitHub] [incubator-iceberg] Fokko opened a new pull request #596: Make the convert in AvroSchemaUtil consistent

Fokko opened a new pull request #596: Make the convert in AvroSchemaUtil consistent
URL: https://github.com/apache/incubator-iceberg/pull/596
 
 
   While working on the quickstart, I've noticed that the `AvroSchemaUtil.convert` returned a `Type` instead of a `Schema`: http://iceberg.incubator.apache.org/api-quickstart/
   
   ```
   scala> cd ~C02VF05JHV2T:incubator-iceberg fdriesprong$ spark-shell --jars runtime/build/libs/iceberg-spark-runtime-008d3c4.jar 
   Spark context Web UI available at http://10.156.54.10:4040
   Spark context available as 'sc' (master = local[*], app id = local-1572518278358).
   Spark session available as 'spark'.
   Welcome to
         ____              __
        / __/__  ___ _____/ /__
       _\ \/ _ \/ _ `/ __/  '_/
      /___/ .__/\_,_/_/ /_/\_\   version 2.4.4
         /_/
            
   Using Scala version 2.11.12 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_192)
   Type in expressions to have them evaluated.
   Type :help for more information.
   
   scala> import scala.io.Source
   import scala.io.Source
   
   scala> import org.apache.iceberg.shaded.org.apache.avro.Schema.Parser
   import org.apache.iceberg.shaded.org.apache.avro.Schema.Parser
   
   scala> import org.apache.iceberg.avro.AvroSchemaUtil
   import org.apache.iceberg.avro.AvroSchemaUtil
   
   scala> val strSchema = Source.fromFile("/Users/fdriesprong/Desktop/fpa-aws-poc/acid-fs/data/beers.avsc").getLines.mkString
   strSchema: String = {  "type" : "record",  "name" : "topLevelRecord",  "fields" : [ {    "name" : "seq",    "type" : [ "string", "null" ]  }, {    "name" : "abv",    "type" : [ "string", "null" ]  }, {    "name" : "ibu",    "type" : [ "string", "null" ]  }, {    "name" : "id",    "type" : [ "string", "null" ]  }, {    "name" : "name",    "type" : [ "string", "null" ]  }, {    "name" : "style",    "type" : [ "string", "null" ]  }, {    "name" : "brewery_id",    "type" : [ "string", "null" ]  }, {    "name" : "ounces",    "type" : [ "string", "null" ]  } ]}
   
   scala> val avroSchema = new Parser().parse(strSchema)
   avroSchema: org.apache.iceberg.shaded.org.apache.avro.Schema = {"type":"record","name":"topLevelRecord","fields":[{"name":"seq","type":["string","null"]},{"name":"abv","type":["string","null"]},{"name":"ibu","type":["string","null"]},{"name":"id","type":["string","null"]},{"name":"name","type":["string","null"]},{"name":"style","type":["string","null"]},{"name":"brewery_id","type":["string","null"]},{"name":"ounces","type":["string","null"]}]}
   
   scala> val schema = AvroSchemaUtil.convert(avroSchema)
   schema: org.apache.iceberg.types.Type = struct<0: seq: optional string, 1: abv: optional string, 2: ibu: optional string, 3: id: optional string, 4: name: optional string, 5: style: optional string, 6: brewery_id: optional string, 7: ounces: optional string>
   
   scala> import org.apache.iceberg.hive.HiveCatalog
   import org.apache.iceberg.hive.HiveCatalog
   
   scala> val catalog = new HiveCatalog(spark.sparkContext.hadoopConfiguration)
   catalog: org.apache.iceberg.hive.HiveCatalog = org.apache.iceberg.hive.HiveCatalog@6c1d25fa
   
   scala> import org.apache.iceberg.catalog.TableIdentifier
   import org.apache.iceberg.catalog.TableIdentifier
   
   scala> import org.apache.iceberg.PartitionSpec
   import org.apache.iceberg.PartitionSpec
   
   scala> val spec = PartitionSpec.builderFor(schema).build()
   <console>:31: error: type mismatch;
    found   : org.apache.iceberg.types.Type
    required: org.apache.iceberg.Schema
          val spec = PartitionSpec.builderFor(schema).build()
   ```
   
   For consistency, I think we should make sure that all the `AvroSchemaUtil.convert` should return a `Schema`.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@iceberg.apache.org
For additional commands, e-mail: issues-help@iceberg.apache.org