You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "Taoufik DACHRAOUI (JIRA)" <ji...@apache.org> on 2019/04/13 21:24:00 UTC

[jira] [Commented] (SPARK-27388) expression encoder for avro objects

    [ https://issues.apache.org/jira/browse/SPARK-27388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16817089#comment-16817089 ] 

Taoufik DACHRAOUI commented on SPARK-27388:
-------------------------------------------

Currently we modified JavaTypeInference to be able to create encoders for Avro objects; we have now two solutions, the one in this PR and another one using Encoders.bean (fewer code changes); which one is better? how to proceed? we should create another PR for the Encoders.bean solution?

The test we used for Encoders.bean is as follows:

 
{code:java}
implicit val avroExampleEncoder = Encoders.bean[AvroExample1](classOf[AvroExample1]).asInstanceOf[ExpressionEncoder[AvroExample1]]
val input: AvroExample1 = AvroExample1.newBuilder()
 .setMyarray(List("Foo", "Bar").asJava)
 .setMyboolean(true)
 .setMybytes(java.nio.ByteBuffer.wrap("MyBytes".getBytes()))
 .setMydouble(2.5)
 .setMyfixed(new Magic("magic".getBytes))
 .setMyfloat(25.0F)
 .setMyint(100)
 .setMylong(10L)
 .setMystring("hello")
 .setMymap(Map(
 "foo" -> new java.lang.Integer(1),
 "bar" -> new java.lang.Integer(2)).asJava)
 .setMymoney(Money.newBuilder().setAmount(100.0F).setCurrency(Currency.EUR).build())
 .build()

val row: InternalRow = avroExampleEncoder.toRow(input)

val output: AvroExample1 = avroExampleEncoder.resolveAndBind().fromRow(row)

val ds: Dataset[AvroExample1] = List(input).toDS()

println(ds.schema)
println(ds.collect().toList)

ds.write.format("avro").save("example1")

val fooDF = spark.read.format("avro").load("example1")

val fooDS = fooDF.as[AvroExample1]

println(fooDS.collect().toList)
 
{code}
The change is made in the branch [https://github.com/mazeboard/spark/tree/bean-encoder]

*with 105 additions and 39 deletions. (excluding the code for tests)*

> expression encoder for avro objects
> -----------------------------------
>
>                 Key: SPARK-27388
>                 URL: https://issues.apache.org/jira/browse/SPARK-27388
>             Project: Spark
>          Issue Type: New Feature
>          Components: SQL
>    Affects Versions: 2.4.1
>            Reporter: Taoufik DACHRAOUI
>            Priority: Major
>
> *What changes were proposed in this pull request?*
> The PR adds support for bean objects, java.util.List, java.util.Map, java.nio.ByteBuffer and java enums to ScalaReflection; unlike the existing javaBean Encoder, properties can be named without the set/get prefix (this is one of the key points that allows the encoding of Avro Fixed types. I believe, the other key point is that the addition must be in ScalaReflection).
> Reminder of Avro types:
>  * primitive types: null, boolean, int, long, float, double, bytes, string
>  * complex types: Records, Enums, Arrays, Maps, Unions, Fixed
> This PR supports simple unions (having a null type and a non-null type) but not complex unions for the simple reason that the Avro compiler will generate java code with type Object for all complex unions, and fields with simple unions will be typed as the non-null type of the union.
>  
> *How was this patch tested?*
> currently only 1 encodeDecodeTest was added to ExpressionEncoderSuite; the test uses the following avro schema:
> {code:java}
> {"namespace": "org.apache.spark.sql.catalyst.encoders", "type": "record", "name": "AvroExample1",
>  "fields": [
>  {"name":"mymoney","type":["null",{"type":"record","name":"Money","namespace":"org.apache.spark.sql.catalyst.encoders","fields":[
>  {"name":"amount","type":"float","default":0},
>  {"name":"currency","type":{"type":"enum","name":"Currency","symbols":["EUR","USD","BRL"]},"default":"EUR"}]}], "default":null},
>  {"name": "myfloat", "type": "float"},
>  {"name": "mylong", "type": "long"},
>  {"name": "myint", "type": "int"},
>  {"name": "mydouble", "type": "double"},
>  {"name": "myboolean", "type": "boolean"},
>  {"name": "mystring", "type": "string"},
>  {"name": "mybytes", "type": "bytes"},
>  {"name": "myfixed", "type": {"type": "fixed", "name": "Magic", "size": 4}},
>  {"name": "myarray", "type": {"type": "array", "items": "string"}},
>  {"name": "mymap", "type": {"type": "map", "values": "int"}}
>  ] }{code}
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org