You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Ian Hummel <ia...@themodernlife.net> on 2014/07/02 17:05:48 UTC

How to use java-class with JSON schema?

Hi gang,

I'm trying to build a JSON schema with a custom type as the field instead
of just a String.  Is "java-class" supposed to work in that use case?  I
can't seem to make any progress.

Example schema (Foo.avsc):

{
    "namespace" : "com.example",
    "type" : "record",
    "name" : "Foo",
    "fields" : [
        { "name" : "batchId", "type" : "long" },
        { "name" : "timestamp", "type" : "string", "java-class" :
"com.example.Timestamp" }
    ]
}

The Timestamp class has a public constructor which takes a single String
argument.  I even tried annotating it with @Stringable.  However, the
generated java class always uses String, not my custom type.

$ java -jar ~/Downloads/avro-tools-1.7.6.jar compile -string schema
src/main/avro/Foo.avsc /tmp/foo
>From the generated .java file

...

  /**

   * All-args constructor.

   */

  public Foo(java.lang.Long batchId, java.lang.String timestamp) {

    this.batchId = batchId;

    this.timestamp = timestamp;

  }

...

Any help appreciated,

- Ian.

Re: How to use java-class with JSON schema?

Posted by Ian Hummel <ia...@themodernlife.net>.
Hit send to fast!

My real question is: is this a supported use case?  Is there no way to make
SpecificData at least aware of other @Stringale types?  Maybe even just
exposing some methods to "register" new @Stringables or even unannotated
types which have a single String argument constructor + toString method?

Cheers,




On Sat, Jul 5, 2014 at 12:07 PM, Ian Hummel <ia...@themodernlife.net> wrote:

> Hi Doug,
>
> Interestingly I was (sort of) able to make this work.  Here's an example
> schema that correctly generates a class with a field of type
> com.mediamath.data.util.Timestamp (my own Timestamp implementation with a
> single String constructor).
>
> {
>     "namespace" : "com.mediamath.data.bidder",
>     "type" : "record",
>     "name" : "Impression",
>     "fields" : [
>         { "name" : "batchId", "type" : "long" },
>         { "name" : "auctionId", "type" : "long" },
>         { "name" : "timestamp", "type" : {
>             "type" : "string", "java-class" :
> "com.mediamath.data.util.Timestamp" }
>         },
>      ...
> }
>
> NOTE the subtle difference in the field declaration from the previous
> attempt.  This actually produces the Java class I was hoping for
>
> public class Impression extends
> org.apache.avro.specific.SpecificRecordBase implements
> org.apache.avro.specific.SpecificRecord {
>   public static final org.apache.avro.Schema SCHEMA$ = ...
>   @Deprecated public long batchId;
>   @Deprecated public long auctionId;
>   @Deprecated public com.mediamath.data.util.Timestamp timestamp;
> ...
>
> Here's my Timestamp class (Scala)
>
> case class Timestamp(s: String) {
>   val instant = Timestamp.fromString(s)
>   override def toString: String = Timestamp.toString(instant)
> }
>
> So the issue I'm running into now is trying to serialize those instances
> to a file.  Working in Scala, here's the code I'm using:
>
>           val schema = Impression.getClassSchema
>           val datumWriter = new SpecificDatumWriter(classOf[Impression])
>           val dataFileWriter = new DataFileWriter(datumWriter)
>           dataFileWriter.create(schema, new File("target/avro-test.avro"))
>           dataFileWriter.append(imp)
>           dataFileWriter.close()
>
> I get an exception:
>
> java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be
> cast to java.lang.CharSequence
> org.apache.avro.file.DataFileWriter$AppendWriteException:
> java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be
> cast to java.lang.CharSequence
> at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
>  at
> com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:67)
> at
> com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:50)
> Caused by: java.lang.ClassCastException: com.mediamath.data.util.Timestamp
> cannot be cast to java.lang.CharSequence
> at
> org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:213)
>  at
> org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:69)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:76)
>  at
> org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
> at
> org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
>  at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
> at
> org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
>  at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
> ... 2 more
>
> Ok, what if I add @Stringable to Timestamp's constructor?  It still
> doesn't work...  The issue is in SpecificData
>
> protected Set<Class> stringableClasses = new HashSet<Class>();
>   {
>     stringableClasses.add(java.math.BigDecimal.class);
>     stringableClasses.add(java.math.BigInteger.class);
>     stringableClasses.add(java.net.URI.class);
>     stringableClasses.add(java.net.URL.class);
>     stringableClasses.add(java.io.File.class);
>   }
>
> It seems that only a small number of classes are allowed, and there is no
> simple way to extend the list.  My workaround is to do something like this
> (Scala again):
>
> val sd = new SpecificData {
>   override def isStringable(c: Class[_]): Boolean = {
>     if (c.isAssignableFrom(classOf[Timestamp])) true
>     else super.isStringable(c)
>   }
> }
> val schema = Impression.getClassSchema
> val datumWriter = new SpecificDatumWriter[Impression](sd) { }
> val dataFileWriter = new DataFileWriter[Impression](datumWriter)
> dataFileWriter.create(schema, new File("target/avro-test.avro"))
> dataFileWriter.append(imp)
> dataFileWriter.close()
>
> That works!  And the serialized objects can even be read back from e.g.
> Python as a String:
>
> $ python test.py
> {... u'publisherTagId': None, u'strategyId': 405963, u'creativeId':
> 671347, u'timestamp': u'2014-05-13 00:35:00' ...}
>
>
>
>
>
> On Thu, Jul 3, 2014 at 2:14 PM, Doug Cutting <cu...@apache.org> wrote:
>
>> The java-class attribute is supported by the reflect implementation,
>> not by the code-generating specific implementation.  So you could
>> define Foo in Java with something like:
>>
>> public class Foo {
>>   private long batchId;
>>   @Stringable private Timestamp timestamp;
>>   public Foo() {}
>>   public Foo(long batchId, Timestamp timestamp) { ... }
>> }
>>
>> then use ReflectData to read/write instances.  Note that
>> java.sql.Timestamp doesn't have a string constructor.  Are you using a
>> different timestamp class?  If you're defining your own then you could
>> instead add the @Stringable annotation to your Timestamp class rather
>> than to each field where it is used.
>>
>> Reflect-defined schemas can refer to specific-defined classes, but not
>> vice-versa, since the compiler doesn't use reflection to discover
>> schemas, but rather always generates from the schema alone.
>>
>> Doug
>>
>> On Wed, Jul 2, 2014 at 8:05 AM, Ian Hummel <ia...@themodernlife.net> wrote:
>> > Hi gang,
>> >
>> > I'm trying to build a JSON schema with a custom type as the field
>> instead of
>> > just a String.  Is "java-class" supposed to work in that use case?  I
>> can't
>> > seem to make any progress.
>> >
>> > Example schema (Foo.avsc):
>> >
>> > {
>> >     "namespace" : "com.example",
>> >     "type" : "record",
>> >     "name" : "Foo",
>> >     "fields" : [
>> >         { "name" : "batchId", "type" : "long" },
>> >         { "name" : "timestamp", "type" : "string", "java-class" :
>> > "com.example.Timestamp" }
>> >     ]
>> > }
>> >
>> > The Timestamp class has a public constructor which takes a single String
>> > argument.  I even tried annotating it with @Stringable.  However, the
>> > generated java class always uses String, not my custom type.
>> >
>> > $ java -jar ~/Downloads/avro-tools-1.7.6.jar compile -string schema
>> > src/main/avro/Foo.avsc /tmp/foo
>> >
>> > From the generated .java file
>> >
>> > ...
>> >
>> >   /**
>> >
>> >    * All-args constructor.
>> >
>> >    */
>> >
>> >   public Foo(java.lang.Long batchId, java.lang.String timestamp) {
>> >
>> >     this.batchId = batchId;
>> >
>> >     this.timestamp = timestamp;
>> >
>> >   }
>> >
>> > ...
>> >
>> >
>> > Any help appreciated,
>> >
>> > - Ian.
>>
>
>

Re: How to use java-class with JSON schema?

Posted by Ian Hummel <ia...@themodernlife.net>.
Hi Doug,

Interestingly I was (sort of) able to make this work.  Here's an example
schema that correctly generates a class with a field of type
com.mediamath.data.util.Timestamp (my own Timestamp implementation with a
single String constructor).

{
    "namespace" : "com.mediamath.data.bidder",
    "type" : "record",
    "name" : "Impression",
    "fields" : [
        { "name" : "batchId", "type" : "long" },
        { "name" : "auctionId", "type" : "long" },
        { "name" : "timestamp", "type" : {
            "type" : "string", "java-class" :
"com.mediamath.data.util.Timestamp" }
        },
     ...
}

NOTE the subtle difference in the field declaration from the previous
attempt.  This actually produces the Java class I was hoping for

public class Impression extends org.apache.avro.specific.SpecificRecordBase
implements org.apache.avro.specific.SpecificRecord {
  public static final org.apache.avro.Schema SCHEMA$ = ...
  @Deprecated public long batchId;
  @Deprecated public long auctionId;
  @Deprecated public com.mediamath.data.util.Timestamp timestamp;
...

Here's my Timestamp class (Scala)

case class Timestamp(s: String) {
  val instant = Timestamp.fromString(s)
  override def toString: String = Timestamp.toString(instant)
}

So the issue I'm running into now is trying to serialize those instances to
a file.  Working in Scala, here's the code I'm using:

          val schema = Impression.getClassSchema
          val datumWriter = new SpecificDatumWriter(classOf[Impression])
          val dataFileWriter = new DataFileWriter(datumWriter)
          dataFileWriter.create(schema, new File("target/avro-test.avro"))
          dataFileWriter.append(imp)
          dataFileWriter.close()

I get an exception:

java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be
cast to java.lang.CharSequence
org.apache.avro.file.DataFileWriter$AppendWriteException:
java.lang.ClassCastException: com.mediamath.data.util.Timestamp cannot be
cast to java.lang.CharSequence
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:296)
at
com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:67)
at
com.mediamath.mdsw.ImpressionsSpec$$anonfun$1$$anonfun$apply$6.apply(ImpressionsSpec.scala:50)
Caused by: java.lang.ClassCastException: com.mediamath.data.util.Timestamp
cannot be cast to java.lang.CharSequence
at
org.apache.avro.generic.GenericDatumWriter.writeString(GenericDatumWriter.java:213)
at
org.apache.avro.specific.SpecificDatumWriter.writeString(SpecificDatumWriter.java:69)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:76)
at
org.apache.avro.generic.GenericDatumWriter.writeField(GenericDatumWriter.java:114)
at
org.apache.avro.generic.GenericDatumWriter.writeRecord(GenericDatumWriter.java:104)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:66)
at
org.apache.avro.generic.GenericDatumWriter.write(GenericDatumWriter.java:58)
at org.apache.avro.file.DataFileWriter.append(DataFileWriter.java:290)
... 2 more

Ok, what if I add @Stringable to Timestamp's constructor?  It still doesn't
work...  The issue is in SpecificData

protected Set<Class> stringableClasses = new HashSet<Class>();
  {
    stringableClasses.add(java.math.BigDecimal.class);
    stringableClasses.add(java.math.BigInteger.class);
    stringableClasses.add(java.net.URI.class);
    stringableClasses.add(java.net.URL.class);
    stringableClasses.add(java.io.File.class);
  }

It seems that only a small number of classes are allowed, and there is no
simple way to extend the list.  My workaround is to do something like this
(Scala again):

val sd = new SpecificData {
  override def isStringable(c: Class[_]): Boolean = {
    if (c.isAssignableFrom(classOf[Timestamp])) true
    else super.isStringable(c)
  }
}
val schema = Impression.getClassSchema
val datumWriter = new SpecificDatumWriter[Impression](sd) { }
val dataFileWriter = new DataFileWriter[Impression](datumWriter)
dataFileWriter.create(schema, new File("target/avro-test.avro"))
dataFileWriter.append(imp)
dataFileWriter.close()

That works!  And the serialized objects can even be read back from e.g.
Python as a String:

$ python test.py
{... u'publisherTagId': None, u'strategyId': 405963, u'creativeId': 671347,
u'timestamp': u'2014-05-13 00:35:00' ...}





On Thu, Jul 3, 2014 at 2:14 PM, Doug Cutting <cu...@apache.org> wrote:

> The java-class attribute is supported by the reflect implementation,
> not by the code-generating specific implementation.  So you could
> define Foo in Java with something like:
>
> public class Foo {
>   private long batchId;
>   @Stringable private Timestamp timestamp;
>   public Foo() {}
>   public Foo(long batchId, Timestamp timestamp) { ... }
> }
>
> then use ReflectData to read/write instances.  Note that
> java.sql.Timestamp doesn't have a string constructor.  Are you using a
> different timestamp class?  If you're defining your own then you could
> instead add the @Stringable annotation to your Timestamp class rather
> than to each field where it is used.
>
> Reflect-defined schemas can refer to specific-defined classes, but not
> vice-versa, since the compiler doesn't use reflection to discover
> schemas, but rather always generates from the schema alone.
>
> Doug
>
> On Wed, Jul 2, 2014 at 8:05 AM, Ian Hummel <ia...@themodernlife.net> wrote:
> > Hi gang,
> >
> > I'm trying to build a JSON schema with a custom type as the field
> instead of
> > just a String.  Is "java-class" supposed to work in that use case?  I
> can't
> > seem to make any progress.
> >
> > Example schema (Foo.avsc):
> >
> > {
> >     "namespace" : "com.example",
> >     "type" : "record",
> >     "name" : "Foo",
> >     "fields" : [
> >         { "name" : "batchId", "type" : "long" },
> >         { "name" : "timestamp", "type" : "string", "java-class" :
> > "com.example.Timestamp" }
> >     ]
> > }
> >
> > The Timestamp class has a public constructor which takes a single String
> > argument.  I even tried annotating it with @Stringable.  However, the
> > generated java class always uses String, not my custom type.
> >
> > $ java -jar ~/Downloads/avro-tools-1.7.6.jar compile -string schema
> > src/main/avro/Foo.avsc /tmp/foo
> >
> > From the generated .java file
> >
> > ...
> >
> >   /**
> >
> >    * All-args constructor.
> >
> >    */
> >
> >   public Foo(java.lang.Long batchId, java.lang.String timestamp) {
> >
> >     this.batchId = batchId;
> >
> >     this.timestamp = timestamp;
> >
> >   }
> >
> > ...
> >
> >
> > Any help appreciated,
> >
> > - Ian.
>

Re: How to use java-class with JSON schema?

Posted by Doug Cutting <cu...@apache.org>.
The java-class attribute is supported by the reflect implementation,
not by the code-generating specific implementation.  So you could
define Foo in Java with something like:

public class Foo {
  private long batchId;
  @Stringable private Timestamp timestamp;
  public Foo() {}
  public Foo(long batchId, Timestamp timestamp) { ... }
}

then use ReflectData to read/write instances.  Note that
java.sql.Timestamp doesn't have a string constructor.  Are you using a
different timestamp class?  If you're defining your own then you could
instead add the @Stringable annotation to your Timestamp class rather
than to each field where it is used.

Reflect-defined schemas can refer to specific-defined classes, but not
vice-versa, since the compiler doesn't use reflection to discover
schemas, but rather always generates from the schema alone.

Doug

On Wed, Jul 2, 2014 at 8:05 AM, Ian Hummel <ia...@themodernlife.net> wrote:
> Hi gang,
>
> I'm trying to build a JSON schema with a custom type as the field instead of
> just a String.  Is "java-class" supposed to work in that use case?  I can't
> seem to make any progress.
>
> Example schema (Foo.avsc):
>
> {
>     "namespace" : "com.example",
>     "type" : "record",
>     "name" : "Foo",
>     "fields" : [
>         { "name" : "batchId", "type" : "long" },
>         { "name" : "timestamp", "type" : "string", "java-class" :
> "com.example.Timestamp" }
>     ]
> }
>
> The Timestamp class has a public constructor which takes a single String
> argument.  I even tried annotating it with @Stringable.  However, the
> generated java class always uses String, not my custom type.
>
> $ java -jar ~/Downloads/avro-tools-1.7.6.jar compile -string schema
> src/main/avro/Foo.avsc /tmp/foo
>
> From the generated .java file
>
> ...
>
>   /**
>
>    * All-args constructor.
>
>    */
>
>   public Foo(java.lang.Long batchId, java.lang.String timestamp) {
>
>     this.batchId = batchId;
>
>     this.timestamp = timestamp;
>
>   }
>
> ...
>
>
> Any help appreciated,
>
> - Ian.