You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Mehrez Alachheb <me...@adomik.com> on 2015/08/06 22:30:27 UTC

Deserialize with different schema

Hi,

I am working in project, in which i have to deserialize an avro files
provided by an other external company.
The problem is that the schema  (example below) of the serialized avro
files doesn't contain a namespace, however i need to add the namespace to
the avro Schema.
I can't serialize the avro files with another schema because we get them
from another  company.
I created a new schema( example below) with name space and i generated the
associated java classes.

How i can deserialize the avro files with my new schema ?

Thanks,
Mehrez.

company schema :
{"type":"record", "name":"AvroData", "fields": [.....] }

the schema that i want:
 {"type":"record", "name":"AvroData", "namespace":"company.avro",
"fields": [.....]
}

Re: Deserialize with different schema

Posted by Mehrez Alachheb <me...@adomik.com>.
Thanks Julian for your reply,
 Yes  with GenericRecord i can deserialise my data easily.

Mehrez.
 
> On 13 Aug 2015, at 10:49, julianpeeters <ju...@gmail.com> wrote:
> 
> Hi Mehrez,
> 
> Can I guess? You're reading some Python/Pig AvroStorage output? Hate that.
> 
> I get the same error when the reader schema has a namespace but the writer
> has none. But only when a record is in a union.
> 
> 
> Here's a pair of small runnable  examples
> <https://github.com/julianpeeters/avro-namespace-issues/tree/master/reading>  
> that show errors with reading and writing accross namespaces.
> 
> For the sake of being complete, here's my  question
> <http://apache-avro.679487.n3.nabble.com/Issues-reading-and-writing-namespace-less-schemas-from-namespaced-Specific-Records-td4032092.html> 
> , and it looks like Vitaly Gordon ran into this issue as well,  here
> <http://apache-avro.679487.n3.nabble.com/Unable-to-compile-a-namespace-less-schema-td4028318.html> 
> . 
> 
> IHMO this is a bug that hinders Avro's utility as a data interchange format.
> I don't think the technical issue is in trying to import a class from the
> default package (which succeeds outside of unions), but instead it's from
> trying to resolve a union reflectively and the writer schema's fullname
> doesn't match the class' fullname.
> 
> The fix for now:
> You could try using the Generic API instead, and then map the Generic
> Records to your Specific Records manually. Here's a start in Java:
> 
>        import org.apache.avro.Schema;
>        import org.apache.avro.file.DataFileReader;
>        import org.apache.avro.generic.GenericDatumReader;
>        import org.apache.avro.generic.GenericRecord;
> 
>        GenericDatumReader<GenericRecord> datumReader = new
> GenericDatumReader<>(schema);
>        DataFileReader<GenericRecord> fileReader = new
> DataFileReader<>(file, datumReader);
>        GenericRecord record = fileReader.next();
> 
> 
> Cheers,
> Julian
> 
> 
> 
> --
> View this message in context: http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-tp4032782p4032816.html
> Sent from the Avro - Users mailing list archive at Nabble.com.


Re: Deserialize with different schema

Posted by julianpeeters <ju...@gmail.com>.
Hi Mehrez,

Can I guess? You're reading some Python/Pig AvroStorage output? Hate that.

I get the same error when the reader schema has a namespace but the writer
has none. But only when a record is in a union.


Here's a pair of small runnable  examples
<https://github.com/julianpeeters/avro-namespace-issues/tree/master/reading>  
that show errors with reading and writing accross namespaces.

For the sake of being complete, here's my  question
<http://apache-avro.679487.n3.nabble.com/Issues-reading-and-writing-namespace-less-schemas-from-namespaced-Specific-Records-td4032092.html> 
, and it looks like Vitaly Gordon ran into this issue as well,  here
<http://apache-avro.679487.n3.nabble.com/Unable-to-compile-a-namespace-less-schema-td4028318.html> 
. 

IHMO this is a bug that hinders Avro's utility as a data interchange format.
I don't think the technical issue is in trying to import a class from the
default package (which succeeds outside of unions), but instead it's from
trying to resolve a union reflectively and the writer schema's fullname
doesn't match the class' fullname.

The fix for now:
You could try using the Generic API instead, and then map the Generic
Records to your Specific Records manually. Here's a start in Java:

        import org.apache.avro.Schema;
        import org.apache.avro.file.DataFileReader;
        import org.apache.avro.generic.GenericDatumReader;
        import org.apache.avro.generic.GenericRecord;

        GenericDatumReader<GenericRecord> datumReader = new
GenericDatumReader<>(schema);
        DataFileReader<GenericRecord> fileReader = new
DataFileReader<>(file, datumReader);
        GenericRecord record = fileReader.next();


Cheers,
Julian



--
View this message in context: http://apache-avro.679487.n3.nabble.com/Deserialize-with-different-schema-tp4032782p4032816.html
Sent from the Avro - Users mailing list archive at Nabble.com.

Re: Deserialize with different schema

Posted by Mehrez Alachheb <me...@adomik.com>.
thanks sam for your help,
When i specify a reader schema like this (with scala):

// DataAvroPacket.getClassSchema()  schema of the class generated with namespace”
//  serialization.Schemas.sw The origin schema of the avro file 

  val specificD = new SpecificData
  val datumReader = specificD.createDatumReader(serialization.Schemas.sw, DataAvroPacket.getClassSchema())
  val fileReader = new DataFileReader(new File("/tmp/test.avro"), datumReader)
  while (fileReader.hasNext()) { val user = fileReader.next()}

I have an AvroTypeException: 

org.apache.avro.AvroTypeException: Found MainPacket, expecting union
	at org.apache.avro.io.ResolvingDecoder.doAction(ResolvingDecoder.java:292)
	at org.apache.avro.io.parsing.Parser.advance(Parser.java:88)
	at org.apache.avro.io.ResolvingDecoder.readIndex(ResolvingDecoder.java:267)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:155)
	at org.apache.avro.generic.GenericDatumReader.readField(GenericDatumReader.java:193)
	at org.apache.avro.generic.GenericDatumReader.readRecord(GenericDatumReader.java:183)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:151)
	at org.apache.avro.generic.GenericDatumReader.read(GenericDatumReader.java:142)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:233)
	at org.apache.avro.file.DataFileStream.next(DataFileStream.java:220)
	at .<init>(<console>:19)
	at .<clinit>(<console>)
	at .<init>(<console>:7)
	at .<clinit>(<console>)
	at $print(<console>)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at scala.tools.nsc.interpreter.IMain$ReadEvalPrint.call(IMain.scala:734)
	at scala.tools.nsc.interpreter.IMain$Request.loadAndRun(IMain.scala:983)
	at scala.tools.nsc.interpreter.IMain.loadAndRunReq$1(IMain.scala:573)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:604)
	at scala.tools.nsc.interpreter.IMain.interpret(IMain.scala:568)
	at scala.tools.nsc.interpreter.ILoop.reallyInterpret$1(ILoop.scala:760)
	at scala.tools.nsc.interpreter.ILoop.interpretStartingWith(ILoop.scala:805)
	at scala.tools.nsc.interpreter.ILoop.command(ILoop.scala:717)
	at scala.tools.nsc.interpreter.ILoop.processLine$1(ILoop.scala:581)
	at scala.tools.nsc.interpreter.ILoop.innerLoop$1(ILoop.scala:588)
	at scala.tools.nsc.interpreter.ILoop.loop(ILoop.scala:591)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply$mcZ$sp(ILoop.scala:882)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
	at scala.tools.nsc.interpreter.ILoop$$anonfun$process$1.apply(ILoop.scala:837)
	at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
	at scala.tools.nsc.interpreter.ILoop.process(ILoop.scala:837)
	at scala.tools.nsc.interpreter.ILoop.main(ILoop.scala:904)
	at xsbt.ConsoleInterface.run(ConsoleInterface.scala:62)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:497)
	at sbt.compiler.AnalyzingCompiler.call(AnalyzingCompiler.scala:101)
	at sbt.compiler.AnalyzingCompiler.console(AnalyzingCompiler.scala:76)
	at sbt.Console.sbt$Console$$console0$1(Console.scala:22)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply$mcV$sp(Console.scala:23)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:23)
	at sbt.Console$$anonfun$apply$2$$anonfun$apply$1.apply(Console.scala:23)
	at sbt.Logger$$anon$4.apply(Logger.scala:85)
	at sbt.TrapExit$App.run(TrapExit.scala:248)
	at java.lang.Thread.run(Thread.java:745)

> On 07 Aug 2015, at 00:02, Sam Groth <sg...@yahoo-inc.com> wrote:
> 
> You should be able to specify a reader schema with the namespace and the writer schema without it. See https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/specific/SpecificData.html#createDatumReader(org.apache.avro.Schema, org.apache.avro.Schema)
> 
> 
> Sam
> 
> 
> 
> On Thursday, August 6, 2015 3:31 PM, Mehrez Alachheb <me...@adomik.com> wrote:
> 
> 
> Hi,
> 
> I am working in project, in which i have to deserialize an avro files provided by an other external company.
> The problem is that the schema  (example below) of the serialized avro files doesn't contain a namespace, however i need to add the namespace to the avro Schema.
> I can't serialize the avro files with another schema because we get them from another  company.
> I created a new schema( example below) with name space and i generated the associated java classes.
> 
> How i can deserialize the avro files with my new schema ? 
> 
> Thanks,
> Mehrez.
> 
> company schema :
> {"type":"record", "name":"AvroData", "fields": [.....] } 
> 
> the schema that i want:
>  {"type":"record", "name":"AvroData", "namespace":"company.avro", "fields": [.....] } 
> 
> 
> 
> 
> 


Re: Deserialize with different schema

Posted by Sam Groth <sg...@yahoo-inc.com>.
You should be able to specify a reader schema with the namespace and the writer schema without it. See https://avro.apache.org/docs/1.7.7/api/java/org/apache/avro/specific/SpecificData.html#createDatumReader(org.apache.avro.Schema, org.apache.avro.Schema)

Sam 


     On Thursday, August 6, 2015 3:31 PM, Mehrez Alachheb <me...@adomik.com> wrote:
   

 Hi,
I am working in project, in which i have to deserialize an avro files provided by an other external company.The problem is that the schema  (example below) of the serialized avro files doesn't contain a namespace, however i need to add the namespace to the avro Schema.I can't serialize the avro files with another schema because we get them from another  company.I created a new schema( example below) with name space and i generated the associated java classes.
How i can deserialize the avro files with my new schema ? 
Thanks,Mehrez.
company schema :{"type":"record", "name":"AvroData", "fields": [.....] } 
the schema that i want: {"type":"record", "name":"AvroData", "namespace":"company.avro", "fields": [.....] }