You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Saptarshi Guha <sg...@mozilla.com> on 2012/06/25 08:17:12 UTC

C/C++ parsing vs. Java parsing.

I have a avro scheme found here: http://sguha.pastebin.mozilla.org/1677671

I tried

java -jar avro-tools-1.7.0.jar  compile schema ~/tmp/robject.avro foo

and it worked.

This failed:

avrogencpp --input ~/tmp/robject.avro --output ~/tmp/h2
Segmentation fault: 11


This failed:

 avro_schema_t *person_schema = (avro_schema_t*)malloc(sizeof(avro_schema_t));
(avro_schema_from_json_literal(string.of.avro.file), person_schema)

with

Error was Error parsing JSON: string or '}' expected near end of file

Q1: Does C and C++ API support all schemas the Java one supports?
Q2: Is it yes to Q1 and this is a bug?

Regards
Saptarshi

Re: C/C++ parsing vs. Java parsing.

Posted by Douglas Creager <do...@creagertino.net>.
> 3. C
> 
> avro_schema_t *person_schema = (avro_schema_t*)malloc(sizeof(avro_schema_t));
> (avro_schema_from_json_literal(jsonstring, person_schema)) 
> 
> returns:
> 
> Error was Error parsing JSON: string or '}' expected near end of file
> 
> So is this a bug? or am i calling it wrong.

That error message is from the JSON parser we use internally — it claims that there's a syntax error in the JSON that you've passed in.  Can you send us the snippet where you define jsonstring?  It might be an issue of escaping things correctly in the C string literal.  Also, there's a comment where avro_schema_from_json_literal is defined, saying that jsonstring must be defined as a "char[]" and not a "char *".  And of course it could also be an actual syntax error.  :-)

–doug


Re: C/C++ parsing vs. Java parsing.

Posted by Saptarshi Guha <sg...@mozilla.com>.
I should mention, 

a) I need Java and C - because the messages will be consumed by Java and C
b) I'd rather stay away from C++ because of the Boost dependency - nothing against it
just becomes another installation hurdle
c) I need to check with other languages e.g. Python since i look forward to language interop.

Thanks again
Saptarshi


----- Original Message -----
From: "Saptarshi Guha" <sg...@mozilla.com>
To: user@avro.apache.org
Sent: Monday, June 25, 2012 10:27:45 PM
Subject: Re: C/C++ parsing vs. Java parsing.

Hi Scott,

Thanks for the response. I changed the avro file to [1]

1. Java works.
2. avrocppgen 

avrogencpp  -i ~/tmp/robject.avro -o foo

works.

3. C

 avro_schema_t *person_schema = (avro_schema_t*)malloc(sizeof(avro_schema_t));
 (avro_schema_from_json_literal(jsonstring, person_schema)) 

returns:

Error was Error parsing JSON: string or '}' expected near end of file

So is this a bug? or am i calling it wrong.


Ideally, i would like a union of 

["NULL","RAW","INTEGER","REAL","COMPLEX","LOGICAL","STRING","LIST"]}}

Each of these is a record of a 1) a type (might be array of integers, though COMPLEX is array of records)
and (2) another field called Attributes.

e.g
[
  {"type":"record",
   "name":"REAL",
   "fields":[
      {"name":"whattype", "type":"myrtype"},
      {"name":"value", "type":"array" , "items":"double"},
      {"name":"attrs"  ,  "type":"attrytpe"}
    ]
  },
  {"type":"record",
   "name":"INTEGER",
   "fields":[
      {"name":"whattype", "type":"myrtype"},
      {"name":"value", "type":"array" , "items":"integers"},
      {"name":"attrs"  ,  "type":"attrytpe"}
    ]
  }
,...
]

Here 'attrytpe' is a Map type defined elsewhere and "myrtype" is an enum defined elsewhere.
Similarly for a complex one in the union, it's 'values' field will be an array of "complex type" defined elsewhere?
Woud i need multiple avro files using the same namespace?

or this the serialized the equivalent of what i have before [1]?

Thanks for your time
Saptarshi


[1]
{
    "namespace": "robjects.avro",
    "type": "record",
    "name": "robject",
    "doc" : "Encoding of some of the R data types",
    "fields": [
	
	{"name":"typeof"     ,"type":{"type":"enum", "name":"thetype" ,"symbols": ["NULL","RAW","INTEGER","REAL","COMPLEX","LOGICAL","STRING","LIST","ATTRIBUTES"]}},
	{"name":"NAtype"     ,"type":{"type":"enum" , "name":"NA" ,"symbols":["NA"]}},
	{"name":"complextype","type":{"type":"record" , "name":"complex", "fields":[
	    {"name":"re", "type":"double"},
	    {"name":"im", "type":"double"}
	]}},
	{"name":"NULL"       ,"type":"null"},
	{"name":"RAW"        ,"type":["null",{"type":"array" ,"items":"bytes"}]},
	{"name":"INTEGER"    ,"type":["null",{"type":"array" ,"items":"int"}]},
	{"name":"REAL"       ,"type":["null",{"type":"array" ,"items":"double"}]},
	{"name":"COMPLEX"    ,"type":["null",{"type":"array" ,"items":"complex"}]},
	{"name":"LOGICAL"    ,"type":["null",{"type":"array" ,"items":["boolean","NA"]}]},
	{"name":"STRING"     ,"type":["null",{"type":"array" ,"items":["string","NA"]}]},
	{"name":"LIST"       ,"type":["null",{"type":"array" ,"items":["robject"]}]},
	{"name":"ATTRIBUTES" ,"type":["null",{"type":"map"   ,"values":"robject"}]}
    ]
}


----- Original Message -----
From: "Scott Carey" <sc...@apache.org>
To: user@avro.apache.org, "Saptarshi Guha" <jo...@mozilla.com>
Sent: Monday, June 25, 2012 9:42:27 PM
Subject: Re: C/C++ parsing vs. Java parsing.

The schema provided is a union of several schemas.  Java supports parsing
this, C++ may not.  Does it work if you make it one single schema, and
nest "NA", "acomplex" and "retypes" inside of "object" ?  It only needs to
be defined the first time it is referenced.  If it does not, then it is
certainly a bug.

Either way I would file a bug in JIRA.  The spec does not say whether a
file should be parseable if it contains a union rather than a record, but
it probably should be.

-Scott

On 6/24/12 11:17 PM, "Saptarshi Guha" <sg...@mozilla.com> wrote:

>I have a avro scheme found here: http://sguha.pastebin.mozilla.org/1677671
>
>I tried
>
>java -jar avro-tools-1.7.0.jar  compile schema ~/tmp/robject.avro foo
>
>and it worked.
>
>This failed:
>
>avrogencpp --input ~/tmp/robject.avro --output ~/tmp/h2
>Segmentation fault: 11
>
>
>This failed:
>
> avro_schema_t *person_schema =
>(avro_schema_t*)malloc(sizeof(avro_schema_t));
>(avro_schema_from_json_literal(string.of.avro.file), person_schema)
>
>with
>
>Error was Error parsing JSON: string or '}' expected near end of file
>
>Q1: Does C and C++ API support all schemas the Java one supports?
>Q2: Is it yes to Q1 and this is a bug?
>
>Regards
>Saptarshi



Re: C/C++ parsing vs. Java parsing.

Posted by Saptarshi Guha <sg...@mozilla.com>.
Hi Scott,

Thanks for the response. I changed the avro file to [1]

1. Java works.
2. avrocppgen 

avrogencpp  -i ~/tmp/robject.avro -o foo

works.

3. C

 avro_schema_t *person_schema = (avro_schema_t*)malloc(sizeof(avro_schema_t));
 (avro_schema_from_json_literal(jsonstring, person_schema)) 

returns:

Error was Error parsing JSON: string or '}' expected near end of file

So is this a bug? or am i calling it wrong.


Ideally, i would like a union of 

["NULL","RAW","INTEGER","REAL","COMPLEX","LOGICAL","STRING","LIST"]}}

Each of these is a record of a 1) a type (might be array of integers, though COMPLEX is array of records)
and (2) another field called Attributes.

e.g
[
  {"type":"record",
   "name":"REAL",
   "fields":[
      {"name":"whattype", "type":"myrtype"},
      {"name":"value", "type":"array" , "items":"double"},
      {"name":"attrs"  ,  "type":"attrytpe"}
    ]
  },
  {"type":"record",
   "name":"INTEGER",
   "fields":[
      {"name":"whattype", "type":"myrtype"},
      {"name":"value", "type":"array" , "items":"integers"},
      {"name":"attrs"  ,  "type":"attrytpe"}
    ]
  }
,...
]

Here 'attrytpe' is a Map type defined elsewhere and "myrtype" is an enum defined elsewhere.
Similarly for a complex one in the union, it's 'values' field will be an array of "complex type" defined elsewhere?
Woud i need multiple avro files using the same namespace?

or this the serialized the equivalent of what i have before [1]?

Thanks for your time
Saptarshi


[1]
{
    "namespace": "robjects.avro",
    "type": "record",
    "name": "robject",
    "doc" : "Encoding of some of the R data types",
    "fields": [
	
	{"name":"typeof"     ,"type":{"type":"enum", "name":"thetype" ,"symbols": ["NULL","RAW","INTEGER","REAL","COMPLEX","LOGICAL","STRING","LIST","ATTRIBUTES"]}},
	{"name":"NAtype"     ,"type":{"type":"enum" , "name":"NA" ,"symbols":["NA"]}},
	{"name":"complextype","type":{"type":"record" , "name":"complex", "fields":[
	    {"name":"re", "type":"double"},
	    {"name":"im", "type":"double"}
	]}},
	{"name":"NULL"       ,"type":"null"},
	{"name":"RAW"        ,"type":["null",{"type":"array" ,"items":"bytes"}]},
	{"name":"INTEGER"    ,"type":["null",{"type":"array" ,"items":"int"}]},
	{"name":"REAL"       ,"type":["null",{"type":"array" ,"items":"double"}]},
	{"name":"COMPLEX"    ,"type":["null",{"type":"array" ,"items":"complex"}]},
	{"name":"LOGICAL"    ,"type":["null",{"type":"array" ,"items":["boolean","NA"]}]},
	{"name":"STRING"     ,"type":["null",{"type":"array" ,"items":["string","NA"]}]},
	{"name":"LIST"       ,"type":["null",{"type":"array" ,"items":["robject"]}]},
	{"name":"ATTRIBUTES" ,"type":["null",{"type":"map"   ,"values":"robject"}]}
    ]
}


----- Original Message -----
From: "Scott Carey" <sc...@apache.org>
To: user@avro.apache.org, "Saptarshi Guha" <jo...@mozilla.com>
Sent: Monday, June 25, 2012 9:42:27 PM
Subject: Re: C/C++ parsing vs. Java parsing.

The schema provided is a union of several schemas.  Java supports parsing
this, C++ may not.  Does it work if you make it one single schema, and
nest "NA", "acomplex" and "retypes" inside of "object" ?  It only needs to
be defined the first time it is referenced.  If it does not, then it is
certainly a bug.

Either way I would file a bug in JIRA.  The spec does not say whether a
file should be parseable if it contains a union rather than a record, but
it probably should be.

-Scott

On 6/24/12 11:17 PM, "Saptarshi Guha" <sg...@mozilla.com> wrote:

>I have a avro scheme found here: http://sguha.pastebin.mozilla.org/1677671
>
>I tried
>
>java -jar avro-tools-1.7.0.jar  compile schema ~/tmp/robject.avro foo
>
>and it worked.
>
>This failed:
>
>avrogencpp --input ~/tmp/robject.avro --output ~/tmp/h2
>Segmentation fault: 11
>
>
>This failed:
>
> avro_schema_t *person_schema =
>(avro_schema_t*)malloc(sizeof(avro_schema_t));
>(avro_schema_from_json_literal(string.of.avro.file), person_schema)
>
>with
>
>Error was Error parsing JSON: string or '}' expected near end of file
>
>Q1: Does C and C++ API support all schemas the Java one supports?
>Q2: Is it yes to Q1 and this is a bug?
>
>Regards
>Saptarshi



Re: C/C++ parsing vs. Java parsing.

Posted by Scott Carey <sc...@apache.org>.
The schema provided is a union of several schemas.  Java supports parsing
this, C++ may not.  Does it work if you make it one single schema, and
nest "NA", "acomplex" and "retypes" inside of "object" ?  It only needs to
be defined the first time it is referenced.  If it does not, then it is
certainly a bug.

Either way I would file a bug in JIRA.  The spec does not say whether a
file should be parseable if it contains a union rather than a record, but
it probably should be.

-Scott

On 6/24/12 11:17 PM, "Saptarshi Guha" <sg...@mozilla.com> wrote:

>I have a avro scheme found here: http://sguha.pastebin.mozilla.org/1677671
>
>I tried
>
>java -jar avro-tools-1.7.0.jar  compile schema ~/tmp/robject.avro foo
>
>and it worked.
>
>This failed:
>
>avrogencpp --input ~/tmp/robject.avro --output ~/tmp/h2
>Segmentation fault: 11
>
>
>This failed:
>
> avro_schema_t *person_schema =
>(avro_schema_t*)malloc(sizeof(avro_schema_t));
>(avro_schema_from_json_literal(string.of.avro.file), person_schema)
>
>with
>
>Error was Error parsing JSON: string or '}' expected near end of file
>
>Q1: Does C and C++ API support all schemas the Java one supports?
>Q2: Is it yes to Q1 and this is a bug?
>
>Regards
>Saptarshi