You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@avro.apache.org by Check Peck <co...@gmail.com> on 2015/05/18 21:54:50 UTC

Not able to load avro schema fully with all its contents

I am working with Apache Avro in C++ and I am trying to load avro schema by
using Avro C++ library. Everything works fine without any issues, only
problem is - I have few "doc" in my Avro schema which is not getting shown
at all in my AvroSchema when I try to load it and also print it out.

    DataSchema_ptr schema_data(new DataSchema());
    schema_data->m_schema = load(avro_schema_file_name.c_str());
    const avro::NodePtr node_data_ptr = schema_data->m_schema.root();
    if (node_data_ptr && node_data_ptr->hasName())
    {
        // is there any problem with this node_data_ptr usage here?
        schema_data->m_name = node_data_ptr->name().fullname().c_str();

        // this line prints out whole AVRO but it doesn't have doc which is
there in my AVRO
        cout<<"File String : " << schema_data->to_string() << endl;
    }

Here "m_schema" is "avro::ValidSchema m_schema;"

Can anyone help me with this. In general I don't see my doc which I have in
Avro Schema getting shown when I print it out.

Re: Not able to load avro schema fully with all its contents

Posted by Svante Karlsson <sv...@csi.se>.
I think the general idea is that if you store your schema in some kind of
schema registry you are not supposed to get back exactly what you entered
but something that is equivalent. The doc field is for sure something that
is not supposed to go into a normalized schema

http://blogs.impetus.com/big_data/big_data_technologies/AVRO.do

http://grokbase.com/p/avro/user/133kfp4c6n/parsing-canonical-form-of-protocol-definitions





2015-05-20 15:10 GMT+02:00 Pierre de Frém <th...@hotmail.com>:

> Hello,
>
> I posted the patch for the trunk branch of the git there (for it to be
> reviewed):
> https://issues.apache.org/jira/browse/AVRO-1256
>
> Pierre
>
> ------------------------------
> From: thepedrau@hotmail.com
> To: user@avro.apache.org
> Subject: RE: Not able to load avro schema fully with all its contents
> Date: Wed, 20 May 2015 10:08:22 +0000
>
>
> Hello,
>
> Sam is right in his previous answer.
> More precisely, the field "doc" is read by the Compiler, but not stored
> at the moment in the Node object. The reason might be that the field "doc"
> is optional is the avro specification (see:
> https://avro.apache.org/docs/1.7.7/spec.html, Complex types).
>
> If you want to store the field doc, you'll have to modify the source code
> yourself to:
> - create a new member "doc" in the Node API (Node.hh),
> - store the doc field in Node as it is read by the Compiler (Compiler.cc),
> - serialize the field doc in NodeImpl.cc
>
> I did a patch for my own use were I store and read fields "doc" for a
> NodeRecord, and I serialize fields doc for the root Node of a NodeRecord.
>
> You can find it at:
> the corresponding branch (created for the patch):
> https://github.com/pidefrem/avro/tree/branch-1.7-specificrecord
>
> the corresponding commit for the field doc:
>
> https://github.com/pidefrem/avro/commit/795a0805b8ea8d3228bd92a483c9cbb405e11a62
>
> Rem: if you want to serialize all fields doc of a NodeRecord, just change
> line 195 of NodeImpl.cc from
> if (depth == 1 && getDoc().size()) {
>
> to
>
> if (getDoc().size()) {
>
> (Maybe my patch could be added in the trunk of the source code if it is
> useful?)
>
> Hope this helps.
>
> Pierre
>
> ------------------------------
> Date: Tue, 19 May 2015 18:37:56 +0000
> From: sgroth@yahoo-inc.com
> To: user@avro.apache.org
> Subject: Re: Not able to load avro schema fully with all its contents
>
> Just a guess, but I would assume that the schema object only stores fields
> that it cares about. This would exclude your docs. If you want to know for
> sure, the source code is here:
> https://github.com/apache/avro/tree/trunk/lang/c%2B%2B
>
>
> Sam
>
>
>
>   On Tuesday, May 19, 2015 1:13 PM, Check Peck <co...@gmail.com>
> wrote:
>
>
> Can anyone help me with this?
>
> On Mon, May 18, 2015 at 2:04 PM, Check Peck <co...@gmail.com>
> wrote:
>
> Does anyone have any idea on this why it is behaving like this?
>
> On Mon, May 18, 2015 at 1:03 PM, Check Peck <co...@gmail.com>
> wrote:
>
> And this is my to_string method I forgot to provide.
>
> std::string DataSchema::to_string() const
> {
>     ostringstream os;
>     if (valid())
>     {
>         os << "JSON data: ";
>         m_schema.toJson(os);
>     }
>     return os.str();
>
> }
>
>
> On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com>
> wrote:
>
> I am working with Apache Avro in C++ and I am trying to load avro schema
> by using Avro C++ library. Everything works fine without any issues, only
> problem is - I have few "doc" in my Avro schema which is not getting shown
> at all in my AvroSchema when I try to load it and also print it out.
>
>     DataSchema_ptr schema_data(new DataSchema());
>     schema_data->m_schema = load(avro_schema_file_name.c_str());
>     const avro::NodePtr node_data_ptr = schema_data->m_schema.root();
>     if (node_data_ptr && node_data_ptr->hasName())
>     {
>         // is there any problem with this node_data_ptr usage here?
>         schema_data->m_name = node_data_ptr->name().fullname().c_str();
>
>         // this line prints out whole AVRO but it doesn't have doc which
> is there in my AVRO
>         cout<<"File String : " << schema_data->to_string() << endl;
>     }
>
> Here "m_schema" is "avro::ValidSchema m_schema;"
>
> Can anyone help me with this. In general I don't see my doc which I have
> in Avro Schema getting shown when I print it out.
>
>
>
>
>
>
>

RE: Not able to load avro schema fully with all its contents

Posted by Pierre de Frém <th...@hotmail.com>.
Hello,
I posted the patch for the trunk branch of the git there (for it to be reviewed):https://issues.apache.org/jira/browse/AVRO-1256
Pierre

From: thepedrau@hotmail.com
To: user@avro.apache.org
Subject: RE: Not able to load avro schema fully with all its contents
Date: Wed, 20 May 2015 10:08:22 +0000




Hello,
Sam is right in his previous answer.More precisely, the field "doc" is read by the Compiler, but not stored at the moment in the Node object. The reason might be that the field "doc" is optional is the avro specification (see: https://avro.apache.org/docs/1.7.7/spec.html, Complex types).
If you want to store the field doc, you'll have to modify the source code yourself to:- create a new member "doc" in the Node API (Node.hh),- store the doc field in Node as it is read by the Compiler (Compiler.cc),- serialize the field doc in NodeImpl.cc
I did a patch for my own use were I store and read fields "doc" for a NodeRecord, and I serialize fields doc for the root Node of a NodeRecord.
You can find it at:the corresponding branch (created for the patch):https://github.com/pidefrem/avro/tree/branch-1.7-specificrecord
the corresponding commit for the field doc:https://github.com/pidefrem/avro/commit/795a0805b8ea8d3228bd92a483c9cbb405e11a62
Rem: if you want to serialize all fields doc of a NodeRecord, just change line 195 of NodeImpl.cc from    if (depth == 1 && getDoc().size()) {
to
    if (getDoc().size()) {
(Maybe my patch could be added in the trunk of the source code if it is useful?)
Hope this helps.
Pierre

Date: Tue, 19 May 2015 18:37:56 +0000
From: sgroth@yahoo-inc.com
To: user@avro.apache.org
Subject: Re: Not able to load avro schema fully with all its contents

Just a guess, but I would assume that the schema object only stores fields that it cares about. This would exclude your docs. If you want to know for sure, the source code is here: https://github.com/apache/avro/tree/trunk/lang/c%2B%2B  

Sam


     On Tuesday, May 19, 2015 1:13 PM, Check Peck <co...@gmail.com> wrote:
    

 Can anyone help me with this?On Mon, May 18, 2015 at 2:04 PM, Check Peck <co...@gmail.com> wrote:Does anyone have any idea on this why it is behaving like this?On Mon, May 18, 2015 at 1:03 PM, Check Peck <co...@gmail.com> wrote:And this is my to_string method I forgot to provide.std::string DataSchema::to_string() const{    ostringstream os;    if (valid())    {        os << "JSON data: ";        m_schema.toJson(os);      }    return os.str();}On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com> wrote:I am working with Apache Avro in C++ and I am trying to load avro schema by using Avro C++ library. Everything works fine without any issues, only problem is - I have few "doc" in my Avro schema which is not getting shown at all in my AvroSchema when I try to load it and also print it out.    DataSchema_ptr schema_data(new DataSchema());    schema_data->m_schema = load(avro_schema_file_name.c_str());    const avro::NodePtr node_data_ptr = schema_data->m_schema.root();    if (node_data_ptr && node_data_ptr->hasName())    {        // is there any problem with this node_data_ptr usage here?        schema_data->m_name = node_data_ptr->name().fullname().c_str();               // this line prints out whole AVRO but it doesn't have doc which is there in my AVRO        cout<<"File String : " << schema_data->to_string() << endl;    }   Here "m_schema" is "avro::ValidSchema m_schema;"   Can anyone help me with this. In general I don't see my doc which I have in Avro Schema getting shown when I print it out.




      		 	   		   		 	   		  

RE: Not able to load avro schema fully with all its contents

Posted by Pierre de Frém <th...@hotmail.com>.
Hello,
Sam is right in his previous answer.More precisely, the field "doc" is read by the Compiler, but not stored at the moment in the Node object. The reason might be that the field "doc" is optional is the avro specification (see: https://avro.apache.org/docs/1.7.7/spec.html, Complex types).
If you want to store the field doc, you'll have to modify the source code yourself to:- create a new member "doc" in the Node API (Node.hh),- store the doc field in Node as it is read by the Compiler (Compiler.cc),- serialize the field doc in NodeImpl.cc
I did a patch for my own use were I store and read fields "doc" for a NodeRecord, and I serialize fields doc for the root Node of a NodeRecord.
You can find it at:the corresponding branch (created for the patch):https://github.com/pidefrem/avro/tree/branch-1.7-specificrecord
the corresponding commit for the field doc:https://github.com/pidefrem/avro/commit/795a0805b8ea8d3228bd92a483c9cbb405e11a62
Rem: if you want to serialize all fields doc of a NodeRecord, just change line 195 of NodeImpl.cc from    if (depth == 1 && getDoc().size()) {
to
    if (getDoc().size()) {
(Maybe my patch could be added in the trunk of the source code if it is useful?)
Hope this helps.
Pierre

Date: Tue, 19 May 2015 18:37:56 +0000
From: sgroth@yahoo-inc.com
To: user@avro.apache.org
Subject: Re: Not able to load avro schema fully with all its contents

Just a guess, but I would assume that the schema object only stores fields that it cares about. This would exclude your docs. If you want to know for sure, the source code is here: https://github.com/apache/avro/tree/trunk/lang/c%2B%2B  

Sam


     On Tuesday, May 19, 2015 1:13 PM, Check Peck <co...@gmail.com> wrote:
    

 Can anyone help me with this?On Mon, May 18, 2015 at 2:04 PM, Check Peck <co...@gmail.com> wrote:Does anyone have any idea on this why it is behaving like this?On Mon, May 18, 2015 at 1:03 PM, Check Peck <co...@gmail.com> wrote:And this is my to_string method I forgot to provide.std::string DataSchema::to_string() const{    ostringstream os;    if (valid())    {        os << "JSON data: ";        m_schema.toJson(os);      }    return os.str();}On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com> wrote:I am working with Apache Avro in C++ and I am trying to load avro schema by using Avro C++ library. Everything works fine without any issues, only problem is - I have few "doc" in my Avro schema which is not getting shown at all in my AvroSchema when I try to load it and also print it out.    DataSchema_ptr schema_data(new DataSchema());    schema_data->m_schema = load(avro_schema_file_name.c_str());    const avro::NodePtr node_data_ptr = schema_data->m_schema.root();    if (node_data_ptr && node_data_ptr->hasName())    {        // is there any problem with this node_data_ptr usage here?        schema_data->m_name = node_data_ptr->name().fullname().c_str();               // this line prints out whole AVRO but it doesn't have doc which is there in my AVRO        cout<<"File String : " << schema_data->to_string() << endl;    }   Here "m_schema" is "avro::ValidSchema m_schema;"   Can anyone help me with this. In general I don't see my doc which I have in Avro Schema getting shown when I print it out.




      		 	   		  

Re: Not able to load avro schema fully with all its contents

Posted by Sam Groth <sg...@yahoo-inc.com>.
Just a guess, but I would assume that the schema object only stores fields that it cares about. This would exclude your docs. If you want to know for sure, the source code is here: https://github.com/apache/avro/tree/trunk/lang/c%2B%2B 

Sam


     On Tuesday, May 19, 2015 1:13 PM, Check Peck <co...@gmail.com> wrote:
   

 Can anyone help me with this?

On Mon, May 18, 2015 at 2:04 PM, Check Peck <co...@gmail.com> wrote:

Does anyone have any idea on this why it is behaving like this?

On Mon, May 18, 2015 at 1:03 PM, Check Peck <co...@gmail.com> wrote:

And this is my to_string method I forgot to provide.

std::string DataSchema::to_string() const
{
    ostringstream os;
    if (valid())
    {
        os << "JSON data: ";
        m_schema.toJson(os);  
    }
    return os.str();
}


On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com> wrote:

I am working with Apache Avro in C++ and I am trying to load avro schema by using Avro C++ library. Everything works fine without any issues, only problem is - I have few "doc" in my Avro schema which is not getting shown at all in my AvroSchema when I try to load it and also print it out.

    DataSchema_ptr schema_data(new DataSchema());
    schema_data->m_schema = load(avro_schema_file_name.c_str());
    const avro::NodePtr node_data_ptr = schema_data->m_schema.root();
    if (node_data_ptr && node_data_ptr->hasName())
    {
        // is there any problem with this node_data_ptr usage here?
        schema_data->m_name = node_data_ptr->name().fullname().c_str();
       
        // this line prints out whole AVRO but it doesn't have doc which is there in my AVRO
        cout<<"File String : " << schema_data->to_string() << endl;
    }
   
Here "m_schema" is "avro::ValidSchema m_schema;"
   
Can anyone help me with this. In general I don't see my doc which I have in Avro Schema getting shown when I print it out.







  

Re: Not able to load avro schema fully with all its contents

Posted by Check Peck <co...@gmail.com>.
Can anyone help me with this?

On Mon, May 18, 2015 at 2:04 PM, Check Peck <co...@gmail.com> wrote:

> Does anyone have any idea on this why it is behaving like this?
>
> On Mon, May 18, 2015 at 1:03 PM, Check Peck <co...@gmail.com>
> wrote:
>
>> And this is my to_string method I forgot to provide.
>>
>> std::string DataSchema::to_string() const
>> {
>>     ostringstream os;
>>     if (valid())
>>     {
>>         os << "JSON data: ";
>>         m_schema.toJson(os);
>>     }
>>     return os.str();
>>
>> }
>>
>>
>> On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com>
>> wrote:
>>
>>> I am working with Apache Avro in C++ and I am trying to load avro schema
>>> by using Avro C++ library. Everything works fine without any issues, only
>>> problem is - I have few "doc" in my Avro schema which is not getting shown
>>> at all in my AvroSchema when I try to load it and also print it out.
>>>
>>>     DataSchema_ptr schema_data(new DataSchema());
>>>     schema_data->m_schema = load(avro_schema_file_name.c_str());
>>>     const avro::NodePtr node_data_ptr = schema_data->m_schema.root();
>>>     if (node_data_ptr && node_data_ptr->hasName())
>>>     {
>>>         // is there any problem with this node_data_ptr usage here?
>>>         schema_data->m_name = node_data_ptr->name().fullname().c_str();
>>>
>>>         // this line prints out whole AVRO but it doesn't have doc which
>>> is there in my AVRO
>>>         cout<<"File String : " << schema_data->to_string() << endl;
>>>     }
>>>
>>> Here "m_schema" is "avro::ValidSchema m_schema;"
>>>
>>> Can anyone help me with this. In general I don't see my doc which I have
>>> in Avro Schema getting shown when I print it out.
>>>
>>
>>
>

Re: Not able to load avro schema fully with all its contents

Posted by Check Peck <co...@gmail.com>.
Does anyone have any idea on this why it is behaving like this?

On Mon, May 18, 2015 at 1:03 PM, Check Peck <co...@gmail.com> wrote:

> And this is my to_string method I forgot to provide.
>
> std::string DataSchema::to_string() const
> {
>     ostringstream os;
>     if (valid())
>     {
>         os << "JSON data: ";
>         m_schema.toJson(os);
>     }
>     return os.str();
>
> }
>
>
> On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com>
> wrote:
>
>> I am working with Apache Avro in C++ and I am trying to load avro schema
>> by using Avro C++ library. Everything works fine without any issues, only
>> problem is - I have few "doc" in my Avro schema which is not getting shown
>> at all in my AvroSchema when I try to load it and also print it out.
>>
>>     DataSchema_ptr schema_data(new DataSchema());
>>     schema_data->m_schema = load(avro_schema_file_name.c_str());
>>     const avro::NodePtr node_data_ptr = schema_data->m_schema.root();
>>     if (node_data_ptr && node_data_ptr->hasName())
>>     {
>>         // is there any problem with this node_data_ptr usage here?
>>         schema_data->m_name = node_data_ptr->name().fullname().c_str();
>>
>>         // this line prints out whole AVRO but it doesn't have doc which
>> is there in my AVRO
>>         cout<<"File String : " << schema_data->to_string() << endl;
>>     }
>>
>> Here "m_schema" is "avro::ValidSchema m_schema;"
>>
>> Can anyone help me with this. In general I don't see my doc which I have
>> in Avro Schema getting shown when I print it out.
>>
>
>

Re: Not able to load avro schema fully with all its contents

Posted by Check Peck <co...@gmail.com>.
And this is my to_string method I forgot to provide.

std::string DataSchema::to_string() const
{
    ostringstream os;
    if (valid())
    {
        os << "JSON data: ";
        m_schema.toJson(os);
    }
    return os.str();
}


On Mon, May 18, 2015 at 12:54 PM, Check Peck <co...@gmail.com>
wrote:

> I am working with Apache Avro in C++ and I am trying to load avro schema
> by using Avro C++ library. Everything works fine without any issues, only
> problem is - I have few "doc" in my Avro schema which is not getting shown
> at all in my AvroSchema when I try to load it and also print it out.
>
>     DataSchema_ptr schema_data(new DataSchema());
>     schema_data->m_schema = load(avro_schema_file_name.c_str());
>     const avro::NodePtr node_data_ptr = schema_data->m_schema.root();
>     if (node_data_ptr && node_data_ptr->hasName())
>     {
>         // is there any problem with this node_data_ptr usage here?
>         schema_data->m_name = node_data_ptr->name().fullname().c_str();
>
>         // this line prints out whole AVRO but it doesn't have doc which
> is there in my AVRO
>         cout<<"File String : " << schema_data->to_string() << endl;
>     }
>
> Here "m_schema" is "avro::ValidSchema m_schema;"
>
> Can anyone help me with this. In general I don't see my doc which I have
> in Avro Schema getting shown when I print it out.
>