You are viewing a plain text version of this content. The canonical link for it is here.
Posted to jira@arrow.apache.org by "Pavel Kovalenko (Jira)" <ji...@apache.org> on 2022/10/12 07:44:00 UTC

[jira] [Updated] (ARROW-17998) [Java] JSON representation of pojo.Schema is incompatible with flatbuffers JSON generated via C++ API

     [ https://issues.apache.org/jira/browse/ARROW-17998?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Pavel Kovalenko updated ARROW-17998:
------------------------------------
    Description: 
I have JSON arrow::Schema representation generated from flatbuffers format in C++:

 
{code:java}
const void* schemaBytes;

std::string fbsSchemaFile;    
flatbuffers::LoadFile("/path/to/Schema.fbs", false, &fbsSchemaFile);

flatbuffers::Parser parser;
parser.Parse(fbsSchemaFile.c_str());

std::string json;
flatbuffers::GenerateTextFromTable(parser, schemaBytes, "org.apache.arrow.flatbuf.Schema", &json);

return json;{code}
 

When I'm trying to read this JSON in Java and create pojo.Schema:

 
{code:java}
String json; // Read from file.
Schema.fromJson(json);{code}
 

 

It fails because JSON formats in flatbuffers generation and in Java using Jackson bindings are a bit different:

 

C++ Schema Flatbuffers JSON example:
{code:java}
{
  fields: [
    {
      name: "cc_call_center_sk",
      type_type: "Int",
      type: {
        bitWidth: 32,
        is_signed: true
      },
      children: [

      ],
      custom_metadata: [
        {
          key: "metadata",
          value: "some_metadata"
        }
      ]
    },
  ],
  custom_metadata: [
    {
      key: "metadata",
      value: "some_metadata"
    }
  ]
}{code}
Java Schema JSON example:
{code:java}
{
  "fields" : [ {
    "name" : "cc_call_center_sk",
    "nullable" : true,
    "type" : {
      "name" : "int",
      "bitWidth" : 32,
      "isSigned" : true
    },
    "children" : [ ],
    "metadata" : [ {
      "value" : "some_metadata",
      "key" : "metadata"
    } ]
  } ],
  "metadata" : [ {
    "value" : "some_metadata",
    "key" : "metadata"
  } ]
} {code}
There is a difference in type id declaration:

`{*}type_type{*}` field is used in C++ flatbuffers

`{*}name{*}` field inside `{*}type{*}` field is used in Java

 

Also, there is a difference in `{*}metadata{*}` field:

`{*}custom_metadata{*}` name is used in C++ flatbuffers

`{*}metadata{*}` name is used in Java

 

It makes it impossible to re-use JSON representation from Java in C++ and vice-versa

Probably the same issue exists in other languages

  was:
I have JSON arrow::Schema representation generated from flatbuffers format in C++:

 
{code:java}
const void* schemaBytes;

std::string fbsSchemaFile;    
flatbuffers::LoadFile("/path/to/Schema.fbs", false, &fbsSchemaFile);

flatbuffers::Parser parser;
parser.Parse(fbsSchemaFile.c_str());

std::string json;
flatbuffers::GenerateTextFromTable(parser, schemaBytes, "org.apache.arrow.flatbuf.Schema", &json);

return json;{code}
 

When I'm trying to read this JSON in Java and create pojo.Schema:

 
{code:java}
String json; // Read from file.
Schema.fromJson(json);{code}
 

 

It fails because JSON formats in flatbuffers generation and in Java using Jackson bindings are a bit different:

 

C++ Schema Flatbuffers JSON example:
{code:java}
{
  fields: [
    {
      name: "cc_call_center_sk",
      type_type: "Int",
      type: {
        bitWidth: 32,
        is_signed: true
      },
      children: [

      ],
      custom_metadata: [
        {
          key: "metadata",
          value: "some_metadata"
        }
      ]
    },
  ],
  custom_metadata: [
    {
      key: "metadata",
      value: "some_metadata"
    }
  ]
}{code}
Java Schema JSON example:
{code:java}
table does not exist
{
  "fields" : [ {
    "name" : "cc_call_center_sk",
    "nullable" : true,
    "type" : {
      "name" : "int",
      "bitWidth" : 32,
      "isSigned" : true
    },
    "children" : [ ],
    "metadata" : [ {
      "value" : "some_metadata",
      "key" : "metadata"
    } ]
  } ],
  "metadata" : [ {
    "value" : "some_metadata",
    "key" : "metadata"
  } ]
} {code}
There is a difference in type id declaration:

`type_type` field is used in C++ flatbuffers

`name` field inside `type` field is used in Java

 

Also, there is a difference in `metadata` field:

`custom_metadata` name is used in C++ flatbuffers

`metadata` name is used in Java

 

It makes it impossible to re-use JSON representation from Java in C++ and vice-versa

 


> [Java] JSON representation of pojo.Schema is incompatible with flatbuffers JSON generated via C++ API
> -----------------------------------------------------------------------------------------------------
>
>                 Key: ARROW-17998
>                 URL: https://issues.apache.org/jira/browse/ARROW-17998
>             Project: Apache Arrow
>          Issue Type: Bug
>          Components: Format, Java
>    Affects Versions: 6.0.1
>            Reporter: Pavel Kovalenko
>            Priority: Major
>
> I have JSON arrow::Schema representation generated from flatbuffers format in C++:
>  
> {code:java}
> const void* schemaBytes;
> std::string fbsSchemaFile;    
> flatbuffers::LoadFile("/path/to/Schema.fbs", false, &fbsSchemaFile);
> flatbuffers::Parser parser;
> parser.Parse(fbsSchemaFile.c_str());
> std::string json;
> flatbuffers::GenerateTextFromTable(parser, schemaBytes, "org.apache.arrow.flatbuf.Schema", &json);
> return json;{code}
>  
> When I'm trying to read this JSON in Java and create pojo.Schema:
>  
> {code:java}
> String json; // Read from file.
> Schema.fromJson(json);{code}
>  
>  
> It fails because JSON formats in flatbuffers generation and in Java using Jackson bindings are a bit different:
>  
> C++ Schema Flatbuffers JSON example:
> {code:java}
> {
>   fields: [
>     {
>       name: "cc_call_center_sk",
>       type_type: "Int",
>       type: {
>         bitWidth: 32,
>         is_signed: true
>       },
>       children: [
>       ],
>       custom_metadata: [
>         {
>           key: "metadata",
>           value: "some_metadata"
>         }
>       ]
>     },
>   ],
>   custom_metadata: [
>     {
>       key: "metadata",
>       value: "some_metadata"
>     }
>   ]
> }{code}
> Java Schema JSON example:
> {code:java}
> {
>   "fields" : [ {
>     "name" : "cc_call_center_sk",
>     "nullable" : true,
>     "type" : {
>       "name" : "int",
>       "bitWidth" : 32,
>       "isSigned" : true
>     },
>     "children" : [ ],
>     "metadata" : [ {
>       "value" : "some_metadata",
>       "key" : "metadata"
>     } ]
>   } ],
>   "metadata" : [ {
>     "value" : "some_metadata",
>     "key" : "metadata"
>   } ]
> } {code}
> There is a difference in type id declaration:
> `{*}type_type{*}` field is used in C++ flatbuffers
> `{*}name{*}` field inside `{*}type{*}` field is used in Java
>  
> Also, there is a difference in `{*}metadata{*}` field:
> `{*}custom_metadata{*}` name is used in C++ flatbuffers
> `{*}metadata{*}` name is used in Java
>  
> It makes it impossible to re-use JSON representation from Java in C++ and vice-versa
> Probably the same issue exists in other languages



--
This message was sent by Atlassian Jira
(v8.20.10#820010)