You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@gobblin.apache.org by "Shirshanka Das (Jira)" <ji...@apache.org> on 2019/11/12 22:19:00 UTC

[jira] [Created] (GOBBLIN-957) Support automatic recursion removal from schemas

Shirshanka Das created GOBBLIN-957:
--------------------------------------

             Summary: Support automatic recursion removal from schemas
                 Key: GOBBLIN-957
                 URL: https://issues.apache.org/jira/browse/GOBBLIN-957
             Project: Apache Gobblin
          Issue Type: Improvement
            Reporter: Shirshanka Das
            Assignee: Shirshanka Das


Analytics engines like Hive etc cannot handle recursive schemas: schemas where inner fields can refer to the wrapping type. 

This Jira proposes that we provide support for automatic recursion removal in data during data ingestion. 

The simple proposal is to just drop the fields in the schema that introduce the recursion. 

e.g.  (pseudo-schema)

User { 

 string name;

 User friend;

}

gets converted to :

User { 

   string name;

}

 

A more sophisticated solution would be to do one or two levels of "schema-unrolling" before dropping data. 

e.g. 

output schema with one-level unrolling would look like: 

User { 

   string name;

   User1 friend;

}

User 1 { 

    string name;

  }

 

 

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)