You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by "Pramod Biligiri (Jira)" <ji...@apache.org> on 2022/10/13 05:07:00 UTC

[jira] [Created] (HUDI-5024) Support storing database also as a Dataset in Datahub, not just a table

Pramod Biligiri created HUDI-5024:
-------------------------------------

             Summary: Support storing database also as a Dataset in Datahub, not just a table
                 Key: HUDI-5024
                 URL: https://issues.apache.org/jira/browse/HUDI-5024
             Project: Apache Hudi
          Issue Type: Task
          Components: meta-sync
            Reporter: Pramod Biligiri


Note: Evaluate feasibility and desirability of this before implementing.

Hudi's DatahubSyncTool only pushes tables as a Dataset into Datahub, and not the database itself as a Dataset. Moreover, Datahub also appears (on the face of it) to only store tables as a Dataset, and not the database itself. This is shown even in their demo page: [https://demo.datahubproject.io/browse/dataset/prod/postgres/calm-pagoda-323403/jaffle_shop]

But some customers might want to store the Database also as a top-level entity. So consider enhancing DatahubSyncTool to do the same - probably using some advanced features of Datahub?

Ongoing Slack thread about this in Datahub Slack: https://datahubspace.slack.com/archives/CUMUWQU66/p1665636994736379



--
This message was sent by Atlassian Jira
(v8.20.10#820010)