You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@spark.apache.org by Tianchen Zhang <du...@gmail.com> on 2021/05/03 18:37:19 UTC

Hi all,

Currently the user-facing Catalog API doesn't support backup/restore
metadata. Our customers are asking for such functionalities. Here is a
usage example:
1. Read all metadata of one Spark cluster
2. Save them into a Parquet file on DFS
3. Read the Parquet file and restore all metadata in another Spark cluster

From the current implementation, Catalog API has the list methods
(listDatabases, listFunctions, etc.) but they don't return enough
information in order to restore an entity (for example, listDatabases lose
"properties" of the database and we need "describe database extended" to
get them). And it only supports createTable (not any other entity
creations). The only way we can backup/restore an entity is using Spark SQL.

We want to introduce the backup and restore from an API level. We are
thinking of doing this simply by adding backup() and restore() in
CatalogImpl, as ExternalCatalog already includes all the methods we need to
retrieve and recreate entities. We are wondering if there is any concern or
drawback of this approach. Please advise.

Thank you in advance,
Tianchen