You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2022/05/09 09:05:34 UTC

[GitHub] [hudi] melin opened a new issue, #5537: Can iceberg and hudi catalog exist at the same time?

melin opened a new issue, #5537:
URL: https://github.com/apache/hudi/issues/5537

   ```scala
   val spark = SparkSession.builder().master("local").enableHiveSupport()
         .config("spark.sql.extensions",
           "org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions," +
             "org.apache.spark.sql.hudi.HoodieSparkSessionExtension")
         .config("spark.sql.catalog.spark_catalog", "org.apache.iceberg.spark.SparkSessionCatalog")
         .config("spark.sql.catalog.spark_catalog.type", "hive")
   
         .config("spark.sql.catalog.hudi", "org.apache.spark.sql.hudi.catalog.HoodieCatalog")
   
         .getOrCreate()
   ```
   
   iceberg catalog name use spark_catalog，hudi catalog name cannot use spark_catalog。
   
   Hudi For Spark 3.2, the additional spark_catalog config is required: --conf 'spark.sql.catalog.spark_catalog=org.apache.spark.sql.hudi.catalog.HoodieCatalog'
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] asethia commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by "asethia (via GitHub)" <gi...@apache.org>.

asethia commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1443872559

   Is any further update on this? If the hack is the solution, what does it take to add it as part of the main code?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] codope commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

codope commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1160589436

   @melin Did you get a chance to try out the above suggestion by @leesf ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1302885402

   @melin : gentle ping. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1123176143

   @YannByron @XuQianJin-Stars : can you folks follow up on this please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] leesf commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

leesf commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1146419871

   @melin I think you can specify `spark_catalog` to `HoodieCatalog` and custom catalog for iceberg catalog for a currently workaround, since Hudi currently do not support custom catalogs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] melin commented on issue #5537: Can iceberg and hudi catalog exist at the same time?

Posted by GitBox <gi...@apache.org>.

melin commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1121163324

   @leesf 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] melin commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

melin commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1122313372

   hack way
   ```java
   @Aspect
   public class CatalogManagerAspectj {
       private static final Logger LOG = LoggerFactory.getLogger(CatalogManagerAspectj.class);
   
       @Around("execution(org.apache.spark.sql.connector.catalog.CatalogManager.new(..))")
       public void aroundCatalogManagerInit(ProceedingJoinPoint pjp) throws Throwable {
           SuperiorHoodieCatalog.defaultSessionCatalog_$eq((CatalogPlugin) pjp.getArgs()[0]);
           pjp.proceed();
       }
   }
   ```
   
   ```java
   import org.apache.spark.sql.AnalysisException
   import org.apache.spark.sql.catalyst.TableIdentifier
   import org.apache.spark.sql.catalyst.analysis._
   import org.apache.spark.sql.connector.catalog.TableChange.{AddColumn, ColumnChange, UpdateColumnComment, UpdateColumnType}
   import org.apache.spark.sql.connector.catalog._
   import org.apache.spark.sql.connector.expressions.Transform
   import org.apache.spark.sql.connector.catalog.CatalogV2Implicits.IdentifierHelper
   import org.apache.spark.sql.hudi.catalog.SuperiorHoodieCatalog.defaultSessionCatalog
   import org.apache.spark.sql.hudi.command._
   import org.apache.spark.sql.types.{StructField, StructType}
   
   import java.util
   
   class SuperiorHoodieCatalog extends HoodieCatalog {
   
     override def name: String = "hudi";
   
     override def defaultNamespace: Array[String] = defaultSessionCatalog.defaultNamespace
   
     override def stageCreate(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: util.Map[String, String]): StagedTable = {
       if (sparkAdapter.isHoodieTable(properties)) {
         HoodieStagedTable(ident, this, schema, partitions, properties, TableCreationMode.STAGE_CREATE)
       } else {
         BasicStagedTable(
           ident,
           asTableCatalog.createTable(ident, schema, partitions, properties),
           this)
       }
     }
   
     override def stageReplace(ident: Identifier, schema: StructType, partitions: Array[Transform], properties: util.Map[String, String]): StagedTable = {
       if (sparkAdapter.isHoodieTable(properties)) {
         HoodieStagedTable(ident, this, schema, partitions, properties, TableCreationMode.STAGE_REPLACE)
       } else {
         asTableCatalog.dropTable(ident)
         BasicStagedTable(
           ident,
           asTableCatalog.createTable(ident, schema, partitions, properties),
           this)
       }
     }
   
     override def stageCreateOrReplace(ident: Identifier,
                                       schema: StructType,
                                       partitions: Array[Transform],
                                       properties: util.Map[String, String]): StagedTable = {
       if (sparkAdapter.isHoodieTable(properties)) {
         HoodieStagedTable(
           ident, this, schema, partitions, properties, TableCreationMode.CREATE_OR_REPLACE)
       } else {
         try asTableCatalog.dropTable(ident) catch {
           case _: NoSuchTableException => // ignore the exception
         }
         BasicStagedTable(
           ident,
           asTableCatalog.createTable(ident, schema, partitions, properties),
           this)
       }
     }
   
     override def loadTable(ident: Identifier): Table = {
       try {
         asTableCatalog.loadTable(ident) match {
           case v1: V1Table if sparkAdapter.isHoodieTable(v1.catalogTable) =>
             HoodieInternalV2Table(
               spark,
               v1.catalogTable.location.toString,
               catalogTable = Some(v1.catalogTable),
               tableIdentifier = Some(ident.toString))
           case o => o
         }
       } catch {
         case e: Exception =>
           throw e
       }
     }
   
     override def createTable(ident: Identifier,
                              schema: StructType,
                              partitions: Array[Transform],
                              properties: util.Map[String, String]): Table = {
       createHoodieTable(ident, schema, partitions, properties, Map.empty, Option.empty, TableCreationMode.CREATE)
     }
   
     override def tableExists(ident: Identifier): Boolean = asTableCatalog.tableExists(ident)
   
     override def dropTable(ident: Identifier): Boolean = {
       val table = loadTable(ident)
       table match {
         case _: HoodieInternalV2Table =>
           DropHoodieTableCommand(ident.asTableIdentifier, ifExists = true, isView = false, purge = false).run(spark)
           true
         case _ => asTableCatalog.dropTable(ident)
       }
     }
   
     override def purgeTable(ident: Identifier): Boolean = {
       val table = loadTable(ident)
       table match {
         case _: HoodieInternalV2Table =>
           DropHoodieTableCommand(ident.asTableIdentifier, ifExists = true, isView = false, purge = true).run(spark)
           true
         case _ => asTableCatalog.purgeTable(ident)
       }
     }
   
     @throws[NoSuchTableException]
     @throws[TableAlreadyExistsException]
     override def renameTable(oldIdent: Identifier, newIdent: Identifier): Unit = {
       loadTable(oldIdent) match {
         case _: HoodieInternalV2Table =>
           new AlterHoodieTableRenameCommand(oldIdent.asTableIdentifier, newIdent.asTableIdentifier, false).run(spark)
         case _ => asTableCatalog.renameTable(oldIdent, newIdent)
       }
     }
   
     override def alterTable(ident: Identifier, changes: TableChange*): Table = {
       val tableIdent = TableIdentifier(ident.name(), ident.namespace().lastOption)
       // scalastyle:off
       val table = loadTable(ident) match {
         case hoodieTable: HoodieInternalV2Table => hoodieTable
         case _ => return asTableCatalog.alterTable(ident, changes: _*)
       }
       // scalastyle:on
   
       val grouped = changes.groupBy(c => c.getClass)
   
       grouped.foreach {
         case (t, newColumns) if t == classOf[AddColumn] =>
           AlterHoodieTableAddColumnsCommand(
             tableIdent,
             newColumns.asInstanceOf[Seq[AddColumn]].map { col =>
               StructField(
                 col.fieldNames()(0),
                 col.dataType(),
                 col.isNullable)
             }).run(spark)
         case (t, columnChanges) if classOf[ColumnChange].isAssignableFrom(t) =>
           columnChanges.foreach {
             case dataType: UpdateColumnType =>
               val colName = UnresolvedAttribute(dataType.fieldNames()).name
               val newDataType = dataType.newDataType()
               val structField = StructField(colName, newDataType)
               AlterHoodieTableChangeColumnCommand(tableIdent, colName, structField).run(spark)
             case dataType: UpdateColumnComment =>
               val newComment = dataType.newComment()
               val colName = UnresolvedAttribute(dataType.fieldNames()).name
               val fieldOpt = table.schema().findNestedField(dataType.fieldNames(), includeCollections = true,
                 spark.sessionState.conf.resolver).map(_._2)
               val field = fieldOpt.getOrElse {
                 throw new AnalysisException(
                   s"Couldn't find column $colName in:\n${table.schema().treeString}")
               }
               AlterHoodieTableChangeColumnCommand(tableIdent, colName, field.withComment(newComment)).run(spark)
           }
         case (t, _) =>
           throw new UnsupportedOperationException(s"not supported table change: ${t.getClass}")
       }
   
       loadTable(ident)
     }
   
     @throws[NoSuchNamespaceException]
     override def listTables(namespace: Array[String]): Array[Identifier] = asTableCatalog.listTables(namespace)
   
   
     override def invalidateTable(ident: Identifier): Unit = {
       asTableCatalog.invalidateTable(ident)
     }
   
     @throws[NoSuchNamespaceException]
     override def listNamespaces: Array[Array[String]] = asNamespaceCatalog.listNamespaces
   
     @throws[NoSuchNamespaceException]
     override def listNamespaces(namespace: Array[String]): Array[Array[String]] =
       asNamespaceCatalog.listNamespaces(namespace)
   
     override def namespaceExists(namespace: Array[String]): Boolean =
       asNamespaceCatalog.namespaceExists(namespace)
   
     @throws[NoSuchNamespaceException]
     override def loadNamespaceMetadata(namespace: Array[String]): util.Map[String, String] =
       asNamespaceCatalog.loadNamespaceMetadata(namespace)
   
     @throws[NamespaceAlreadyExistsException]
     override def createNamespace(namespace: Array[String], metadata: util.Map[String, String]): Unit = {
       asNamespaceCatalog.createNamespace(namespace, metadata)
     }
   
     @throws[NoSuchNamespaceException]
     override def alterNamespace(namespace: Array[String], changes: NamespaceChange*): Unit = {
       asNamespaceCatalog.alterNamespace(namespace, changes:_*)
     }
   
     @throws[NoSuchNamespaceException]
     override def dropNamespace(namespace: Array[String]): Boolean =
       asNamespaceCatalog.dropNamespace(namespace)
   
     private def asTableCatalog: TableCatalog = defaultSessionCatalog.asInstanceOf[TableCatalog]
   
     private def asNamespaceCatalog: SupportsNamespaces =
       defaultSessionCatalog.asInstanceOf[SupportsNamespaces]
   }
   
   object SuperiorHoodieCatalog {
     var defaultSessionCatalog: CatalogPlugin = _
   }
   ```
   
   ```xml
   <?xml version="1.0" encoding="UTF-8" ?>
   <aspectj>
       <aspects>
           <aspect name="com.github.melin.superior.jobserver.extensions.aspectj.CatalogManagerAspectj"/>
       </aspects>
       <weaver options="-verbose -showWeaveInfo">
           <include within="org.apache.spark.sql.connector.catalog..*"/>
       </weaver>
       <weaver options="-XaddSerialVersionUID"/>
   </aspectj>
   
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1229360928

   @melin : any updates please.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [hudi] nsivabalan commented on issue #5537: hudi supports custom catalog name, spark_catalog is not mandatory

Posted by GitBox <gi...@apache.org>.

nsivabalan commented on issue #5537:
URL: https://github.com/apache/hudi/issues/5537#issuecomment-1302885962

   @YannByron : looks like the author has given some hacky solution. Is there any enhancement we can add to hudi based on that. 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@hudi.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org