You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@paimon.apache.org by "leaves12138 (via GitHub)" <gi...@apache.org> on 2023/11/22 08:05:07 UTC

[PR] [flink] [hive] Introduce procedure to migrate table from hive to paimon [incubator-paimon]

leaves12138 opened a new pull request, #2368:
URL: https://github.com/apache/incubator-paimon/pull/2368

   ### Purpose
   
   Introduce procedure to migrate table from hive to paimon 
   Or
   Add file from hive to paimon.
   
   We should use `FlinkGenericCatalog` to migrate table.
   
   Example:
   ```sql
   --Migrate
   CREATE CATALOG PAIMON_GE WITH ('type'='paimon-generic', 'hive-conf-dir' = 'xxx');
   USING CATALOG PAIMON_GE;
   CALL migrate_table('default.hivetable');
   ```
   ```sql
   --Add file
   CREATE CATALOG PAIMON_GE WITH ('type'='paimon-generic', 'hive-conf-dir' = 'xxx');
   USING CATALOG PAIMON_GE;
   CALL add_file('hivetable', 'paimontable', false, false);
   ```
   
   ### Tests
   
   <!-- List UT and IT cases to verify this change -->
   
   IT case added.
   
   <!-- Does this change affect API or storage format -->
   
   ### Documentation
   
   I am writing for this. Come up the pull request first.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

Re: [PR] [flink] [hive] Introduce procedure to migrate table from hive to paimon [incubator-paimon]

Posted by "JingsongLi (via GitHub)" <gi...@apache.org>.

JingsongLi commented on code in PR #2368:
URL: https://github.com/apache/incubator-paimon/pull/2368#discussion_r1405546364


##########
paimon-core/src/main/java/org/apache/paimon/migrate/DataTypeWriter.java:
##########
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.paimon.migrate;
+
+import org.apache.paimon.data.BinaryString;
+import org.apache.paimon.data.Decimal;
+import org.apache.paimon.data.Timestamp;
+import org.apache.paimon.types.ArrayType;
+import org.apache.paimon.types.BigIntType;
+import org.apache.paimon.types.BinaryType;
+import org.apache.paimon.types.BooleanType;
+import org.apache.paimon.types.CharType;
+import org.apache.paimon.types.DataTypeVisitor;
+import org.apache.paimon.types.DateType;
+import org.apache.paimon.types.DecimalType;
+import org.apache.paimon.types.DoubleType;
+import org.apache.paimon.types.FloatType;
+import org.apache.paimon.types.IntType;
+import org.apache.paimon.types.LocalZonedTimestampType;
+import org.apache.paimon.types.MapType;
+import org.apache.paimon.types.MultisetType;
+import org.apache.paimon.types.RowType;
+import org.apache.paimon.types.SmallIntType;
+import org.apache.paimon.types.TimeType;
+import org.apache.paimon.types.TimestampType;
+import org.apache.paimon.types.TinyIntType;
+import org.apache.paimon.types.VarBinaryType;
+import org.apache.paimon.types.VarCharType;
+
+import java.math.BigDecimal;
+
+/** Generate different converter to write data. */
+public class DataTypeWriter implements DataTypeVisitor<DataConverter> {

Review Comment:
   Can you use `TypeUtils.castFromString` and `BinaryWriter.createValueSetter`?



##########
paimon-flink/paimon-flink-common/src/main/java/org/apache/paimon/flink/procedure/MigrateTableProcedure.java:
##########
@@ -0,0 +1,108 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.paimon.flink.procedure;
+
+import org.apache.paimon.CoreOptions;
+import org.apache.paimon.catalog.Identifier;
+import org.apache.paimon.flink.FlinkCatalogFactory;
+import org.apache.paimon.utils.ParameterUtils;
+
+import org.apache.flink.table.api.EnvironmentSettings;
+import org.apache.flink.table.api.internal.TableEnvironmentImpl;
+import org.apache.flink.table.catalog.CatalogTable;
+import org.apache.flink.table.catalog.CatalogTableImpl;
+import org.apache.flink.table.catalog.ObjectPath;
+import org.apache.flink.table.catalog.ResolvedCatalogTable;
+import org.apache.flink.table.catalog.ResolvedSchema;
+import org.apache.flink.table.procedure.ProcedureContext;
+
+import java.util.HashMap;
+import java.util.Map;
+
+import static org.apache.flink.table.factories.FactoryUtil.CONNECTOR;
+
+/** Migrate procedure to migrate hive table to paimon table. */
+public class MigrateTableProcedure extends GenericProcedureBase {
+
+    private static final String BACK_SUFFIX = "_backup_";
+
+    @Override
+    public String identifier() {
+        return "migrate_table";
+    }
+
+    public String[] call(ProcedureContext procedureContext, String sourceTablePath)
+            throws Exception {
+        return call(procedureContext, sourceTablePath, "");
+    }
+
+    public String[] call(
+            ProcedureContext procedureContext, String sourceTablePath, String properties)
+            throws Exception {
+        TableEnvironmentImpl tableEnvironment =
+                TableEnvironmentImpl.create(EnvironmentSettings.inBatchMode());
+        Identifier sourceTableId = Identifier.fromString(sourceTablePath);
+
+        CatalogTable sourceFlinkTable =

Review Comment:
   move these logical into paimon-hive, we don't need to rely on Flink Catalog APIs.



##########
docs/content/migration/migration-from-hive.md:
##########
@@ -0,0 +1,80 @@
+---
+title: "Migration From Hive"
+weight: 1
+type: docs
+aliases:
+- /migration/migration-from-hive.html
+---
+<!--
+Licensed to the Apache Software Foundation (ASF) under one
+or more contributor license agreements.  See the NOTICE file
+distributed with this work for additional information
+regarding copyright ownership.  The ASF licenses this file
+to you under the Apache License, Version 2.0 (the
+"License"); you may not use this file except in compliance
+with the License.  You may obtain a copy of the License at
+
+  http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing,
+software distributed under the License is distributed on an
+"AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+KIND, either express or implied.  See the License for the
+specific language governing permissions and limitations
+under the License.
+-->
+
+# Hive Table Migration
+
+Apache Hive supports ORC, Parquet file formats that could be migrated to Paimon. 
+When migrating data to a paimon table, the origin table will be permanently disappeared. So please back up your data if you
+still need the original table. The migrated table will be [unware-bucket append-only table]({{< ref "concepts/append-only-table#append-for-scalable-table" >}}).
+
+Now, we can use flink generic catalog with Migrate Table Procedure and Migrate File Procedure to totally migrate a table from hive to paimon.

Review Comment:
   We should try to not limit by flink generic catalog, this feature can be engine unrelated.



##########
paimon-core/src/main/java/org/apache/paimon/operation/AppendOnlyFileStoreRead.java:
##########
@@ -104,7 +104,7 @@ public RecordReader<InternalRow> createReader(DataSplit split) throws IOExceptio
         DataFilePathFactory dataFilePathFactory =
                 pathFactory.createDataFilePathFactory(split.partition(), split.bucket());
         List<ConcatRecordReader.ReaderSupplier<InternalRow>> suppliers = new ArrayList<>();
-        if (split.beforeFiles().size() > 0) {

Review Comment:
   Here don't need to use isEmpty, size > 0 is also a good way.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@paimon.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org