You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@hudi.apache.org by GitBox <gi...@apache.org> on 2020/01/09 08:11:50 UTC

[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: A schema provider to get metadata through Jdbc

OpenOpened opened a new pull request #1200: A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-586062515
 
 
   @OpenOpened Thanks for working through this issue with us patiently! merged!

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r377048678
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * The code logic implementation refers to spark 2.4.x and spark 3.x.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try (PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table))) {
+        statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+        try (ResultSet rs = statement.executeQuery()) {
+          StructType structType;
+          if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+            structType = JdbcUtils.getSchema(rs, dialect, true);
+          } else {
+            structType = JdbcUtils.getSchema(rs, dialect, false);
+          }
+          return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+        }
+      }
+    } else {
+      throw new HoodieException(String.format("%s table does not exists!", table));
+    }
+  }
+
+  /**
+   * Replace java map with scala immutable map.
+   * refers: https://stackoverflow.com/questions/11903167/convert-java-util-hashmap-to-scala-collection-immutable-map-in-java/11903737#11903737
 
 Review comment:
   I think we could remove this line, If we copy from other projects, we may need add copyright to LICENSE. but if copied from stackoverflow, it would be removed.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183554
 
 

 ##########
 File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestHoodieDeltaStreamer.java
 ##########
 @@ -511,6 +524,22 @@ public void testNullSchemaProvider() throws Exception {
     }
   }
 
+  @Test
+  public void testJdbcbasedSchemaProvider() throws Exception {
 
 Review comment:
   can we create a separate test class for this? given you are only testing the schema provider?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r365561760
 
 

 ##########
 File path: hudi-utilities/pom.xml
 ##########
 @@ -69,6 +100,12 @@
   </build>
 
   <dependencies>
+    <!-- Scala -->
 
 Review comment:
   can you clarify why these pom changes are needed? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183326
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try {
+        PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table));
+        try {
+          statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+          ResultSet rs = statement.executeQuery();
+          try {
+            StructType structType;
+            if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+              structType = JdbcUtils.getSchema(rs, dialect, true);
+            } else {
+              structType = JdbcUtils.getSchema(rs, dialect, false);
+            }
+            return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+          } finally {
+            rs.close();
+          }
+        } finally {
+          statement.close();
+        }
+      } finally {
+        conn.close();
+      }
+    } else {
+      throw new HoodieException(String.format("%s table not exists!", table));
+    }
+  }
+
+  @SuppressWarnings("unchecked")
+  private static <K, V> scala.collection.immutable.Map<K, V> toScalaImmutableMap(java.util.Map<K, V> javaMap) {
 
 Review comment:
   import the java collection classes?`Map`, `List`, `ArrayList` ? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
leesf commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-583846298
 
 
   @OpenOpened Only left some minor comments, otherwise looks good to me.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376784460
 
 

 ##########
 File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestJdbcbasedSchemaProvider.java
 ##########
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.schema.JdbcbasedSchemaProvider;
+
+import org.apache.avro.Schema;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+
+import static org.junit.Assert.assertEquals;
+
 
 Review comment:
   Would be better to add some annotations

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design. And ensure the uniformity of the schema, such as reading historical data from spark jdbc, and Use delta streamer to synchronize data.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376784460
 
 

 ##########
 File path: hudi-utilities/src/test/java/org/apache/hudi/utilities/TestJdbcbasedSchemaProvider.java
 ##########
 @@ -0,0 +1,82 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities;
+
+import org.apache.hudi.common.util.TypedProperties;
+import org.apache.hudi.exception.HoodieException;
+import org.apache.hudi.utilities.schema.JdbcbasedSchemaProvider;
+
+import org.apache.avro.Schema;
+import org.apache.log4j.LogManager;
+import org.apache.log4j.Logger;
+import org.apache.spark.api.java.JavaSparkContext;
+import org.junit.After;
+import org.junit.Before;
+import org.junit.Test;
+
+import java.io.IOException;
+import java.sql.Connection;
+import java.sql.DriverManager;
+import java.sql.PreparedStatement;
+import java.sql.SQLException;
+
+import static org.junit.Assert.assertEquals;
+
 
 Review comment:
   Would be better to add some annotations

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r365561853
 
 

 ##########
 File path: hudi-utilities/src/main/scala/org/apache/hudi/utilities/JdbcProviderUtils.scala
 ##########
 @@ -0,0 +1,59 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *      http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.hudi.utilities
+
+import org.apache.avro.Schema
+import org.apache.hudi.AvroConversionUtils
+import org.apache.hudi.exception.HoodieException
+import org.apache.spark.sql.execution.datasources.jdbc.{JDBCOptions, JdbcUtils}
+import org.apache.spark.sql.jdbc.JdbcDialects
+
+import scala.collection.JavaConverters._
 
 Review comment:
   could we avoid this and keep this code in Java? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r378035166
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * The code logic implementation refers to spark 2.4.x and spark 3.x.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try (PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table))) {
+        statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+        try (ResultSet rs = statement.executeQuery()) {
+          StructType structType;
+          if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+            structType = JdbcUtils.getSchema(rs, dialect, true);
+          } else {
+            structType = JdbcUtils.getSchema(rs, dialect, false);
+          }
+          return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+        }
+      }
+    } else {
+      throw new HoodieException(String.format("%s table does not exists!", table));
+    }
+  }
+
+  /**
+   * Replace java map with scala immutable map.
+   * refers: https://stackoverflow.com/questions/11903167/convert-java-util-hashmap-to-scala-collection-immutable-map-in-java/11903737#11903737
 
 Review comment:
   Sorry to be a pain. but this piece of code would be problematic since its exactly https://stackoverflow.com/a/45992769 ? Please re-implement this 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r377078510
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * The code logic implementation refers to spark 2.4.x and spark 3.x.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try (PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table))) {
+        statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+        try (ResultSet rs = statement.executeQuery()) {
+          StructType structType;
+          if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+            structType = JdbcUtils.getSchema(rs, dialect, true);
+          } else {
+            structType = JdbcUtils.getSchema(rs, dialect, false);
+          }
+          return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+        }
+      }
+    } else {
+      throw new HoodieException(String.format("%s table does not exists!", table));
+    }
+  }
+
+  /**
+   * Replace java map with scala immutable map.
+   * refers: https://stackoverflow.com/questions/11903167/convert-java-util-hashmap-to-scala-collection-immutable-map-in-java/11903737#11903737
 
 Review comment:
   ok, i will remove it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r378134572
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * The code logic implementation refers to spark 2.4.x and spark 3.x.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try (PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table))) {
+        statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+        try (ResultSet rs = statement.executeQuery()) {
+          StructType structType;
+          if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+            structType = JdbcUtils.getSchema(rs, dialect, true);
+          } else {
+            structType = JdbcUtils.getSchema(rs, dialect, false);
+          }
+          return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+        }
+      }
+    } else {
+      throw new HoodieException(String.format("%s table does not exists!", table));
+    }
+  }
+
+  /**
+   * Replace java map with scala immutable map.
+   * refers: https://stackoverflow.com/questions/11903167/convert-java-util-hashmap-to-scala-collection-immutable-map-in-java/11903737#11903737
 
 Review comment:
   @vinothchandar @leesf I reimplemented the relevant logic using the second approach.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376705690
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try {
+        PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table));
+        try {
+          statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+          ResultSet rs = statement.executeQuery();
+          try {
+            StructType structType;
+            if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+              structType = JdbcUtils.getSchema(rs, dialect, true);
+            } else {
+              structType = JdbcUtils.getSchema(rs, dialect, false);
+            }
+            return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+          } finally {
+            rs.close();
+          }
+        } finally {
+          statement.close();
+        }
+      } finally {
+        conn.close();
+      }
 
 Review comment:
   Would be changed to `try-with-resources`? since `try-catch-finnaly` has been optimized to `try-with-resources` in the project now. 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-574033535
 
 
   @vinothchandar 
   Please review. I mainly did something:
   1. Added test cases
   2. All logic is implemented using java
   3. The jdbc code logic references spark 2.4.4 and spark 3.x

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-581784722
 
 
   @OpenOpened Sorry.. forgot to mention you.. I fixed a missing license for the test file.. There is one conflict looks like.. if you can rebase and push again, this is good to go.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar merged pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar merged pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design. And ensure the uniformity of the schema, such as reading historical data from spark jdbc, and Use delta streamer to synchronize data.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r378055288
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * The code logic implementation refers to spark 2.4.x and spark 3.x.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try (PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table))) {
+        statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+        try (ResultSet rs = statement.executeQuery()) {
+          StructType structType;
+          if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+            structType = JdbcUtils.getSchema(rs, dialect, true);
+          } else {
+            structType = JdbcUtils.getSchema(rs, dialect, false);
+          }
+          return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+        }
+      }
+    } else {
+      throw new HoodieException(String.format("%s table does not exists!", table));
+    }
+  }
+
+  /**
+   * Replace java map with scala immutable map.
+   * refers: https://stackoverflow.com/questions/11903167/convert-java-util-hashmap-to-scala-collection-immutable-map-in-java/11903737#11903737
 
 Review comment:
   Do you guys have a better way? I checked the relevant information. If you want to achieve conversion elegantly, there are only two methods. First, using scala code to do the conversion, you need to implement a scala conversion class. Second, rewrite the related spark method with java. The disadvantage is that it may cause compatibility problems in the future.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366184290
 
 

 ##########
 File path: hudi-utilities/src/test/resources/delta-streamer-config/source-jdbc.avsc
 ##########
 @@ -0,0 +1,59 @@
+/*
 
 Review comment:
   any reason why the existing `source.avsc` won't work for you? Like to avoid creating new schema if possible 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376708108
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -235,4 +248,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getJDBCSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try {
+        PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table));
+        try {
+          statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+          ResultSet rs = statement.executeQuery();
+          try {
+            StructType structType;
+            if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+              structType = JdbcUtils.getSchema(rs, dialect, true);
+            } else {
+              structType = JdbcUtils.getSchema(rs, dialect, false);
+            }
+            return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+          } finally {
+            rs.close();
+          }
+        } finally {
+          statement.close();
+        }
+      } finally {
+        conn.close();
+      }
 
 Review comment:
   thank you for your advice. i wll do it.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design. And ensure the uniformity of the schema, such as reading historical data from spark jdbc, and Use delta streamer to synchronize data.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366309913
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try {
+        PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table));
+        try {
+          statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+          ResultSet rs = statement.executeQuery();
+          try {
+            StructType structType;
+            if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+              structType = JdbcUtils.getSchema(rs, dialect, true);
+            } else {
+              structType = JdbcUtils.getSchema(rs, dialect, false);
+            }
+            return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+          } finally {
+            rs.close();
+          }
+        } finally {
+          statement.close();
+        }
+      } finally {
+        conn.close();
+      }
+    } else {
+      throw new HoodieException(String.format("%s table not exists!", table));
+    }
+  }
+
+  @SuppressWarnings("unchecked")
+  private static <K, V> scala.collection.immutable.Map<K, V> toScalaImmutableMap(java.util.Map<K, V> javaMap) {
 
 Review comment:
   Because the underlying spark function only accepts parameters of type scala.collection.immutable.Map, I provide a private conversion function from java map to scala immutable Map.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened opened a new pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   ## What is the purpose of the pull request
   
   In our production environment, we usually need to synchronize data from mysql, and at the same time, we need to get the schema from the database. So I submitted this PR. A schema provider that obtains metadata through Jdbc calls the Spark JDBC related methods by design. And ensure the uniformity of the schema, such as reading historical data from spark jdbc, and Use delta streamer to synchronize data.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366183041
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try {
+        PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table));
+        try {
+          statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+          ResultSet rs = statement.executeQuery();
+          try {
+            StructType structType;
+            if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+              structType = JdbcUtils.getSchema(rs, dialect, true);
+            } else {
+              structType = JdbcUtils.getSchema(rs, dialect, false);
+            }
+            return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+          } finally {
+            rs.close();
+          }
+        } finally {
+          statement.close();
+        }
+      } finally {
+        conn.close();
+      }
+    } else {
+      throw new HoodieException(String.format("%s table not exists!", table));
 
 Review comment:
   change to `table does not exist!`? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-574151367
 
 
   @vinothchandar Please check again

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on issue #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#issuecomment-583741643
 
 
   @vinothchandar @leesf please review again

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366309238
 
 

 ##########
 File path: hudi-utilities/src/test/resources/delta-streamer-config/source-jdbc.avsc
 ##########
 @@ -0,0 +1,59 @@
+/*
 
 Review comment:
   Because the data structure of source.avs is too complex, the schema of the database table cannot achieve a similar effect.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
leesf commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r376784592
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map<String, String> options) throws Exception {
+    scala.collection.immutable.Map<String, String> ioptions = toScalaImmutableMap(options);
+    JDBCOptions jdbcOptions = new JDBCOptions(ioptions);
+    Connection conn = JdbcUtils.createConnectionFactory(jdbcOptions).apply();
+    String url = jdbcOptions.url();
+    String table = jdbcOptions.tableOrQuery();
+    JdbcOptionsInWrite jdbcOptionsInWrite = new JdbcOptionsInWrite(ioptions);
+    boolean tableExists = JdbcUtils.tableExists(conn, jdbcOptionsInWrite);
+    if (tableExists) {
+      JdbcDialect dialect = JdbcDialects.get(url);
+      try {
+        PreparedStatement statement = conn.prepareStatement(dialect.getSchemaQuery(table));
+        try {
+          statement.setQueryTimeout(Integer.parseInt(options.get("timeout")));
+          ResultSet rs = statement.executeQuery();
+          try {
+            StructType structType;
+            if (Boolean.parseBoolean(ioptions.get("nullable").get())) {
+              structType = JdbcUtils.getSchema(rs, dialect, true);
+            } else {
+              structType = JdbcUtils.getSchema(rs, dialect, false);
+            }
+            return AvroConversionUtils.convertStructTypeToAvroSchema(structType, table, "hoodie." + table);
+          } finally {
+            rs.close();
+          }
+        } finally {
+          statement.close();
+        }
+      } finally {
+        conn.close();
+      }
+    } else {
+      throw new HoodieException(String.format("%s table not exists!", table));
+    }
+  }
+
+  @SuppressWarnings("unchecked")
+  private static <K, V> scala.collection.immutable.Map<K, V> toScalaImmutableMap(java.util.Map<K, V> javaMap) {
 
 Review comment:
   also is the method(`toScalaImmutableMap`) copied from other projects or implemented on our own?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
OpenOpened closed pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200
 
 
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services

[GitHub] [incubator-hudi] vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc

Posted by GitBox <gi...@apache.org>.
vinothchandar commented on a change in pull request #1200: [HUDI-514] A schema provider to get metadata through Jdbc
URL: https://github.com/apache/incubator-hudi/pull/1200#discussion_r366182135
 
 

 ##########
 File path: hudi-utilities/src/main/java/org/apache/hudi/utilities/UtilHelpers.java
 ##########
 @@ -236,4 +250,57 @@ public static TypedProperties readConfig(InputStream in) throws IOException {
     defaults.load(in);
     return defaults;
   }
+
+  /***
+   * call spark function get the schema through jdbc.
+   * @param options
+   * @return
+   * @throws Exception
+   */
+  public static Schema getSchema(Map<String, String> options) throws Exception {
 
 Review comment:
   rename to `getJDBCSchema`? 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
users@infra.apache.org


With regards,
Apache Git Services