You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/24 12:09:39 UTC

[GitHub] [arrow] liyafan82 opened a new pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

liyafan82 opened a new pull request #10983:
URL: https://github.com/apache/arrow/pull/10983


   According to the current design of the JDBC adapter, it is not possible to reuse the vector schema roots. That is, a new vector schema root is created and released for each batch.
   
   This can cause performance problems, because in many scenarios, the client code only reads data in vector schema root. So the vector schema roots can be reused in the following cycle: populate data -> client use data -> populate data -> ...
   
   The current design has another problem. For most times, it has two alternating vector schema roots in memory, causing a large waste of memory, especially for large batches.
   
   We solve both problems by providing a flag in the config, which allows the user to reuse the vector shema roots.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#discussion_r702553105



##########
File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowTest.java
##########
@@ -230,11 +235,16 @@ public void runLargeNumberOfRows() throws IOException, SQLException {
     int x = 0;
     final int targetRows = 600000;
     ResultSet rs = new FakeResultSet(targetRows);
-    try (ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(rs, allocator)) {
+    JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator, JdbcToArrowUtils.getUtcCalendar(), false)

Review comment:
       Sounds good. Parameter doc added for this line of code, and also added for some other places. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#discussion_r700732345



##########
File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java
##########
@@ -137,48 +139,46 @@ private VectorSchemaRoot createVectorSchemaRoot() {
 
   // Loads the next schema root or null if no more rows are available.
   private void load(VectorSchemaRoot root) throws SQLException {
-
-    for (int i = 1; i <= consumers.length; i++) {
-      consumers[i - 1].resetValueVector(root.getVector(rsmd.getColumnLabel(i)));
+    for (int i = 0; i < consumers.length; i++) {
+      consumers[i].resetValueVector(root.getVector(i));
     }
 
     consumeData(root);
-
-    if (root.getRowCount() == 0) {
-      root.close();
-      nextBatch = null;
-    } else {
-      nextBatch = root;
-    }
   }
 
   @Override
   public boolean hasNext() {
-    return nextBatch != null;
+    try {
+      return !resultSet.isAfterLast();
+    } catch (SQLException e) {
+      throw new RuntimeException(e);
+    }
   }
 
   /**
-   * Gets the next vector. The user is responsible for freeing its resources.
+   * Gets the next vector.
+   * If {@link JdbcToArrowConfig#isReuseVectorSchemaRoot()} is false,
+   * the client is responsible for freeing its resources.
    */
   @Override
   public VectorSchemaRoot next() {
     Preconditions.checkArgument(hasNext());
-    VectorSchemaRoot returned = nextBatch;
     try {
-      load(createVectorSchemaRoot());
+      VectorSchemaRoot ret = config.isReuseVectorSchemaRoot() ? nextBatch : createVectorSchemaRoot();

Review comment:
       does it make sense to factor this out to a method that takes config? instead of repeating ternary logic in a few places?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#discussion_r700733201



##########
File path: java/adapter/jdbc/src/test/java/org/apache/arrow/adapter/jdbc/h2/JdbcToArrowTest.java
##########
@@ -230,11 +235,16 @@ public void runLargeNumberOfRows() throws IOException, SQLException {
     int x = 0;
     final int targetRows = 600000;
     ResultSet rs = new FakeResultSet(targetRows);
-    try (ArrowVectorIterator iter = JdbcToArrow.sqlToArrowVectorIterator(rs, allocator)) {
+    JdbcToArrowConfig config = new JdbcToArrowConfigBuilder(allocator, JdbcToArrowUtils.getUtcCalendar(), false)

Review comment:
       please add parameter doc for the new false literaal.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
emkornfield commented on pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#issuecomment-917711818


   +1 thank you.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#issuecomment-913328891


   > Should there be a test that asserts the VectorSchemaRoot is actually reused when the value set is true?
   
   Good suggestion. I've added `JdbcToArrowVectorIteratorTest#testVectorSchemaRootReuse` for this. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#issuecomment-904581684


   https://issues.apache.org/jira/browse/ARROW-13733


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#discussion_r702552067



##########
File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java
##########
@@ -137,48 +139,46 @@ private VectorSchemaRoot createVectorSchemaRoot() {
 
   // Loads the next schema root or null if no more rows are available.
   private void load(VectorSchemaRoot root) throws SQLException {
-
-    for (int i = 1; i <= consumers.length; i++) {
-      consumers[i - 1].resetValueVector(root.getVector(rsmd.getColumnLabel(i)));
+    for (int i = 0; i < consumers.length; i++) {
+      consumers[i].resetValueVector(root.getVector(i));
     }
 
     consumeData(root);
-
-    if (root.getRowCount() == 0) {
-      root.close();
-      nextBatch = null;
-    } else {
-      nextBatch = root;
-    }
   }
 
   @Override
   public boolean hasNext() {
-    return nextBatch != null;
+    try {
+      return !resultSet.isAfterLast();

Review comment:
       I think so.
   `isAfterLast` is a public API of interface `java.sql.ResultSet` (https://docs.oracle.com/javase/7/docs/api/java/sql/ResultSet.html#isAfterLast()), so it is supposed to be supported by each legitimate implementation. 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield closed pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
emkornfield closed pull request #10983:
URL: https://github.com/apache/arrow/pull/10983


   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] liyafan82 commented on a change in pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
liyafan82 commented on a change in pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#discussion_r702552897



##########
File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java
##########
@@ -137,48 +139,46 @@ private VectorSchemaRoot createVectorSchemaRoot() {
 
   // Loads the next schema root or null if no more rows are available.
   private void load(VectorSchemaRoot root) throws SQLException {
-
-    for (int i = 1; i <= consumers.length; i++) {
-      consumers[i - 1].resetValueVector(root.getVector(rsmd.getColumnLabel(i)));
+    for (int i = 0; i < consumers.length; i++) {
+      consumers[i].resetValueVector(root.getVector(i));
     }
 
     consumeData(root);
-
-    if (root.getRowCount() == 0) {
-      root.close();
-      nextBatch = null;
-    } else {
-      nextBatch = root;
-    }
   }
 
   @Override
   public boolean hasNext() {
-    return nextBatch != null;
+    try {
+      return !resultSet.isAfterLast();
+    } catch (SQLException e) {
+      throw new RuntimeException(e);
+    }
   }
 
   /**
-   * Gets the next vector. The user is responsible for freeing its resources.
+   * Gets the next vector.
+   * If {@link JdbcToArrowConfig#isReuseVectorSchemaRoot()} is false,
+   * the client is responsible for freeing its resources.
    */
   @Override
   public VectorSchemaRoot next() {
     Preconditions.checkArgument(hasNext());
-    VectorSchemaRoot returned = nextBatch;
     try {
-      load(createVectorSchemaRoot());
+      VectorSchemaRoot ret = config.isReuseVectorSchemaRoot() ? nextBatch : createVectorSchemaRoot();

Review comment:
       I checked the code, and find the ternary logic is used twice. However, the logic in the two places are the opposite:
   1) In `initialize()`, a new vector schema root is created, if the resue flag is enabled.
   2) In `next()`, a new vector schema root is created, if the reuse flag is diabled.
   
   So there is no common logic here?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] emkornfield commented on a change in pull request #10983: ARROW-13733 [Java]: Allow JDBC adapters to reuse vector schema roots

Posted by GitBox <gi...@apache.org>.
emkornfield commented on a change in pull request #10983:
URL: https://github.com/apache/arrow/pull/10983#discussion_r700732056



##########
File path: java/adapter/jdbc/src/main/java/org/apache/arrow/adapter/jdbc/ArrowVectorIterator.java
##########
@@ -137,48 +139,46 @@ private VectorSchemaRoot createVectorSchemaRoot() {
 
   // Loads the next schema root or null if no more rows are available.
   private void load(VectorSchemaRoot root) throws SQLException {
-
-    for (int i = 1; i <= consumers.length; i++) {
-      consumers[i - 1].resetValueVector(root.getVector(rsmd.getColumnLabel(i)));
+    for (int i = 0; i < consumers.length; i++) {
+      consumers[i].resetValueVector(root.getVector(i));
     }
 
     consumeData(root);
-
-    if (root.getRowCount() == 0) {
-      root.close();
-      nextBatch = null;
-    } else {
-      nextBatch = root;
-    }
   }
 
   @Override
   public boolean hasNext() {
-    return nextBatch != null;
+    try {
+      return !resultSet.isAfterLast();

Review comment:
       is this guaranteed to be implemented by most JDBC providers?  




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org