You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/13 19:42:38 UTC

[GitHub] [arrow-cookbook] davisusanibar opened a new pull request, #207: [Java] Parquet reading example fails with Arrow v8.0

davisusanibar opened a new pull request, #207:
URL: https://github.com/apache/arrow-cookbook/pull/207

   PR to fix problems mention at: https://github.com/apache/arrow-cookbook/issues/206


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] davisusanibar commented on a diff in pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

davisusanibar commented on code in PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#discussion_r872816011


##########
java/source/dataset.rst:
##########
@@ -275,11 +302,19 @@ In case we need to project only certain columns we could configure ScanOptions w
     ){
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.print(vsr.contentToTSVString());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){
+                while (reader.loadNextBatch()) {
+                    try(VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
+                        final VectorUnloader unloader = new VectorUnloader(root);
+                        try(ArrowRecordBatch arrowRecordBatch = unloader.getRecordBatch()){
+                            loader.load(arrowRecordBatch);
+                            System.out.print(vsr.contentToTSVString());

Review Comment:
   Updated



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] toddfarmer commented on a diff in pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

toddfarmer commented on code in PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#discussion_r872745868


##########
java/source/dataset.rst:
##########
@@ -224,11 +240,19 @@ Consider that we have these files: data1: 3 rows, data2: 3 rows and data3: 250 r
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
             final int[] count = {1};
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.println("Batch: " + count[0]++ + ", RowCount: " + vsr.getRowCount());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){

Review Comment:
   ```suggestion
               try (ArrowReader reader = scanTask.execute()) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] lidavidm commented on a diff in pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

lidavidm commented on code in PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#discussion_r872753301


##########
java/source/demo/pom.xml:
##########
@@ -21,7 +21,7 @@
     <properties>
         <maven.compiler.source>8</maven.compiler.source>
         <maven.compiler.target>8</maven.compiler.target>
-        <arrow.version>7.0.0</arrow.version>
+        <arrow.version>8.0.0</arrow.version>

Review Comment:
   We need to get this added to the list of post-release tasks and/or make it one of the things we check as part of release.



##########
java/source/dataset.rst:
##########
@@ -275,11 +302,19 @@ In case we need to project only certain columns we could configure ScanOptions w
     ){
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.print(vsr.contentToTSVString());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){
+                while (reader.loadNextBatch()) {
+                    try(VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
+                        final VectorUnloader unloader = new VectorUnloader(root);
+                        try(ArrowRecordBatch arrowRecordBatch = unloader.getRecordBatch()){
+                            loader.load(arrowRecordBatch);
+                            System.out.print(vsr.contentToTSVString());

Review Comment:
   nit: why not just `root.contentToTSVString()`? Is there a need to demonstrate VectorLoader/VectorUnloader here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] davisusanibar commented on a diff in pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

davisusanibar commented on code in PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#discussion_r872755453


##########
java/source/dataset.rst:
##########
@@ -224,11 +240,19 @@ Consider that we have these files: data1: 3 rows, data2: 3 rows and data3: 250 r
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
             final int[] count = {1};
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.println("Batch: " + count[0]++ + ", RowCount: " + vsr.getRowCount());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){

Review Comment:
   Thank you @toddfarmer, changed lots of places.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] davisusanibar commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

davisusanibar commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126448145

   > Maybe we should label the top level page, or perhaps include it in the page titles, or something?
   > 
   > Also, I wonder if we can just snapshot the 'old' cookbook on each release and label it as the '7.0.0 cookbook' or something
   
   Ticket created: https://github.com/apache/arrow-cookbook/issues/208


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] lidavidm merged pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

lidavidm merged PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] davisusanibar commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

davisusanibar commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126508686

   > Do you want to just answer it?
   
   Yes, just mention the PR in case they need to continue with their work.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] toddfarmer commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

toddfarmer commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126409768

   Should we add any comments around these code changes to highlight that the supplied example code is version (8.0.0)-specific?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] davisusanibar commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

davisusanibar commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126405602

   Hi @lidavidm please could you help with a review.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] toddfarmer commented on a diff in pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

toddfarmer commented on code in PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#discussion_r872745979


##########
java/source/dataset.rst:
##########
@@ -224,11 +240,19 @@ Consider that we have these files: data1: 3 rows, data2: 3 rows and data3: 250 r
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
             final int[] count = {1};
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.println("Batch: " + count[0]++ + ", RowCount: " + vsr.getRowCount());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){
+                while (reader.loadNextBatch()) {
+                    try(VectorSchemaRoot root = reader.getVectorSchemaRoot()) {

Review Comment:
   ```suggestion
                       try (VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
   ```



##########
java/source/dataset.rst:
##########
@@ -275,11 +302,19 @@ In case we need to project only certain columns we could configure ScanOptions w
     ){
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.print(vsr.contentToTSVString());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){
+                while (reader.loadNextBatch()) {
+                    try(VectorSchemaRoot root = reader.getVectorSchemaRoot()) {

Review Comment:
   ```suggestion
                       try (VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
   ```



##########
java/source/dataset.rst:
##########
@@ -224,11 +240,19 @@ Consider that we have these files: data1: 3 rows, data2: 3 rows and data3: 250 r
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
             final int[] count = {1};
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.println("Batch: " + count[0]++ + ", RowCount: " + vsr.getRowCount());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){
+                while (reader.loadNextBatch()) {
+                    try(VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
+                        final VectorUnloader unloader = new VectorUnloader(root);
+                        try(ArrowRecordBatch arrowRecordBatch = unloader.getRecordBatch()){

Review Comment:
   ```suggestion
                           try (ArrowRecordBatch arrowRecordBatch = unloader.getRecordBatch()) {
   ```



##########
java/source/dataset.rst:
##########
@@ -275,11 +302,19 @@ In case we need to project only certain columns we could configure ScanOptions w
     ){
         scanner.scan().forEach(scanTask-> {
             VectorLoader loader = new VectorLoader(vsr);
-            scanTask.execute().forEachRemaining(arrowRecordBatch -> {
-                loader.load(arrowRecordBatch);
-                System.out.print(vsr.contentToTSVString());
-                arrowRecordBatch.close();
-            });
+            try(ArrowReader reader = scanTask.execute()){
+                while (reader.loadNextBatch()) {
+                    try(VectorSchemaRoot root = reader.getVectorSchemaRoot()) {
+                        final VectorUnloader unloader = new VectorUnloader(root);
+                        try(ArrowRecordBatch arrowRecordBatch = unloader.getRecordBatch()){

Review Comment:
   ```suggestion
                           try (ArrowRecordBatch arrowRecordBatch = unloader.getRecordBatch()) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] davisusanibar commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

davisusanibar commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126494553

   What about [stackoverflow question](https://stackoverflow.com/questions/72233354/how-to-read-parquet-files-into-tables-in-java-using-apache-arrow)? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] lidavidm commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

lidavidm commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126489623

   Note we can't deploy cookbook updates until we get the R cookbook fixed (#201)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] lidavidm commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

lidavidm commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126495176

   Do you want to just answer it?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow-cookbook] lidavidm commented on pull request #207: [Java] Parquet reading example fails with Arrow v8.0

Posted by GitBox <gi...@apache.org>.

lidavidm commented on PR #207:
URL: https://github.com/apache/arrow-cookbook/pull/207#issuecomment-1126414987

   Maybe we should label the top level page, or perhaps include it in the page titles, or something?
   
   Also, I wonder if we can just snapshot the 'old' cookbook on each release and label it as the '7.0.0 cookbook' or something


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org