You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "danepitkin (via GitHub)" <gi...@apache.org> on 2023/09/13 21:07:09 UTC
[GitHub] [arrow-cookbook] danepitkin commented on a diff in pull request #316: [Java] Document how to convert JDBC Adapter result into a Parquet file
danepitkin commented on code in PR #316:
URL: https://github.com/apache/arrow-cookbook/pull/316#discussion_r1325072073
##########
java/source/io.rst:
##########
@@ -263,6 +263,93 @@ Write - Out to Buffer
Number of rows written: 3
+Write Parquet Files
+*******************
+
+Let's read an Arrow file and populate that data into a Parquet file.
+
+.. testcode::
+
+ import java.io.IOException;
+ import java.nio.file.DirectoryStream;
+ import java.nio.file.Files;
+ import java.nio.file.Path;
+ import java.nio.file.Paths;
+
+ import org.apache.arrow.dataset.file.DatasetFileWriter;
+ import org.apache.arrow.dataset.file.FileFormat;
+ import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+ import org.apache.arrow.dataset.jni.NativeMemoryPool;
+ import org.apache.arrow.dataset.scanner.ScanOptions;
+ import org.apache.arrow.dataset.scanner.Scanner;
+ import org.apache.arrow.dataset.source.Dataset;
+ import org.apache.arrow.dataset.source.DatasetFactory;
+ import org.apache.arrow.memory.BufferAllocator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.ipc.ArrowFileReader;
+ import org.apache.arrow.vector.ipc.ArrowReader;
+ import org.apache.arrow.vector.ipc.SeekableReadChannel;
+ import org.apache.arrow.vector.util.ByteArrayReadableSeekableByteChannel;
+
+ // read arrow demo data
+ Path uriRead = Paths.get("./thirdpartydeps/arrowfiles/random_access.arrow");
Review Comment:
Should we add a comment describing what's in this file? Looks like it's three row groups of 3 rows each based on the output.
##########
java/source/jdbc.rst:
##########
@@ -307,3 +307,191 @@ values to the given scale.
102 true 100000000030.0000000 some char text [1,2]
INT_FIELD1 BOOL_FIELD2 BIGINT_FIELD5 CHAR_FIELD16 LIST_FIELD19
103 true 10000000003.0000000 some char text [1]
+
+Write ResultSet to Parquet File
+===============================
+
+As an example, we are trying to write a parquet file from the JDBC adapter results.
+
+.. testcode::
+
+ import java.io.BufferedReader;
+ import java.io.FileReader;
+ import java.io.IOException;
+ import java.nio.file.DirectoryStream;
+ import java.nio.file.Files;
+ import java.nio.file.Path;
+ import java.sql.Connection;
+ import java.sql.DriverManager;
+ import java.sql.ResultSet;
+ import java.sql.SQLException;
+ import java.sql.Types;
+ import java.util.HashMap;
+
+ import org.apache.arrow.adapter.jdbc.ArrowVectorIterator;
+ import org.apache.arrow.adapter.jdbc.JdbcFieldInfo;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrow;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrowConfig;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrowConfigBuilder;
+ import org.apache.arrow.adapter.jdbc.JdbcToArrowUtils;
+ import org.apache.arrow.dataset.file.DatasetFileWriter;
+ import org.apache.arrow.dataset.file.FileFormat;
+ import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+ import org.apache.arrow.dataset.jni.NativeMemoryPool;
+ import org.apache.arrow.dataset.scanner.ScanOptions;
+ import org.apache.arrow.dataset.scanner.Scanner;
+ import org.apache.arrow.dataset.source.Dataset;
+ import org.apache.arrow.dataset.source.DatasetFactory;
+ import org.apache.arrow.memory.BufferAllocator;
+ import org.apache.arrow.memory.RootAllocator;
+ import org.apache.arrow.vector.VectorSchemaRoot;
+ import org.apache.arrow.vector.ipc.ArrowReader;
+ import org.apache.arrow.vector.types.pojo.Schema;
+ import org.apache.ibatis.jdbc.ScriptRunner;
+ import org.slf4j.LoggerFactory;
+
+ import ch.qos.logback.classic.Level;
+ import ch.qos.logback.classic.Logger;
+
+ class JDBCReader extends ArrowReader {
Review Comment:
Awesome!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org