You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by li...@apache.org on 2022/02/22 13:02:02 UTC

[arrow-cookbook] branch main updated: [Java] Java cookbook for create arrow jni dataset (#138)

This is an automated email from the ASF dual-hosted git repository.

lidavidm pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/arrow-cookbook.git


The following commit(s) were added to refs/heads/main by this push:
     new 97eb69c  [Java] Java cookbook for create arrow jni dataset (#138)
97eb69c is described below

commit 97eb69c2c2d55f2f7c660ff0bb43d3795867f56b
Author: david dali susanibar arce <da...@gmail.com>
AuthorDate: Tue Feb 22 08:01:56 2022 -0500

    [Java] Java cookbook for create arrow jni dataset (#138)
    
    * Adding java cookbook for creating arrow jni
    
    * JNI library dependencies
    
    * Adding java cookbook for creating arrow jni
    
    * Adding java cookbook for creating arrow jni
    
    * Adding java cookbook for creating arrow jni
    
    * Testing problem with download dependencies
    
    * Debug jni errors
    
    * Debug jni errors
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Solving jni errors for jni *.dylib and *.so library dependencies
    
    * Custom protobuf.rb formula
    
    * Adding parquet files
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Configure ci worflow for jni and not jni
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding github cache for protobuf lib
    
    * Adding JNI testing cookbooks
    
    * Arrow jni dataset for version 7.0.0
    
    * Arrow jni dataset for version 7.0.0
    
    * Solving error: Failed to collect dependencies
    
    * Solving error: Failed to collect dependencies
    
    * Update java/source/dataset.rst
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Solving pr comments
    
    * Update java/source/dataset.rst
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Update java/source/dataset.rst
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Update java/source/dataset.rst
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Update java/source/dataset.rst
    
    Co-authored-by: David Li <li...@gmail.com>
    
    * Solving pr comments
    
    * Solving pr comments
    
    * Solving pr comments
    
    * Solving pr comments
    
    Co-authored-by: David Li <li...@gmail.com>
---
 ...a_cookbook.yml => test_java_linux_cookbook.yml} |   8 +-
 ...ava_cookbook.yml => test_java_osx_cookbook.yml} |  19 +-
 java/ext/javadoctest.py                            |   1 -
 java/source/dataset.rst                            | 277 +++++++++++++++++++++
 java/source/demo/pom.xml                           |  20 +-
 java/source/index.rst                              |   1 +
 java/thirdpartydeps/parquetfiles/data1.parquet     | Bin 0 -> 687 bytes
 java/thirdpartydeps/parquetfiles/data2.parquet     | Bin 0 -> 690 bytes
 java/thirdpartydeps/parquetfiles/data3.parquet     | Bin 0 -> 4569 bytes
 9 files changed, 298 insertions(+), 28 deletions(-)

diff --git a/.github/workflows/test_java_cookbook.yml b/.github/workflows/test_java_linux_cookbook.yml
similarity index 89%
copy from .github/workflows/test_java_cookbook.yml
copy to .github/workflows/test_java_linux_cookbook.yml
index 8f211d5..539afd0 100644
--- a/.github/workflows/test_java_cookbook.yml
+++ b/.github/workflows/test_java_linux_cookbook.yml
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-name: Test Java Cookbook
+name: Test Java Cookbook On Linux
 
 on:
   pull_request:
@@ -23,15 +23,15 @@ on:
        - main
     paths:
      - "java/**"
-     - ".github/workflows/test_java_cookbook.yml"
+     - ".github/workflows/test_java_linux_cookbook.yml"
      
 concurrency:
   group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}
   cancel-in-progress: true
 
 jobs:
-  test_py:
-    name: "Test Java Cookbook"
+  test_java_linux:
+    name: "Test Java Cookbook On Linux"
     runs-on: ubuntu-latest
     steps:
       - uses: actions/checkout@v1
diff --git a/.github/workflows/test_java_cookbook.yml b/.github/workflows/test_java_osx_cookbook.yml
similarity index 75%
rename from .github/workflows/test_java_cookbook.yml
rename to .github/workflows/test_java_osx_cookbook.yml
index 8f211d5..a55dd68 100644
--- a/.github/workflows/test_java_cookbook.yml
+++ b/.github/workflows/test_java_osx_cookbook.yml
@@ -15,7 +15,7 @@
 # specific language governing permissions and limitations
 # under the License.
 
-name: Test Java Cookbook
+name: Test Java Cookbook on MacOS
 
 on:
   pull_request:
@@ -23,22 +23,25 @@ on:
        - main
     paths:
      - "java/**"
-     - ".github/workflows/test_java_cookbook.yml"
+     - ".github/workflows/test_java_osx_cookbook.yml"
      
 concurrency:
   group: ${{ github.repository }}-${{ github.ref }}-${{ github.workflow }}
   cancel-in-progress: true
 
 jobs:
-  test_py:
-    name: "Test Java Cookbook"
-    runs-on: ubuntu-latest
+  test_java_osx:
+    name: "Test Java Cookbook on MacOS"
+    runs-on: macos-latest
     steps:
       - uses: actions/checkout@v1
-      - name: Install dependencies
-        run: sudo apt install libcurl4-openssl-dev libssl-dev python3-pip openjdk-11-jdk maven
+      - uses: actions/setup-java@v2
+        with:
+          distribution: 'temurin'
+          java-version: '11'
+      - name: Upgrade pip
+        run: python3 -m pip install --upgrade pip
       - name: Run tests
         run: make javatest
       - name: Build cookbook
         run: make java
-
diff --git a/java/ext/javadoctest.py b/java/ext/javadoctest.py
index 4b39817..1a55dd5 100644
--- a/java/ext/javadoctest.py
+++ b/java/ext/javadoctest.py
@@ -23,7 +23,6 @@ class JavaDocTestBuilder(DocTestBuilder):
     ) -> Any:
         # go to project that contains all your arrow maven dependencies
         path_arrow_project = pathlib.Path(__file__).parent.parent / "source" / "demo"
-
         # create list of all arrow jar dependencies
         subprocess.check_call(
             [
diff --git a/java/source/dataset.rst b/java/source/dataset.rst
new file mode 100644
index 0000000..ecf2bb3
--- /dev/null
+++ b/java/source/dataset.rst
@@ -0,0 +1,277 @@
+.. _arrow-dataset:
+
+=======
+Dataset
+=======
+
+* `Arrow Java Dataset`_: Java implementation of Arrow Datasets library. Implement Dataset Java API by JNI to C++.
+
+.. contents::
+
+Constructing Datasets
+=====================
+
+We can construct a dataset with an auto-inferred schema.
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import java.util.stream.StreamSupport;
+
+    try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+        String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
+        try (DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri)) {
+            try(Dataset dataset = datasetFactory.finish()){
+                ScanOptions options = new ScanOptions(/*batchSize*/ 100);
+                try(Scanner scanner = dataset.newScan(options)){
+                    System.out.println(StreamSupport.stream(scanner.scan().spliterator(), false).count());
+                }
+            }
+        }
+    }
+
+.. testoutput::
+
+    1
+
+Let construct our dataset with predefined schema.
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import java.util.stream.StreamSupport;
+
+    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
+    try (RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)) {
+        try (DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri)) {
+            try(Dataset dataset = datasetFactory.finish(datasetFactory.inspect())){
+                ScanOptions options = new ScanOptions(/*batchSize*/ 100);
+                try(Scanner scanner = dataset.newScan(options)){
+                    System.out.println(StreamSupport.stream(scanner.scan().spliterator(), false).count());
+                }
+            }
+        }
+    }
+
+.. testoutput::
+
+    1
+
+Getting the Schema
+==================
+
+During Dataset Construction
+***************************
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
+    try(RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)){
+        try(DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri)){
+            Schema schema = datasetFactory.inspect();
+
+            System.out.println(schema);
+        }
+    }
+
+.. testoutput::
+
+    Schema<id: Int(32, true), name: Utf8>(metadata: {parquet.avro.schema={"type":"record","name":"User","namespace":"org.apache.arrow.dataset","fields":[{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]}]}, writer.model.name=avro})
+
+From a Dataset
+**************
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.types.pojo.Schema;
+
+    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
+    try(RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE)){
+        try(DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri)){
+            ScanOptions options = new ScanOptions(/*batchSize*/ 1);
+            try(Dataset dataset = datasetFactory.finish()){
+                try(Scanner scanner = dataset.newScan(options)){
+                    Schema schema = scanner.schema();
+
+                    System.out.println(schema);
+                }
+            }
+        }
+    }
+
+.. testoutput::
+
+    Schema<id: Int(32, true), name: Utf8>(metadata: {parquet.avro.schema={"type":"record","name":"User","namespace":"org.apache.arrow.dataset","fields":[{"name":"id","type":["int","null"]},{"name":"name","type":["string","null"]}]}, writer.model.name=avro})
+
+Query Parquet File
+==================
+
+Let query information for a parquet file.
+
+Query Data Content For File
+***************************
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.VectorLoader;
+    import org.apache.arrow.vector.VectorSchemaRoot;
+
+    import java.util.stream.Stream;
+
+    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
+    try(RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
+        DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
+        Dataset dataset = datasetFactory.finish()){
+        ScanOptions options = new ScanOptions(/*batchSize*/ 100);
+        try(Scanner scanner = dataset.newScan(options);
+            VectorSchemaRoot vsr = VectorSchemaRoot.create(scanner.schema(), rootAllocator)){
+            scanner.scan().forEach(scanTask-> {
+                VectorLoader loader = new VectorLoader(vsr);
+                scanTask.execute().forEachRemaining(arrowRecordBatch -> {
+                    loader.load(arrowRecordBatch);
+                    System.out.print(vsr.contentToTSVString());
+                    arrowRecordBatch.close();
+                });
+            });
+        }
+    }
+
+.. testoutput::
+
+    id    name
+    1    David
+    2    Gladis
+    3    Juan
+
+Query Data Content For Directory
+********************************
+
+Consider that we have these files: data1: 3 rows, data2: 3 rows and data3: 250 rows.
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.VectorLoader;
+    import org.apache.arrow.vector.VectorSchemaRoot;
+
+    import java.util.stream.Stream;
+
+    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/";
+    try(RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
+        DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
+        Dataset dataset = datasetFactory.finish()){
+        ScanOptions options = new ScanOptions(/*batchSize*/ 100);
+        try(Scanner scanner = dataset.newScan(options);
+            VectorSchemaRoot vsr = VectorSchemaRoot.create(scanner.schema(), rootAllocator)){
+            scanner.scan().forEach(scanTask-> {
+                VectorLoader loader = new VectorLoader(vsr);
+                final int[] count = {1};
+                scanTask.execute().forEachRemaining(arrowRecordBatch -> {
+                    loader.load(arrowRecordBatch);
+                    System.out.println("Batch: " + count[0]++ + ", RowCount: " + vsr.getRowCount());
+                    arrowRecordBatch.close();
+                });
+            });
+        }
+    }
+
+.. testoutput::
+
+    Batch: 1, RowCount: 3
+    Batch: 2, RowCount: 3
+    Batch: 3, RowCount: 100
+    Batch: 4, RowCount: 100
+    Batch: 5, RowCount: 50
+
+Query Data Content with Projection
+**********************************
+
+In case we need to project only certain columns we could configure ScanOptions with projections needed.
+
+.. testcode::
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.VectorLoader;
+    import org.apache.arrow.vector.VectorSchemaRoot;
+
+    import java.util.Optional;
+
+    String uri = "file:" + System.getProperty("user.dir") + "/thirdpartydeps/parquetfiles/data1.parquet";
+    try(RootAllocator rootAllocator = new RootAllocator(Long.MAX_VALUE);
+        DatasetFactory datasetFactory = new FileSystemDatasetFactory(rootAllocator, NativeMemoryPool.getDefault(), FileFormat.PARQUET, uri);
+        Dataset dataset = datasetFactory.finish()){
+        String[] projection = new String[] {"name"};
+        ScanOptions options = new ScanOptions(/*batchSize*/ 100, Optional.of(projection));
+        try(Scanner scanner = dataset.newScan(options);
+            VectorSchemaRoot vsr = VectorSchemaRoot.create(scanner.schema(), rootAllocator)){
+            scanner.scan().forEach(scanTask-> {
+                VectorLoader loader = new VectorLoader(vsr);
+                scanTask.execute().forEachRemaining(arrowRecordBatch -> {
+                    loader.load(arrowRecordBatch);
+                    System.out.print(vsr.contentToTSVString());
+                    arrowRecordBatch.close();
+                });
+            });
+        }
+    }
+
+.. testoutput::
+
+    name
+    David
+    Gladis
+    Juan
+
+
+.. _Arrow Java Dataset: https://arrow.apache.org/docs/dev/java/dataset.html
\ No newline at end of file
diff --git a/java/source/demo/pom.xml b/java/source/demo/pom.xml
index 2f4305d..41d10d5 100644
--- a/java/source/demo/pom.xml
+++ b/java/source/demo/pom.xml
@@ -21,7 +21,7 @@
     <properties>
         <maven.compiler.source>8</maven.compiler.source>
         <maven.compiler.target>8</maven.compiler.target>
-        <arrow.version>6.0.0</arrow.version>
+        <arrow.version>7.0.0</arrow.version>
     </properties>
 
     <dependencies>
@@ -42,23 +42,13 @@
         </dependency>
         <dependency>
             <groupId>org.apache.arrow</groupId>
-            <artifactId>flight-core</artifactId>
+            <artifactId>arrow-dataset</artifactId>
             <version>${arrow.version}</version>
-            <exclusions>
-                <exclusion>
-                    <groupId>io.netty</groupId>
-                    <artifactId>netty-transport-native-unix-common</artifactId>
-                </exclusion>
-                <exclusion>
-                    <groupId>io.netty</groupId>
-                    <artifactId>netty-transport-native-kqueue</artifactId>
-                </exclusion>
-            </exclusions>
         </dependency>
         <dependency>
-            <groupId>junit</groupId>
-            <artifactId>junit</artifactId>
-            <version>4.13.2</version>
+            <groupId>com.google.guava</groupId>
+            <artifactId>guava</artifactId>
+            <version>30.1.1-jre</version>
         </dependency>
     </dependencies>
 
diff --git a/java/source/index.rst b/java/source/index.rst
index 17b87c3..38d7bf7 100644
--- a/java/source/index.rst
+++ b/java/source/index.rst
@@ -14,6 +14,7 @@ Welcome to java arrow's documentation!
    io
    schema
    data
+   dataset
 
 Indices and tables
 ==================
diff --git a/java/thirdpartydeps/parquetfiles/data1.parquet b/java/thirdpartydeps/parquetfiles/data1.parquet
new file mode 100644
index 0000000..a2602db
Binary files /dev/null and b/java/thirdpartydeps/parquetfiles/data1.parquet differ
diff --git a/java/thirdpartydeps/parquetfiles/data2.parquet b/java/thirdpartydeps/parquetfiles/data2.parquet
new file mode 100644
index 0000000..0adc5eb
Binary files /dev/null and b/java/thirdpartydeps/parquetfiles/data2.parquet differ
diff --git a/java/thirdpartydeps/parquetfiles/data3.parquet b/java/thirdpartydeps/parquetfiles/data3.parquet
new file mode 100644
index 0000000..958edfd
Binary files /dev/null and b/java/thirdpartydeps/parquetfiles/data3.parquet differ