You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by ko...@apache.org on 2019/03/16 08:58:49 UTC
[arrow] branch master updated: ARROW-4506: [Ruby] Add
Arrow::RecordBatch#raw_records
This is an automated email from the ASF dual-hosted git repository.
kou pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/arrow.git
The following commit(s) were added to refs/heads/master by this push:
new aab4946 ARROW-4506: [Ruby] Add Arrow::RecordBatch#raw_records
aab4946 is described below
commit aab4946ad283946be948119c9107471acd64333c
Author: Kenta Murata <mr...@mrkn.jp>
AuthorDate: Sat Mar 16 17:58:27 2019 +0900
ARROW-4506: [Ruby] Add Arrow::RecordBatch#raw_records
I want to add Arrow::RecordBatch#raw_records method to convert a record batch object to a nested array.
This is the first step to implement the feature. The following things are out of scope of this pull-request.
- Conversion of half-float values to Ruby's float.
- Unit treatment of Time32 and Time64
- Conversion of the following compound data types to Ruby objects:
- ListType
- StructType
- UnionType
- DictionaryType
### TODO:
- [x] Extracting raw values of HalfFloatArray
- [x] Extracting ListArray
- [x] Extracting StructArray
- [x] Extracting SparseUnionArray
- [x] Extracting DenseUnionArray
- [x] FixedSizeBinary
- [x] Date32
- [x] Date64
- [x] Timestamp
- [x] Decimal128
- [x] Struct
- [x] Dictionary
- [x] Extracting indices of DictionaryArray
- [x] Make CI passed
- [x] Add benchmark script
Author: Kenta Murata <mr...@mrkn.jp>
Author: Kouhei Sutou <ko...@clear-code.com>
Closes #3587 from mrkn/raw_records and squashes the following commits:
00197e4b <Kouhei Sutou> Split test files
0d0d5170 <Kouhei Sutou> Replace a large StructArray test with small tests
61f9774b <Kouhei Sutou> Use {"field_name" => value} for union value
5fa9ba03 <Kouhei Sutou> Finish replacing tests for StructArray
83bf55a3 <Kouhei Sutou> Add support for nested StructArray
c065a8a2 <Kouhei Sutou> Add tests for StructArray#raw_records
795c96a8 <Kouhei Sutou> Add support for "_" for data type name
87ab55da <Kouhei Sutou> Add support for nil value as NULL for struct field value
08b85c81 <Kouhei Sutou> Add support for nested list
4d78da49 <Kouhei Sutou> Remove resolved TODO
024506f1 <Kouhei Sutou> Remove needless tests
e30a9e73 <Kouhei Sutou> Add support for nil in ListArrayBuilder#append_values
55066950 <Kouhei Sutou> Add primitive array tests
67808a2b <Kouhei Sutou> Add support for building BinaryArray
44779252 <Kouhei Sutou> Add tests for primitive arrays
cb3cb471 <Kouhei Sutou> Add support for NullArray
728b82c9 <Kouhei Sutou> Reduce scope
021dbee2 <Kouhei Sutou> Use .cpp for C++
4d01f85b <Kouhei Sutou> Use constexpr
7f5d6cc3 <Kouhei Sutou> Fix style
db1e1c25 <Kouhei Sutou> Remove needless reference
028e3c4a <Kouhei Sutou> Use auto
ce67b5bb <Kouhei Sutou> Stop reusing block argument name
750afff5 <Kouhei Sutou> Use .cpp for C++ extension
a2032754 <Kouhei Sutou> Fold a long line
46d63aeb <Kouhei Sutou> Use auto
03233bd5 <Kouhei Sutou> Use Red Arrow in build directly
3a776306 <Kouhei Sutou> Fix package name for MSYS2
022dd073 <Kenta Murata> Fix the benchmark for dictionary array
3e18e85f <Kenta Murata> Rename a benchmark file
2c6142c8 <Kenta Murata> Rename a directory
40595617 <Kenta Murata> Fix benchmark task
5d380fdc <Kenta Murata> Use values between 2**16 and 2**32-1 for testing UInt32Array
d0b8d0fc <Kenta Murata> Fix styling
f41550d0 <Kenta Murata> Stop using precomputed scales of time unit
bdf7090e <Kenta Murata> Remove needless scope blocks
8e81e9b2 <Kouhei Sutou> Implement converters based on visitor
31ca243a <Kenta Murata> Add extension files in the gem package
ec11062b <Kenta Murata> Add arrow ext dir in $LOAD_PATH
d0a4733b <Kenta Murata> Fix benchmark against the removal of convert_decimal: option
24d14800 <Kenta Murata> Guard RVAL2GOBJ by rb::protect
73063288 <Kenta Murata> Drop convert_decimal: option
ba96767a <Kenta Murata> Remove a needless member function
dfa6b4ae <Kenta Murata> Introduce require_extension_library method to load arrow.so
ee8dcccf <Kenta Murata> Avoid using rb_str_new_cstr
7e51f7a8 <Kenta Murata> Fix variable names in test
8d1c36e4 <Kenta Murata> Use #pragma once
bba7eb4e <Kenta Murata> Rename a variable
b47fbfeb <Kenta Murata> Use GOBJ2RVAL_UNREF correctly
4a73d5eb <Kenta Murata> Use auto instead of VALUE
d59d72be <Kenta Murata> Put some codes out side of rb::protect block
9995dec9 <Kenta Murata> Use rb_enc_str_new with rb_ascii8bit_encoding for binary string creation
e0b26563 <Kenta Murata> Replace assert with DCHECK
70ccc327 <Kenta Murata> Make cArrowRecordBatch a local variable
16cbe8d9 <Kenta Murata> Use rb::RawMethod
d6b7d3da <Kenta Murata> Remove needless require
ed507393 <Kenta Murata> Fix styling
da2c49f6 <Kenta Murata> Rename rb_cDate to cDate
4645c17c <Kenta Murata> Rename cRecordBatch to cArrowRecordBatch
2c410cf6 <Kenta Murata> Remove namespace comments
55df564d <Kenta Murata> Rename files
0a4016b3 <Kenta Murata> Replace license headers
180c7c3e <Kenta Murata> Use static timestamp_range in benchmark
1413aa6b <Kouhei Sutou> Set PKG_CONFIG_PATH to build Red Arrow
509af8ff <Kenta Murata> Add benchmark task
e3fa4e62 <Kenta Murata> Fix word usage
31c8a270 <Kenta Murata> Use double quotations
9568a371 <Kenta Murata> Remove redundant sub test cases
7fb64a9b <Kenta Murata> Remove parentheses with empty argument
5bdd9952 <Kenta Murata> Remove needless require
56634502 <Kenta Murata> Remove arrow_ruby_compile function
794c7a27 <Kenta Murata> Revert needless changes
be026ed1 <Kouhei Sutou> Fix style
c0b7442d <Kouhei Sutou> Add "compile" task
99442745 <Kouhei Sutou> Add support "rake clean" and "rake clobber"
56bbc15b <Kouhei Sutou> Run extconf.rb automatically in test/run-test.rb
4d9440d0 <Kouhei Sutou> Use Ext++
d16b6e1d <Kouhei Sutou> Sort alphabetically
6217e862 <Kouhei Sutou> Add support for auto package install
08818907 <Kouhei Sutou> Remove rake-compiler dependency
38d6f530 <Kenta Murata> Make the default value of conver_decimal true
f3cd6724 <Kenta Murata> Move gem entries from Gemfile into gemspec
2b97d3e3 <Kenta Murata> Fix benchmarks
0a9f9ece <Kenta Murata> Fix the random state of Faker in benchmark
26c7ea17 <Kenta Murata> Add benchmark scripts
7eaa0690 <Kenta Murata> Separate raw_records test
de943457 <Kenta Murata> Fix missing const modifiers
40c4966c <Kenta Murata> Support Struct in Union
cba224a3 <Kenta Murata> Support Dictionary in Union
95517e3a <Kenta Murata> Add tests for dense union in dense union
86f8bc11 <Kenta Murata> Fix travis script
82ce1bcb <Kenta Murata> Add license comment
9df5765a <Kenta Murata> Refactoring test
5016c6a9 <Kenta Murata> Support Dictionary in UnionArray
462480d6 <Kenta Murata> Support Date32, Date64, and Timestamp in UnionArray
0d358a3c <Kenta Murata> Refactoring
5a2276bf <Kenta Murata> Use non-default field name for a list in a record batch
6b9de32d <Kenta Murata> Add support of FixedSizeBinary in Union
786e4388 <Kenta Murata> Refactoring of Decimal128 converter
fcd3ab95 <Kenta Murata> Support SparseUnion
ce10bcbd <Kenta Murata> Support Decimal128 in DenseUnion
5b14d106 <Kenta Murata> Add partial support of DenseUnion
432f05ae <Kenta Murata> Fix encoding bug
94fe1c48 <Kenta Murata> Add tentative support of HalfFloat
72a64f5e <Kenta Murata> Refactoring
61f0bc50 <Kenta Murata> Support Dictionary indices
63876209 <Kenta Murata> Support Struct
223db821 <Kenta Murata> Use RETURN_NOT_OK
cead59bb <Kenta Murata> Save errinfo if rb::error created from state
0186066c <Kenta Murata> Extract ArrayConverter class
294c7496 <Kenta Murata> Supply PKG_CONFIG_PATH to rake compile
53fa1ffd <Kenta Murata> Fix CI script
17a69c2a <Kenta Murata> Support List
88119d5a <Kenta Murata> Remove pure-Ruby version
aa52a174 <Kenta Murata> Tweak comment and error message
6c8a2e2c <Kenta Murata> Add tentative supports of Time32 and Time64
d33d77d2 <Kenta Murata> Support Timestamp
72935bc9 <Kenta Murata> Support Date32 and Date64
93518946 <Kenta Murata> Use rb_jump_tag to raise deferred exception
476f544b <Kenta Murata> Add convert_decimal kwarg
54e59e78 <Kenta Murata> Fix VisitValue for nil
5f7a78ca <Kenta Murata> Use RawRecordsBuilder
d8d54d67 <Kenta Murata> Add RawRecordsBuilder
4d6d6392 <Kenta Murata> Update test case
8d68d5bc <Kenta Murata> Add a partial native implementation of RecordBatch#raw_records
25a1925c <Kenta Murata> Add test and tentative implementation of RecordBatch#raw_records
---
ci/travis_script_ruby.sh | 11 +-
ruby/red-arrow-cuda/test/run-test.rb | 2 +
ruby/red-arrow/.gitignore | 2 +
ruby/red-arrow/Rakefile | 53 +-
ruby/red-arrow/benchmark/raw-records/boolean.yml | 65 ++
.../red-arrow/benchmark/raw-records/decimal128.yml | 66 ++
.../red-arrow/benchmark/raw-records/dictionary.yml | 73 ++
ruby/red-arrow/benchmark/raw-records/int64.yml | 65 ++
ruby/red-arrow/benchmark/raw-records/list.yml | 68 ++
ruby/red-arrow/benchmark/raw-records/string.yml | 65 ++
ruby/red-arrow/benchmark/raw-records/timestamp.yml | 72 ++
ruby/red-arrow/dependency-check/Rakefile | 43 --
ruby/red-arrow/ext/arrow/arrow.cpp | 43 ++
ruby/red-arrow/ext/arrow/extconf.rb | 46 ++
ruby/red-arrow/ext/arrow/record-batch.cpp | 756 +++++++++++++++++++++
ruby/red-arrow/ext/arrow/red-arrow.hpp | 52 ++
.../arrow/binary-array-builder.rb} | 39 +-
ruby/red-arrow/lib/arrow/data-type.rb | 12 +-
ruby/red-arrow/lib/arrow/list-array-builder.rb | 2 +-
ruby/red-arrow/lib/arrow/loader.rb | 6 +
ruby/red-arrow/lib/arrow/struct-array-builder.rb | 6 +-
ruby/red-arrow/red-arrow.gemspec | 8 +-
.../raw-records/record-batch/test-basic-arrays.rb | 349 ++++++++++
.../record-batch/test-dense-union-array.rb | 487 +++++++++++++
.../raw-records/record-batch/test-list-array.rb | 499 ++++++++++++++
.../record-batch/test-multiple-columns.rb | 49 ++
.../record-batch/test-sparse-union-array.rb | 475 +++++++++++++
.../raw-records/record-batch/test-struct-array.rb | 427 ++++++++++++
ruby/red-arrow/test/run-test.rb | 21 +
ruby/red-arrow/test/test-data-type.rb | 5 +
ruby/red-gandiva/test/run-test.rb | 2 +
ruby/red-parquet/test/run-test.rb | 2 +
ruby/red-plasma/test/run-test.rb | 2 +
33 files changed, 3793 insertions(+), 80 deletions(-)
diff --git a/ci/travis_script_ruby.sh b/ci/travis_script_ruby.sh
index 7d69bee..0ae85b4 100755
--- a/ci/travis_script_ruby.sh
+++ b/ci/travis_script_ruby.sh
@@ -23,11 +23,16 @@ source $TRAVIS_BUILD_DIR/ci/travis_env_common.sh
arrow_ruby_run_test()
{
- local arrow_c_glib_lib_dir=$1
+ local arrow_c_glib_lib_dir="$1"
- export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$arrow_c_glib_lib_dir
- export GI_TYPELIB_PATH=$arrow_c_glib_lib_dir/girepository-1.0
+ local ld_library_path_keep="$LD_LIBRARY_PATH"
+ local pkg_config_path_keep="$PKG_COFNIG_PATH"
+ LD_LIBRARY_PATH="${arrow_c_glib_lib_dir}:${LD_LIBRARY_PATH}"
+ PKG_CONFIG_PATH="${arrow_c_glib_lib_dir}/pkgconfig:${PKG_CONFIG_PATH}"
+ export GI_TYPELIB_PATH="${arrow_c_glib_lib_dir}/girepository-1.0"
test/run-test.rb
+ LD_LIBRARY_PATH="$ld_library_path_keep"
+ PKG_CONFIG_PATH="$pkg_config_path_keep"
}
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$ARROW_CPP_INSTALL/lib
diff --git a/ruby/red-arrow-cuda/test/run-test.rb b/ruby/red-arrow-cuda/test/run-test.rb
index b826f3e..a4f7f76 100755
--- a/ruby/red-arrow-cuda/test/run-test.rb
+++ b/ruby/red-arrow-cuda/test/run-test.rb
@@ -28,7 +28,9 @@ lib_dir = base_dir + "lib"
test_dir = base_dir + "test"
arrow_lib_dir = arrow_base_dir + "lib"
+arrow_ext_dir = arrow_base_dir + "ext" + "arrow"
+$LOAD_PATH.unshift(arrow_ext_dir.to_s)
$LOAD_PATH.unshift(arrow_lib_dir.to_s)
$LOAD_PATH.unshift(lib_dir.to_s)
diff --git a/ruby/red-arrow/.gitignore b/ruby/red-arrow/.gitignore
index 68e4b5c..e41483f 100644
--- a/ruby/red-arrow/.gitignore
+++ b/ruby/red-arrow/.gitignore
@@ -17,4 +17,6 @@
/.yardoc/
/doc/reference/
+/ext/arrow/Makefile
+/ext/arrow/mkmf.log
/pkg/
diff --git a/ruby/red-arrow/Rakefile b/ruby/red-arrow/Rakefile
index a3ece36..af7ed9b 100644
--- a/ruby/red-arrow/Rakefile
+++ b/ruby/red-arrow/Rakefile
@@ -17,27 +17,72 @@
# specific language governing permissions and limitations
# under the License.
-require "rubygems"
require "bundler/gem_helper"
+require "rake/clean"
require "yard"
base_dir = File.join(__dir__)
helper = Bundler::GemHelper.new(base_dir)
helper.install
+spec = helper.gemspec
release_task = Rake::Task["release"]
release_task.prerequisites.replace(["build", "release:rubygem_push"])
+def run_extconf(extension_dir, *arguments)
+ cd(extension_dir) do
+ ruby("extconf.rb", *arguments)
+ end
+end
+
+spec.extensions.each do |extension|
+ extension_dir = File.dirname(extension)
+ CLOBBER << File.join(extension_dir, "Makefile")
+ CLOBBER << File.join(extension_dir, "mkmf.log")
+
+ makefile = File.join(extension_dir, "Makefile")
+ file makefile do
+ run_extconf(extension_dir)
+ end
+
+ desc "Configure"
+ task :configure do
+ run_extconf(extension_dir)
+ end
+
+ desc "Compile"
+ task :compile => makefile do
+ cd(extension_dir) do
+ sh("make")
+ end
+ end
+
+ task :clean do
+ cd(extension_dir) do
+ sh("make", "clean") if File.exist?("Makefile")
+ end
+ end
+end
+
desc "Run tests"
task :test do
- cd("dependency-check") do
- ruby("-S", "rake")
- end
ruby("test/run-test.rb")
end
task default: :test
+desc "Run benchmarks"
+task :benchmark do
+ benchmarks = if ENV["BENCHMARKS"]
+ ENV["BENCHMARKS"].split
+ else
+ FileList["benchmark/{,*/**/}*.yml"]
+ end
+ benchmarks.each do |benchmark|
+ sh("benchmark-driver", benchmark)
+ end
+end
+
YARD::Rake::YardocTask.new do |task|
end
diff --git a/ruby/red-arrow/benchmark/raw-records/boolean.yml b/ruby/red-arrow/benchmark/raw-records/boolean.yml
new file mode 100644
index 0000000..5e2551e
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/boolean.yml
@@ -0,0 +1,65 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ type = :boolean
+
+ fields = {}
+ arrays = {}
+ n_columns.times do |i|
+ column_name = "column_#{i}"
+ fields[column_name] = type
+ arrays[column_name] = n_rows.times.map { Faker::Boolean.boolean }
+ end
+ record_batch = Arrow::RecordBatch.new(fields, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ record << columns[j][i]
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records
diff --git a/ruby/red-arrow/benchmark/raw-records/decimal128.yml b/ruby/red-arrow/benchmark/raw-records/decimal128.yml
new file mode 100644
index 0000000..9b2fb2e
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/decimal128.yml
@@ -0,0 +1,66 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ type = Arrow::Decimal128DataType.new(10, 5)
+
+ fields = {}
+ arrays = {}
+ n_columns.times do |i|
+ column_name = "column_#{i}"
+ fields[column_name] = type
+ arrays[column_name] = n_rows.times.map { Faker::Number.decimal(10, 5) }
+ end
+ record_batch = Arrow::RecordBatch.new(fields, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ x = columns[j][i]
+ record << BigDecimal(x.to_s)
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records()
diff --git a/ruby/red-arrow/benchmark/raw-records/dictionary.yml b/ruby/red-arrow/benchmark/raw-records/dictionary.yml
new file mode 100644
index 0000000..3b60abd
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/dictionary.yml
@@ -0,0 +1,73 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ dictionary = Arrow::StringArray.new(
+ 100.times.map { Faker::Book.genre }.uniq.sort
+ )
+ type = Arrow::DictionaryDataType.new(:int8, dictionary, true)
+
+ fields = n_columns.times.map {|i| ["column_#{i}".to_sym, type] }.to_h
+ schema = Arrow::Schema.new(**fields)
+ arrays = n_columns.times.map do
+ Arrow::DictionaryArray.new(
+ type,
+ Arrow::Int8Array.new(
+ n_rows.times.map {
+ Faker::Number.within(0 ... dictionary.length)
+ }
+ )
+ )
+ end
+ record_batch = Arrow::RecordBatch.new(schema, n_rows, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ record << columns[j].indices[i]
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records
diff --git a/ruby/red-arrow/benchmark/raw-records/int64.yml b/ruby/red-arrow/benchmark/raw-records/int64.yml
new file mode 100644
index 0000000..65d7b11
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/int64.yml
@@ -0,0 +1,65 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ type = :int64
+
+ fields = {}
+ arrays = {}
+ n_columns.times do |i|
+ column_name = "column_#{i}"
+ fields[column_name] = type
+ arrays[column_name] = n_rows.times.map { Faker::Number.number(18).to_i }
+ end
+ record_batch = Arrow::RecordBatch.new(fields, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ record << columns[j][i]
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records
diff --git a/ruby/red-arrow/benchmark/raw-records/list.yml b/ruby/red-arrow/benchmark/raw-records/list.yml
new file mode 100644
index 0000000..f29b26f
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/list.yml
@@ -0,0 +1,68 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ type = Arrow::ListDataType.new(name: "values", type: :double)
+
+ fields = {}
+ arrays = {}
+ n_columns.times do |i|
+ column_name = "column_#{i}"
+ fields[column_name] = type
+ arrays[column_name] = n_rows.times.map {
+ len = Faker::Number.within(1 ... 100)
+ len.times.map { Faker::Number.normal(0, 1e+6) }
+ }
+ end
+ record_batch = Arrow::RecordBatch.new(fields, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ record << columns[j][i]
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records
diff --git a/ruby/red-arrow/benchmark/raw-records/string.yml b/ruby/red-arrow/benchmark/raw-records/string.yml
new file mode 100644
index 0000000..2854a37
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/string.yml
@@ -0,0 +1,65 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ type = :string
+
+ fields = {}
+ arrays = {}
+ n_columns.times do |i|
+ column_name = "column_#{i}"
+ fields[column_name] = type
+ arrays[column_name] = n_rows.times.map { Faker::Name.name }
+ end
+ record_batch = Arrow::RecordBatch.new(fields, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ record << columns[j][i]
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records
diff --git a/ruby/red-arrow/benchmark/raw-records/timestamp.yml b/ruby/red-arrow/benchmark/raw-records/timestamp.yml
new file mode 100644
index 0000000..b57570f
--- /dev/null
+++ b/ruby/red-arrow/benchmark/raw-records/timestamp.yml
@@ -0,0 +1,72 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+contexts:
+ - name: master
+ prelude: |
+ $LOAD_PATH.unshift(File.expand_path("ext/arrow"))
+ $LOAD_PATH.unshift(File.expand_path("lib"))
+prelude: |-
+ require "arrow"
+ require "faker"
+
+ state = ENV.fetch("FAKER_RANDOM_SEED", 17).to_i
+ Faker::Config.random = Random.new(state)
+
+ n_rows = 1000
+ n_columns = 10
+ type = Arrow::TimestampDataType.new(:micro)
+ base_timestamp = Time.at(Faker::Number.within(0 ... 1_000_000_000))
+ thirty_days_in_sec = 30*24*3600
+ timestamp_range = [base_timestamp - thirty_days_in_sec, base_timestamp + thirty_days_in_sec]
+
+ fields = {}
+ arrays = {}
+ n_columns.times do |i|
+ column_name = "column_#{i}"
+ fields[column_name] = type
+ arrays[column_name] = n_rows.times.map {
+ sec = Faker::Time.between(*timestamp_range).to_i
+ micro = Faker::Number.within(0 ... 1_000_000)
+ sec * 1_000_000 + micro
+ }
+ end
+ record_batch = Arrow::RecordBatch.new(fields, arrays)
+
+ def pure_ruby_raw_records(record_batch)
+ n_rows = record_batch.n_rows
+ n_columns = record_batch.n_columns
+ columns = record_batch.columns
+ records = []
+ i = 0
+ while i < n_rows
+ record = []
+ j = 0
+ while j < n_columns
+ record << columns[j][i]
+ j += 1
+ end
+ records << record
+ i += 1
+ end
+ records
+ end
+benchmark:
+ pure_ruby: |-
+ pure_ruby_raw_records(record_batch)
+ raw_records: |-
+ record_batch.raw_records
diff --git a/ruby/red-arrow/dependency-check/Rakefile b/ruby/red-arrow/dependency-check/Rakefile
deleted file mode 100644
index e80e732..0000000
--- a/ruby/red-arrow/dependency-check/Rakefile
+++ /dev/null
@@ -1,43 +0,0 @@
-# -*- ruby -*-
-#
-# Licensed to the Apache Software Foundation (ASF) under one
-# or more contributor license agreements. See the NOTICE file
-# distributed with this work for additional information
-# regarding copyright ownership. The ASF licenses this file
-# to you under the Apache License, Version 2.0 (the
-# "License"); you may not use this file except in compliance
-# with the License. You may obtain a copy of the License at
-#
-# http://www.apache.org/licenses/LICENSE-2.0
-#
-# Unless required by applicable law or agreed to in writing,
-# software distributed under the License is distributed on an
-# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
-# KIND, either express or implied. See the License for the
-# specific language governing permissions and limitations
-# under the License.
-
-require "pkg-config"
-require "native-package-installer"
-
-case RUBY_PLATFORM
-when /mingw|mswin/
- task :default => "nothing"
-else
- task :default => "dependency:check"
-end
-
-task :nothing do
-end
-
-namespace :dependency do
- desc "Check dependency"
- task :check do
- unless PKGConfig.check_version?("arrow-glib", 0, 9, 0)
- unless NativePackageInstaller.install(:debian => "libarrow-glib-dev",
- :redhat => "arrow-glib-devel")
- exit(false)
- end
- end
- end
-end
diff --git a/ruby/red-arrow/ext/arrow/arrow.cpp b/ruby/red-arrow/ext/arrow/arrow.cpp
new file mode 100644
index 0000000..48b98fb
--- /dev/null
+++ b/ruby/red-arrow/ext/arrow/arrow.cpp
@@ -0,0 +1,43 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "red-arrow.hpp"
+
+#include <ruby.hpp>
+
+namespace red_arrow {
+ VALUE cDate;
+ ID id_BigDecimal;
+ ID id_jd;
+ ID id_to_datetime;
+}
+
+extern "C" void Init_arrow() {
+ auto mArrow = rb_const_get_at(rb_cObject, rb_intern("Arrow"));
+ auto cArrowRecordBatch = rb_const_get_at(mArrow, rb_intern("RecordBatch"));
+ rb_define_method(cArrowRecordBatch, "raw_records",
+ reinterpret_cast<rb::RawMethod>(red_arrow::record_batch_raw_records),
+ 0);
+
+ red_arrow::cDate = rb_const_get(rb_cObject, rb_intern("Date"));
+
+ red_arrow::id_BigDecimal = rb_intern("BigDecimal");
+ red_arrow::id_jd = rb_intern("jd");
+ red_arrow::id_to_datetime = rb_intern("to_datetime");
+}
diff --git a/ruby/red-arrow/ext/arrow/extconf.rb b/ruby/red-arrow/ext/arrow/extconf.rb
new file mode 100644
index 0000000..a8b9a0b
--- /dev/null
+++ b/ruby/red-arrow/ext/arrow/extconf.rb
@@ -0,0 +1,46 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+require "extpp"
+require "mkmf-gnome2"
+
+unless required_pkg_config_package("arrow",
+ debian: "libarrow-dev",
+ redhat: "arrow-devel",
+ homebrew: "apache-arrow",
+ msys2: "arrow")
+ exit(false)
+end
+
+unless required_pkg_config_package("arrow-glib",
+ debian: "libarrow-glib-dev",
+ redhat: "arrow-glib-devel",
+ homebrew: "apache-arrow-glib",
+ msys2: "arrow")
+ exit(false)
+end
+
+[
+ ["glib2", "ext/glib2"],
+].each do |name, relative_source_dir|
+ spec = find_gem_spec(name)
+ source_dir = File.join(spec.full_gem_path, relative_source_dir)
+ build_dir = source_dir
+ add_depend_package_path(name, source_dir, build_dir)
+end
+
+create_makefile("arrow")
diff --git a/ruby/red-arrow/ext/arrow/record-batch.cpp b/ruby/red-arrow/ext/arrow/record-batch.cpp
new file mode 100644
index 0000000..506c8e1
--- /dev/null
+++ b/ruby/red-arrow/ext/arrow/record-batch.cpp
@@ -0,0 +1,756 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#include "red-arrow.hpp"
+
+#include <ruby.hpp>
+#include <ruby/encoding.h>
+
+#include <arrow-glib/error.hpp>
+
+#include <arrow/util/logging.h>
+
+namespace red_arrow {
+ namespace {
+ using Status = arrow::Status;
+
+ void check_status(const Status&& status, const char* context) {
+ GError* error = nullptr;
+ if (!garrow_error_check(&error, status, context)) {
+ RG_RAISE_ERROR(error);
+ }
+ }
+
+ class ListArrayValueConverter;
+ class StructArrayValueConverter;
+ class UnionArrayValueConverter;
+ class DictionaryArrayValueConverter;
+
+ class ArrayValueConverter {
+ public:
+ ArrayValueConverter()
+ : decimal_buffer_(),
+ list_array_value_converter_(nullptr),
+ struct_array_value_converter_(nullptr),
+ union_array_value_converter_(nullptr),
+ dictionary_array_value_converter_(nullptr) {
+ }
+
+ void set_sub_value_converters(ListArrayValueConverter* list_array_value_converter,
+ StructArrayValueConverter* struct_array_value_converter,
+ UnionArrayValueConverter* union_array_value_converter,
+ DictionaryArrayValueConverter* dictionary_array_value_converter) {
+ list_array_value_converter_ = list_array_value_converter;
+ struct_array_value_converter_ = struct_array_value_converter;
+ union_array_value_converter_ = union_array_value_converter;
+ dictionary_array_value_converter_ = dictionary_array_value_converter;
+ }
+
+ inline VALUE convert(const arrow::NullArray& array,
+ const int64_t i) {
+ return Qnil;
+ }
+
+ inline VALUE convert(const arrow::BooleanArray& array,
+ const int64_t i) {
+ return array.Value(i) ? Qtrue : Qfalse;
+ }
+
+ inline VALUE convert(const arrow::Int8Array& array,
+ const int64_t i) {
+ return INT2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::Int16Array& array,
+ const int64_t i) {
+ return INT2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::Int32Array& array,
+ const int64_t i) {
+ return INT2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::Int64Array& array,
+ const int64_t i) {
+ return LL2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::UInt8Array& array,
+ const int64_t i) {
+ return UINT2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::UInt16Array& array,
+ const int64_t i) {
+ return UINT2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::UInt32Array& array,
+ const int64_t i) {
+ return UINT2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::UInt64Array& array,
+ const int64_t i) {
+ return ULL2NUM(array.Value(i));
+ }
+
+ // TODO
+ // inline VALUE convert(const arrow::HalfFloatArray& array,
+ // const int64_t i) {
+ // }
+
+ inline VALUE convert(const arrow::FloatArray& array,
+ const int64_t i) {
+ return DBL2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::DoubleArray& array,
+ const int64_t i) {
+ return DBL2NUM(array.Value(i));
+ }
+
+ inline VALUE convert(const arrow::BinaryArray& array,
+ const int64_t i) {
+ int32_t length;
+ const auto value = array.GetValue(i, &length);
+ // TODO: encoding support
+ return rb_enc_str_new(reinterpret_cast<const char*>(value),
+ length,
+ rb_ascii8bit_encoding());
+ }
+
+ inline VALUE convert(const arrow::StringArray& array,
+ const int64_t i) {
+ int32_t length;
+ const auto value = array.GetValue(i, &length);
+ return rb_utf8_str_new(reinterpret_cast<const char*>(value),
+ length);
+ }
+
+ inline VALUE convert(const arrow::FixedSizeBinaryArray& array,
+ const int64_t i) {
+ return rb_enc_str_new(reinterpret_cast<const char*>(array.Value(i)),
+ array.byte_width(),
+ rb_ascii8bit_encoding());
+ }
+
+ constexpr static int32_t JULIAN_DATE_UNIX_EPOCH = 2440588;
+ inline VALUE convert(const arrow::Date32Array& array,
+ const int64_t i) {
+ const auto value = array.Value(i);
+ const auto days_in_julian = value + JULIAN_DATE_UNIX_EPOCH;
+ return rb_funcall(cDate, id_jd, 1, LONG2NUM(days_in_julian));
+ }
+
+ inline VALUE convert(const arrow::Date64Array& array,
+ const int64_t i) {
+ const auto value = array.Value(i);
+ auto msec = LL2NUM(value);
+ auto sec = rb_rational_new(msec, INT2NUM(1000));
+ auto time_value = rb_time_num_new(sec, Qnil);
+ return rb_funcall(time_value, id_to_datetime, 0, 0);
+ }
+
+ inline VALUE convert(const arrow::Time32Array& array,
+ const int64_t i) {
+ // TODO: unit treatment
+ const auto value = array.Value(i);
+ return INT2NUM(value);
+ }
+
+ inline VALUE convert(const arrow::Time64Array& array,
+ const int64_t i) {
+ // TODO: unit treatment
+ const auto value = array.Value(i);
+ return LL2NUM(value);
+ }
+
+ inline VALUE convert(const arrow::TimestampArray& array,
+ const int64_t i) {
+ const auto type =
+ arrow::internal::checked_cast<const arrow::TimestampType*>(array.type().get());
+ auto scale = time_unit_to_scale(type->unit());
+ if (NIL_P(scale)) {
+ rb_raise(rb_eArgError, "Invalid TimeUnit");
+ }
+ auto value = array.Value(i);
+ auto sec = rb_rational_new(LL2NUM(value), scale);
+ return rb_time_num_new(sec, Qnil);
+ }
+
+ // TODO
+ // inline VALUE convert(const arrow::IntervalArray& array,
+ // const int64_t i) {
+ // };
+
+ VALUE convert(const arrow::ListArray& array,
+ const int64_t i);
+
+ VALUE convert(const arrow::StructArray& array,
+ const int64_t i);
+
+ VALUE convert(const arrow::UnionArray& array,
+ const int64_t i);
+
+ VALUE convert(const arrow::DictionaryArray& array,
+ const int64_t i);
+
+ inline VALUE convert(const arrow::Decimal128Array& array,
+ const int64_t i) {
+ decimal_buffer_ = array.FormatValue(i);
+ return rb_funcall(rb_cObject,
+ id_BigDecimal,
+ 1,
+ rb_enc_str_new(decimal_buffer_.data(),
+ decimal_buffer_.length(),
+ rb_ascii8bit_encoding()));
+ }
+
+ private:
+ std::string decimal_buffer_;
+ ListArrayValueConverter* list_array_value_converter_;
+ StructArrayValueConverter* struct_array_value_converter_;
+ UnionArrayValueConverter* union_array_value_converter_;
+ DictionaryArrayValueConverter* dictionary_array_value_converter_;
+ };
+
+ class ListArrayValueConverter : public arrow::ArrayVisitor {
+ public:
+ explicit ListArrayValueConverter(ArrayValueConverter* converter)
+ : array_value_converter_(converter),
+ offset_(0),
+ length_(0),
+ result_(Qnil) {}
+
+ VALUE convert(const arrow::ListArray& array, const int64_t index) {
+ auto values = array.values().get();
+ auto offset_keep = offset_;
+ auto length_keep = length_;
+ offset_ = array.value_offset(index);
+ length_ = array.value_length(index);
+ auto result_keep = result_;
+ result_ = rb_ary_new_capa(length_);
+ check_status(values->Accept(this),
+ "[raw-records][list-array]");
+ offset_ = offset_keep;
+ length_ = length_keep;
+ auto result_return = result_;
+ result_ = result_keep;
+ return result_return;
+ }
+
+#define VISIT(TYPE) \
+ Status Visit(const arrow::TYPE ## Array& array) override { \
+ return visit_value(array); \
+ }
+
+ VISIT(Null)
+ VISIT(Boolean)
+ VISIT(Int8)
+ VISIT(Int16)
+ VISIT(Int32)
+ VISIT(Int64)
+ VISIT(UInt8)
+ VISIT(UInt16)
+ VISIT(UInt32)
+ VISIT(UInt64)
+ // TODO
+ // VISIT(HalfFloat)
+ VISIT(Float)
+ VISIT(Double)
+ VISIT(Binary)
+ VISIT(String)
+ VISIT(FixedSizeBinary)
+ VISIT(Date32)
+ VISIT(Date64)
+ VISIT(Time32)
+ VISIT(Time64)
+ VISIT(Timestamp)
+ // TODO
+ // VISIT(Interval)
+ VISIT(List)
+ VISIT(Struct)
+ VISIT(Union)
+ VISIT(Dictionary)
+ VISIT(Decimal128)
+ // TODO
+ // VISIT(Extension)
+
+#undef VISIT
+
+ private:
+ template <typename ArrayType>
+ inline VALUE convert_value(const ArrayType& array,
+ const int64_t i) {
+ return array_value_converter_->convert(array, i);
+ }
+
+ template <typename ArrayType>
+ Status visit_value(const ArrayType& array) {
+ if (array.null_count() > 0) {
+ for (int64_t i = 0; i < length_; ++i) {
+ auto value = Qnil;
+ if (!array.IsNull(i + offset_)) {
+ value = convert_value(array, i + offset_);
+ }
+ rb_ary_push(result_, value);
+ }
+ } else {
+ for (int64_t i = 0; i < length_; ++i) {
+ rb_ary_push(result_, convert_value(array, i + offset_));
+ }
+ }
+ return Status::OK();
+ }
+
+ ArrayValueConverter* array_value_converter_;
+ int32_t offset_;
+ int32_t length_;
+ VALUE result_;
+ };
+
+ class StructArrayValueConverter : public arrow::ArrayVisitor {
+ public:
+ explicit StructArrayValueConverter(ArrayValueConverter* converter)
+ : array_value_converter_(converter),
+ key_(Qnil),
+ index_(0),
+ result_(Qnil) {}
+
+ VALUE convert(const arrow::StructArray& array,
+ const int64_t index) {
+ auto index_keep = index_;
+ auto result_keep = result_;
+ index_ = index;
+ result_ = rb_hash_new();
+ const auto struct_type = array.struct_type();
+ const auto n = struct_type->num_children();
+ for (int i = 0; i < n; ++i) {
+ const auto field_type = struct_type->child(i).get();
+ const auto& field_name = field_type->name();
+ auto key_keep = key_;
+ key_ = rb_utf8_str_new(field_name.data(), field_name.length());
+ const auto field_array = array.field(i).get();
+ check_status(field_array->Accept(this),
+ "[raw-records][struct-array]");
+ key_ = key_keep;
+ }
+ auto result_return = result_;
+ result_ = result_keep;
+ index_ = index_keep;
+ return result_return;
+ }
+
+#define VISIT(TYPE) \
+ Status Visit(const arrow::TYPE ## Array& array) override { \
+ fill_field(array); \
+ return Status::OK(); \
+ }
+
+ VISIT(Null)
+ VISIT(Boolean)
+ VISIT(Int8)
+ VISIT(Int16)
+ VISIT(Int32)
+ VISIT(Int64)
+ VISIT(UInt8)
+ VISIT(UInt16)
+ VISIT(UInt32)
+ VISIT(UInt64)
+ // TODO
+ // VISIT(HalfFloat)
+ VISIT(Float)
+ VISIT(Double)
+ VISIT(Binary)
+ VISIT(String)
+ VISIT(FixedSizeBinary)
+ VISIT(Date32)
+ VISIT(Date64)
+ VISIT(Time32)
+ VISIT(Time64)
+ VISIT(Timestamp)
+ // TODO
+ // VISIT(Interval)
+ VISIT(List)
+ VISIT(Struct)
+ VISIT(Union)
+ VISIT(Dictionary)
+ VISIT(Decimal128)
+ // TODO
+ // VISIT(Extension)
+
+#undef VISIT
+
+ private:
+ template <typename ArrayType>
+ inline VALUE convert_value(const ArrayType& array,
+ const int64_t i) {
+ return array_value_converter_->convert(array, i);
+ }
+
+ template <typename ArrayType>
+ void fill_field(const ArrayType& array) {
+ if (array.IsNull(index_)) {
+ rb_hash_aset(result_, key_, Qnil);
+ } else {
+ rb_hash_aset(result_, key_, convert_value(array, index_));
+ }
+ }
+
+ ArrayValueConverter* array_value_converter_;
+ VALUE key_;
+ int64_t index_;
+ VALUE result_;
+ };
+
+ class UnionArrayValueConverter : public arrow::ArrayVisitor {
+ public:
+ explicit UnionArrayValueConverter(ArrayValueConverter* converter)
+ : array_value_converter_(converter),
+ index_(0),
+ result_(Qnil) {}
+
+ VALUE convert(const arrow::UnionArray& array,
+ const int64_t index) {
+ const auto index_keep = index_;
+ const auto result_keep = result_;
+ index_ = index;
+ switch (array.mode()) {
+ case arrow::UnionMode::SPARSE:
+ convert_sparse(array);
+ break;
+ case arrow::UnionMode::DENSE:
+ convert_dense(array);
+ break;
+ default:
+ rb_raise(rb_eArgError, "Invalid union mode");
+ break;
+ }
+ auto result_return = result_;
+ index_ = index_keep;
+ result_ = result_keep;
+ return result_return;
+ }
+
+#define VISIT(TYPE) \
+ Status Visit(const arrow::TYPE ## Array& array) override { \
+ convert_value(array); \
+ return Status::OK(); \
+ }
+
+ VISIT(Null)
+ VISIT(Boolean)
+ VISIT(Int8)
+ VISIT(Int16)
+ VISIT(Int32)
+ VISIT(Int64)
+ VISIT(UInt8)
+ VISIT(UInt16)
+ VISIT(UInt32)
+ VISIT(UInt64)
+ // TODO
+ // VISIT(HalfFloat)
+ VISIT(Float)
+ VISIT(Double)
+ VISIT(Binary)
+ VISIT(String)
+ VISIT(FixedSizeBinary)
+ VISIT(Date32)
+ VISIT(Date64)
+ VISIT(Time32)
+ VISIT(Time64)
+ VISIT(Timestamp)
+ // TODO
+ // VISIT(Interval)
+ VISIT(List)
+ VISIT(Struct)
+ VISIT(Union)
+ VISIT(Dictionary)
+ VISIT(Decimal128)
+ // TODO
+ // VISIT(Extension)
+
+#undef VISIT
+ private:
+ template <typename ArrayType>
+ inline void convert_value(const ArrayType& array) {
+ auto result = rb_hash_new();
+ if (array.IsNull(index_)) {
+ rb_hash_aset(result, field_name_, Qnil);
+ } else {
+ rb_hash_aset(result,
+ field_name_,
+ array_value_converter_->convert(array, index_));
+ }
+ result_ = result;
+ }
+
+ uint8_t compute_child_index(const arrow::UnionArray& array,
+ arrow::UnionType* type,
+ const char* tag) {
+ const auto type_id = array.raw_type_ids()[index_];
+ const auto& type_codes = type->type_codes();
+ for (uint8_t i = 0; i < type_codes.size(); ++i) {
+ if (type_codes[i] == type_id) {
+ return i;
+ }
+ }
+ check_status(Status::Invalid("Unknown type ID: ", type_id),
+ tag);
+ return 0;
+ }
+
+ void convert_sparse(const arrow::UnionArray& array) {
+ const auto type =
+ std::static_pointer_cast<arrow::UnionType>(array.type()).get();
+ const auto tag = "[raw-records][union-sparse-array]";
+ const auto child_index = compute_child_index(array, type, tag);
+ const auto child_field = type->child(child_index).get();
+ const auto& field_name = child_field->name();
+ const auto field_name_keep = field_name_;
+ field_name_ = rb_utf8_str_new(field_name.data(), field_name.length());
+ const auto child_array = array.child(child_index).get();
+ check_status(child_array->Accept(this), tag);
+ field_name_ = field_name_keep;
+ }
+
+ void convert_dense(const arrow::UnionArray& array) {
+ const auto type =
+ std::static_pointer_cast<arrow::UnionType>(array.type()).get();
+ const auto tag = "[raw-records][union-dense-array]";
+ const auto child_index = compute_child_index(array, type, tag);
+ const auto child_field = type->child(child_index).get();
+ const auto& field_name = child_field->name();
+ const auto field_name_keep = field_name_;
+ field_name_ = rb_utf8_str_new(field_name.data(), field_name.length());
+ const auto child_array = array.child(child_index);
+ const auto index_keep = index_;
+ index_ = array.value_offset(index_);
+ check_status(child_array->Accept(this), tag);
+ index_ = index_keep;
+ field_name_ = field_name_keep;
+ }
+
+ ArrayValueConverter* array_value_converter_;
+ int64_t index_;
+ VALUE field_name_;
+ VALUE result_;
+ };
+
+ class DictionaryArrayValueConverter : public arrow::ArrayVisitor {
+ public:
+ explicit DictionaryArrayValueConverter(ArrayValueConverter* converter)
+ : array_value_converter_(converter),
+ index_(0),
+ result_(Qnil) {
+ }
+
+ VALUE convert(const arrow::DictionaryArray& array,
+ const int64_t index) {
+ index_ = index;
+ auto indices = array.indices().get();
+ check_status(indices->Accept(this),
+ "[raw-records][dictionary-array]");
+ return result_;
+ }
+
+ // TODO: Convert to real value.
+#define VISIT(TYPE) \
+ Status Visit(const arrow::TYPE ## Array& array) override { \
+ result_ = convert_value(array, index_); \
+ return Status::OK(); \
+ }
+
+ VISIT(Int8)
+ VISIT(Int16)
+ VISIT(Int32)
+ VISIT(Int64)
+
+#undef VISIT
+
+ private:
+ template <typename ArrayType>
+ inline VALUE convert_value(const ArrayType& array,
+ const int64_t i) {
+ return array_value_converter_->convert(array, i);
+ }
+
+ ArrayValueConverter* array_value_converter_;
+ int64_t index_;
+ VALUE result_;
+ };
+
+ VALUE ArrayValueConverter::convert(const arrow::ListArray& array,
+ const int64_t i) {
+ return list_array_value_converter_->convert(array, i);
+ }
+
+ VALUE ArrayValueConverter::convert(const arrow::StructArray& array,
+ const int64_t i) {
+ return struct_array_value_converter_->convert(array, i);
+ }
+
+ VALUE ArrayValueConverter::convert(const arrow::UnionArray& array,
+ const int64_t i) {
+ return union_array_value_converter_->convert(array, i);
+ }
+
+ VALUE ArrayValueConverter::convert(const arrow::DictionaryArray& array,
+ const int64_t i) {
+ return dictionary_array_value_converter_->convert(array, i);
+ }
+
+ class RawRecordsBuilder : public arrow::ArrayVisitor {
+ public:
+ explicit RawRecordsBuilder(VALUE records, int n_columns)
+ : array_value_converter_(),
+ list_array_value_converter_(&array_value_converter_),
+ struct_array_value_converter_(&array_value_converter_),
+ union_array_value_converter_(&array_value_converter_),
+ dictionary_array_value_converter_(&array_value_converter_),
+ records_(records),
+ n_columns_(n_columns) {
+ array_value_converter_.
+ set_sub_value_converters(&list_array_value_converter_,
+ &struct_array_value_converter_,
+ &union_array_value_converter_,
+ &dictionary_array_value_converter_);
+ }
+
+ void build(const arrow::RecordBatch& record_batch) {
+ rb::protect([&] {
+ const auto n_rows = record_batch.num_rows();
+ for (int64_t i = 0; i < n_rows; ++i) {
+ auto record = rb_ary_new_capa(n_columns_);
+ rb_ary_push(records_, record);
+ }
+ for (int i = 0; i < n_columns_; ++i) {
+ const auto array = record_batch.column(i).get();
+ column_index_ = i;
+ check_status(array->Accept(this),
+ "[raw-records]");
+ }
+ return Qnil;
+ });
+ }
+
+#define VISIT(TYPE) \
+ Status Visit(const arrow::TYPE ## Array& array) override { \
+ convert(array); \
+ return Status::OK(); \
+ }
+
+ VISIT(Null)
+ VISIT(Boolean)
+ VISIT(Int8)
+ VISIT(Int16)
+ VISIT(Int32)
+ VISIT(Int64)
+ VISIT(UInt8)
+ VISIT(UInt16)
+ VISIT(UInt32)
+ VISIT(UInt64)
+ // TODO
+ // VISIT(HalfFloat)
+ VISIT(Float)
+ VISIT(Double)
+ VISIT(Binary)
+ VISIT(String)
+ VISIT(FixedSizeBinary)
+ VISIT(Date32)
+ VISIT(Date64)
+ VISIT(Time32)
+ VISIT(Time64)
+ VISIT(Timestamp)
+ // TODO
+ // VISIT(Interval)
+ VISIT(List)
+ VISIT(Struct)
+ VISIT(Union)
+ VISIT(Dictionary)
+ VISIT(Decimal128)
+ // TODO
+ // VISIT(Extension)
+
+#undef VISIT
+
+ private:
+ template <typename ArrayType>
+ inline VALUE convert_value(const ArrayType& array,
+ const int64_t i) {
+ return array_value_converter_.convert(array, i);
+ }
+
+ template <typename ArrayType>
+ void convert(const ArrayType& array) {
+ const auto n = array.length();
+ if (array.null_count() > 0) {
+ for (int64_t i = 0; i < n; ++i) {
+ auto value = Qnil;
+ if (!array.IsNull(i)) {
+ value = convert_value(array, i);
+ }
+ auto record = rb_ary_entry(records_, i);
+ rb_ary_store(record, column_index_, value);
+ }
+ } else {
+ for (int64_t i = 0; i < n; ++i) {
+ auto record = rb_ary_entry(records_, i);
+ rb_ary_store(record, column_index_, convert_value(array, i));
+ }
+ }
+ }
+
+ ArrayValueConverter array_value_converter_;
+ ListArrayValueConverter list_array_value_converter_;
+ StructArrayValueConverter struct_array_value_converter_;
+ UnionArrayValueConverter union_array_value_converter_;
+ DictionaryArrayValueConverter dictionary_array_value_converter_;
+
+ // Destination for converted records.
+ VALUE records_;
+
+ // The current column index.
+ int column_index_;
+
+ // The number of columns.
+ const int n_columns_;
+ };
+ }
+
+ VALUE
+ record_batch_raw_records(VALUE rb_record_batch) {
+ auto garrow_record_batch = GARROW_RECORD_BATCH(RVAL2GOBJ(rb_record_batch));
+ auto record_batch = garrow_record_batch_get_raw(garrow_record_batch).get();
+ const auto n_rows = record_batch->num_rows();
+ const auto n_columns = record_batch->num_columns();
+ auto records = rb_ary_new_capa(n_rows);
+
+ try {
+ RawRecordsBuilder builder(records, n_columns);
+ builder.build(*record_batch);
+ } catch (rb::State& state) {
+ state.jump();
+ }
+
+ return records;
+ }
+}
diff --git a/ruby/red-arrow/ext/arrow/red-arrow.hpp b/ruby/red-arrow/ext/arrow/red-arrow.hpp
new file mode 100644
index 0000000..5c9b846
--- /dev/null
+++ b/ruby/red-arrow/ext/arrow/red-arrow.hpp
@@ -0,0 +1,52 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements. See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership. The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License. You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied. See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+#pragma once
+
+#include <arrow/api.h>
+
+#include <arrow-glib/arrow-glib.hpp>
+#include <rbgobject.h>
+
+namespace red_arrow {
+ extern VALUE cDate;
+
+ extern ID id_BigDecimal;
+ extern ID id_jd;
+ extern ID id_to_datetime;
+
+ VALUE record_batch_raw_records(VALUE obj);
+
+ inline VALUE time_unit_to_scale(arrow::TimeUnit::type unit) {
+ switch (unit) {
+ case arrow::TimeUnit::SECOND:
+ return INT2FIX(1);
+ case arrow::TimeUnit::MILLI:
+ return INT2FIX(1000);
+ case arrow::TimeUnit::MICRO:
+ return INT2FIX(1000 * 1000);
+ case arrow::TimeUnit::NANO:
+ // NOTE: INT2FIX works for 1e+9 because: FIXNUM_MAX >= (1<<30) - 1 > 1e+9
+ return INT2FIX(1000 * 1000 * 1000);
+ default:
+ break; // NOT REACHED
+ }
+ return Qnil;
+ }
+}
diff --git a/ruby/red-arrow/test/run-test.rb b/ruby/red-arrow/lib/arrow/binary-array-builder.rb
old mode 100755
new mode 100644
similarity index 66%
copy from ruby/red-arrow/test/run-test.rb
copy to ruby/red-arrow/lib/arrow/binary-array-builder.rb
index 9551f60..c780374
--- a/ruby/red-arrow/test/run-test.rb
+++ b/ruby/red-arrow/lib/arrow/binary-array-builder.rb
@@ -1,5 +1,3 @@
-#!/usr/bin/env ruby
-#
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
@@ -17,21 +15,22 @@
# specific language governing permissions and limitations
# under the License.
-ENV["TZ"] = "Asia/Tokyo"
-
-$VERBOSE = true
-
-require "pathname"
-
-base_dir = Pathname.new(__dir__).parent.expand_path
-
-lib_dir = base_dir + "lib"
-test_dir = base_dir + "test"
-
-$LOAD_PATH.unshift(lib_dir.to_s)
-
-require_relative "helper"
-
-ENV["TEST_UNIT_MAX_DIFF_TARGET_STRING_SIZE"] ||= "10000"
-
-exit(Test::Unit::AutoRunner.run(true, test_dir.to_s))
+module Arrow
+ class BinaryArrayBuilder
+ def append_values(values, is_valids=nil)
+ if is_valids
+ is_valids.each_with_index do |is_valid, i|
+ if is_valid
+ append_value(values[i])
+ else
+ append_null
+ end
+ end
+ else
+ values.each do |value|
+ append_value(value)
+ end
+ end
+ end
+ end
+end
diff --git a/ruby/red-arrow/lib/arrow/data-type.rb b/ruby/red-arrow/lib/arrow/data-type.rb
index 03960e4..5b1c873 100644
--- a/ruby/red-arrow/lib/arrow/data-type.rb
+++ b/ruby/red-arrow/lib/arrow/data-type.rb
@@ -114,14 +114,18 @@ module Arrow
private
def resolve_class(data_type)
- data_type_name = data_type.to_s.capitalize.gsub(/\AUint/, "UInt")
+ components = data_type.to_s.split("_").collect(&:capitalize)
+ data_type_name = components.join.gsub(/\AUint/, "UInt")
data_type_class_name = "#{data_type_name}DataType"
unless Arrow.const_defined?(data_type_class_name)
available_types = []
Arrow.constants.each do |name|
- if name.to_s.end_with?("DataType")
- available_types << name.to_s.gsub(/DataType\z/, "").downcase.to_sym
- end
+ name = name.to_s
+ next if name == "DataType"
+ next unless name.end_with?("DataType")
+ name = name.gsub(/DataType\z/, "")
+ components = name.scan(/(UInt[0-9]+|[A-Z][a-z\d]+)/).flatten
+ available_types << components.collect(&:downcase).join("_").to_sym
end
message =
"unknown type: #{data_type.inspect}: " +
diff --git a/ruby/red-arrow/lib/arrow/list-array-builder.rb b/ruby/red-arrow/lib/arrow/list-array-builder.rb
index 1fa507f..d889c8a 100644
--- a/ruby/red-arrow/lib/arrow/list-array-builder.rb
+++ b/ruby/red-arrow/lib/arrow/list-array-builder.rb
@@ -56,7 +56,7 @@ module Arrow
when ::Array
append_value_raw
@value_builder ||= value_builder
- @value_builder.append_values(value, nil)
+ @value_builder.append(*value)
else
message = "list value must be nil or Array: #{value.inspect}"
raise ArgumentError, message
diff --git a/ruby/red-arrow/lib/arrow/loader.rb b/ruby/red-arrow/lib/arrow/loader.rb
index 6e0bf29..280229b 100644
--- a/ruby/red-arrow/lib/arrow/loader.rb
+++ b/ruby/red-arrow/lib/arrow/loader.rb
@@ -28,11 +28,13 @@ module Arrow
private
def post_load(repository, namespace)
require_libraries
+ require_extension_library
end
def require_libraries
require "arrow/array"
require "arrow/array-builder"
+ require "arrow/binary-array-builder"
require "arrow/chunked-array"
require "arrow/column"
require "arrow/compression-type"
@@ -79,6 +81,10 @@ module Arrow
require "arrow/writable"
end
+ def require_extension_library
+ require "arrow.so"
+ end
+
def load_object_info(info)
super
diff --git a/ruby/red-arrow/lib/arrow/struct-array-builder.rb b/ruby/red-arrow/lib/arrow/struct-array-builder.rb
index b56056c..0ed37ec 100644
--- a/ruby/red-arrow/lib/arrow/struct-array-builder.rb
+++ b/ruby/red-arrow/lib/arrow/struct-array-builder.rb
@@ -71,17 +71,17 @@ module Arrow
when ::Array
append_value_raw
value.each_with_index do |sub_value, i|
- self[i].append_value(sub_value)
+ self[i].append(sub_value)
end
when Arrow::Struct
append_value_raw
value.values.each_with_index do |sub_value, i|
- self[i].append_value(sub_value)
+ self[i].append(sub_value)
end
when Hash
append_value_raw
value.each do |name, sub_value|
- self[name].append_value(sub_value)
+ self[name].append(sub_value)
end
else
message =
diff --git a/ruby/red-arrow/red-arrow.gemspec b/ruby/red-arrow/red-arrow.gemspec
index 9451c9c..7c6320e 100644
--- a/ruby/red-arrow/red-arrow.gemspec
+++ b/ruby/red-arrow/red-arrow.gemspec
@@ -39,17 +39,21 @@ Gem::Specification.new do |spec|
spec.license = "Apache-2.0"
spec.files = ["README.md", "Rakefile", "Gemfile", "#{spec.name}.gemspec"]
spec.files += ["LICENSE.txt", "NOTICE.txt"]
+ spec.files += Dir.glob("ext/**/*.{cpp,hpp,rb}")
spec.files += Dir.glob("lib/**/*.rb")
spec.files += Dir.glob("image/*.*")
spec.files += Dir.glob("doc/text/*")
spec.test_files += Dir.glob("test/**/*")
- spec.extensions = ["dependency-check/Rakefile"]
+ spec.extensions = ["ext/arrow/extconf.rb"]
+ spec.add_runtime_dependency("extpp")
spec.add_runtime_dependency("gobject-introspection", ">= 3.3.5")
- spec.add_runtime_dependency("pkg-config")
spec.add_runtime_dependency("native-package-installer")
+ spec.add_runtime_dependency("pkg-config")
+ spec.add_development_dependency("benchmark-driver")
spec.add_development_dependency("bundler")
+ spec.add_development_dependency("faker")
spec.add_development_dependency("rake")
spec.add_development_dependency("redcarpet")
spec.add_development_dependency("test-unit")
diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-basic-arrays.rb b/ruby/red-arrow/test/raw-records/record-batch/test-basic-arrays.rb
new file mode 100644
index 0000000..eee2699
--- /dev/null
+++ b/ruby/red-arrow/test/raw-records/record-batch/test-basic-arrays.rb
@@ -0,0 +1,349 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+class RawRecordsRecordBatchBasicArraysTest < Test::Unit::TestCase
+ test("NullArray") do
+ records = [
+ [nil],
+ [nil],
+ [nil],
+ [nil],
+ ]
+ array = Arrow::NullArray.new(records.size)
+ schema = Arrow::Schema.new(column: :null)
+ record_batch = Arrow::RecordBatch.new(schema,
+ records.size,
+ [array])
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BooleanArray") do
+ records = [
+ [true],
+ [nil],
+ [false],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :boolean},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int8Array") do
+ records = [
+ [-(2 ** 7)],
+ [nil],
+ [(2 ** 7) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :int8},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt8Array") do
+ records = [
+ [0],
+ [nil],
+ [(2 ** 8) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :uint8},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int16Array") do
+ records = [
+ [-(2 ** 15)],
+ [nil],
+ [(2 ** 15) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :int16},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt16Array") do
+ records = [
+ [0],
+ [nil],
+ [(2 ** 16) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :uint16},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int32Array") do
+ records = [
+ [-(2 ** 31)],
+ [nil],
+ [(2 ** 31) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :int32},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt32Array") do
+ records = [
+ [0],
+ [nil],
+ [(2 ** 32) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :uint32},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int64Array") do
+ records = [
+ [-(2 ** 63)],
+ [nil],
+ [(2 ** 63) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :int64},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt64Array") do
+ records = [
+ [0],
+ [nil],
+ [(2 ** 64) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :uint64},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("FloatArray") do
+ records = [
+ [-1.0],
+ [nil],
+ [1.0],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :float},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DoubleArray") do
+ records = [
+ [-1.0],
+ [nil],
+ [1.0],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :double},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BinaryArray") do
+ records = [
+ ["\x00".b],
+ [nil],
+ ["\xff".b],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :binary},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StringArray") do
+ records = [
+ ["Ruby"],
+ [nil],
+ ["\u3042"], # U+3042 HIRAGANA LETTER A
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :string},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date32Array") do
+ records = [
+ [Date.new(1960, 1, 1)],
+ [nil],
+ [Date.new(2017, 8, 23)],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :date32},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date64Array") do
+ records = [
+ [DateTime.new(1960, 1, 1, 2, 9, 30)],
+ [nil],
+ [DateTime.new(2017, 8, 23, 14, 57, 2)],
+ ]
+ record_batch = Arrow::RecordBatch.new({column: :date64},
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ sub_test_case("TimestampArray") do
+ test("second") do
+ records = [
+ [Time.parse("1960-01-01T02:09:30Z")],
+ [nil],
+ [Time.parse("2017-08-23T14:57:02Z")],
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :timestamp,
+ unit: :second,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [Time.parse("1960-01-01T02:09:30.123Z")],
+ [nil],
+ [Time.parse("2017-08-23T14:57:02.987Z")],
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :timestamp,
+ unit: :milli,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("micro") do
+ records = [
+ [Time.parse("1960-01-01T02:09:30.123456Z")],
+ [nil],
+ [Time.parse("2017-08-23T14:57:02.987654Z")],
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :timestamp,
+ unit: :micro,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [Time.parse("1960-01-01T02:09:30.123456789Z")],
+ [nil],
+ [Time.parse("2017-08-23T14:57:02.987654321Z")],
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :timestamp,
+ unit: :nano,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time32Array") do
+ test("second") do
+ records = [
+ [60 * 10], # 00:10:00
+ [nil],
+ [60 * 60 * 2 + 9], # 02:00:09
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :time32,
+ unit: :second,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [(60 * 10) * 1000 + 123], # 00:10:00.123
+ [nil],
+ [(60 * 60 * 2 + 9) * 1000 + 987], # 02:00:09.987
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :time32,
+ unit: :milli,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time64Array") do
+ test("micro") do
+ records = [
+ [(60 * 10) * 1_000_000 + 123_456], # 00:10:00.123456
+ [nil],
+ [(60 * 60 * 2 + 9) * 1_000_000 + 987_654], # 02:00:09.987654
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :time64,
+ unit: :micro,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [(60 * 10) * 1_000_000_000 + 123_456_789], # 00:10:00.123456789
+ [nil],
+ [(60 * 60 * 2 + 9) * 1_000_000_000 + 987_654_321], # 02:00:09.987654321
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :time64,
+ unit: :nano,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ test("Decimal128Array") do
+ records = [
+ [BigDecimal("92.92")],
+ [nil],
+ [BigDecimal("29.29")],
+ ]
+ record_batch = Arrow::RecordBatch.new({
+ column: {
+ type: :decimal128,
+ precision: 8,
+ scale: 2,
+ }
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+end
diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-dense-union-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-dense-union-array.rb
new file mode 100644
index 0000000..8fdf02e
--- /dev/null
+++ b/ruby/red-arrow/test/raw-records/record-batch/test-dense-union-array.rb
@@ -0,0 +1,487 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+class RawRecordsRecordBatchDenseUnionArrayTest < Test::Unit::TestCase
+ def fields(type, type_codes)
+ field_description = {}
+ if type.is_a?(Hash)
+ field_description = field_description.merge(type)
+ else
+ field_description[:type] = type
+ end
+ {
+ column: {
+ type: :dense_union,
+ fields: [
+ field_description.merge(name: "0"),
+ field_description.merge(name: "1"),
+ ],
+ type_codes: type_codes,
+ },
+ }
+ end
+
+ # TODO: Use Arrow::RecordBatch.new(fields(type), records)
+ def build_record_batch(type, records)
+ type_codes = [0, 1]
+ schema = Arrow::Schema.new(fields(type, type_codes))
+ type_ids = []
+ offsets = []
+ arrays = schema.fields[0].data_type.fields.collect do |field|
+ sub_schema = Arrow::Schema.new([field])
+ sub_records = []
+ records.each do |record|
+ column = record[0]
+ next if column.nil?
+ next unless column.key?(field.name)
+ sub_records << [column[field.name]]
+ end
+ sub_record_batch = Arrow::RecordBatch.new(sub_schema,
+ sub_records)
+ sub_record_batch.columns[0]
+ end
+ records.each do |record|
+ column = record[0]
+ if column.nil?
+ type_ids << nil
+ offsets << 0
+ elsif column.key?("0")
+ type_id = type_codes[0]
+ type_ids << type_id
+ offsets << (type_ids.count(type_id) - 1)
+ elsif column.key?("1")
+ type_id = type_codes[1]
+ type_ids << type_id
+ offsets << (type_ids.count(type_id) - 1)
+ end
+ end
+ # TODO
+ # union_array = Arrow::DenseUnionArray.new(schema.fields[0].data_type,
+ # Arrow::Int8Array.new(type_ids),
+ # Arrow::Int32Array.new(offsets),
+ # arrays)
+ union_array = Arrow::DenseUnionArray.new(Arrow::Int8Array.new(type_ids),
+ Arrow::Int32Array.new(offsets),
+ arrays)
+ schema = Arrow::Schema.new(column: union_array.value_data_type)
+ Arrow::RecordBatch.new(schema,
+ records.size,
+ [union_array])
+ end
+
+ test("NullArray") do
+ omit("Need to add support for NullArrayBuilder")
+ records = [
+ [{"0" => nil}],
+ [nil],
+ ]
+ record_batch = build_record_batch(:null, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BooleanArray") do
+ records = [
+ [{"0" => true}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:boolean, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int8Array") do
+ records = [
+ [{"0" => -(2 ** 7)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int8, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt8Array") do
+ records = [
+ [{"0" => (2 ** 8) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint8, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int16Array") do
+ records = [
+ [{"0" => -(2 ** 15)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int16, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt16Array") do
+ records = [
+ [{"0" => (2 ** 16) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint16, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int32Array") do
+ records = [
+ [{"0" => -(2 ** 31)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int32, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt32Array") do
+ records = [
+ [{"0" => (2 ** 32) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint32, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int64Array") do
+ records = [
+ [{"0" => -(2 ** 63)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int64, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt64Array") do
+ records = [
+ [{"0" => (2 ** 64) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint64, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("FloatArray") do
+ records = [
+ [{"0" => -1.0}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:float, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DoubleArray") do
+ records = [
+ [{"0" => -1.0}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:double, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BinaryArray") do
+ records = [
+ [{"0" => "\xff".b}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:binary, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StringArray") do
+ records = [
+ [{"0" => "Ruby"}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:string, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date32Array") do
+ records = [
+ [{"0" => Date.new(1960, 1, 1)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:date32, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date64Array") do
+ records = [
+ [{"0" => DateTime.new(1960, 1, 1, 2, 9, 30)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:date64, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ sub_test_case("TimestampArray") do
+ test("second") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :second,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30.123Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :milli,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("micro") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30.123456Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :micro,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30.123456789Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :nano,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time32Array") do
+ test("second") do
+ records = [
+ [{"0" => 60 * 10}], # 00:10:00
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time32,
+ unit: :second,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [{"0" => (60 * 10) * 1000 + 123}], # 00:10:00.123
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time32,
+ unit: :milli,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time64Array") do
+ test("micro") do
+ records = [
+ [{"0" => (60 * 10) * 1_000_000 + 123_456}], # 00:10:00.123456
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time64,
+ unit: :micro,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ # 00:10:00.123456789
+ [{"0" => (60 * 10) * 1_000_000_000 + 123_456_789}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time64,
+ unit: :nano,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ test("Decimal128Array") do
+ records = [
+ [{"0" => BigDecimal("92.92")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :decimal128,
+ precision: 8,
+ scale: 2,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("ListArray") do
+ records = [
+ [{"0" => [true, nil, false]}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :list,
+ field: {
+ name: :sub_element,
+ type: :boolean,
+ },
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StructArray") do
+ records = [
+ [{"0" => {"sub_field" => true}}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => {"sub_field" => nil}}],
+ ]
+ record_batch = build_record_batch({
+ type: :struct,
+ fields: [
+ {
+ name: :sub_field,
+ type: :boolean,
+ },
+ ],
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("SparseUnionArray") do
+ omit("Need to add support for SparseUnionArrayBuilder")
+ records = [
+ [{"0" => {"field1" => true}}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => {"field2" => nil}}],
+ ]
+ record_batch = build_record_batch({
+ type: :sparse_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1],
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DenseUnionArray") do
+ omit("Need to add support for DenseUnionArrayBuilder")
+ records = [
+ [{"0" => {"field1" => true}}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => {"field2" => nil}}],
+ ]
+ record_batch = build_record_batch({
+ type: :dense_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1],
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DictionaryArray") do
+ omit("Need to add support for DictionaryArrayBuilder")
+ records = [
+ [{"0" => "Ruby"}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => "GLib"}],
+ ]
+ dictionary = Arrow::StringArray.new(["GLib", "Ruby"])
+ record_batch = build_record_batch({
+ type: :dictionary,
+ index_data_type: :int8,
+ dictionary: dictionary,
+ ordered: true,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+end
diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-list-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-list-array.rb
new file mode 100644
index 0000000..bf1af36
--- /dev/null
+++ b/ruby/red-arrow/test/raw-records/record-batch/test-list-array.rb
@@ -0,0 +1,499 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+class RawRecordsRecordBatchListArrayTest < Test::Unit::TestCase
+ def fields(type)
+ field_description = {
+ name: :element,
+ }
+ if type.is_a?(Hash)
+ field_description = field_description.merge(type)
+ else
+ field_description[:type] = type
+ end
+ {
+ column: {
+ type: :list,
+ field: field_description,
+ },
+ }
+ end
+
+ test("NullArray") do
+ omit("Need to add support for NullArrayBuilder")
+ records = [
+ [[nil, nil, nil]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:null),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BooleanArray") do
+ records = [
+ [[true, nil, false]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:boolean),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int8Array") do
+ records = [
+ [[-(2 ** 7), nil, (2 ** 7) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int8),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt8Array") do
+ records = [
+ [[0, nil, (2 ** 8) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint8),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int16Array") do
+ records = [
+ [[-(2 ** 15), nil, (2 ** 15) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int16),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt16Array") do
+ records = [
+ [[0, nil, (2 ** 16) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint16),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int32Array") do
+ records = [
+ [[-(2 ** 31), nil, (2 ** 31) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int32),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt32Array") do
+ records = [
+ [[0, nil, (2 ** 32) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint32),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int64Array") do
+ records = [
+ [[-(2 ** 63), nil, (2 ** 63) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int64),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt64Array") do
+ records = [
+ [[0, nil, (2 ** 64) - 1]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint64),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("FloatArray") do
+ records = [
+ [[-1.0, nil, 1.0]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:float),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DoubleArray") do
+ records = [
+ [[-1.0, nil, 1.0]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:double),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BinaryArray") do
+ records = [
+ [["\x00".b, nil, "\xff".b]],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:binary),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StringArray") do
+ records = [
+ [
+ [
+ "Ruby",
+ nil,
+ "\u3042", # U+3042 HIRAGANA LETTER A
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:string),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date32Array") do
+ records = [
+ [
+ [
+ Date.new(1960, 1, 1),
+ nil,
+ Date.new(2017, 8, 23),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:date32),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date64Array") do
+ records = [
+ [
+ [
+ DateTime.new(1960, 1, 1, 2, 9, 30),
+ nil,
+ DateTime.new(2017, 8, 23, 14, 57, 2),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:date64),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ sub_test_case("TimestampArray") do
+ test("second") do
+ records = [
+ [
+ [
+ Time.parse("1960-01-01T02:09:30Z"),
+ nil,
+ Time.parse("2017-08-23T14:57:02Z"),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :second),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [
+ [
+ Time.parse("1960-01-01T02:09:30.123Z"),
+ nil,
+ Time.parse("2017-08-23T14:57:02.987Z"),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :milli),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("micro") do
+ records = [
+ [
+ [
+ Time.parse("1960-01-01T02:09:30.123456Z"),
+ nil,
+ Time.parse("2017-08-23T14:57:02.987654Z"),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :micro),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [
+ [
+ Time.parse("1960-01-01T02:09:30.123456789Z"),
+ nil,
+ Time.parse("2017-08-23T14:57:02.987654321Z"),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :nano),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time32Array") do
+ test("second") do
+ records = [
+ [
+ [
+ 60 * 10, # 00:10:00
+ nil,
+ 60 * 60 * 2 + 9, # 02:00:09
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time32,
+ unit: :second),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [
+ [
+ (60 * 10) * 1000 + 123, # 00:10:00.123
+ nil,
+ (60 * 60 * 2 + 9) * 1000 + 987, # 02:00:09.987
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time32,
+ unit: :milli),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time64Array") do
+ test("micro") do
+ records = [
+ [
+ [
+ (60 * 10) * 1_000_000 + 123_456, # 00:10:00.123456
+ nil,
+ (60 * 60 * 2 + 9) * 1_000_000 + 987_654, # 02:00:09.987654
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time64,
+ unit: :micro),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [
+ [
+ (60 * 10) * 1_000_000_000 + 123_456_789, # 00:10:00.123456789
+ nil,
+ (60 * 60 * 2 + 9) * 1_000_000_000 + 987_654_321, # 02:00:09.987654321
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time64,
+ unit: :nano),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ test("Decimal128Array") do
+ records = [
+ [
+ [
+ BigDecimal("92.92"),
+ nil,
+ BigDecimal("29.29"),
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :decimal128,
+ precision: 8,
+ scale: 2),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("ListArray") do
+ records = [
+ [
+ [
+ [
+ true,
+ nil,
+ ],
+ nil,
+ [
+ nil,
+ false,
+ ],
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :list,
+ field: {
+ name: :sub_element,
+ type: :boolean,
+ }),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StructArray") do
+ records = [
+ [
+ [
+ {"field" => true},
+ nil,
+ {"field" => nil},
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :struct,
+ fields: [
+ {
+ name: :field,
+ type: :boolean,
+ },
+ ]),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("SparseUnionArray") do
+ omit("Need to add support for SparseUnionArrayBuilder")
+ records = [
+ [
+ [
+ {"field1" => true},
+ nil,
+ {"field2" => nil},
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :sparse_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1]),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DenseUnionArray") do
+ omit("Need to add support for DenseUnionArrayBuilder")
+ records = [
+ [
+ [
+ {"field1" => true},
+ nil,
+ {"field2" => nil},
+ ],
+ ],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :dense_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1]),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DictionaryArray") do
+ omit("Need to add support for DictionaryArrayBuilder")
+ records = [
+ [
+ [
+ "Ruby",
+ nil,
+ "GLib",
+ ],
+ ],
+ [nil],
+ ]
+ dictionary = Arrow::StringArray.new(["GLib", "Ruby"])
+ record_batch = Arrow::RecordBatch.new(fields(type: :dictionary,
+ index_data_type: :int8,
+ dictionary: dictionary,
+ ordered: true),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+end
diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-multiple-columns.rb b/ruby/red-arrow/test/raw-records/record-batch/test-multiple-columns.rb
new file mode 100644
index 0000000..c0e3631
--- /dev/null
+++ b/ruby/red-arrow/test/raw-records/record-batch/test-multiple-columns.rb
@@ -0,0 +1,49 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+class RawRecordsRecordBatchMultipleColumnsTest < Test::Unit::TestCase
+ test("3 elements") do
+ records = [
+ [true, nil, "Ruby"],
+ [nil, 0, "GLib"],
+ [false, 2 ** 8 - 1, nil],
+ ]
+ record_batch = Arrow::RecordBatch.new([
+ {name: :column0, type: :boolean},
+ {name: :column1, type: :uint8},
+ {name: :column2, type: :string},
+ ],
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("4 elements") do
+ records = [
+ [true, nil, "Ruby", -(2 ** 63)],
+ [nil, 0, "GLib", nil],
+ [false, 2 ** 8 - 1, nil, (2 ** 63) - 1],
+ ]
+ record_batch = Arrow::RecordBatch.new([
+ {name: :column0, type: :boolean},
+ {name: :column1, type: :uint8},
+ {name: :column2, type: :string},
+ {name: :column3, type: :int64},
+ ],
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+end
diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-sparse-union-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-sparse-union-array.rb
new file mode 100644
index 0000000..3a6191d
--- /dev/null
+++ b/ruby/red-arrow/test/raw-records/record-batch/test-sparse-union-array.rb
@@ -0,0 +1,475 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+class RawRecordsRecordBatchSparseUnionArrayTest < Test::Unit::TestCase
+ def fields(type, type_codes)
+ field_description = {}
+ if type.is_a?(Hash)
+ field_description = field_description.merge(type)
+ else
+ field_description[:type] = type
+ end
+ {
+ column: {
+ type: :sparse_union,
+ fields: [
+ field_description.merge(name: "0"),
+ field_description.merge(name: "1"),
+ ],
+ type_codes: type_codes,
+ },
+ }
+ end
+
+ # TODO: Use Arrow::RecordBatch.new(fields(type), records)
+ def build_record_batch(type, records)
+ type_codes = [0, 1]
+ schema = Arrow::Schema.new(fields(type, type_codes))
+ type_ids = []
+ arrays = schema.fields[0].data_type.fields.collect do |field|
+ sub_schema = Arrow::Schema.new([field])
+ sub_records = records.collect do |record|
+ [record[0].nil? ? nil : record[0][field.name]]
+ end
+ sub_record_batch = Arrow::RecordBatch.new(sub_schema,
+ sub_records)
+ sub_record_batch.columns[0]
+ end
+ records.each do |record|
+ column = record[0]
+ if column.nil?
+ type_ids << nil
+ elsif column.key?("0")
+ type_ids << type_codes[0]
+ elsif column.key?("1")
+ type_ids << type_codes[1]
+ end
+ end
+ # TODO
+ # union_array = Arrow::SparseUnionArray.new(schema.fields[0].data_type,
+ # Arrow::Int8Array.new(type_ids),
+ # arrays)
+ union_array = Arrow::SparseUnionArray.new(Arrow::Int8Array.new(type_ids),
+ arrays)
+ schema = Arrow::Schema.new(column: union_array.value_data_type)
+ Arrow::RecordBatch.new(schema,
+ records.size,
+ [union_array])
+ end
+
+ test("NullArray") do
+ omit("Need to add support for NullArrayBuilder")
+ records = [
+ [{"0" => nil}],
+ [nil],
+ ]
+ record_batch = build_record_batch(:null, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BooleanArray") do
+ records = [
+ [{"0" => true}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:boolean, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int8Array") do
+ records = [
+ [{"0" => -(2 ** 7)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int8, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt8Array") do
+ records = [
+ [{"0" => (2 ** 8) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint8, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int16Array") do
+ records = [
+ [{"0" => -(2 ** 15)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int16, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt16Array") do
+ records = [
+ [{"0" => (2 ** 16) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint16, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int32Array") do
+ records = [
+ [{"0" => -(2 ** 31)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int32, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt32Array") do
+ records = [
+ [{"0" => (2 ** 32) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint32, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int64Array") do
+ records = [
+ [{"0" => -(2 ** 63)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:int64, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt64Array") do
+ records = [
+ [{"0" => (2 ** 64) - 1}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:uint64, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("FloatArray") do
+ records = [
+ [{"0" => -1.0}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:float, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DoubleArray") do
+ records = [
+ [{"0" => -1.0}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:double, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BinaryArray") do
+ records = [
+ [{"0" => "\xff".b}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:binary, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StringArray") do
+ records = [
+ [{"0" => "Ruby"}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:string, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date32Array") do
+ records = [
+ [{"0" => Date.new(1960, 1, 1)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:date32, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date64Array") do
+ records = [
+ [{"0" => DateTime.new(1960, 1, 1, 2, 9, 30)}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch(:date64, records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ sub_test_case("TimestampArray") do
+ test("second") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :second,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30.123Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :milli,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("micro") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30.123456Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :micro,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [{"0" => Time.parse("1960-01-01T02:09:30.123456789Z")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :timestamp,
+ unit: :nano,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time32Array") do
+ test("second") do
+ records = [
+ [{"0" => 60 * 10}], # 00:10:00
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time32,
+ unit: :second,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [{"0" => (60 * 10) * 1000 + 123}], # 00:10:00.123
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time32,
+ unit: :milli,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time64Array") do
+ test("micro") do
+ records = [
+ [{"0" => (60 * 10) * 1_000_000 + 123_456}], # 00:10:00.123456
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time64,
+ unit: :micro,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ # 00:10:00.123456789
+ [{"0" => (60 * 10) * 1_000_000_000 + 123_456_789}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :time64,
+ unit: :nano,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ test("Decimal128Array") do
+ records = [
+ [{"0" => BigDecimal("92.92")}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :decimal128,
+ precision: 8,
+ scale: 2,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("ListArray") do
+ records = [
+ [{"0" => [true, nil, false]}],
+ [nil],
+ [{"1" => nil}],
+ ]
+ record_batch = build_record_batch({
+ type: :list,
+ field: {
+ name: :sub_element,
+ type: :boolean,
+ },
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StructArray") do
+ records = [
+ [{"0" => {"sub_field" => true}}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => {"sub_field" => nil}}],
+ ]
+ record_batch = build_record_batch({
+ type: :struct,
+ fields: [
+ {
+ name: :sub_field,
+ type: :boolean,
+ },
+ ],
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("SparseUnionArray") do
+ omit("Need to add support for SparseUnionArrayBuilder")
+ records = [
+ [{"0" => {"field1" => true}}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => {"field2" => nil}}],
+ ]
+ record_batch = build_record_batch({
+ type: :sparse_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1],
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DenseUnionArray") do
+ omit("Need to add support for DenseUnionArrayBuilder")
+ records = [
+ [{"0" => {"field1" => true}}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => {"field2" => nil}}],
+ ]
+ record_batch = build_record_batch({
+ type: :dense_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1],
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DictionaryArray") do
+ omit("Need to add support for DictionaryArrayBuilder")
+ records = [
+ [{"0" => "Ruby"}],
+ [nil],
+ [{"1" => nil}],
+ [{"0" => "GLib"}],
+ ]
+ dictionary = Arrow::StringArray.new(["GLib", "Ruby"])
+ record_batch = build_record_batch({
+ type: :dictionary,
+ index_data_type: :int8,
+ dictionary: dictionary,
+ ordered: true,
+ },
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+end
diff --git a/ruby/red-arrow/test/raw-records/record-batch/test-struct-array.rb b/ruby/red-arrow/test/raw-records/record-batch/test-struct-array.rb
new file mode 100644
index 0000000..bccd0d9
--- /dev/null
+++ b/ruby/red-arrow/test/raw-records/record-batch/test-struct-array.rb
@@ -0,0 +1,427 @@
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements. See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership. The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License. You may obtain a copy of the License at
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied. See the License for the
+# specific language governing permissions and limitations
+# under the License.
+
+class RawRecordsRecordBatchStructArrayTest < Test::Unit::TestCase
+ def fields(type)
+ field_description = {
+ name: :field,
+ }
+ if type.is_a?(Hash)
+ field_description = field_description.merge(type)
+ else
+ field_description[:type] = type
+ end
+ {
+ column: {
+ type: :struct,
+ fields: [
+ field_description,
+ ],
+ },
+ }
+ end
+
+ test("NullArray") do
+ omit("Need to add support for NullArrayBuilder")
+ records = [
+ [{"field" => nil}],
+ [nil],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:null),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BooleanArray") do
+ records = [
+ [{"field" => true}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:boolean),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int8Array") do
+ records = [
+ [{"field" => -(2 ** 7)}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int8),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt8Array") do
+ records = [
+ [{"field" => (2 ** 8) - 1}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint8),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int16Array") do
+ records = [
+ [{"field" => -(2 ** 15)}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int16),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt16Array") do
+ records = [
+ [{"field" => (2 ** 16) - 1}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint16),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int32Array") do
+ records = [
+ [{"field" => -(2 ** 31)}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int32),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt32Array") do
+ records = [
+ [{"field" => (2 ** 32) - 1}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint32),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Int64Array") do
+ records = [
+ [{"field" => -(2 ** 63)}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:int64),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("UInt64Array") do
+ records = [
+ [{"field" => (2 ** 64) - 1}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:uint64),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("FloatArray") do
+ records = [
+ [{"field" => -1.0}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:float),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DoubleArray") do
+ records = [
+ [{"field" => -1.0}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:double),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("BinaryArray") do
+ records = [
+ [{"field" => "\xff".b}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:binary),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StringArray") do
+ records = [
+ [{"field" => "Ruby"}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:string),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date32Array") do
+ records = [
+ [{"field" => Date.new(1960, 1, 1)}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:date32),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("Date64Array") do
+ records = [
+ [{"field" => DateTime.new(1960, 1, 1, 2, 9, 30)}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(:date64),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ sub_test_case("TimestampArray") do
+ test("second") do
+ records = [
+ [{"field" => Time.parse("1960-01-01T02:09:30Z")}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :second),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [{"field" => Time.parse("1960-01-01T02:09:30.123Z")}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :milli),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("micro") do
+ records = [
+ [{"field" => Time.parse("1960-01-01T02:09:30.123456Z")}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :micro),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ [{"field" => Time.parse("1960-01-01T02:09:30.123456789Z")}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :timestamp,
+ unit: :nano),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time32Array") do
+ test("second") do
+ records = [
+ [{"field" => 60 * 10}], # 00:10:00
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time32,
+ unit: :second),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("milli") do
+ records = [
+ [{"field" => (60 * 10) * 1000 + 123}], # 00:10:00.123
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time32,
+ unit: :milli),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ sub_test_case("Time64Array") do
+ test("micro") do
+ records = [
+ [{"field" => (60 * 10) * 1_000_000 + 123_456}], # 00:10:00.123456
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time64,
+ unit: :micro),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("nano") do
+ records = [
+ # 00:10:00.123456789
+ [{"field" => (60 * 10) * 1_000_000_000 + 123_456_789}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :time64,
+ unit: :nano),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+ end
+
+ test("Decimal128Array") do
+ records = [
+ [{"field" => BigDecimal("92.92")}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :decimal128,
+ precision: 8,
+ scale: 2),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("ListArray") do
+ records = [
+ [{"field" => [true, nil, false]}],
+ [nil],
+ [{"field" => nil}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :list,
+ field: {
+ name: :sub_element,
+ type: :boolean,
+ }),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("StructArray") do
+ records = [
+ [{"field" => {"sub_field" => true}}],
+ [nil],
+ [{"field" => nil}],
+ [{"field" => {"sub_field" => nil}}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :struct,
+ fields: [
+ {
+ name: :sub_field,
+ type: :boolean,
+ },
+ ]),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("SparseUnionArray") do
+ omit("Need to add support for SparseUnionArrayBuilder")
+ records = [
+ [{"field" => {"field1" => true}}],
+ [nil],
+ [{"field" => nil}],
+ [{"field" => {"field2" => nil}}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :sparse_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1]),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DenseUnionArray") do
+ omit("Need to add support for DenseUnionArrayBuilder")
+ records = [
+ [{"field" => {"field1" => true}}],
+ [nil],
+ [{"field" => nil}],
+ [{"field" => {"field2" => nil}}],
+ ]
+ record_batch = Arrow::RecordBatch.new(fields(type: :dense_union,
+ fields: [
+ {
+ name: :field1,
+ type: :boolean,
+ },
+ {
+ name: :field2,
+ type: :uint8,
+ },
+ ],
+ type_codes: [0, 1]),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+
+ test("DictionaryArray") do
+ omit("Need to add support for DictionaryArrayBuilder")
+ records = [
+ [{"field" => "Ruby"}],
+ [nil],
+ [{"field" => nil}],
+ [{"field" => "GLib"}],
+ ]
+ dictionary = Arrow::StringArray.new(["GLib", "Ruby"])
+ record_batch = Arrow::RecordBatch.new(fields(type: :dictionary,
+ index_data_type: :int8,
+ dictionary: dictionary,
+ ordered: true),
+ records)
+ assert_equal(records, record_batch.raw_records)
+ end
+end
diff --git a/ruby/red-arrow/test/run-test.rb b/ruby/red-arrow/test/run-test.rb
index 9551f60..4712d49 100755
--- a/ruby/red-arrow/test/run-test.rb
+++ b/ruby/red-arrow/test/run-test.rb
@@ -26,8 +26,29 @@ require "pathname"
base_dir = Pathname.new(__dir__).parent.expand_path
lib_dir = base_dir + "lib"
+ext_dir = base_dir + "ext" + "arrow"
test_dir = base_dir + "test"
+make = nil
+if ENV["NO_MAKE"] != "yes"
+ if ENV["MAKE"]
+ make = ENV["MAKE"]
+ elsif system("type gmake > /dev/null")
+ make = "gmake"
+ elsif system("type make > /dev/null")
+ make = "make"
+ end
+end
+if make
+ Dir.chdir(ext_dir.to_s) do
+ unless File.exist?("Makefile")
+ system(RbConfig.ruby, "extconf.rb", "--enable-debug-build") or exit(false)
+ end
+ system("#{make} > /dev/null") or exit(false)
+ end
+end
+
+$LOAD_PATH.unshift(ext_dir.to_s)
$LOAD_PATH.unshift(lib_dir.to_s)
require_relative "helper"
diff --git a/ruby/red-arrow/test/test-data-type.rb b/ruby/red-arrow/test/test-data-type.rb
index 747eff8..bcffea2 100644
--- a/ruby/red-arrow/test/test-data-type.rb
+++ b/ruby/red-arrow/test/test-data-type.rb
@@ -43,6 +43,11 @@ class DataTypeTest < Test::Unit::TestCase
assert_equal(Arrow::ListDataType.new(field),
Arrow::DataType.resolve(type: :list, field: field))
end
+
+ test("_") do
+ assert_equal(Arrow::FixedSizeBinaryDataType.new(10),
+ Arrow::DataType.resolve([:fixed_size_binary, 10]))
+ end
end
sub_test_case("instance methods") do
diff --git a/ruby/red-gandiva/test/run-test.rb b/ruby/red-gandiva/test/run-test.rb
index b826f3e..a4f7f76 100755
--- a/ruby/red-gandiva/test/run-test.rb
+++ b/ruby/red-gandiva/test/run-test.rb
@@ -28,7 +28,9 @@ lib_dir = base_dir + "lib"
test_dir = base_dir + "test"
arrow_lib_dir = arrow_base_dir + "lib"
+arrow_ext_dir = arrow_base_dir + "ext" + "arrow"
+$LOAD_PATH.unshift(arrow_ext_dir.to_s)
$LOAD_PATH.unshift(arrow_lib_dir.to_s)
$LOAD_PATH.unshift(lib_dir.to_s)
diff --git a/ruby/red-parquet/test/run-test.rb b/ruby/red-parquet/test/run-test.rb
index b826f3e..a4f7f76 100755
--- a/ruby/red-parquet/test/run-test.rb
+++ b/ruby/red-parquet/test/run-test.rb
@@ -28,7 +28,9 @@ lib_dir = base_dir + "lib"
test_dir = base_dir + "test"
arrow_lib_dir = arrow_base_dir + "lib"
+arrow_ext_dir = arrow_base_dir + "ext" + "arrow"
+$LOAD_PATH.unshift(arrow_ext_dir.to_s)
$LOAD_PATH.unshift(arrow_lib_dir.to_s)
$LOAD_PATH.unshift(lib_dir.to_s)
diff --git a/ruby/red-plasma/test/run-test.rb b/ruby/red-plasma/test/run-test.rb
index b826f3e..a4f7f76 100755
--- a/ruby/red-plasma/test/run-test.rb
+++ b/ruby/red-plasma/test/run-test.rb
@@ -28,7 +28,9 @@ lib_dir = base_dir + "lib"
test_dir = base_dir + "test"
arrow_lib_dir = arrow_base_dir + "lib"
+arrow_ext_dir = arrow_base_dir + "ext" + "arrow"
+$LOAD_PATH.unshift(arrow_ext_dir.to_s)
$LOAD_PATH.unshift(arrow_lib_dir.to_s)
$LOAD_PATH.unshift(lib_dir.to_s)