You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Suvayu Ali <fa...@gmail.com> on 2018/12/13 21:52:11 UTC
Access Gandiva filter result by array index
Hi everyone,
Maybe I'm missing something obvious, but for the life of me, I can't
figure out how I can access the elements of an array after a Gandiva
filter operation.
I have linked a minimal example at the end which I compile like this:
$ /usr/lib64/ccache/g++ -g -Wall -m64 -std=c++17 -pthread -fPIC \
-I/opt/data-an/include mwe.cc -o mwe \
-L/opt/data-an/lib64 -lgandiva -larrow
and I then run the binary like this:
$ LD_LIBRARY_PATH=/opt/data-an/lib64 ./mwe
Broadly this is what I was attempting:
1. create a 5-element vector: 1, 3, 2, 4, 5
int num_records = 5;
arrow::Int64Builder i64builder;
ArrayPtr array0;
EXPECT_OK(i64builder.AppendValues({1, 3, 2, 4, 5}));
EXPECT_OK(i64builder.Finish(&array0));
2. use Gandiva to get even elements; here, indices: 2, 3
// schema for input fields
auto field0 = field("f0", arrow::int64());
auto schema = arrow::schema({field0});
// even: f0 % 2 == 0
auto field0_node = TreeExprBuilder::MakeField(field0);
auto lit_2 = TreeExprBuilder::MakeLiteral(int64_t(2));
auto remainder = TreeExprBuilder::MakeFunction("mod", {field0_node, lit_2}, int64());
auto lit_0 = TreeExprBuilder::MakeLiteral(int64_t(0));
auto even = TreeExprBuilder::MakeFunction("equal", {remainder, lit_0}, boolean());
auto condition = TreeExprBuilder::MakeCondition(even);
// input record batch
auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array0});
// filter
std::shared_ptr<Filter> filter;
EXPECT_OK(Filter::Make(schema, condition, &filter));
std::shared_ptr<SelectionVector> selected;
EXPECT_OK(SelectionVector::MakeInt16(num_records, pool_, &selected));
EXPECT_OK(filter->Evaluate(*in_batch, selected));
3. try accessing elements from the original array by index, which works
after downcasting.
// std::cout << "array0[0]: " << array0->Value(0); // doesn't compile
// error: ‘using element_type = class arrow::Array’ {aka ‘class
// arrow::Array’} has no member named ‘Value’
// downcast it to the correct derived class, this works
auto array0_cast = std::dynamic_pointer_cast<NumericArray<Int64Type>>(array0);
std::cout << "array0[0]: " << array0_cast->Value(0) << std::endl;
4. Then try to access the "selected" elements (even elements) in the original
array by using the selection vector from the Gandiva filter as an index array
auto idx_arr_cast = std::dynamic_pointer_cast<NumericArray<Int16Type>>(idx_arr);
if (idx_arr_cast) {
std::cout << "idx_arr[0]: " << idx_arr_cast->Value(0) << std::endl;
} else {
std::cerr << "idx_arr_cast is a nullptr!" << std::endl;
}
But I can't access the elements of the selection vector! Since it is declared
as std::shared_ptr<arrow::Array>, the Value(..) method isn't found. I had
filled it with SelectionVector::MakeInt16(..), so I tried downcasting to
arrow::NumericArray<Int16Type>, but that fails!
I'm not sure where I'm going wrong.
I also have a related, but more general question. Given an array, I can't find
a way to access the elements (or iterate over them) if I don't know the exact
type. If I know the type, I can downcast, and use the likes of Value(..),
GetValue(..), GetString(..), etc. Is that right? Or am I missing something?
I looked at the pretty printer implementation, if I understood it correctly,
it specializes the WriteDataValue(..) method for every kind of array. Do I need
something similar for generalised index access?
Thanks for any help.
Cheers,
PS: The complete MWE, along with a Makefile, can be cloned from this gist:
https://gist.github.com/suvayu/aa2d38cee82b97be76186ec00073fe10
--
Suvayu
Open source is the future. It sets us free.
* Footnotes
Re: Access Gandiva filter result by array index
Posted by Suvayu Ali <fa...@gmail.com>.
Hi Ravindra,
On Fri, Dec 14, 2018 at 01:11:02PM +0530, Ravindra Pindikura wrote:
> >
> > But I can't access the elements of the selection vector! Since it is declared
> > as std::shared_ptr<arrow::Array>, the Value(..) method isn't found. I had
> > filled it with SelectionVector::MakeInt16(..), so I tried downcasting to
> > arrow::NumericArray<Int16Type>, but that fails!
>
> This should work:
>
> auto array = std::dynamic_pointer_cast<arrow::NumericArray<arrow::UInt16Type>>(selected->ToArray());
> printf("%d %d\n", array->Value(0), array->Value(1));
Silly of me to not try the unsigned type in the first place! Thanks a lot :)
Cheers,
--
Suvayu
Open source is the future. It sets us free.
Re: Access Gandiva filter result by array index
Posted by Ravindra Pindikura <ra...@dremio.com>.
> On Dec 14, 2018, at 3:22 AM, Suvayu Ali <fa...@gmail.com> wrote:
>
> Hi everyone,
>
> Maybe I'm missing something obvious, but for the life of me, I can't
> figure out how I can access the elements of an array after a Gandiva
> filter operation.
>
> I have linked a minimal example at the end which I compile like this:
>
> $ /usr/lib64/ccache/g++ -g -Wall -m64 -std=c++17 -pthread -fPIC \
> -I/opt/data-an/include mwe.cc -o mwe \
> -L/opt/data-an/lib64 -lgandiva -larrow
>
> and I then run the binary like this:
>
> $ LD_LIBRARY_PATH=/opt/data-an/lib64 ./mwe
>
> Broadly this is what I was attempting:
>
> 1. create a 5-element vector: 1, 3, 2, 4, 5
>
> int num_records = 5;
> arrow::Int64Builder i64builder;
> ArrayPtr array0;
>
> EXPECT_OK(i64builder.AppendValues({1, 3, 2, 4, 5}));
> EXPECT_OK(i64builder.Finish(&array0));
>
> 2. use Gandiva to get even elements; here, indices: 2, 3
>
> // schema for input fields
> auto field0 = field("f0", arrow::int64());
> auto schema = arrow::schema({field0});
>
> // even: f0 % 2 == 0
> auto field0_node = TreeExprBuilder::MakeField(field0);
> auto lit_2 = TreeExprBuilder::MakeLiteral(int64_t(2));
> auto remainder = TreeExprBuilder::MakeFunction("mod", {field0_node, lit_2}, int64());
> auto lit_0 = TreeExprBuilder::MakeLiteral(int64_t(0));
> auto even = TreeExprBuilder::MakeFunction("equal", {remainder, lit_0}, boolean());
> auto condition = TreeExprBuilder::MakeCondition(even);
>
> // input record batch
> auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array0});
>
> // filter
> std::shared_ptr<Filter> filter;
> EXPECT_OK(Filter::Make(schema, condition, &filter));
>
> std::shared_ptr<SelectionVector> selected;
> EXPECT_OK(SelectionVector::MakeInt16(num_records, pool_, &selected));
> EXPECT_OK(filter->Evaluate(*in_batch, selected));
>
> 3. try accessing elements from the original array by index, which works
> after downcasting.
>
> // std::cout << "array0[0]: " << array0->Value(0); // doesn't compile
> // error: ‘using element_type = class arrow::Array’ {aka ‘class
> // arrow::Array’} has no member named ‘Value’
>
> // downcast it to the correct derived class, this works
> auto array0_cast = std::dynamic_pointer_cast<NumericArray<Int64Type>>(array0);
> std::cout << "array0[0]: " << array0_cast->Value(0) << std::endl;
>
> 4. Then try to access the "selected" elements (even elements) in the original
> array by using the selection vector from the Gandiva filter as an index array
>
> auto idx_arr_cast = std::dynamic_pointer_cast<NumericArray<Int16Type>>(idx_arr);
> if (idx_arr_cast) {
> std::cout << "idx_arr[0]: " << idx_arr_cast->Value(0) << std::endl;
> } else {
> std::cerr << "idx_arr_cast is a nullptr!" << std::endl;
> }
>
> But I can't access the elements of the selection vector! Since it is declared
> as std::shared_ptr<arrow::Array>, the Value(..) method isn't found. I had
> filled it with SelectionVector::MakeInt16(..), so I tried downcasting to
> arrow::NumericArray<Int16Type>, but that fails!
This should work:
auto array = std::dynamic_pointer_cast<arrow::NumericArray<arrow::UInt16Type>>(selected->ToArray());
printf("%d %d\n", array->Value(0), array->Value(1));
>
> I'm not sure where I'm going wrong.
>
> I also have a related, but more general question. Given an array, I can't find
> a way to access the elements (or iterate over them) if I don't know the exact
> type. If I know the type, I can downcast, and use the likes of Value(..),
> GetValue(..), GetString(..), etc. Is that right? Or am I missing something?
>
> I looked at the pretty printer implementation, if I understood it correctly,
> it specializes the WriteDataValue(..) method for every kind of array. Do I need
> something similar for generalised index access?
>
> Thanks for any help.
>
> Cheers,
>
> PS: The complete MWE, along with a Makefile, can be cloned from this gist:
> https://gist.github.com/suvayu/aa2d38cee82b97be76186ec00073fe10
>
> --
> Suvayu
>
> Open source is the future. It sets us free.
>
> * Footnotes
>