You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Suvayu Ali <fa...@gmail.com> on 2018/12/13 21:52:11 UTC

Access Gandiva filter result by array index

Hi everyone,

Maybe I'm missing something obvious, but for the life of me, I can't
figure out how I can access the elements of an array after a Gandiva
filter operation.

I have linked a minimal example at the end which I compile like this:

  $ /usr/lib64/ccache/g++ -g -Wall -m64 -std=c++17 -pthread -fPIC \
        -I/opt/data-an/include  mwe.cc -o mwe \
        -L/opt/data-an/lib64 -lgandiva -larrow

and I then run the binary like this:

  $ LD_LIBRARY_PATH=/opt/data-an/lib64 ./mwe

Broadly this is what I was attempting:

1. create a 5-element vector: 1, 3, 2, 4, 5

   int num_records = 5;
   arrow::Int64Builder i64builder;
   ArrayPtr array0;

   EXPECT_OK(i64builder.AppendValues({1, 3, 2, 4, 5}));
   EXPECT_OK(i64builder.Finish(&array0));

2. use Gandiva to get even elements; here, indices: 2, 3

   // schema for input fields
   auto field0 = field("f0", arrow::int64());
   auto schema = arrow::schema({field0});

   // even: f0 % 2 == 0
   auto field0_node = TreeExprBuilder::MakeField(field0);
   auto lit_2 = TreeExprBuilder::MakeLiteral(int64_t(2));
   auto remainder = TreeExprBuilder::MakeFunction("mod", {field0_node, lit_2}, int64());
   auto lit_0 = TreeExprBuilder::MakeLiteral(int64_t(0));
   auto even = TreeExprBuilder::MakeFunction("equal", {remainder, lit_0}, boolean());
   auto condition = TreeExprBuilder::MakeCondition(even);

   // input record batch
   auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array0});

   // filter
   std::shared_ptr<Filter> filter;
   EXPECT_OK(Filter::Make(schema, condition, &filter));

   std::shared_ptr<SelectionVector> selected;
   EXPECT_OK(SelectionVector::MakeInt16(num_records, pool_, &selected));
   EXPECT_OK(filter->Evaluate(*in_batch, selected));

3. try accessing elements from the original array by index, which works
   after downcasting.

   // std::cout << "array0[0]: " << array0->Value(0); // doesn't compile
   // error: ‘using element_type = class arrow::Array’ {aka ‘class
   // arrow::Array’} has no member named ‘Value’

   // downcast it to the correct derived class, this works
   auto array0_cast = std::dynamic_pointer_cast<NumericArray<Int64Type>>(array0);
   std::cout << "array0[0]: " << array0_cast->Value(0) << std::endl;

4. Then try to access the "selected" elements (even elements) in the original
   array by using the selection vector from the Gandiva filter as an index array

   auto idx_arr_cast = std::dynamic_pointer_cast<NumericArray<Int16Type>>(idx_arr);
   if (idx_arr_cast) {
     std::cout << "idx_arr[0]: " << idx_arr_cast->Value(0) << std::endl;
   } else {
     std::cerr << "idx_arr_cast is a nullptr!" << std::endl;
   }

   But I can't access the elements of the selection vector!  Since it is declared
   as std::shared_ptr<arrow::Array>, the Value(..) method isn't found.  I had
   filled it with SelectionVector::MakeInt16(..), so I tried downcasting to
   arrow::NumericArray<Int16Type>, but that fails!

   I'm not sure where I'm going wrong.

I also have a related, but more general question. Given an array, I can't find
a way to access the elements (or iterate over them) if I don't know the exact
type. If I know the type, I can downcast, and use the likes of Value(..),
GetValue(..), GetString(..), etc.  Is that right?  Or am I missing something?

I looked at the pretty printer implementation, if I understood it correctly,
it specializes the WriteDataValue(..) method for every kind of array.  Do I need
something similar for generalised index access?

Thanks for any help.

Cheers,

PS: The complete MWE, along with a Makefile, can be cloned from this gist:
    https://gist.github.com/suvayu/aa2d38cee82b97be76186ec00073fe10

-- 
Suvayu

Open source is the future. It sets us free.

* Footnotes


Re: Access Gandiva filter result by array index

Posted by Suvayu Ali <fa...@gmail.com>.
Hi Ravindra,

On Fri, Dec 14, 2018 at 01:11:02PM +0530, Ravindra Pindikura wrote:
> > 
> >   But I can't access the elements of the selection vector!  Since it is declared
> >   as std::shared_ptr<arrow::Array>, the Value(..) method isn't found.  I had
> >   filled it with SelectionVector::MakeInt16(..), so I tried downcasting to
> >   arrow::NumericArray<Int16Type>, but that fails!
> 
> This should work:
> 
>   auto array = std::dynamic_pointer_cast<arrow::NumericArray<arrow::UInt16Type>>(selected->ToArray());
>   printf("%d %d\n", array->Value(0), array->Value(1));

Silly of me to not try the unsigned type in the first place!  Thanks a lot :)

Cheers,

-- 
Suvayu

Open source is the future. It sets us free.

Re: Access Gandiva filter result by array index

Posted by Ravindra Pindikura <ra...@dremio.com>.

> On Dec 14, 2018, at 3:22 AM, Suvayu Ali <fa...@gmail.com> wrote:
> 
> Hi everyone,
> 
> Maybe I'm missing something obvious, but for the life of me, I can't
> figure out how I can access the elements of an array after a Gandiva
> filter operation.
> 
> I have linked a minimal example at the end which I compile like this:
> 
>  $ /usr/lib64/ccache/g++ -g -Wall -m64 -std=c++17 -pthread -fPIC \
>        -I/opt/data-an/include  mwe.cc -o mwe \
>        -L/opt/data-an/lib64 -lgandiva -larrow
> 
> and I then run the binary like this:
> 
>  $ LD_LIBRARY_PATH=/opt/data-an/lib64 ./mwe
> 
> Broadly this is what I was attempting:
> 
> 1. create a 5-element vector: 1, 3, 2, 4, 5
> 
>   int num_records = 5;
>   arrow::Int64Builder i64builder;
>   ArrayPtr array0;
> 
>   EXPECT_OK(i64builder.AppendValues({1, 3, 2, 4, 5}));
>   EXPECT_OK(i64builder.Finish(&array0));
> 
> 2. use Gandiva to get even elements; here, indices: 2, 3
> 
>   // schema for input fields
>   auto field0 = field("f0", arrow::int64());
>   auto schema = arrow::schema({field0});
> 
>   // even: f0 % 2 == 0
>   auto field0_node = TreeExprBuilder::MakeField(field0);
>   auto lit_2 = TreeExprBuilder::MakeLiteral(int64_t(2));
>   auto remainder = TreeExprBuilder::MakeFunction("mod", {field0_node, lit_2}, int64());
>   auto lit_0 = TreeExprBuilder::MakeLiteral(int64_t(0));
>   auto even = TreeExprBuilder::MakeFunction("equal", {remainder, lit_0}, boolean());
>   auto condition = TreeExprBuilder::MakeCondition(even);
> 
>   // input record batch
>   auto in_batch = arrow::RecordBatch::Make(schema, num_records, {array0});
> 
>   // filter
>   std::shared_ptr<Filter> filter;
>   EXPECT_OK(Filter::Make(schema, condition, &filter));
> 
>   std::shared_ptr<SelectionVector> selected;
>   EXPECT_OK(SelectionVector::MakeInt16(num_records, pool_, &selected));
>   EXPECT_OK(filter->Evaluate(*in_batch, selected));

> 
> 3. try accessing elements from the original array by index, which works
>   after downcasting.
> 
>   // std::cout << "array0[0]: " << array0->Value(0); // doesn't compile
>   // error: ‘using element_type = class arrow::Array’ {aka ‘class
>   // arrow::Array’} has no member named ‘Value’
> 
>   // downcast it to the correct derived class, this works
>   auto array0_cast = std::dynamic_pointer_cast<NumericArray<Int64Type>>(array0);
>   std::cout << "array0[0]: " << array0_cast->Value(0) << std::endl;
> 
> 4. Then try to access the "selected" elements (even elements) in the original
>   array by using the selection vector from the Gandiva filter as an index array
> 
>   auto idx_arr_cast = std::dynamic_pointer_cast<NumericArray<Int16Type>>(idx_arr);
>   if (idx_arr_cast) {
>     std::cout << "idx_arr[0]: " << idx_arr_cast->Value(0) << std::endl;
>   } else {
>     std::cerr << "idx_arr_cast is a nullptr!" << std::endl;
>   }
> 
>   But I can't access the elements of the selection vector!  Since it is declared
>   as std::shared_ptr<arrow::Array>, the Value(..) method isn't found.  I had
>   filled it with SelectionVector::MakeInt16(..), so I tried downcasting to
>   arrow::NumericArray<Int16Type>, but that fails!

This should work:

  auto array = std::dynamic_pointer_cast<arrow::NumericArray<arrow::UInt16Type>>(selected->ToArray());
  printf("%d %d\n", array->Value(0), array->Value(1));


> 
>   I'm not sure where I'm going wrong.
> 
> I also have a related, but more general question. Given an array, I can't find
> a way to access the elements (or iterate over them) if I don't know the exact
> type. If I know the type, I can downcast, and use the likes of Value(..),
> GetValue(..), GetString(..), etc.  Is that right?  Or am I missing something?
> 
> I looked at the pretty printer implementation, if I understood it correctly,
> it specializes the WriteDataValue(..) method for every kind of array.  Do I need
> something similar for generalised index access?
> 
> Thanks for any help.
> 
> Cheers,
> 
> PS: The complete MWE, along with a Makefile, can be cloned from this gist:
>    https://gist.github.com/suvayu/aa2d38cee82b97be76186ec00073fe10
> 
> -- 
> Suvayu
> 
> Open source is the future. It sets us free.
> 
> * Footnotes
>