You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by david sherrier <da...@gmail.com> on 2019/10/10 15:31:03 UTC

Simple Join Implementation Questions

Hey all,

I'm working on a simple serial join implementation and need to be able to
compare data across two columns of the same type.  Right now the only way I
have found to do this is too use ArrayData::GetValues<T>(1) and then
iterate over the returned buffer comparing the values.  The problem I am
having with this approach is that I need the type in the template meaning
that when I need to add a row to the result table I need to know the type
of each column which would seem like a needlessly large switch statement
comparing on the type id and then returning the type. This also seems to
only work for fixed length types it would appear to be even more
complicated to read string data but I have not tried that yet. Is there an
easier way too do this that I am missing?  The second issue I am having is
that comparisons between types that do not inherit from ctypes seem to not
be implemented yet in particular for this use case String type.  I would
have expected that since tables have a defined schema with the type known
there would be some sort of iterator to read over column data?

Thanks,
David Sherrier

Re: Simple Join Implementation Questions

Posted by Antoine Pitrou <an...@python.org>.
Hi David,

You should look into the visitor facilities provided by Arrow C++, in
arrow/visitor_inline.h.

I would especially look at two of them:

- VisitArrayInline() will call the visitor's overloaded Visit() method
with the right array concrete type (for example Int16Array, ListArray...)

- Once you know the concrete type (for example Int16Type, which is
Int16Array::TypeClass), you can use ArrayDataVisitor<ConcreteType> to
iterate over each array element

Regards

Antoine.


Le 10/10/2019 à 17:31, david sherrier a écrit :
> Hey all,
> 
> I'm working on a simple serial join implementation and need to be able to
> compare data across two columns of the same type.  Right now the only way I
> have found to do this is too use ArrayData::GetValues<T>(1) and then
> iterate over the returned buffer comparing the values.  The problem I am
> having with this approach is that I need the type in the template meaning
> that when I need to add a row to the result table I need to know the type
> of each column which would seem like a needlessly large switch statement
> comparing on the type id and then returning the type. This also seems to
> only work for fixed length types it would appear to be even more
> complicated to read string data but I have not tried that yet. Is there an
> easier way too do this that I am missing?  The second issue I am having is
> that comparisons between types that do not inherit from ctypes seem to not
> be implemented yet in particular for this use case String type.  I would
> have expected that since tables have a defined schema with the type known
> there would be some sort of iterator to read over column data?
> 
> Thanks,
> David Sherrier
>