You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Atri Sharma <at...@gmail.com> on 2018/09/23 18:13:00 UTC

[NEWBIE QUESTION] Test not resolving correct Array from Datum

Hi All,

While adding a new test, I am facing an issue where a Datum of Array
type returned by a function in compute layer does not match the
expected value. I manually checked the buffers of the returned Datum's
Array's contained ArrayData, and they look to be the correct values,
but on printing this ArrayData, all values are shown as null.

Could someone please tell me what I am missing here? Is there an other
way to do the comparison?

The WIP code is at:

https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-0c513f55830e5334d28c08b1a07c6215R1441

Regards,

Atri

Re: [NEWBIE QUESTION] Test not resolving correct Array from Datum

Posted by Atri Sharma <at...@gmail.com>.
Hi Wes,

Thanks for your reply.

I agree, the implementation of the kernel is not the most optimal one.
However, I was not sure how to initialize a hash table of an Arrow data
type (STL wouldnt be able to help, since it needs a fixed type, I believe).
I vaguely figured that i needed thr
E Arrow native Hash, but wanted to focus on it after i get something
running so that I can learn the ropes.

I will follow your guidelines and work on the hash based implementation.

Atri

On Mon, 24 Sep 2018, 00:13 Wes McKinney, <we...@gmail.com> wrote:

> hi Atri,
>
> You're pushing one buffer for each element in the left array:
>
>
> https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-4459cb59122bbce0553230b6638f6d5eR100
>
> (gdb) p out->buffers.size()
> $23 = 3
>
> The first buffer in out->buffers is the validity bitmap, which is
> being set to all zeros, which indicates to other code that the values
> are all null
>
> Unfortunately, this is not the preferred approach to implementing a
> "match" function. It needs to use a hash table like Unique and
> DictionaryEncode -- otherwise you have an O(n * m) algorithm instead
> of O(n). You can see a commented out API placeholder in kernels/hash.h
>
> Hope this helps
> Wes
> On Sun, Sep 23, 2018 at 2:13 PM Atri Sharma <at...@gmail.com> wrote:
> >
> > Hi All,
> >
> > While adding a new test, I am facing an issue where a Datum of Array
> > type returned by a function in compute layer does not match the
> > expected value. I manually checked the buffers of the returned Datum's
> > Array's contained ArrayData, and they look to be the correct values,
> > but on printing this ArrayData, all values are shown as null.
> >
> > Could someone please tell me what I am missing here? Is there an other
> > way to do the comparison?
> >
> > The WIP code is at:
> >
> >
> https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-0c513f55830e5334d28c08b1a07c6215R1441
> >
> > Regards,
> >
> > Atri
>

Re: [NEWBIE QUESTION] Test not resolving correct Array from Datum

Posted by Wes McKinney <we...@gmail.com>.
hi Atri,

You're pushing one buffer for each element in the left array:

https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-4459cb59122bbce0553230b6638f6d5eR100

(gdb) p out->buffers.size()
$23 = 3

The first buffer in out->buffers is the validity bitmap, which is
being set to all zeros, which indicates to other code that the values
are all null

Unfortunately, this is not the preferred approach to implementing a
"match" function. It needs to use a hash table like Unique and
DictionaryEncode -- otherwise you have an O(n * m) algorithm instead
of O(n). You can see a commented out API placeholder in kernels/hash.h

Hope this helps
Wes
On Sun, Sep 23, 2018 at 2:13 PM Atri Sharma <at...@gmail.com> wrote:
>
> Hi All,
>
> While adding a new test, I am facing an issue where a Datum of Array
> type returned by a function in compute layer does not match the
> expected value. I manually checked the buffers of the returned Datum's
> Array's contained ArrayData, and they look to be the correct values,
> but on printing this ArrayData, all values are shown as null.
>
> Could someone please tell me what I am missing here? Is there an other
> way to do the comparison?
>
> The WIP code is at:
>
> https://github.com/atris/arrow/commit/1dcce9b2f34818760df29fdf8767fc1619257ea9#diff-0c513f55830e5334d28c08b1a07c6215R1441
>
> Regards,
>
> Atri