You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@arrow.apache.org by Louis C <lc...@outlook.fr> on 2022/09/23 15:57:09 UTC

[C++] ORC and handling of attributes

Hello,
It seems that the ORC reader/writer support for attributes (in Arrow it is called metadata) is limited. The writer does not handle at all the writing of Arrow metadata (neither for the table nor for fields), and the reader fills the Arrow schema's metadata with the ORC file metadata, but does nothing for the fields' metadata, as far as I can tell looking at the code.

Looking at ORC, it seems that what they call "attributes" serves a similar purpose as Arrow metadata. See https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69
As the "Type" object can represent both the table and a particular field, I think that that could serve for passing the metadata.
Is my understanding correct about the state of the ORC adapter and is there something that would prevent from doing that?

Regards
[https://opengraph.githubassets.com/d3ec807aa32290db4a737647fbcfced334e6375ead7cbbb1969b59be7db2cd43/apache/orc]<https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc<https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads - orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc
github.com


RE: [C++] ORC and handling of attributes

Posted by Louis C <lc...@outlook.fr>.
Hello,

Thanks for your answer, I may look into this if I find the time someday.

________________________________
De : Micah Kornfield <em...@gmail.com>
Envoyé : mardi 4 octobre 2022 05:05
À : user@arrow.apache.org <us...@arrow.apache.org>
Objet : Re: [C++] ORC and handling of attributes

Probably not any hard blockers but would need someone to take up the work.

On Fri, Sep 23, 2022 at 8:57 AM Louis C <lc...@outlook.fr>> wrote:
Hello,
It seems that the ORC reader/writer support for attributes (in Arrow it is called metadata) is limited. The writer does not handle at all the writing of Arrow metadata (neither for the table nor for fields), and the reader fills the Arrow schema's metadata with the ORC file metadata, but does nothing for the fields' metadata, as far as I can tell looking at the code.

Looking at ORC, it seems that what they call "attributes" serves a similar purpose as Arrow metadata. See https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69
As the "Type" object can represent both the table and a particular field, I think that that could serve for passing the metadata.
Is my understanding correct about the state of the ORC adapter and is there something that would prevent from doing that?

Regards
[https://opengraph.githubassets.com/d3ec807aa32290db4a737647fbcfced334e6375ead7cbbb1969b59be7db2cd43/apache/orc]<https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc<https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
Apache ORC - the smallest, fastest columnar storage for Hadoop workloads - orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc
github.com<http://github.com>


Re: [C++] ORC and handling of attributes

Posted by Micah Kornfield <em...@gmail.com>.
Probably not any hard blockers but would need someone to take up the work.

On Fri, Sep 23, 2022 at 8:57 AM Louis C <lc...@outlook.fr> wrote:

> Hello,
> It seems that the ORC reader/writer support for attributes (in Arrow it is
> called metadata) is limited. The writer does not handle at all the writing
> of Arrow metadata (neither for the table nor for fields), and the reader
> fills the Arrow schema's metadata with the ORC file metadata, but does
> nothing for the fields' metadata, as far as I can tell looking at the code.
>
> Looking at ORC, it seems that what they call "attributes" serves a similar
> purpose as Arrow metadata. See
> https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69
> As the "Type" object can represent both the table and a particular field,
> I think that that could serve for passing the metadata.
> Is my understanding correct about the state of the ORC adapter and is
> there something that would prevent from doing that?
>
> Regards
>
> <https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
> orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc
> <https://github.com/apache/orc/blob/ff6093c98bf38c06c906dde3207040e1b5b55753/c%2B%2B/include/orc/Type.hh#L50-L69>
> Apache ORC - the smallest, fastest columnar storage for Hadoop workloads -
> orc/Type.hh at ff6093c98bf38c06c906dde3207040e1b5b55753 · apache/orc
> github.com
>
>