You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Gawain Bolton <ga...@free.fr> on 2019/11/14 21:21:56 UTC

[C++][Parquet]: Stream API handling of optional fields

Hello,

I would like to add support for handling optional fields to the 
parquet::StreamReader and parquet::StreamWriter classes which I recently 
contributed (thank you!).

Ideally I would do this by using std::optional like this:

     parquet::StreamWriter writer{ parquet::ParquetFileWriter::Open(...) };

     std::optional<double> d;

     writer << d;

     ...

     parquet::StreamReader os{parquet::ParquetFileReader::Open(...)};

     reader >> d;

However std::optional is only available in C++17 and arrow is compiled 
in C++11 mode.

 From what I see arrow does use Boost to a limited extent and in fact 
gandiva/cache.h uses the boost::optional class.

So would it be possible to use the boost::optional class in parquet?

Or perhaps someone can suggest another way of handling optional fields?

Thanks in advance for your help,

Gawain



Re: [C++][Parquet]: Stream API handling of optional fields

Posted by Micah Kornfield <em...@gmail.com>.
Thanks Gawain.  For reference, https://github.com/apache/arrow/pull/5849 is
the PR.  We might want to wait a day or two before merging to make sure no
one has any objections to this approach approach.

On Sat, Nov 16, 2019 at 6:51 AM Gawain Bolton <ga...@free.fr> wrote:

> Thanks for your reply.
>
> If I understand correctly ARROW-7178 must be done so that Arrow has a
> version of std::optional which Parquet could then use.
>
> I think I will submit a PR for this shortly.
>
> Gawain
>
> On 15/11/2019 14:05, Francois Saint-Jacques wrote:
> > I'm all for it. Created [1] it would also enable an operator[] for
> > arrays of primitive types [2].
> >
> > [1] https://issues.apache.org/jira/browse/ARROW-7178
> > [2] https://issues.apache.org/jira/browse/ARROW-6276
> >
> > On Fri, Nov 15, 2019 at 12:40 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >> I think there are potentially other places in the Arrow code base that
> >> "optional" could be useful (e.g. a row-reader like class for Arrow
> >> Tables).  It looks like there is at least 1 header only optional library
> >> [1] that is c++17 forward compatible.  I think I would lean towards
> >> vendoring that or another header only library, instead of depending on
> >> boost (I would need to double check and seem to recall there being
> >> difference between boost and the standard one).
> >>
> >> [1] https://github.com/martinmoene/optional-lite
> >>
> >> On Thu, Nov 14, 2019 at 1:22 PM Gawain Bolton <ga...@free.fr>
> wrote:
> >>
> >>> Hello,
> >>>
> >>> I would like to add support for handling optional fields to the
> >>> parquet::StreamReader and parquet::StreamWriter classes which I
> recently
> >>> contributed (thank you!).
> >>>
> >>> Ideally I would do this by using std::optional like this:
> >>>
> >>>       parquet::StreamWriter writer{
> parquet::ParquetFileWriter::Open(...) };
> >>>
> >>>       std::optional<double> d;
> >>>
> >>>       writer << d;
> >>>
> >>>       ...
> >>>
> >>>       parquet::StreamReader os{parquet::ParquetFileReader::Open(...)};
> >>>
> >>>       reader >> d;
> >>>
> >>> However std::optional is only available in C++17 and arrow is compiled
> >>> in C++11 mode.
> >>>
> >>>   From what I see arrow does use Boost to a limited extent and in fact
> >>> gandiva/cache.h uses the boost::optional class.
> >>>
> >>> So would it be possible to use the boost::optional class in parquet?
> >>>
> >>> Or perhaps someone can suggest another way of handling optional fields?
> >>>
> >>> Thanks in advance for your help,
> >>>
> >>> Gawain
> >>>
> >>>
> >>>
>

Re: [C++][Parquet]: Stream API handling of optional fields

Posted by Gawain Bolton <ga...@free.fr>.
Thanks for your reply.

If I understand correctly ARROW-7178 must be done so that Arrow has a 
version of std::optional which Parquet could then use.

I think I will submit a PR for this shortly.

Gawain

On 15/11/2019 14:05, Francois Saint-Jacques wrote:
> I'm all for it. Created [1] it would also enable an operator[] for
> arrays of primitive types [2].
>
> [1] https://issues.apache.org/jira/browse/ARROW-7178
> [2] https://issues.apache.org/jira/browse/ARROW-6276
>
> On Fri, Nov 15, 2019 at 12:40 AM Micah Kornfield <em...@gmail.com> wrote:
>> I think there are potentially other places in the Arrow code base that
>> "optional" could be useful (e.g. a row-reader like class for Arrow
>> Tables).  It looks like there is at least 1 header only optional library
>> [1] that is c++17 forward compatible.  I think I would lean towards
>> vendoring that or another header only library, instead of depending on
>> boost (I would need to double check and seem to recall there being
>> difference between boost and the standard one).
>>
>> [1] https://github.com/martinmoene/optional-lite
>>
>> On Thu, Nov 14, 2019 at 1:22 PM Gawain Bolton <ga...@free.fr> wrote:
>>
>>> Hello,
>>>
>>> I would like to add support for handling optional fields to the
>>> parquet::StreamReader and parquet::StreamWriter classes which I recently
>>> contributed (thank you!).
>>>
>>> Ideally I would do this by using std::optional like this:
>>>
>>>       parquet::StreamWriter writer{ parquet::ParquetFileWriter::Open(...) };
>>>
>>>       std::optional<double> d;
>>>
>>>       writer << d;
>>>
>>>       ...
>>>
>>>       parquet::StreamReader os{parquet::ParquetFileReader::Open(...)};
>>>
>>>       reader >> d;
>>>
>>> However std::optional is only available in C++17 and arrow is compiled
>>> in C++11 mode.
>>>
>>>   From what I see arrow does use Boost to a limited extent and in fact
>>> gandiva/cache.h uses the boost::optional class.
>>>
>>> So would it be possible to use the boost::optional class in parquet?
>>>
>>> Or perhaps someone can suggest another way of handling optional fields?
>>>
>>> Thanks in advance for your help,
>>>
>>> Gawain
>>>
>>>
>>>

Re: [C++][Parquet]: Stream API handling of optional fields

Posted by Francois Saint-Jacques <fs...@gmail.com>.
I'm all for it. Created [1] it would also enable an operator[] for
arrays of primitive types [2].

[1] https://issues.apache.org/jira/browse/ARROW-7178
[2] https://issues.apache.org/jira/browse/ARROW-6276

On Fri, Nov 15, 2019 at 12:40 AM Micah Kornfield <em...@gmail.com> wrote:
>
> I think there are potentially other places in the Arrow code base that
> "optional" could be useful (e.g. a row-reader like class for Arrow
> Tables).  It looks like there is at least 1 header only optional library
> [1] that is c++17 forward compatible.  I think I would lean towards
> vendoring that or another header only library, instead of depending on
> boost (I would need to double check and seem to recall there being
> difference between boost and the standard one).
>
> [1] https://github.com/martinmoene/optional-lite
>
> On Thu, Nov 14, 2019 at 1:22 PM Gawain Bolton <ga...@free.fr> wrote:
>
> > Hello,
> >
> > I would like to add support for handling optional fields to the
> > parquet::StreamReader and parquet::StreamWriter classes which I recently
> > contributed (thank you!).
> >
> > Ideally I would do this by using std::optional like this:
> >
> >      parquet::StreamWriter writer{ parquet::ParquetFileWriter::Open(...) };
> >
> >      std::optional<double> d;
> >
> >      writer << d;
> >
> >      ...
> >
> >      parquet::StreamReader os{parquet::ParquetFileReader::Open(...)};
> >
> >      reader >> d;
> >
> > However std::optional is only available in C++17 and arrow is compiled
> > in C++11 mode.
> >
> >  From what I see arrow does use Boost to a limited extent and in fact
> > gandiva/cache.h uses the boost::optional class.
> >
> > So would it be possible to use the boost::optional class in parquet?
> >
> > Or perhaps someone can suggest another way of handling optional fields?
> >
> > Thanks in advance for your help,
> >
> > Gawain
> >
> >
> >

Re: [C++][Parquet]: Stream API handling of optional fields

Posted by Micah Kornfield <em...@gmail.com>.
I think there are potentially other places in the Arrow code base that
"optional" could be useful (e.g. a row-reader like class for Arrow
Tables).  It looks like there is at least 1 header only optional library
[1] that is c++17 forward compatible.  I think I would lean towards
vendoring that or another header only library, instead of depending on
boost (I would need to double check and seem to recall there being
difference between boost and the standard one).

[1] https://github.com/martinmoene/optional-lite

On Thu, Nov 14, 2019 at 1:22 PM Gawain Bolton <ga...@free.fr> wrote:

> Hello,
>
> I would like to add support for handling optional fields to the
> parquet::StreamReader and parquet::StreamWriter classes which I recently
> contributed (thank you!).
>
> Ideally I would do this by using std::optional like this:
>
>      parquet::StreamWriter writer{ parquet::ParquetFileWriter::Open(...) };
>
>      std::optional<double> d;
>
>      writer << d;
>
>      ...
>
>      parquet::StreamReader os{parquet::ParquetFileReader::Open(...)};
>
>      reader >> d;
>
> However std::optional is only available in C++17 and arrow is compiled
> in C++11 mode.
>
>  From what I see arrow does use Boost to a limited extent and in fact
> gandiva/cache.h uses the boost::optional class.
>
> So would it be possible to use the boost::optional class in parquet?
>
> Or perhaps someone can suggest another way of handling optional fields?
>
> Thanks in advance for your help,
>
> Gawain
>
>
>