You are viewing a plain text version of this content. The canonical link for it is here.

Posted to dev@arrow.apache.org by Micah Kornfield <em...@gmail.com> on 2020/02/03 06:01:25 UTC

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Just to give an update.  I've been a little bit delayed, but my progress is
as follows:
1.  Had 1 PR merged that will exercise basic end-to-end tests.
2.  Have another PR open that allows a configuration option in C++ to
determine which algorithm version to use for reading/writing, the existing
version and the new version supported complex-nested arrays.  I think a
large amount of code will be reused/delegated to but I will err on the side
of not touching the existing code/algorithms so that any errors in the
implementation  or performance regressions can hopefully be mitigated at
runtime.  I expect in later releases (once the code has "baked") will
become a no-op.
3.  Started coding the write path.

Which leaves:
1.  Finishing the write path (I estimate 2-3 weeks) to be code complete
2.  Implementing the read path.

Again, I'm happy to collaborate if people have bandwidth and want to
contribute.

Thanks,
Micah

On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <em...@gmail.com>
wrote:

> Hi Wes,
> I'm still interested in doing the work.  But don't to hold anybody up if
> they have bandwidth.
>
> In order to actually make progress on this, my plan will be to:
> 1.  Help with the current Java review backlog through early next week or
> so (this has been taking the majority of my time allocated for Arrow
> contributions for the last 6 months or so).
> 2.  Shift all my attention to trying to get this done (this means no
> reviews other then closing out existing ones that I've started until it is
> done).  Hopefully, other Java committers can help shrink the backlog
> further (Jacques thanks for you recent efforts here).
>
> Thanks,
> Micah
>
> On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com> wrote:
>
>> hi folks,
>>
>> I think we have reached a point where the incomplete C++ Parquet
>> nested data assembly/disassembly is harming the value of several
>> others parts of the project, for example the Datasets API. As another
>> example, it's possible to ingest nested data from JSON but not write
>> it to Parquet in general.
>>
>> Implementing the nested data read and write path completely is a
>> difficult project requiring at least several weeks of dedicated work,
>> so it's not so surprising that it hasn't been accomplished yet. I know
>> that several people have expressed interest in working on it, but I
>> would like to see if anyone would be able to volunteer a commitment of
>> time and guess on a rough timeline when this work could be done. It
>> seems to me if this slips beyond 2020 it will significant diminish the
>> value being created by other parts of the project.
>>
>> Since I'm pretty familiar with all the Parquet code I'm one candidate
>> person to take on this project (and I can dedicate the time, but it
>> would come at the expense of other projects where I can also be
>> useful). But Micah and others expressed interest in working on it, so
>> I wanted to have a discussion about it to see what others think.
>>
>> Thanks
>> Wes
>>
>

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Wes McKinney <we...@gmail.com>.

Sounds good.

In general I would say that this is a good opportunity to make
improvements around random data generation. For example, I don't think
we have an API for generating a RecordBatch given a schema and some
options (e.g. probability of nulls, distribution of list sizes), for
example, but that would be a good thing to have to assist both with
perf and correctness testing.

On Thu, Apr 16, 2020 at 11:28 PM Micah Kornfield <em...@gmail.com> wrote:
>
> Hi Wes,
> Thanks that seems like a good characterization.  I opened up some JIRA subtasks on ARROW-1644 which go into a little more detail on tasks that can probably be worked on in parallel (I've only assigned ones to myself that I'm actively working on, happy to add discuss/collaborate on the finer points on the JIRAs).  There will probably be a few more JIRAs to open to do final integration work (e.g. a flag to switch between old and new engines).
>
> For unit tests (Item B).  as noted earlier in the thread there is already a disabled unit test trying to verify basic ability to round-trip but that probably isn't sufficient.
>
> Thanks,
> Micah
>
> On Wed, Apr 15, 2020 at 9:32 AM Wes McKinney <we...@gmail.com> wrote:
>>
>> hi Micah,
>>
>> Sounds good. It seems like there are a few projects where people might
>> be able to work without stepping on each other's toes
>>
>> A. Array reassembly from raw repetition/definition levels (I would
>> guess this would be your focus)
>> B. Schema and data generation for round-trip correctness and
>> performance testing (I reckon that the unit tests for A will largely
>> be hand-written examples like you did for the write path)
>> C. Benchmarks, particularly to be able to assess performance changes
>> going from the old incomplete implementations to the new ones
>>
>> Some of us should be able to pitch in to help with this. Might also be
>> a good opportunity to do some cleanup of the test code in
>> cpp/src/parquet/arrow
>>
>> - Wes
>>
>> On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield <em...@gmail.com> wrote:
>> >
>> > Hi Wes,
>> > Yes, I'm making progress and at this point I anticipate being able to finish it off by next release, possibly without support for round tripping fixed size lists.  I've been spending some time thinking about different approaches and have started coding some of the building blocks, which I think in the common case (relatively low nesting levels) should be fairly performant (I'm also going to write some benchmarks to sanity check this).  One caveat to this is my schedule is going to change slightly next week and its possible my bandwidth might be more limited, I'll update the list if this happens.
>> >
>> > I think there are at least two areas that I'm not working on that could be parallelized if you or your team has bandwidth.
>> >
>> > 1. It would be good to have some parquet files representing real world datasets available to benchmark against.
>> > 2. The higher level book keeping of tracking which def-levels/rep-levels are needed to compare against for any particular column (i.e. preceding repeated parent).  I'm currently working on the code that takes these and converts them to offsets/null fields.
>> >
>> > I can go into more details if you or your team would like to collaborate.
>> >
>> > Thanks,
>> > Micah
>> >
>> > On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney <we...@gmail.com> wrote:
>> >>
>> >> hi Micah,
>> >>
>> >> I'm glad that we have the write side of nested completed for 0.17.0.
>> >>
>> >> As far as completing the read side and then implementing sufficient
>> >> testing to exercise corner cases in end-to-end reads/writes, do you
>> >> anticipate being able to work on this in the next 4-6 weeks (obviously
>> >> the state of the world has affected everyone's availability /
>> >> bandwidth)? I ask because someone from my team (or me also) may be
>> >> able to get involved and help this move along. It'd be great to have
>> >> this 100% completed and checked off our list for the next release
>> >> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
>> >> tests get completed also)
>> >>
>> >> thanks
>> >> Wes
>> >>
>> >> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield <em...@gmail.com> wrote:
>> >> >>
>> >> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> >> think about setting up a feature branch for you to merge PRs into?
>> >> >> Then the branch can be iterated on and we can merge it back when it's
>> >> >> feature complete and does not have perf regressions for the flat
>> >> >> read/write path.
>> >> >>
>> >> > I'd like to avoid a separate branch if possible.  I'm willing to close the open PR till I'm sure it is needed but I'm hoping keeping PRs as small focused as possible with performance testing a long the way will be a better reviewer and developer experience here.
>> >> >
>> >> >> The earliest I'd have time to work on this myself would likely be
>> >> >> sometime in March. Others are welcome to jump in as well (and it'd be
>> >> >> great to increase the overall level of knowledge of the Parquet
>> >> >> codebase)
>> >> >
>> >> > Hopefully, Igor can help out otherwise I'll take up the read path after I finish the write path.
>> >> >
>> >> > -Micah
>> >> >
>> >> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney <we...@gmail.com> wrote:
>> >> >>
>> >> >> hi Micah
>> >> >>
>> >> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <em...@gmail.com> wrote:
>> >> >> >
>> >> >> > Just to give an update.  I've been a little bit delayed, but my progress is
>> >> >> > as follows:
>> >> >> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
>> >> >> > 2.  Have another PR open that allows a configuration option in C++ to
>> >> >> > determine which algorithm version to use for reading/writing, the existing
>> >> >> > version and the new version supported complex-nested arrays.  I think a
>> >> >> > large amount of code will be reused/delegated to but I will err on the side
>> >> >> > of not touching the existing code/algorithms so that any errors in the
>> >> >> > implementation  or performance regressions can hopefully be mitigated at
>> >> >> > runtime.  I expect in later releases (once the code has "baked") will
>> >> >> > become a no-op.
>> >> >>
>> >> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> >> think about setting up a feature branch for you to merge PRs into?
>> >> >> Then the branch can be iterated on and we can merge it back when it's
>> >> >> feature complete and does not have perf regressions for the flat
>> >> >> read/write path.
>> >> >>
>> >> >> > 3.  Started coding the write path.
>> >> >> >
>> >> >> > Which leaves:
>> >> >> > 1.  Finishing the write path (I estimate 2-3 weeks) to be code complete
>> >> >> > 2.  Implementing the read path.
>> >> >>
>> >> >> The earliest I'd have time to work on this myself would likely be
>> >> >> sometime in March. Others are welcome to jump in as well (and it'd be
>> >> >> great to increase the overall level of knowledge of the Parquet
>> >> >> codebase)
>> >> >>
>> >> >> > Again, I'm happy to collaborate if people have bandwidth and want to
>> >> >> > contribute.
>> >> >> >
>> >> >> > Thanks,
>> >> >> > Micah
>> >> >> >
>> >> >> > On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <em...@gmail.com>
>> >> >> > wrote:
>> >> >> >
>> >> >> > > Hi Wes,
>> >> >> > > I'm still interested in doing the work.  But don't to hold anybody up if
>> >> >> > > they have bandwidth.
>> >> >> > >
>> >> >> > > In order to actually make progress on this, my plan will be to:
>> >> >> > > 1.  Help with the current Java review backlog through early next week or
>> >> >> > > so (this has been taking the majority of my time allocated for Arrow
>> >> >> > > contributions for the last 6 months or so).
>> >> >> > > 2.  Shift all my attention to trying to get this done (this means no
>> >> >> > > reviews other then closing out existing ones that I've started until it is
>> >> >> > > done).  Hopefully, other Java committers can help shrink the backlog
>> >> >> > > further (Jacques thanks for you recent efforts here).
>> >> >> > >
>> >> >> > > Thanks,
>> >> >> > > Micah
>> >> >> > >
>> >> >> > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com> wrote:
>> >> >> > >
>> >> >> > >> hi folks,
>> >> >> > >>
>> >> >> > >> I think we have reached a point where the incomplete C++ Parquet
>> >> >> > >> nested data assembly/disassembly is harming the value of several
>> >> >> > >> others parts of the project, for example the Datasets API. As another
>> >> >> > >> example, it's possible to ingest nested data from JSON but not write
>> >> >> > >> it to Parquet in general.
>> >> >> > >>
>> >> >> > >> Implementing the nested data read and write path completely is a
>> >> >> > >> difficult project requiring at least several weeks of dedicated work,
>> >> >> > >> so it's not so surprising that it hasn't been accomplished yet. I know
>> >> >> > >> that several people have expressed interest in working on it, but I
>> >> >> > >> would like to see if anyone would be able to volunteer a commitment of
>> >> >> > >> time and guess on a rough timeline when this work could be done. It
>> >> >> > >> seems to me if this slips beyond 2020 it will significant diminish the
>> >> >> > >> value being created by other parts of the project.
>> >> >> > >>
>> >> >> > >> Since I'm pretty familiar with all the Parquet code I'm one candidate
>> >> >> > >> person to take on this project (and I can dedicate the time, but it
>> >> >> > >> would come at the expense of other projects where I can also be
>> >> >> > >> useful). But Micah and others expressed interest in working on it, so
>> >> >> > >> I wanted to have a discussion about it to see what others think.
>> >> >> > >>
>> >> >> > >> Thanks
>> >> >> > >> Wes
>> >> >> > >>
>> >> >> > >

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Micah Kornfield <em...@gmail.com>.

Hi Wes,
Thanks that seems like a good characterization.  I opened up some JIRA
subtasks on ARROW-1644 which go into a little more detail on tasks that can
probably be worked on in parallel (I've only assigned ones to myself
that I'm actively working on, happy to add discuss/collaborate on the finer
points on the JIRAs).  There will probably be a few more JIRAs to open to
do final integration work (e.g. a flag to switch between old and new
engines).

For unit tests (Item B).  as noted earlier in the thread there is already a
disabled unit test trying to verify basic ability to round-trip but that
probably isn't sufficient.

Thanks,
Micah

On Wed, Apr 15, 2020 at 9:32 AM Wes McKinney <we...@gmail.com> wrote:

> hi Micah,
>
> Sounds good. It seems like there are a few projects where people might
> be able to work without stepping on each other's toes
>
> A. Array reassembly from raw repetition/definition levels (I would
> guess this would be your focus)
> B. Schema and data generation for round-trip correctness and
> performance testing (I reckon that the unit tests for A will largely
> be hand-written examples like you did for the write path)
> C. Benchmarks, particularly to be able to assess performance changes
> going from the old incomplete implementations to the new ones
>
> Some of us should be able to pitch in to help with this. Might also be
> a good opportunity to do some cleanup of the test code in
> cpp/src/parquet/arrow
>
> - Wes
>
> On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Hi Wes,
> > Yes, I'm making progress and at this point I anticipate being able to
> finish it off by next release, possibly without support for round tripping
> fixed size lists.  I've been spending some time thinking about different
> approaches and have started coding some of the building blocks, which I
> think in the common case (relatively low nesting levels) should be fairly
> performant (I'm also going to write some benchmarks to sanity check this).
> One caveat to this is my schedule is going to change slightly next week and
> its possible my bandwidth might be more limited, I'll update the list if
> this happens.
> >
> > I think there are at least two areas that I'm not working on that could
> be parallelized if you or your team has bandwidth.
> >
> > 1. It would be good to have some parquet files representing real world
> datasets available to benchmark against.
> > 2. The higher level book keeping of tracking which def-levels/rep-levels
> are needed to compare against for any particular column (i.e. preceding
> repeated parent).  I'm currently working on the code that takes these and
> converts them to offsets/null fields.
> >
> > I can go into more details if you or your team would like to collaborate.
> >
> > Thanks,
> > Micah
> >
> > On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney <we...@gmail.com>
> wrote:
> >>
> >> hi Micah,
> >>
> >> I'm glad that we have the write side of nested completed for 0.17.0.
> >>
> >> As far as completing the read side and then implementing sufficient
> >> testing to exercise corner cases in end-to-end reads/writes, do you
> >> anticipate being able to work on this in the next 4-6 weeks (obviously
> >> the state of the world has affected everyone's availability /
> >> bandwidth)? I ask because someone from my team (or me also) may be
> >> able to get involved and help this move along. It'd be great to have
> >> this 100% completed and checked off our list for the next release
> >> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
> >> tests get completed also)
> >>
> >> thanks
> >> Wes
> >>
> >> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >> >>
> >> >> Glad to hear about the progress. As I mentioned on #2, what do you
> >> >> think about setting up a feature branch for you to merge PRs into?
> >> >> Then the branch can be iterated on and we can merge it back when it's
> >> >> feature complete and does not have perf regressions for the flat
> >> >> read/write path.
> >> >>
> >> > I'd like to avoid a separate branch if possible.  I'm willing to
> close the open PR till I'm sure it is needed but I'm hoping keeping PRs as
> small focused as possible with performance testing a long the way will be a
> better reviewer and developer experience here.
> >> >
> >> >> The earliest I'd have time to work on this myself would likely be
> >> >> sometime in March. Others are welcome to jump in as well (and it'd be
> >> >> great to increase the overall level of knowledge of the Parquet
> >> >> codebase)
> >> >
> >> > Hopefully, Igor can help out otherwise I'll take up the read path
> after I finish the write path.
> >> >
> >> > -Micah
> >> >
> >> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney <we...@gmail.com>
> wrote:
> >> >>
> >> >> hi Micah
> >> >>
> >> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <
> emkornfield@gmail.com> wrote:
> >> >> >
> >> >> > Just to give an update.  I've been a little bit delayed, but my
> progress is
> >> >> > as follows:
> >> >> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
> >> >> > 2.  Have another PR open that allows a configuration option in C++
> to
> >> >> > determine which algorithm version to use for reading/writing, the
> existing
> >> >> > version and the new version supported complex-nested arrays.  I
> think a
> >> >> > large amount of code will be reused/delegated to but I will err on
> the side
> >> >> > of not touching the existing code/algorithms so that any errors in
> the
> >> >> > implementation  or performance regressions can hopefully be
> mitigated at
> >> >> > runtime.  I expect in later releases (once the code has "baked")
> will
> >> >> > become a no-op.
> >> >>
> >> >> Glad to hear about the progress. As I mentioned on #2, what do you
> >> >> think about setting up a feature branch for you to merge PRs into?
> >> >> Then the branch can be iterated on and we can merge it back when it's
> >> >> feature complete and does not have perf regressions for the flat
> >> >> read/write path.
> >> >>
> >> >> > 3.  Started coding the write path.
> >> >> >
> >> >> > Which leaves:
> >> >> > 1.  Finishing the write path (I estimate 2-3 weeks) to be code
> complete
> >> >> > 2.  Implementing the read path.
> >> >>
> >> >> The earliest I'd have time to work on this myself would likely be
> >> >> sometime in March. Others are welcome to jump in as well (and it'd be
> >> >> great to increase the overall level of knowledge of the Parquet
> >> >> codebase)
> >> >>
> >> >> > Again, I'm happy to collaborate if people have bandwidth and want
> to
> >> >> > contribute.
> >> >> >
> >> >> > Thanks,
> >> >> > Micah
> >> >> >
> >> >> > On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <
> emkornfield@gmail.com>
> >> >> > wrote:
> >> >> >
> >> >> > > Hi Wes,
> >> >> > > I'm still interested in doing the work.  But don't to hold
> anybody up if
> >> >> > > they have bandwidth.
> >> >> > >
> >> >> > > In order to actually make progress on this, my plan will be to:
> >> >> > > 1.  Help with the current Java review backlog through early next
> week or
> >> >> > > so (this has been taking the majority of my time allocated for
> Arrow
> >> >> > > contributions for the last 6 months or so).
> >> >> > > 2.  Shift all my attention to trying to get this done (this
> means no
> >> >> > > reviews other then closing out existing ones that I've started
> until it is
> >> >> > > done).  Hopefully, other Java committers can help shrink the
> backlog
> >> >> > > further (Jacques thanks for you recent efforts here).
> >> >> > >
> >> >> > > Thanks,
> >> >> > > Micah
> >> >> > >
> >> >> > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com>
> wrote:
> >> >> > >
> >> >> > >> hi folks,
> >> >> > >>
> >> >> > >> I think we have reached a point where the incomplete C++ Parquet
> >> >> > >> nested data assembly/disassembly is harming the value of several
> >> >> > >> others parts of the project, for example the Datasets API. As
> another
> >> >> > >> example, it's possible to ingest nested data from JSON but not
> write
> >> >> > >> it to Parquet in general.
> >> >> > >>
> >> >> > >> Implementing the nested data read and write path completely is a
> >> >> > >> difficult project requiring at least several weeks of dedicated
> work,
> >> >> > >> so it's not so surprising that it hasn't been accomplished yet.
> I know
> >> >> > >> that several people have expressed interest in working on it,
> but I
> >> >> > >> would like to see if anyone would be able to volunteer a
> commitment of
> >> >> > >> time and guess on a rough timeline when this work could be
> done. It
> >> >> > >> seems to me if this slips beyond 2020 it will significant
> diminish the
> >> >> > >> value being created by other parts of the project.
> >> >> > >>
> >> >> > >> Since I'm pretty familiar with all the Parquet code I'm one
> candidate
> >> >> > >> person to take on this project (and I can dedicate the time,
> but it
> >> >> > >> would come at the expense of other projects where I can also be
> >> >> > >> useful). But Micah and others expressed interest in working on
> it, so
> >> >> > >> I wanted to have a discussion about it to see what others think.
> >> >> > >>
> >> >> > >> Thanks
> >> >> > >> Wes
> >> >> > >>
> >> >> > >
>

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Wes McKinney <we...@gmail.com>.

hi Micah,

Sounds good. It seems like there are a few projects where people might
be able to work without stepping on each other's toes

A. Array reassembly from raw repetition/definition levels (I would
guess this would be your focus)
B. Schema and data generation for round-trip correctness and
performance testing (I reckon that the unit tests for A will largely
be hand-written examples like you did for the write path)
C. Benchmarks, particularly to be able to assess performance changes
going from the old incomplete implementations to the new ones

Some of us should be able to pitch in to help with this. Might also be
a good opportunity to do some cleanup of the test code in
cpp/src/parquet/arrow

- Wes

On Tue, Apr 14, 2020 at 11:19 PM Micah Kornfield <em...@gmail.com> wrote:
>
> Hi Wes,
> Yes, I'm making progress and at this point I anticipate being able to finish it off by next release, possibly without support for round tripping fixed size lists.  I've been spending some time thinking about different approaches and have started coding some of the building blocks, which I think in the common case (relatively low nesting levels) should be fairly performant (I'm also going to write some benchmarks to sanity check this).  One caveat to this is my schedule is going to change slightly next week and its possible my bandwidth might be more limited, I'll update the list if this happens.
>
> I think there are at least two areas that I'm not working on that could be parallelized if you or your team has bandwidth.
>
> 1. It would be good to have some parquet files representing real world datasets available to benchmark against.
> 2. The higher level book keeping of tracking which def-levels/rep-levels are needed to compare against for any particular column (i.e. preceding repeated parent).  I'm currently working on the code that takes these and converts them to offsets/null fields.
>
> I can go into more details if you or your team would like to collaborate.
>
> Thanks,
> Micah
>
> On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney <we...@gmail.com> wrote:
>>
>> hi Micah,
>>
>> I'm glad that we have the write side of nested completed for 0.17.0.
>>
>> As far as completing the read side and then implementing sufficient
>> testing to exercise corner cases in end-to-end reads/writes, do you
>> anticipate being able to work on this in the next 4-6 weeks (obviously
>> the state of the world has affected everyone's availability /
>> bandwidth)? I ask because someone from my team (or me also) may be
>> able to get involved and help this move along. It'd be great to have
>> this 100% completed and checked off our list for the next release
>> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
>> tests get completed also)
>>
>> thanks
>> Wes
>>
>> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield <em...@gmail.com> wrote:
>> >>
>> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> think about setting up a feature branch for you to merge PRs into?
>> >> Then the branch can be iterated on and we can merge it back when it's
>> >> feature complete and does not have perf regressions for the flat
>> >> read/write path.
>> >>
>> > I'd like to avoid a separate branch if possible.  I'm willing to close the open PR till I'm sure it is needed but I'm hoping keeping PRs as small focused as possible with performance testing a long the way will be a better reviewer and developer experience here.
>> >
>> >> The earliest I'd have time to work on this myself would likely be
>> >> sometime in March. Others are welcome to jump in as well (and it'd be
>> >> great to increase the overall level of knowledge of the Parquet
>> >> codebase)
>> >
>> > Hopefully, Igor can help out otherwise I'll take up the read path after I finish the write path.
>> >
>> > -Micah
>> >
>> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney <we...@gmail.com> wrote:
>> >>
>> >> hi Micah
>> >>
>> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <em...@gmail.com> wrote:
>> >> >
>> >> > Just to give an update.  I've been a little bit delayed, but my progress is
>> >> > as follows:
>> >> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
>> >> > 2.  Have another PR open that allows a configuration option in C++ to
>> >> > determine which algorithm version to use for reading/writing, the existing
>> >> > version and the new version supported complex-nested arrays.  I think a
>> >> > large amount of code will be reused/delegated to but I will err on the side
>> >> > of not touching the existing code/algorithms so that any errors in the
>> >> > implementation  or performance regressions can hopefully be mitigated at
>> >> > runtime.  I expect in later releases (once the code has "baked") will
>> >> > become a no-op.
>> >>
>> >> Glad to hear about the progress. As I mentioned on #2, what do you
>> >> think about setting up a feature branch for you to merge PRs into?
>> >> Then the branch can be iterated on and we can merge it back when it's
>> >> feature complete and does not have perf regressions for the flat
>> >> read/write path.
>> >>
>> >> > 3.  Started coding the write path.
>> >> >
>> >> > Which leaves:
>> >> > 1.  Finishing the write path (I estimate 2-3 weeks) to be code complete
>> >> > 2.  Implementing the read path.
>> >>
>> >> The earliest I'd have time to work on this myself would likely be
>> >> sometime in March. Others are welcome to jump in as well (and it'd be
>> >> great to increase the overall level of knowledge of the Parquet
>> >> codebase)
>> >>
>> >> > Again, I'm happy to collaborate if people have bandwidth and want to
>> >> > contribute.
>> >> >
>> >> > Thanks,
>> >> > Micah
>> >> >
>> >> > On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <em...@gmail.com>
>> >> > wrote:
>> >> >
>> >> > > Hi Wes,
>> >> > > I'm still interested in doing the work.  But don't to hold anybody up if
>> >> > > they have bandwidth.
>> >> > >
>> >> > > In order to actually make progress on this, my plan will be to:
>> >> > > 1.  Help with the current Java review backlog through early next week or
>> >> > > so (this has been taking the majority of my time allocated for Arrow
>> >> > > contributions for the last 6 months or so).
>> >> > > 2.  Shift all my attention to trying to get this done (this means no
>> >> > > reviews other then closing out existing ones that I've started until it is
>> >> > > done).  Hopefully, other Java committers can help shrink the backlog
>> >> > > further (Jacques thanks for you recent efforts here).
>> >> > >
>> >> > > Thanks,
>> >> > > Micah
>> >> > >
>> >> > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com> wrote:
>> >> > >
>> >> > >> hi folks,
>> >> > >>
>> >> > >> I think we have reached a point where the incomplete C++ Parquet
>> >> > >> nested data assembly/disassembly is harming the value of several
>> >> > >> others parts of the project, for example the Datasets API. As another
>> >> > >> example, it's possible to ingest nested data from JSON but not write
>> >> > >> it to Parquet in general.
>> >> > >>
>> >> > >> Implementing the nested data read and write path completely is a
>> >> > >> difficult project requiring at least several weeks of dedicated work,
>> >> > >> so it's not so surprising that it hasn't been accomplished yet. I know
>> >> > >> that several people have expressed interest in working on it, but I
>> >> > >> would like to see if anyone would be able to volunteer a commitment of
>> >> > >> time and guess on a rough timeline when this work could be done. It
>> >> > >> seems to me if this slips beyond 2020 it will significant diminish the
>> >> > >> value being created by other parts of the project.
>> >> > >>
>> >> > >> Since I'm pretty familiar with all the Parquet code I'm one candidate
>> >> > >> person to take on this project (and I can dedicate the time, but it
>> >> > >> would come at the expense of other projects where I can also be
>> >> > >> useful). But Micah and others expressed interest in working on it, so
>> >> > >> I wanted to have a discussion about it to see what others think.
>> >> > >>
>> >> > >> Thanks
>> >> > >> Wes
>> >> > >>
>> >> > >

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Micah Kornfield <em...@gmail.com>.

Hi Wes,
Yes, I'm making progress and at this point I anticipate being able to
finish it off by next release, possibly without support for round tripping
fixed size lists.  I've been spending some time thinking about different
approaches and have started coding some of the building blocks, which I
think in the common case (relatively low nesting levels) should be fairly
performant (I'm also going to write some benchmarks to sanity check
this).  One caveat to this is my schedule is going to change slightly next
week and its possible my bandwidth might be more limited, I'll update the
list if this happens.

I think there are at least two areas that I'm not working on that could be
parallelized if you or your team has bandwidth.

1. It would be good to have some parquet files representing real world
datasets available to benchmark against.
2. The higher level book keeping of tracking which def-levels/rep-levels
are needed to compare against for any particular column (i.e. preceding
repeated parent).  I'm currently working on the code that takes these and
converts them to offsets/null fields.

I can go into more details if you or your team would like to collaborate.

Thanks,
Micah

On Tue, Apr 14, 2020 at 7:48 AM Wes McKinney <we...@gmail.com> wrote:

> hi Micah,
>
> I'm glad that we have the write side of nested completed for 0.17.0.
>
> As far as completing the read side and then implementing sufficient
> testing to exercise corner cases in end-to-end reads/writes, do you
> anticipate being able to work on this in the next 4-6 weeks (obviously
> the state of the world has affected everyone's availability /
> bandwidth)? I ask because someone from my team (or me also) may be
> able to get involved and help this move along. It'd be great to have
> this 100% completed and checked off our list for the next release
> (i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
> tests get completed also)
>
> thanks
> Wes
>
> On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >>
> >> Glad to hear about the progress. As I mentioned on #2, what do you
> >> think about setting up a feature branch for you to merge PRs into?
> >> Then the branch can be iterated on and we can merge it back when it's
> >> feature complete and does not have perf regressions for the flat
> >> read/write path.
> >>
> > I'd like to avoid a separate branch if possible.  I'm willing to close
> the open PR till I'm sure it is needed but I'm hoping keeping PRs as small
> focused as possible with performance testing a long the way will be a
> better reviewer and developer experience here.
> >
> >> The earliest I'd have time to work on this myself would likely be
> >> sometime in March. Others are welcome to jump in as well (and it'd be
> >> great to increase the overall level of knowledge of the Parquet
> >> codebase)
> >
> > Hopefully, Igor can help out otherwise I'll take up the read path after
> I finish the write path.
> >
> > -Micah
> >
> > On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney <we...@gmail.com> wrote:
> >>
> >> hi Micah
> >>
> >> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >> >
> >> > Just to give an update.  I've been a little bit delayed, but my
> progress is
> >> > as follows:
> >> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
> >> > 2.  Have another PR open that allows a configuration option in C++ to
> >> > determine which algorithm version to use for reading/writing, the
> existing
> >> > version and the new version supported complex-nested arrays.  I think
> a
> >> > large amount of code will be reused/delegated to but I will err on
> the side
> >> > of not touching the existing code/algorithms so that any errors in the
> >> > implementation  or performance regressions can hopefully be mitigated
> at
> >> > runtime.  I expect in later releases (once the code has "baked") will
> >> > become a no-op.
> >>
> >> Glad to hear about the progress. As I mentioned on #2, what do you
> >> think about setting up a feature branch for you to merge PRs into?
> >> Then the branch can be iterated on and we can merge it back when it's
> >> feature complete and does not have perf regressions for the flat
> >> read/write path.
> >>
> >> > 3.  Started coding the write path.
> >> >
> >> > Which leaves:
> >> > 1.  Finishing the write path (I estimate 2-3 weeks) to be code
> complete
> >> > 2.  Implementing the read path.
> >>
> >> The earliest I'd have time to work on this myself would likely be
> >> sometime in March. Others are welcome to jump in as well (and it'd be
> >> great to increase the overall level of knowledge of the Parquet
> >> codebase)
> >>
> >> > Again, I'm happy to collaborate if people have bandwidth and want to
> >> > contribute.
> >> >
> >> > Thanks,
> >> > Micah
> >> >
> >> > On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <
> emkornfield@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi Wes,
> >> > > I'm still interested in doing the work.  But don't to hold anybody
> up if
> >> > > they have bandwidth.
> >> > >
> >> > > In order to actually make progress on this, my plan will be to:
> >> > > 1.  Help with the current Java review backlog through early next
> week or
> >> > > so (this has been taking the majority of my time allocated for Arrow
> >> > > contributions for the last 6 months or so).
> >> > > 2.  Shift all my attention to trying to get this done (this means no
> >> > > reviews other then closing out existing ones that I've started
> until it is
> >> > > done).  Hopefully, other Java committers can help shrink the backlog
> >> > > further (Jacques thanks for you recent efforts here).
> >> > >
> >> > > Thanks,
> >> > > Micah
> >> > >
> >> > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com>
> wrote:
> >> > >
> >> > >> hi folks,
> >> > >>
> >> > >> I think we have reached a point where the incomplete C++ Parquet
> >> > >> nested data assembly/disassembly is harming the value of several
> >> > >> others parts of the project, for example the Datasets API. As
> another
> >> > >> example, it's possible to ingest nested data from JSON but not
> write
> >> > >> it to Parquet in general.
> >> > >>
> >> > >> Implementing the nested data read and write path completely is a
> >> > >> difficult project requiring at least several weeks of dedicated
> work,
> >> > >> so it's not so surprising that it hasn't been accomplished yet. I
> know
> >> > >> that several people have expressed interest in working on it, but I
> >> > >> would like to see if anyone would be able to volunteer a
> commitment of
> >> > >> time and guess on a rough timeline when this work could be done. It
> >> > >> seems to me if this slips beyond 2020 it will significant diminish
> the
> >> > >> value being created by other parts of the project.
> >> > >>
> >> > >> Since I'm pretty familiar with all the Parquet code I'm one
> candidate
> >> > >> person to take on this project (and I can dedicate the time, but it
> >> > >> would come at the expense of other projects where I can also be
> >> > >> useful). But Micah and others expressed interest in working on it,
> so
> >> > >> I wanted to have a discussion about it to see what others think.
> >> > >>
> >> > >> Thanks
> >> > >> Wes
> >> > >>
> >> > >
>

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Wes McKinney <we...@gmail.com>.

hi Micah,

I'm glad that we have the write side of nested completed for 0.17.0.

As far as completing the read side and then implementing sufficient
testing to exercise corner cases in end-to-end reads/writes, do you
anticipate being able to work on this in the next 4-6 weeks (obviously
the state of the world has affected everyone's availability /
bandwidth)? I ask because someone from my team (or me also) may be
able to get involved and help this move along. It'd be great to have
this 100% completed and checked off our list for the next release
(i.e. 0.18.0 or 1.0.0 depending on whether the Java/C++ integration
tests get completed also)

thanks
Wes

On Wed, Feb 5, 2020 at 12:12 AM Micah Kornfield <em...@gmail.com> wrote:
>>
>> Glad to hear about the progress. As I mentioned on #2, what do you
>> think about setting up a feature branch for you to merge PRs into?
>> Then the branch can be iterated on and we can merge it back when it's
>> feature complete and does not have perf regressions for the flat
>> read/write path.
>>
> I'd like to avoid a separate branch if possible.  I'm willing to close the open PR till I'm sure it is needed but I'm hoping keeping PRs as small focused as possible with performance testing a long the way will be a better reviewer and developer experience here.
>
>> The earliest I'd have time to work on this myself would likely be
>> sometime in March. Others are welcome to jump in as well (and it'd be
>> great to increase the overall level of knowledge of the Parquet
>> codebase)
>
> Hopefully, Igor can help out otherwise I'll take up the read path after I finish the write path.
>
> -Micah
>
> On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney <we...@gmail.com> wrote:
>>
>> hi Micah
>>
>> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <em...@gmail.com> wrote:
>> >
>> > Just to give an update.  I've been a little bit delayed, but my progress is
>> > as follows:
>> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
>> > 2.  Have another PR open that allows a configuration option in C++ to
>> > determine which algorithm version to use for reading/writing, the existing
>> > version and the new version supported complex-nested arrays.  I think a
>> > large amount of code will be reused/delegated to but I will err on the side
>> > of not touching the existing code/algorithms so that any errors in the
>> > implementation  or performance regressions can hopefully be mitigated at
>> > runtime.  I expect in later releases (once the code has "baked") will
>> > become a no-op.
>>
>> Glad to hear about the progress. As I mentioned on #2, what do you
>> think about setting up a feature branch for you to merge PRs into?
>> Then the branch can be iterated on and we can merge it back when it's
>> feature complete and does not have perf regressions for the flat
>> read/write path.
>>
>> > 3.  Started coding the write path.
>> >
>> > Which leaves:
>> > 1.  Finishing the write path (I estimate 2-3 weeks) to be code complete
>> > 2.  Implementing the read path.
>>
>> The earliest I'd have time to work on this myself would likely be
>> sometime in March. Others are welcome to jump in as well (and it'd be
>> great to increase the overall level of knowledge of the Parquet
>> codebase)
>>
>> > Again, I'm happy to collaborate if people have bandwidth and want to
>> > contribute.
>> >
>> > Thanks,
>> > Micah
>> >
>> > On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <em...@gmail.com>
>> > wrote:
>> >
>> > > Hi Wes,
>> > > I'm still interested in doing the work.  But don't to hold anybody up if
>> > > they have bandwidth.
>> > >
>> > > In order to actually make progress on this, my plan will be to:
>> > > 1.  Help with the current Java review backlog through early next week or
>> > > so (this has been taking the majority of my time allocated for Arrow
>> > > contributions for the last 6 months or so).
>> > > 2.  Shift all my attention to trying to get this done (this means no
>> > > reviews other then closing out existing ones that I've started until it is
>> > > done).  Hopefully, other Java committers can help shrink the backlog
>> > > further (Jacques thanks for you recent efforts here).
>> > >
>> > > Thanks,
>> > > Micah
>> > >
>> > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com> wrote:
>> > >
>> > >> hi folks,
>> > >>
>> > >> I think we have reached a point where the incomplete C++ Parquet
>> > >> nested data assembly/disassembly is harming the value of several
>> > >> others parts of the project, for example the Datasets API. As another
>> > >> example, it's possible to ingest nested data from JSON but not write
>> > >> it to Parquet in general.
>> > >>
>> > >> Implementing the nested data read and write path completely is a
>> > >> difficult project requiring at least several weeks of dedicated work,
>> > >> so it's not so surprising that it hasn't been accomplished yet. I know
>> > >> that several people have expressed interest in working on it, but I
>> > >> would like to see if anyone would be able to volunteer a commitment of
>> > >> time and guess on a rough timeline when this work could be done. It
>> > >> seems to me if this slips beyond 2020 it will significant diminish the
>> > >> value being created by other parts of the project.
>> > >>
>> > >> Since I'm pretty familiar with all the Parquet code I'm one candidate
>> > >> person to take on this project (and I can dedicate the time, but it
>> > >> would come at the expense of other projects where I can also be
>> > >> useful). But Micah and others expressed interest in working on it, so
>> > >> I wanted to have a discussion about it to see what others think.
>> > >>
>> > >> Thanks
>> > >> Wes
>> > >>
>> > >

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Micah Kornfield <em...@gmail.com>.

>
> Glad to hear about the progress. As I mentioned on #2, what do you
> think about setting up a feature branch for you to merge PRs into?
> Then the branch can be iterated on and we can merge it back when it's
> feature complete and does not have perf regressions for the flat
> read/write path.
>
> I'd like to avoid a separate branch if possible.  I'm willing to close the
open PR till I'm sure it is needed but I'm hoping keeping PRs as small
focused as possible with performance testing a long the way will be a
better reviewer and developer experience here.

The earliest I'd have time to work on this myself would likely be
> sometime in March. Others are welcome to jump in as well (and it'd be
> great to increase the overall level of knowledge of the Parquet
> codebase)

Hopefully, Igor can help out otherwise I'll take up the read path after I
finish the write path.

-Micah

On Tue, Feb 4, 2020 at 3:31 PM Wes McKinney <we...@gmail.com> wrote:

> hi Micah
>
> On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <em...@gmail.com>
> wrote:
> >
> > Just to give an update.  I've been a little bit delayed, but my progress
> is
> > as follows:
> > 1.  Had 1 PR merged that will exercise basic end-to-end tests.
> > 2.  Have another PR open that allows a configuration option in C++ to
> > determine which algorithm version to use for reading/writing, the
> existing
> > version and the new version supported complex-nested arrays.  I think a
> > large amount of code will be reused/delegated to but I will err on the
> side
> > of not touching the existing code/algorithms so that any errors in the
> > implementation  or performance regressions can hopefully be mitigated at
> > runtime.  I expect in later releases (once the code has "baked") will
> > become a no-op.
>
> Glad to hear about the progress. As I mentioned on #2, what do you
> think about setting up a feature branch for you to merge PRs into?
> Then the branch can be iterated on and we can merge it back when it's
> feature complete and does not have perf regressions for the flat
> read/write path.
>
> > 3.  Started coding the write path.
> >
> > Which leaves:
> > 1.  Finishing the write path (I estimate 2-3 weeks) to be code complete
> > 2.  Implementing the read path.
>
> The earliest I'd have time to work on this myself would likely be
> sometime in March. Others are welcome to jump in as well (and it'd be
> great to increase the overall level of knowledge of the Parquet
> codebase)
>
> > Again, I'm happy to collaborate if people have bandwidth and want to
> > contribute.
> >
> > Thanks,
> > Micah
> >
> > On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <em...@gmail.com>
> > wrote:
> >
> > > Hi Wes,
> > > I'm still interested in doing the work.  But don't to hold anybody up
> if
> > > they have bandwidth.
> > >
> > > In order to actually make progress on this, my plan will be to:
> > > 1.  Help with the current Java review backlog through early next week
> or
> > > so (this has been taking the majority of my time allocated for Arrow
> > > contributions for the last 6 months or so).
> > > 2.  Shift all my attention to trying to get this done (this means no
> > > reviews other then closing out existing ones that I've started until
> it is
> > > done).  Hopefully, other Java committers can help shrink the backlog
> > > further (Jacques thanks for you recent efforts here).
> > >
> > > Thanks,
> > > Micah
> > >
> > > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com>
> wrote:
> > >
> > >> hi folks,
> > >>
> > >> I think we have reached a point where the incomplete C++ Parquet
> > >> nested data assembly/disassembly is harming the value of several
> > >> others parts of the project, for example the Datasets API. As another
> > >> example, it's possible to ingest nested data from JSON but not write
> > >> it to Parquet in general.
> > >>
> > >> Implementing the nested data read and write path completely is a
> > >> difficult project requiring at least several weeks of dedicated work,
> > >> so it's not so surprising that it hasn't been accomplished yet. I know
> > >> that several people have expressed interest in working on it, but I
> > >> would like to see if anyone would be able to volunteer a commitment of
> > >> time and guess on a rough timeline when this work could be done. It
> > >> seems to me if this slips beyond 2020 it will significant diminish the
> > >> value being created by other parts of the project.
> > >>
> > >> Since I'm pretty familiar with all the Parquet code I'm one candidate
> > >> person to take on this project (and I can dedicate the time, but it
> > >> would come at the expense of other projects where I can also be
> > >> useful). But Micah and others expressed interest in working on it, so
> > >> I wanted to have a discussion about it to see what others think.
> > >>
> > >> Thanks
> > >> Wes
> > >>
> > >
>

Re: Coordinating / scheduling C++ Parquet-Arrow nested data work (ARROW-1644 and others)

Posted by Wes McKinney <we...@gmail.com>.

hi Micah

On Mon, Feb 3, 2020 at 12:01 AM Micah Kornfield <em...@gmail.com> wrote:
>
> Just to give an update.  I've been a little bit delayed, but my progress is
> as follows:
> 1.  Had 1 PR merged that will exercise basic end-to-end tests.
> 2.  Have another PR open that allows a configuration option in C++ to
> determine which algorithm version to use for reading/writing, the existing
> version and the new version supported complex-nested arrays.  I think a
> large amount of code will be reused/delegated to but I will err on the side
> of not touching the existing code/algorithms so that any errors in the
> implementation  or performance regressions can hopefully be mitigated at
> runtime.  I expect in later releases (once the code has "baked") will
> become a no-op.

Glad to hear about the progress. As I mentioned on #2, what do you
think about setting up a feature branch for you to merge PRs into?
Then the branch can be iterated on and we can merge it back when it's
feature complete and does not have perf regressions for the flat
read/write path.

> 3.  Started coding the write path.
>
> Which leaves:
> 1.  Finishing the write path (I estimate 2-3 weeks) to be code complete
> 2.  Implementing the read path.

The earliest I'd have time to work on this myself would likely be
sometime in March. Others are welcome to jump in as well (and it'd be
great to increase the overall level of knowledge of the Parquet
codebase)

> Again, I'm happy to collaborate if people have bandwidth and want to
> contribute.
>
> Thanks,
> Micah
>
> On Thu, Jan 9, 2020 at 10:31 PM Micah Kornfield <em...@gmail.com>
> wrote:
>
> > Hi Wes,
> > I'm still interested in doing the work.  But don't to hold anybody up if
> > they have bandwidth.
> >
> > In order to actually make progress on this, my plan will be to:
> > 1.  Help with the current Java review backlog through early next week or
> > so (this has been taking the majority of my time allocated for Arrow
> > contributions for the last 6 months or so).
> > 2.  Shift all my attention to trying to get this done (this means no
> > reviews other then closing out existing ones that I've started until it is
> > done).  Hopefully, other Java committers can help shrink the backlog
> > further (Jacques thanks for you recent efforts here).
> >
> > Thanks,
> > Micah
> >
> > On Thu, Jan 9, 2020 at 8:16 AM Wes McKinney <we...@gmail.com> wrote:
> >
> >> hi folks,
> >>
> >> I think we have reached a point where the incomplete C++ Parquet
> >> nested data assembly/disassembly is harming the value of several
> >> others parts of the project, for example the Datasets API. As another
> >> example, it's possible to ingest nested data from JSON but not write
> >> it to Parquet in general.
> >>
> >> Implementing the nested data read and write path completely is a
> >> difficult project requiring at least several weeks of dedicated work,
> >> so it's not so surprising that it hasn't been accomplished yet. I know
> >> that several people have expressed interest in working on it, but I
> >> would like to see if anyone would be able to volunteer a commitment of
> >> time and guess on a rough timeline when this work could be done. It
> >> seems to me if this slips beyond 2020 it will significant diminish the
> >> value being created by other parts of the project.
> >>
> >> Since I'm pretty familiar with all the Parquet code I'm one candidate
> >> person to take on this project (and I can dedicate the time, but it
> >> would come at the expense of other projects where I can also be
> >> useful). But Micah and others expressed interest in working on it, so
> >> I wanted to have a discussion about it to see what others think.
> >>
> >> Thanks
> >> Wes
> >>
> >