You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@arrow.apache.org by Weston Pace <we...@gmail.com> on 2021/07/29 21:08:52 UTC

[DISCUSS] Datasets API plugins?

In reviewing the RADOS PR I ran into another question.  I recently
sent an email on the topic where the author wants their integration to
be part of the Arrow repo (I believe this is the case for the RADOS
PR).  However, what about the case where the author doesn't want to be
part of the Gibhub repo (so, to be clear, this email is not relevant
for the RADOS PR).

Right now, in order to add a new file format to the dataset API the
author has to add code to the Arrow codebase to create a new
FileFormat or Fragment.  Do we want to make the datasets API a
"plugin" architecture to allow new formats in the future be added
dynamically.

Of course, now that I'm writing the email, I suppose the answer is
clear.  If someone cares enough about having an external extension
they can always do the work to add such a plugin system.  Does this
sound right or is there some other reason against this or different
approach we'd want to take in the future?

Re: [DISCUSS] Datasets API plugins?

Posted by Wes McKinney <we...@gmail.com>.
I think if someone wants to build a plugin model for datasets / file
formats (and refactor the existing "built-in" formats to use those
plugin APIs), that sounds like a fine idea to me. I don't think the
idea was for the API to be closed only to the formats that are
implemented inside the Arrow codebase.

On Thu, Jul 29, 2021 at 4:09 PM Weston Pace <we...@gmail.com> wrote:
>
> In reviewing the RADOS PR I ran into another question.  I recently
> sent an email on the topic where the author wants their integration to
> be part of the Arrow repo (I believe this is the case for the RADOS
> PR).  However, what about the case where the author doesn't want to be
> part of the Gibhub repo (so, to be clear, this email is not relevant
> for the RADOS PR).
>
> Right now, in order to add a new file format to the dataset API the
> author has to add code to the Arrow codebase to create a new
> FileFormat or Fragment.  Do we want to make the datasets API a
> "plugin" architecture to allow new formats in the future be added
> dynamically.
>
> Of course, now that I'm writing the email, I suppose the answer is
> clear.  If someone cares enough about having an external extension
> they can always do the work to add such a plugin system.  Does this
> sound right or is there some other reason against this or different
> approach we'd want to take in the future?