You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@knox.apache.org by larry mccay <lm...@apache.org> on 2019/11/07 01:41:47 UTC

[DISCUSS] KIP-14 - KnoxShell Improvement for Tabular Data

All -

I've created the following KIP to try and capture the motivation, usecases
and vision for KnoxShellTable in the following KIP.

I may extend this rather than add a new KIP for the Custom Groovy Shell
commands that I am working on for working with KnoxShellTable but in the
meantime, this represents the core work for tabular data. Some details of
the current implementation are still missing and may be added as we go.

https://cwiki.apache.org/confluence/display/KNOX/KIP-14+-+KnoxShell+Improvements+for+Tabular+Data

Any thoughts on this are more than welcome!

thanks,

--larry

Re: [DISCUSS] KIP-14 - KnoxShell Improvement for Tabular Data

Posted by larry mccay <lm...@apache.org>.
That is a fair point and this sort of processing should not be considered a
primary goal of Knox.
However, as a client environment for remote clusters that will be
increasingly locked down from SSH, it would be good to be able to do
meaningful things as appropriate. Appropriate scenarios would be for small
tasks, demonstrations and dev/testing work.

As far as general purpose table operations and JDBC - not all data sources
are from JDBC.
Indeed those that are JDBC based sources already have those capabilities
but being able to join tabular data across data source types - say an
oracle database and a csv file and publish to a hive table in a cluster
would make sense.

There is no goal here to reinvent dataframes to actually compete in that
space.


On Mon, Nov 11, 2019 at 2:25 PM Kevin Risden <kr...@apache.org> wrote:

> We need to be careful about how far this work goes into data processing.
> There is a difference between providing a client side DSL to interact with
> cluster services and building a completely standalone data processing DSL.
>
> I worry that the sorting/joining/processing is straying too far from just
> being a client side DSL and turning into a whole data processing framework.
> Specifically UC 3 seems to be significantly outside of the scope of what
> Knox has historically done. Most of that processing should be pushed down
> to the engine exposed via UC 1. I don't see why the Knox DSL should provide
> that type of interaction when it should be pushed down to the JDBC
> interface and the results returned through Knox.
>
> Kevin Risden
>
>
> On Wed, Nov 6, 2019 at 8:42 PM larry mccay <lm...@apache.org> wrote:
>
> > All -
> >
> > I've created the following KIP to try and capture the motivation,
> usecases
> > and vision for KnoxShellTable in the following KIP.
> >
> > I may extend this rather than add a new KIP for the Custom Groovy Shell
> > commands that I am working on for working with KnoxShellTable but in the
> > meantime, this represents the core work for tabular data. Some details of
> > the current implementation are still missing and may be added as we go.
> >
> >
> >
> https://cwiki.apache.org/confluence/display/KNOX/KIP-14+-+KnoxShell+Improvements+for+Tabular+Data
> >
> > Any thoughts on this are more than welcome!
> >
> > thanks,
> >
> > --larry
> >
>

Re: [DISCUSS] KIP-14 - KnoxShell Improvement for Tabular Data

Posted by Kevin Risden <kr...@apache.org>.
We need to be careful about how far this work goes into data processing.
There is a difference between providing a client side DSL to interact with
cluster services and building a completely standalone data processing DSL.

I worry that the sorting/joining/processing is straying too far from just
being a client side DSL and turning into a whole data processing framework.
Specifically UC 3 seems to be significantly outside of the scope of what
Knox has historically done. Most of that processing should be pushed down
to the engine exposed via UC 1. I don't see why the Knox DSL should provide
that type of interaction when it should be pushed down to the JDBC
interface and the results returned through Knox.

Kevin Risden


On Wed, Nov 6, 2019 at 8:42 PM larry mccay <lm...@apache.org> wrote:

> All -
>
> I've created the following KIP to try and capture the motivation, usecases
> and vision for KnoxShellTable in the following KIP.
>
> I may extend this rather than add a new KIP for the Custom Groovy Shell
> commands that I am working on for working with KnoxShellTable but in the
> meantime, this represents the core work for tabular data. Some details of
> the current implementation are still missing and may be added as we go.
>
>
> https://cwiki.apache.org/confluence/display/KNOX/KIP-14+-+KnoxShell+Improvements+for+Tabular+Data
>
> Any thoughts on this are more than welcome!
>
> thanks,
>
> --larry
>