You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@amaterasu.apache.org by Arun Manivannan <ar...@arunma.com> on 2019/01/13 11:16:27 UTC

Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

Hi Guy, Yaniv and Nadiv,

This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
captures part of the issue - the datasets.yaml, ConfigManager and the
testcases. The Integration with the AmaContext is yet to be done but I
would like to get your thoughts on the implementation.

Guy - Would it be okay if you could help throw some light on the syntax and
the idiomatic part of Kotlin itself. Newbie here.

Cheers,
Arun

On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
wrote:

> Yaniv Rodenski created AMATERASU-52:
> ---------------------------------------
>
>              Summary: Implement AmaContext.datastores
>                  Key: AMATERASU-52
>                  URL: https://issues.apache.org/jira/browse/AMATERASU-52
>              Project: AMATERASU
>           Issue Type: Task
>             Reporter: Yaniv Rodenski
>             Assignee: Arun Manivannan
>              Fix For: 0.2.1-incubating
>
>
> AmaContext.datastores should contain the data from datastores.yaml
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>

Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

Posted by Arun Manivannan <ar...@arunma.com>.
Hi,

I realised that making data classes for the config may not be the right
approach considering we won't be able to know all the properties in
advance. Would you consider a Map instead?

I have updated the PR to reflect this proposal.  The datasets.yaml sticks
to the format that I mentioned in yesterday's mail.  Please have a look and
let me know if this works.

Regards,
Arun

On Tue, Jan 29, 2019 at 10:05 PM Arun Manivannan <ar...@arunma.com> wrote:

> Makes sense, Nadav. I have been toying with the idea of having the
> structure like this. I am trying to make it work on konf (argggh!!) though.
> Do you think this sounds reasonable?
>
>
> datasets:
>   hive:
>     transactions:
>       uri: /user/somepath
>       format: parquet
>       database: transations_daily
>       table: transx
>
>     second_transactions:
>       uri: /seconduser/somepath
>       format: avro
>       database: transations_monthly
>       table: avro_table
>   file:
>     users:
>       uri: s3://filestore
>       format: parquet
>       mode: overwrite
>
>
>
> Cheers,
> Arun
>
>
> On Tue, Jan 29, 2019 at 1:45 PM Nadav Har Tzvi <na...@gmail.com>
> wrote:
>
>> Hey Arun,
>>
>> I kinda feel like the datastores yaml is somewhat obscure. I propose the
>> following structure.
>>
>> Instead of
>>
>> datasets:
>>   hive:
>>     - key: transactions
>>       uri: /user/somepath
>>       format: parquet
>>       database: transations_daily
>>       table: transx
>>
>>     - key: second_transactions
>>       uri: /seconduser/somepath
>>       format: avro
>>       database: transations_monthly
>>       table: avro_table
>>   file:
>>     - key: users
>>       uri: s3://filestore
>>       format: parquet
>>       mode: overwrite
>>
>> I would have
>>
>> datasets:
>>   - key: transactions
>>     uri: /user/somepath
>>     format: parquet
>>     database: transations_daily
>>     table: transx
>>     type: hive
>>   - key: second_transactions
>>     uri: /seconduser/somepath
>>     format: avro
>>     database: transations_monthly
>>     table: avro_table
>>     type: hive
>>   - key: users
>>     uri: s3://filestore
>>     format: parquet
>>     mode: overwrite
>>     type: file
>>
>> In my opinion it is more straightforward and uniform. I think it is also
>> more straightforward code-wise.
>> What do you think?
>>
>> Cheers,
>> Nadav
>>
>>
>>
>> On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski <ya...@shinto.io> wrote:
>>
>> > Hi Arun,
>> >
>> > I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
>> > <na...@gmail.com> should at least review as you both need to
>> > maintain compatible APIs.
>> >
>> > Cheers,
>> > Yaniv
>> >
>> > On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com>
>> wrote:
>> >
>> >> Hi Guy, Yaniv and Nadiv,
>> >>
>> >> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
>> >> captures part of the issue - the datasets.yaml, ConfigManager and the
>> >> testcases. The Integration with the AmaContext is yet to be done but I
>> >> would like to get your thoughts on the implementation.
>> >>
>> >> Guy - Would it be okay if you could help throw some light on the syntax
>> >> and
>> >> the idiomatic part of Kotlin itself. Newbie here.
>> >>
>> >> Cheers,
>> >> Arun
>> >>
>> >> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <jira@apache.org
>> >
>> >> wrote:
>> >>
>> >> > Yaniv Rodenski created AMATERASU-52:
>> >> > ---------------------------------------
>> >> >
>> >> >              Summary: Implement AmaContext.datastores
>> >> >                  Key: AMATERASU-52
>> >> >                  URL:
>> >> https://issues.apache.org/jira/browse/AMATERASU-52
>> >> >              Project: AMATERASU
>> >> >           Issue Type: Task
>> >> >             Reporter: Yaniv Rodenski
>> >> >             Assignee: Arun Manivannan
>> >> >              Fix For: 0.2.1-incubating
>> >> >
>> >> >
>> >> > AmaContext.datastores should contain the data from datastores.yaml
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > This message was sent by Atlassian JIRA
>> >> > (v7.6.3#76005)
>> >> >
>> >>
>> >
>> >
>> > --
>> > Yaniv Rodenski
>> >
>> > +61 477 778 405
>> > yaniv@shinto.io
>> >
>> >
>>
>

Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

Posted by Arun Manivannan <ar...@arunma.com>.
Makes sense, Nadav. I have been toying with the idea of having the
structure like this. I am trying to make it work on konf (argggh!!) though.
Do you think this sounds reasonable?


datasets:
  hive:
    transactions:
      uri: /user/somepath
      format: parquet
      database: transations_daily
      table: transx

    second_transactions:
      uri: /seconduser/somepath
      format: avro
      database: transations_monthly
      table: avro_table
  file:
    users:
      uri: s3://filestore
      format: parquet
      mode: overwrite



Cheers,
Arun


On Tue, Jan 29, 2019 at 1:45 PM Nadav Har Tzvi <na...@gmail.com>
wrote:

> Hey Arun,
>
> I kinda feel like the datastores yaml is somewhat obscure. I propose the
> following structure.
>
> Instead of
>
> datasets:
>   hive:
>     - key: transactions
>       uri: /user/somepath
>       format: parquet
>       database: transations_daily
>       table: transx
>
>     - key: second_transactions
>       uri: /seconduser/somepath
>       format: avro
>       database: transations_monthly
>       table: avro_table
>   file:
>     - key: users
>       uri: s3://filestore
>       format: parquet
>       mode: overwrite
>
> I would have
>
> datasets:
>   - key: transactions
>     uri: /user/somepath
>     format: parquet
>     database: transations_daily
>     table: transx
>     type: hive
>   - key: second_transactions
>     uri: /seconduser/somepath
>     format: avro
>     database: transations_monthly
>     table: avro_table
>     type: hive
>   - key: users
>     uri: s3://filestore
>     format: parquet
>     mode: overwrite
>     type: file
>
> In my opinion it is more straightforward and uniform. I think it is also
> more straightforward code-wise.
> What do you think?
>
> Cheers,
> Nadav
>
>
>
> On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski <ya...@shinto.io> wrote:
>
> > Hi Arun,
> >
> > I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
> > <na...@gmail.com> should at least review as you both need to
> > maintain compatible APIs.
> >
> > Cheers,
> > Yaniv
> >
> > On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com>
> wrote:
> >
> >> Hi Guy, Yaniv and Nadiv,
> >>
> >> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
> >> captures part of the issue - the datasets.yaml, ConfigManager and the
> >> testcases. The Integration with the AmaContext is yet to be done but I
> >> would like to get your thoughts on the implementation.
> >>
> >> Guy - Would it be okay if you could help throw some light on the syntax
> >> and
> >> the idiomatic part of Kotlin itself. Newbie here.
> >>
> >> Cheers,
> >> Arun
> >>
> >> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
> >> wrote:
> >>
> >> > Yaniv Rodenski created AMATERASU-52:
> >> > ---------------------------------------
> >> >
> >> >              Summary: Implement AmaContext.datastores
> >> >                  Key: AMATERASU-52
> >> >                  URL:
> >> https://issues.apache.org/jira/browse/AMATERASU-52
> >> >              Project: AMATERASU
> >> >           Issue Type: Task
> >> >             Reporter: Yaniv Rodenski
> >> >             Assignee: Arun Manivannan
> >> >              Fix For: 0.2.1-incubating
> >> >
> >> >
> >> > AmaContext.datastores should contain the data from datastores.yaml
> >> >
> >> >
> >> >
> >> > --
> >> > This message was sent by Atlassian JIRA
> >> > (v7.6.3#76005)
> >> >
> >>
> >
> >
> > --
> > Yaniv Rodenski
> >
> > +61 477 778 405
> > yaniv@shinto.io
> >
> >
>

Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

Posted by Nadav Har Tzvi <na...@gmail.com>.
Hey Arun,

I kinda feel like the datastores yaml is somewhat obscure. I propose the
following structure.

Instead of

datasets:
  hive:
    - key: transactions
      uri: /user/somepath
      format: parquet
      database: transations_daily
      table: transx

    - key: second_transactions
      uri: /seconduser/somepath
      format: avro
      database: transations_monthly
      table: avro_table
  file:
    - key: users
      uri: s3://filestore
      format: parquet
      mode: overwrite

I would have

datasets:
  - key: transactions
    uri: /user/somepath
    format: parquet
    database: transations_daily
    table: transx
    type: hive
  - key: second_transactions
    uri: /seconduser/somepath
    format: avro
    database: transations_monthly
    table: avro_table
    type: hive
  - key: users
    uri: s3://filestore
    format: parquet
    mode: overwrite
    type: file

In my opinion it is more straightforward and uniform. I think it is also
more straightforward code-wise.
What do you think?

Cheers,
Nadav



On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski <ya...@shinto.io> wrote:

> Hi Arun,
>
> I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
> <na...@gmail.com> should at least review as you both need to
> maintain compatible APIs.
>
> Cheers,
> Yaniv
>
> On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com> wrote:
>
>> Hi Guy, Yaniv and Nadiv,
>>
>> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
>> captures part of the issue - the datasets.yaml, ConfigManager and the
>> testcases. The Integration with the AmaContext is yet to be done but I
>> would like to get your thoughts on the implementation.
>>
>> Guy - Would it be okay if you could help throw some light on the syntax
>> and
>> the idiomatic part of Kotlin itself. Newbie here.
>>
>> Cheers,
>> Arun
>>
>> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
>> wrote:
>>
>> > Yaniv Rodenski created AMATERASU-52:
>> > ---------------------------------------
>> >
>> >              Summary: Implement AmaContext.datastores
>> >                  Key: AMATERASU-52
>> >                  URL:
>> https://issues.apache.org/jira/browse/AMATERASU-52
>> >              Project: AMATERASU
>> >           Issue Type: Task
>> >             Reporter: Yaniv Rodenski
>> >             Assignee: Arun Manivannan
>> >              Fix For: 0.2.1-incubating
>> >
>> >
>> > AmaContext.datastores should contain the data from datastores.yaml
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v7.6.3#76005)
>> >
>>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> yaniv@shinto.io
>
>

Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores

Posted by Yaniv Rodenski <ya...@shinto.io>.
Hi Arun,

I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
<na...@gmail.com> should at least review as you both need to
maintain compatible APIs.

Cheers,
Yaniv

On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com> wrote:

> Hi Guy, Yaniv and Nadiv,
>
> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
> captures part of the issue - the datasets.yaml, ConfigManager and the
> testcases. The Integration with the AmaContext is yet to be done but I
> would like to get your thoughts on the implementation.
>
> Guy - Would it be okay if you could help throw some light on the syntax and
> the idiomatic part of Kotlin itself. Newbie here.
>
> Cheers,
> Arun
>
> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
> wrote:
>
> > Yaniv Rodenski created AMATERASU-52:
> > ---------------------------------------
> >
> >              Summary: Implement AmaContext.datastores
> >                  Key: AMATERASU-52
> >                  URL: https://issues.apache.org/jira/browse/AMATERASU-52
> >              Project: AMATERASU
> >           Issue Type: Task
> >             Reporter: Yaniv Rodenski
> >             Assignee: Arun Manivannan
> >              Fix For: 0.2.1-incubating
> >
> >
> > AmaContext.datastores should contain the data from datastores.yaml
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> >
>


-- 
Yaniv Rodenski

+61 477 778 405
yaniv@shinto.io