You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@amaterasu.apache.org by Arun Manivannan <ar...@arunma.com> on 2019/01/13 11:16:27 UTC
Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores
Hi Guy, Yaniv and Nadiv,
This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
captures part of the issue - the datasets.yaml, ConfigManager and the
testcases. The Integration with the AmaContext is yet to be done but I
would like to get your thoughts on the implementation.
Guy - Would it be okay if you could help throw some light on the syntax and
the idiomatic part of Kotlin itself. Newbie here.
Cheers,
Arun
On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
wrote:
> Yaniv Rodenski created AMATERASU-52:
> ---------------------------------------
>
> Summary: Implement AmaContext.datastores
> Key: AMATERASU-52
> URL: https://issues.apache.org/jira/browse/AMATERASU-52
> Project: AMATERASU
> Issue Type: Task
> Reporter: Yaniv Rodenski
> Assignee: Arun Manivannan
> Fix For: 0.2.1-incubating
>
>
> AmaContext.datastores should contain the data from datastores.yaml
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>
Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores
Posted by Arun Manivannan <ar...@arunma.com>.
Hi,
I realised that making data classes for the config may not be the right
approach considering we won't be able to know all the properties in
advance. Would you consider a Map instead?
I have updated the PR to reflect this proposal. The datasets.yaml sticks
to the format that I mentioned in yesterday's mail. Please have a look and
let me know if this works.
Regards,
Arun
On Tue, Jan 29, 2019 at 10:05 PM Arun Manivannan <ar...@arunma.com> wrote:
> Makes sense, Nadav. I have been toying with the idea of having the
> structure like this. I am trying to make it work on konf (argggh!!) though.
> Do you think this sounds reasonable?
>
>
> datasets:
> hive:
> transactions:
> uri: /user/somepath
> format: parquet
> database: transations_daily
> table: transx
>
> second_transactions:
> uri: /seconduser/somepath
> format: avro
> database: transations_monthly
> table: avro_table
> file:
> users:
> uri: s3://filestore
> format: parquet
> mode: overwrite
>
>
>
> Cheers,
> Arun
>
>
> On Tue, Jan 29, 2019 at 1:45 PM Nadav Har Tzvi <na...@gmail.com>
> wrote:
>
>> Hey Arun,
>>
>> I kinda feel like the datastores yaml is somewhat obscure. I propose the
>> following structure.
>>
>> Instead of
>>
>> datasets:
>> hive:
>> - key: transactions
>> uri: /user/somepath
>> format: parquet
>> database: transations_daily
>> table: transx
>>
>> - key: second_transactions
>> uri: /seconduser/somepath
>> format: avro
>> database: transations_monthly
>> table: avro_table
>> file:
>> - key: users
>> uri: s3://filestore
>> format: parquet
>> mode: overwrite
>>
>> I would have
>>
>> datasets:
>> - key: transactions
>> uri: /user/somepath
>> format: parquet
>> database: transations_daily
>> table: transx
>> type: hive
>> - key: second_transactions
>> uri: /seconduser/somepath
>> format: avro
>> database: transations_monthly
>> table: avro_table
>> type: hive
>> - key: users
>> uri: s3://filestore
>> format: parquet
>> mode: overwrite
>> type: file
>>
>> In my opinion it is more straightforward and uniform. I think it is also
>> more straightforward code-wise.
>> What do you think?
>>
>> Cheers,
>> Nadav
>>
>>
>>
>> On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski <ya...@shinto.io> wrote:
>>
>> > Hi Arun,
>> >
>> > I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
>> > <na...@gmail.com> should at least review as you both need to
>> > maintain compatible APIs.
>> >
>> > Cheers,
>> > Yaniv
>> >
>> > On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com>
>> wrote:
>> >
>> >> Hi Guy, Yaniv and Nadiv,
>> >>
>> >> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
>> >> captures part of the issue - the datasets.yaml, ConfigManager and the
>> >> testcases. The Integration with the AmaContext is yet to be done but I
>> >> would like to get your thoughts on the implementation.
>> >>
>> >> Guy - Would it be okay if you could help throw some light on the syntax
>> >> and
>> >> the idiomatic part of Kotlin itself. Newbie here.
>> >>
>> >> Cheers,
>> >> Arun
>> >>
>> >> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <jira@apache.org
>> >
>> >> wrote:
>> >>
>> >> > Yaniv Rodenski created AMATERASU-52:
>> >> > ---------------------------------------
>> >> >
>> >> > Summary: Implement AmaContext.datastores
>> >> > Key: AMATERASU-52
>> >> > URL:
>> >> https://issues.apache.org/jira/browse/AMATERASU-52
>> >> > Project: AMATERASU
>> >> > Issue Type: Task
>> >> > Reporter: Yaniv Rodenski
>> >> > Assignee: Arun Manivannan
>> >> > Fix For: 0.2.1-incubating
>> >> >
>> >> >
>> >> > AmaContext.datastores should contain the data from datastores.yaml
>> >> >
>> >> >
>> >> >
>> >> > --
>> >> > This message was sent by Atlassian JIRA
>> >> > (v7.6.3#76005)
>> >> >
>> >>
>> >
>> >
>> > --
>> > Yaniv Rodenski
>> >
>> > +61 477 778 405
>> > yaniv@shinto.io
>> >
>> >
>>
>
Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores
Posted by Arun Manivannan <ar...@arunma.com>.
Makes sense, Nadav. I have been toying with the idea of having the
structure like this. I am trying to make it work on konf (argggh!!) though.
Do you think this sounds reasonable?
datasets:
hive:
transactions:
uri: /user/somepath
format: parquet
database: transations_daily
table: transx
second_transactions:
uri: /seconduser/somepath
format: avro
database: transations_monthly
table: avro_table
file:
users:
uri: s3://filestore
format: parquet
mode: overwrite
Cheers,
Arun
On Tue, Jan 29, 2019 at 1:45 PM Nadav Har Tzvi <na...@gmail.com>
wrote:
> Hey Arun,
>
> I kinda feel like the datastores yaml is somewhat obscure. I propose the
> following structure.
>
> Instead of
>
> datasets:
> hive:
> - key: transactions
> uri: /user/somepath
> format: parquet
> database: transations_daily
> table: transx
>
> - key: second_transactions
> uri: /seconduser/somepath
> format: avro
> database: transations_monthly
> table: avro_table
> file:
> - key: users
> uri: s3://filestore
> format: parquet
> mode: overwrite
>
> I would have
>
> datasets:
> - key: transactions
> uri: /user/somepath
> format: parquet
> database: transations_daily
> table: transx
> type: hive
> - key: second_transactions
> uri: /seconduser/somepath
> format: avro
> database: transations_monthly
> table: avro_table
> type: hive
> - key: users
> uri: s3://filestore
> format: parquet
> mode: overwrite
> type: file
>
> In my opinion it is more straightforward and uniform. I think it is also
> more straightforward code-wise.
> What do you think?
>
> Cheers,
> Nadav
>
>
>
> On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski <ya...@shinto.io> wrote:
>
> > Hi Arun,
> >
> > I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
> > <na...@gmail.com> should at least review as you both need to
> > maintain compatible APIs.
> >
> > Cheers,
> > Yaniv
> >
> > On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com>
> wrote:
> >
> >> Hi Guy, Yaniv and Nadiv,
> >>
> >> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
> >> captures part of the issue - the datasets.yaml, ConfigManager and the
> >> testcases. The Integration with the AmaContext is yet to be done but I
> >> would like to get your thoughts on the implementation.
> >>
> >> Guy - Would it be okay if you could help throw some light on the syntax
> >> and
> >> the idiomatic part of Kotlin itself. Newbie here.
> >>
> >> Cheers,
> >> Arun
> >>
> >> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
> >> wrote:
> >>
> >> > Yaniv Rodenski created AMATERASU-52:
> >> > ---------------------------------------
> >> >
> >> > Summary: Implement AmaContext.datastores
> >> > Key: AMATERASU-52
> >> > URL:
> >> https://issues.apache.org/jira/browse/AMATERASU-52
> >> > Project: AMATERASU
> >> > Issue Type: Task
> >> > Reporter: Yaniv Rodenski
> >> > Assignee: Arun Manivannan
> >> > Fix For: 0.2.1-incubating
> >> >
> >> >
> >> > AmaContext.datastores should contain the data from datastores.yaml
> >> >
> >> >
> >> >
> >> > --
> >> > This message was sent by Atlassian JIRA
> >> > (v7.6.3#76005)
> >> >
> >>
> >
> >
> > --
> > Yaniv Rodenski
> >
> > +61 477 778 405
> > yaniv@shinto.io
> >
> >
>
Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores
Posted by Nadav Har Tzvi <na...@gmail.com>.
Hey Arun,
I kinda feel like the datastores yaml is somewhat obscure. I propose the
following structure.
Instead of
datasets:
hive:
- key: transactions
uri: /user/somepath
format: parquet
database: transations_daily
table: transx
- key: second_transactions
uri: /seconduser/somepath
format: avro
database: transations_monthly
table: avro_table
file:
- key: users
uri: s3://filestore
format: parquet
mode: overwrite
I would have
datasets:
- key: transactions
uri: /user/somepath
format: parquet
database: transations_daily
table: transx
type: hive
- key: second_transactions
uri: /seconduser/somepath
format: avro
database: transations_monthly
table: avro_table
type: hive
- key: users
uri: s3://filestore
format: parquet
mode: overwrite
type: file
In my opinion it is more straightforward and uniform. I think it is also
more straightforward code-wise.
What do you think?
Cheers,
Nadav
On Mon, 14 Jan 2019 at 00:57, Yaniv Rodenski <ya...@shinto.io> wrote:
> Hi Arun,
>
> I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
> <na...@gmail.com> should at least review as you both need to
> maintain compatible APIs.
>
> Cheers,
> Yaniv
>
> On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com> wrote:
>
>> Hi Guy, Yaniv and Nadiv,
>>
>> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
>> captures part of the issue - the datasets.yaml, ConfigManager and the
>> testcases. The Integration with the AmaContext is yet to be done but I
>> would like to get your thoughts on the implementation.
>>
>> Guy - Would it be okay if you could help throw some light on the syntax
>> and
>> the idiomatic part of Kotlin itself. Newbie here.
>>
>> Cheers,
>> Arun
>>
>> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
>> wrote:
>>
>> > Yaniv Rodenski created AMATERASU-52:
>> > ---------------------------------------
>> >
>> > Summary: Implement AmaContext.datastores
>> > Key: AMATERASU-52
>> > URL:
>> https://issues.apache.org/jira/browse/AMATERASU-52
>> > Project: AMATERASU
>> > Issue Type: Task
>> > Reporter: Yaniv Rodenski
>> > Assignee: Arun Manivannan
>> > Fix For: 0.2.1-incubating
>> >
>> >
>> > AmaContext.datastores should contain the data from datastores.yaml
>> >
>> >
>> >
>> > --
>> > This message was sent by Atlassian JIRA
>> > (v7.6.3#76005)
>> >
>>
>
>
> --
> Yaniv Rodenski
>
> +61 477 778 405
> yaniv@shinto.io
>
>
Re: [jira] [Created] (AMATERASU-52) Implement AmaContext.datastores
Posted by Yaniv Rodenski <ya...@shinto.io>.
Hi Arun,
I've added my comments to the PR, but good call, I agree @Nadav Har Tzvi
<na...@gmail.com> should at least review as you both need to
maintain compatible APIs.
Cheers,
Yaniv
On Sun, Jan 13, 2019 at 10:21 PM Arun Manivannan <ar...@arunma.com> wrote:
> Hi Guy, Yaniv and Nadiv,
>
> This PR <https://github.com/apache/incubator-amaterasu/pull/39> just
> captures part of the issue - the datasets.yaml, ConfigManager and the
> testcases. The Integration with the AmaContext is yet to be done but I
> would like to get your thoughts on the implementation.
>
> Guy - Would it be okay if you could help throw some light on the syntax and
> the idiomatic part of Kotlin itself. Newbie here.
>
> Cheers,
> Arun
>
> On Fri, Oct 12, 2018 at 7:15 PM Yaniv Rodenski (JIRA) <ji...@apache.org>
> wrote:
>
> > Yaniv Rodenski created AMATERASU-52:
> > ---------------------------------------
> >
> > Summary: Implement AmaContext.datastores
> > Key: AMATERASU-52
> > URL: https://issues.apache.org/jira/browse/AMATERASU-52
> > Project: AMATERASU
> > Issue Type: Task
> > Reporter: Yaniv Rodenski
> > Assignee: Arun Manivannan
> > Fix For: 0.2.1-incubating
> >
> >
> > AmaContext.datastores should contain the data from datastores.yaml
> >
> >
> >
> > --
> > This message was sent by Atlassian JIRA
> > (v7.6.3#76005)
> >
>
--
Yaniv Rodenski
+61 477 778 405
yaniv@shinto.io