You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Balaji Varadarajan <v....@ymail.com.INVALID> on 2020/06/01 05:54:54 UTC

Re: How to extend the timeline server schema to accommodate business metadata

 Hi Mario,
Timeline Server was designed to serve hudi metadata for Hudi writers and readers.  it may not be suitable to serve arbitrary data. But, it is an interesting thought. Can you elaborate more on what kind of business metadata are you looking. Is this something you are planning to store in commit files ? 
Balaji.V

    On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <de...@gmail.com> wrote:  
 
 I see a need for extending the current timeline server schema so that a flexible model could be achieved in order to accommodate business metadata.

let me know if that makes sense to anyone here...

Regards,

Mario.
  

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
great ! will definitely follow up...

Em qua., 10 de jun. de 2020 às 19:28, Bhavani Sudha <bh...@gmail.com>
escreveu:

> Ah okay. Thanks for letting us know. I created a Jira here to capture this
> thread - https://issues.apache.org/jira/browse/HUDI-1020. Feel free to add
> to the jira.
>
> Thanks,
> Sudha
>
> On Wed, Jun 10, 2020 at 11:03 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
> > unfortunately ... but restrict myself to be an enthusiastic as my current
> > employer applies some severe restrictions.
> >
> > I would be more than happy to contribute by specifying the requirements
> but
> > from a code developer perspective I will have to pass that for now...
> >
> > Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha <
> > bhavanisudhas@gmail.com>
> > escreveu:
> >
> > > Definitely. I was trying to add you to the Hudi contributors so you can
> > > create a Jira . For that I need a jira id. If you have not already
> signed
> > > up, please sign up for Jira and let me know your jira id.
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera <de...@gmail.com>
> > > wrote:
> > >
> > > > Hi Sudha,
> > > >
> > > > Can you or Vinoth help me with this? How can we create a JIRA for
> that
> > ?
> > > >
> > > > I can collaborate bringing the description and definition of done.
> > > >
> > > > Thanks,
> > > >
> > > > Mario.
> > > >
> > > > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, <bh...@gmail.com>
> > > wrote:
> > > >
> > > > > Hi Mario,
> > > > >
> > > > > Can you please share your jira id ?
> > > > >
> > > > > Thanks,
> > > > > Sudha
> > > > >
> > > > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <
> desavera@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > hey Vinoth, I noticed you added this suggestion to the weekly log
> > ..
> > > > that
> > > > > > is great ! just let me know if I am able to create a JIRA , as I
> > > tried
> > > > to
> > > > > > go to HUDI project in Apache and did not find a way to do it. I
> can
> > > > bring
> > > > > > in a good description of the benefits etc...
> > > > > >
> > > > > > thanks, Mario.
> > > > > >
> > > > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> > > vinoth@apache.org
> > > > >
> > > > > > escreveu:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > We can probably make a new JIRA. Not sure if there is an
> existing
> > > > JIRA
> > > > > to
> > > > > > > re-use.
> > > > > > > The Following modules are good to look at.
> > > > > > >
> > > > > > > hudi-timeline-service
> > > > > > > packaging/hudi-timeline-server-bundle
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> > > desavera@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Sorry Vinoth for not being clear... If that is a work in
> > progress
> > > > > would
> > > > > > > you
> > > > > > > > have a jira I could follow up and contribute to ? If not ,
> what
> > > is
> > > > > the
> > > > > > > > module name you suggest me looking at?
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Mario.
> > > > > > > >
> > > > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vinoth@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Sorry did not understand the last part. :) are you
> suggesting
> > > we
> > > > > > > create a
> > > > > > > > > jira
> > > > > > > > >
> > > > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > > > desavera@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > That sounds great ! Will check that and keep an eye on
> the
> > > long
> > > > > > > running
> > > > > > > > > > server approach... once it gets a ticket I could watch
> for
> > > just
> > > > > let
> > > > > > > me
> > > > > > > > > know
> > > > > > > > > > please.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <
> > vinoth@apache.org
> > > >
> > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Mario,
> > > > > > > > > > >
> > > > > > > > > > > We actually started with the idea of making the
> timeline
> > > > > server,
> > > > > > a
> > > > > > > > long
> > > > > > > > > > > running service.  We have a module if you notice that
> > > builds
> > > > > our
> > > > > > a
> > > > > > > > > bundle
> > > > > > > > > > > that you could deploy. May be you can play with it and
> > see
> > > if
> > > > > > that
> > > > > > > > > sounds
> > > > > > > > > > > interesting to you. It will definitely have some rough
> > > edges
> > > > > > given
> > > > > > > > it’s
> > > > > > > > > > not
> > > > > > > > > > > been widely used.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > Vinoth
> > > > > > > > > > >
> > > > > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > > > > desavera@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> > > > sometime
> > > > > > > > > thinking
> > > > > > > > > > > over
> > > > > > > > > > > > another possibility which would be externalising the
> > Hudi
> > > > > > > timeline
> > > > > > > > > > > service
> > > > > > > > > > > > itself to an external server holding both operational
> > (ie
> > > > > Hudi)
> > > > > > > and
> > > > > > > > > > > > business metadata.
> > > > > > > > > > > >
> > > > > > > > > > > > would you guys have any opinion on that ? would that
> be
> > > > easy
> > > > > > as I
> > > > > > > > do
> > > > > > > > > > not
> > > > > > > > > > > > seem to see a way yet , except reading about RocksDB
> > but
> > > > that
> > > > > > is
> > > > > > > > > still
> > > > > > > > > > > not
> > > > > > > > > > > > quite clear.
> > > > > > > > > > > >
> > > > > > > > > > > > best regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Mario.
> > > > > > > > > > > >
> > > > > > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Mario,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks for the detailed explanation. Hudi already
> > > allows
> > > > > > extra
> > > > > > > > > > metadata
> > > > > > > > > > > > to
> > > > > > > > > > > > > be written atomically with each commit i.e write
> > > > operation.
> > > > > > In
> > > > > > > > > fact,
> > > > > > > > > > > that
> > > > > > > > > > > > > is how we track checkpoints for our delta streamer
> > > tool..
> > > > > It
> > > > > > > may
> > > > > > > > > not
> > > > > > > > > > > > solve
> > > > > > > > > > > > > the need for querying the data together with this
> > > > > > information.
> > > > > > > > but
> > > > > > > > > > > gives
> > > > > > > > > > > > > you ability to do some basic tagging.. if thats
> > useful
> > > > > > > > > > > > >
> > > > > > > > > > > > > >>If we enable the timeline service metadata model
> to
> > > be
> > > > > > > extended
> > > > > > > > > we
> > > > > > > > > > > > could
> > > > > > > > > > > > > use the service instance itself to support
> > specialised
> > > > > > queries
> > > > > > > > that
> > > > > > > > > > > > involve
> > > > > > > > > > > > > business qualifiers in order to return a proper set
> > of
> > > > > > metadata
> > > > > > > > > > > pointing
> > > > > > > > > > > > to
> > > > > > > > > > > > > the related commits
> > > > > > > > > > > > >
> > > > > > > > > > > > > This is a good idea actually.. There is another
> > active
> > > > > > discuss
> > > > > > > > > thread
> > > > > > > > > > > on
> > > > > > > > > > > > > making the metadata queryable.. there is also
> > > > > > > > > > > > > https://issues.apache.org/jira/browse/HUDI-309
> which
> > > we
> > > > > > paused
> > > > > > > > for
> > > > > > > > > > > now..
> > > > > > > > > > > > > But that's more in line with what you are thinking
> > IIUC
> > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > vinoth
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > > > > > > desavera@gmail.com>
> > > > > > > > > > > > > wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi Balaji,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > business metadata are all types of info related
> to
> > > the
> > > > > > > business
> > > > > > > > > > where
> > > > > > > > > > > > the
> > > > > > > > > > > > > > Hudi solution is being used... from a COB (ie
> close
> > > of
> > > > > > > business
> > > > > > > > > > date)
> > > > > > > > > > > > > > related to that commit to any qualifier related
> to
> > > that
> > > > > > > commit
> > > > > > > > > that
> > > > > > > > > > > > might
> > > > > > > > > > > > > > be useful to be associated with that commit id.
> If
> > we
> > > > > > enable
> > > > > > > > the
> > > > > > > > > > > > timeline
> > > > > > > > > > > > > > service metadata model to be extended we could
> use
> > > the
> > > > > > > service
> > > > > > > > > > > instance
> > > > > > > > > > > > > > itself to support specialised queries that
> involve
> > > > > business
> > > > > > > > > > > qualifiers
> > > > > > > > > > > > in
> > > > > > > > > > > > > > order to return a proper set of metadata pointing
> > to
> > > > the
> > > > > > > > related
> > > > > > > > > > > > commits
> > > > > > > > > > > > > > that answer a business query.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > if we do not have that flexibility we might end
> up
> > > > > > creating a
> > > > > > > > > > > external
> > > > > > > > > > > > > > transaction log and then comes the hard task to
> > make
> > > > that
> > > > > > > > service
> > > > > > > > > > in
> > > > > > > > > > > > sync
> > > > > > > > > > > > > > to the timeline service.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > let me know if that makes sense to you,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mario.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji
> > > Varadarajan
> > > > > > > > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  Hi Mario,
> > > > > > > > > > > > > > > Timeline Server was designed to serve hudi
> > metadata
> > > > for
> > > > > > > Hudi
> > > > > > > > > > > writers
> > > > > > > > > > > > > and
> > > > > > > > > > > > > > > readers.  it may not be suitable to serve
> > arbitrary
> > > > > data.
> > > > > > > > But,
> > > > > > > > > it
> > > > > > > > > > > is
> > > > > > > > > > > > an
> > > > > > > > > > > > > > > interesting thought. Can you elaborate more on
> > what
> > > > > kind
> > > > > > of
> > > > > > > > > > > business
> > > > > > > > > > > > > > > metadata are you looking. Is this something you
> > are
> > > > > > > planning
> > > > > > > > to
> > > > > > > > > > > store
> > > > > > > > > > > > > in
> > > > > > > > > > > > > > > commit files ?
> > > > > > > > > > > > > > > Balaji.V
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT,
> > Mario
> > > > de
> > > > > Sá
> > > > > > > > Vera
> > > > > > > > > <
> > > > > > > > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > >  I see a need for extending the current
> timeline
> > > > server
> > > > > > > > schema
> > > > > > > > > so
> > > > > > > > > > > > that
> > > > > > > > > > > > > a
> > > > > > > > > > > > > > > flexible model could be achieved in order to
> > > > > accommodate
> > > > > > > > > business
> > > > > > > > > > > > > > metadata.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > let me know if that makes sense to anyone
> here...
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Mario.
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Bhavani Sudha <bh...@gmail.com>.
Ah okay. Thanks for letting us know. I created a Jira here to capture this
thread - https://issues.apache.org/jira/browse/HUDI-1020. Feel free to add
to the jira.

Thanks,
Sudha

On Wed, Jun 10, 2020 at 11:03 AM Mario de Sá Vera <de...@gmail.com>
wrote:

> Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
> unfortunately ... but restrict myself to be an enthusiastic as my current
> employer applies some severe restrictions.
>
> I would be more than happy to contribute by specifying the requirements but
> from a code developer perspective I will have to pass that for now...
>
> Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha <
> bhavanisudhas@gmail.com>
> escreveu:
>
> > Definitely. I was trying to add you to the Hudi contributors so you can
> > create a Jira . For that I need a jira id. If you have not already signed
> > up, please sign up for Jira and let me know your jira id.
> >
> > Thanks,
> > Sudha
> >
> > On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera <de...@gmail.com>
> > wrote:
> >
> > > Hi Sudha,
> > >
> > > Can you or Vinoth help me with this? How can we create a JIRA for that
> ?
> > >
> > > I can collaborate bringing the description and definition of done.
> > >
> > > Thanks,
> > >
> > > Mario.
> > >
> > > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, <bh...@gmail.com>
> > wrote:
> > >
> > > > Hi Mario,
> > > >
> > > > Can you please share your jira id ?
> > > >
> > > > Thanks,
> > > > Sudha
> > > >
> > > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <de...@gmail.com>
> > > > wrote:
> > > >
> > > > > hey Vinoth, I noticed you added this suggestion to the weekly log
> ..
> > > that
> > > > > is great ! just let me know if I am able to create a JIRA , as I
> > tried
> > > to
> > > > > go to HUDI project in Apache and did not find a way to do it. I can
> > > bring
> > > > > in a good description of the benefits etc...
> > > > >
> > > > > thanks, Mario.
> > > > >
> > > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> > vinoth@apache.org
> > > >
> > > > > escreveu:
> > > > >
> > > > > > Hi,
> > > > > >
> > > > > > We can probably make a new JIRA. Not sure if there is an existing
> > > JIRA
> > > > to
> > > > > > re-use.
> > > > > > The Following modules are good to look at.
> > > > > >
> > > > > > hudi-timeline-service
> > > > > > packaging/hudi-timeline-server-bundle
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> > desavera@gmail.com
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > Sorry Vinoth for not being clear... If that is a work in
> progress
> > > > would
> > > > > > you
> > > > > > > have a jira I could follow up and contribute to ? If not , what
> > is
> > > > the
> > > > > > > module name you suggest me looking at?
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Sorry did not understand the last part. :) are you suggesting
> > we
> > > > > > create a
> > > > > > > > jira
> > > > > > > >
> > > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > > desavera@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > That sounds great ! Will check that and keep an eye on the
> > long
> > > > > > running
> > > > > > > > > server approach... once it gets a ticket I could watch for
> > just
> > > > let
> > > > > > me
> > > > > > > > know
> > > > > > > > > please.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <
> vinoth@apache.org
> > >
> > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Mario,
> > > > > > > > > >
> > > > > > > > > > We actually started with the idea of making the timeline
> > > > server,
> > > > > a
> > > > > > > long
> > > > > > > > > > running service.  We have a module if you notice that
> > builds
> > > > our
> > > > > a
> > > > > > > > bundle
> > > > > > > > > > that you could deploy. May be you can play with it and
> see
> > if
> > > > > that
> > > > > > > > sounds
> > > > > > > > > > interesting to you. It will definitely have some rough
> > edges
> > > > > given
> > > > > > > it’s
> > > > > > > > > not
> > > > > > > > > > been widely used.
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > Vinoth
> > > > > > > > > >
> > > > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > > > desavera@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> > > sometime
> > > > > > > > thinking
> > > > > > > > > > over
> > > > > > > > > > > another possibility which would be externalising the
> Hudi
> > > > > > timeline
> > > > > > > > > > service
> > > > > > > > > > > itself to an external server holding both operational
> (ie
> > > > Hudi)
> > > > > > and
> > > > > > > > > > > business metadata.
> > > > > > > > > > >
> > > > > > > > > > > would you guys have any opinion on that ? would that be
> > > easy
> > > > > as I
> > > > > > > do
> > > > > > > > > not
> > > > > > > > > > > seem to see a way yet , except reading about RocksDB
> but
> > > that
> > > > > is
> > > > > > > > still
> > > > > > > > > > not
> > > > > > > > > > > quite clear.
> > > > > > > > > > >
> > > > > > > > > > > best regards,
> > > > > > > > > > >
> > > > > > > > > > > Mario.
> > > > > > > > > > >
> > > > > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Mario,
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for the detailed explanation. Hudi already
> > allows
> > > > > extra
> > > > > > > > > metadata
> > > > > > > > > > > to
> > > > > > > > > > > > be written atomically with each commit i.e write
> > > operation.
> > > > > In
> > > > > > > > fact,
> > > > > > > > > > that
> > > > > > > > > > > > is how we track checkpoints for our delta streamer
> > tool..
> > > > It
> > > > > > may
> > > > > > > > not
> > > > > > > > > > > solve
> > > > > > > > > > > > the need for querying the data together with this
> > > > > information.
> > > > > > > but
> > > > > > > > > > gives
> > > > > > > > > > > > you ability to do some basic tagging.. if thats
> useful
> > > > > > > > > > > >
> > > > > > > > > > > > >>If we enable the timeline service metadata model to
> > be
> > > > > > extended
> > > > > > > > we
> > > > > > > > > > > could
> > > > > > > > > > > > use the service instance itself to support
> specialised
> > > > > queries
> > > > > > > that
> > > > > > > > > > > involve
> > > > > > > > > > > > business qualifiers in order to return a proper set
> of
> > > > > metadata
> > > > > > > > > > pointing
> > > > > > > > > > > to
> > > > > > > > > > > > the related commits
> > > > > > > > > > > >
> > > > > > > > > > > > This is a good idea actually.. There is another
> active
> > > > > discuss
> > > > > > > > thread
> > > > > > > > > > on
> > > > > > > > > > > > making the metadata queryable.. there is also
> > > > > > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which
> > we
> > > > > paused
> > > > > > > for
> > > > > > > > > > now..
> > > > > > > > > > > > But that's more in line with what you are thinking
> IIUC
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks
> > > > > > > > > > > > vinoth
> > > > > > > > > > > >
> > > > > > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > > > > > desavera@gmail.com>
> > > > > > > > > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > Hi Balaji,
> > > > > > > > > > > > >
> > > > > > > > > > > > > business metadata are all types of info related to
> > the
> > > > > > business
> > > > > > > > > where
> > > > > > > > > > > the
> > > > > > > > > > > > > Hudi solution is being used... from a COB (ie close
> > of
> > > > > > business
> > > > > > > > > date)
> > > > > > > > > > > > > related to that commit to any qualifier related to
> > that
> > > > > > commit
> > > > > > > > that
> > > > > > > > > > > might
> > > > > > > > > > > > > be useful to be associated with that commit id. If
> we
> > > > > enable
> > > > > > > the
> > > > > > > > > > > timeline
> > > > > > > > > > > > > service metadata model to be extended we could use
> > the
> > > > > > service
> > > > > > > > > > instance
> > > > > > > > > > > > > itself to support specialised queries that involve
> > > > business
> > > > > > > > > > qualifiers
> > > > > > > > > > > in
> > > > > > > > > > > > > order to return a proper set of metadata pointing
> to
> > > the
> > > > > > > related
> > > > > > > > > > > commits
> > > > > > > > > > > > > that answer a business query.
> > > > > > > > > > > > >
> > > > > > > > > > > > > if we do not have that flexibility we might end up
> > > > > creating a
> > > > > > > > > > external
> > > > > > > > > > > > > transaction log and then comes the hard task to
> make
> > > that
> > > > > > > service
> > > > > > > > > in
> > > > > > > > > > > sync
> > > > > > > > > > > > > to the timeline service.
> > > > > > > > > > > > >
> > > > > > > > > > > > > let me know if that makes sense to you,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Mario.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji
> > Varadarajan
> > > > > > > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > > > > > > >
> > > > > > > > > > > > > >  Hi Mario,
> > > > > > > > > > > > > > Timeline Server was designed to serve hudi
> metadata
> > > for
> > > > > > Hudi
> > > > > > > > > > writers
> > > > > > > > > > > > and
> > > > > > > > > > > > > > readers.  it may not be suitable to serve
> arbitrary
> > > > data.
> > > > > > > But,
> > > > > > > > it
> > > > > > > > > > is
> > > > > > > > > > > an
> > > > > > > > > > > > > > interesting thought. Can you elaborate more on
> what
> > > > kind
> > > > > of
> > > > > > > > > > business
> > > > > > > > > > > > > > metadata are you looking. Is this something you
> are
> > > > > > planning
> > > > > > > to
> > > > > > > > > > store
> > > > > > > > > > > > in
> > > > > > > > > > > > > > commit files ?
> > > > > > > > > > > > > > Balaji.V
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT,
> Mario
> > > de
> > > > Sá
> > > > > > > Vera
> > > > > > > > <
> > > > > > > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > >  I see a need for extending the current timeline
> > > server
> > > > > > > schema
> > > > > > > > so
> > > > > > > > > > > that
> > > > > > > > > > > > a
> > > > > > > > > > > > > > flexible model could be achieved in order to
> > > > accommodate
> > > > > > > > business
> > > > > > > > > > > > > metadata.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Regards,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Mario.
> > > > > > > > > > > > > >
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
Sure Sudha, I am afraid I am not allowed to become a Hudi contributor
unfortunately ... but restrict myself to be an enthusiastic as my current
employer applies some severe restrictions.

I would be more than happy to contribute by specifying the requirements but
from a code developer perspective I will have to pass that for now...

Em qua., 10 de jun. de 2020 às 18:40, Bhavani Sudha <bh...@gmail.com>
escreveu:

> Definitely. I was trying to add you to the Hudi contributors so you can
> create a Jira . For that I need a jira id. If you have not already signed
> up, please sign up for Jira and let me know your jira id.
>
> Thanks,
> Sudha
>
> On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > Hi Sudha,
> >
> > Can you or Vinoth help me with this? How can we create a JIRA for that ?
> >
> > I can collaborate bringing the description and definition of done.
> >
> > Thanks,
> >
> > Mario.
> >
> > On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, <bh...@gmail.com>
> wrote:
> >
> > > Hi Mario,
> > >
> > > Can you please share your jira id ?
> > >
> > > Thanks,
> > > Sudha
> > >
> > > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <de...@gmail.com>
> > > wrote:
> > >
> > > > hey Vinoth, I noticed you added this suggestion to the weekly log ..
> > that
> > > > is great ! just let me know if I am able to create a JIRA , as I
> tried
> > to
> > > > go to HUDI project in Apache and did not find a way to do it. I can
> > bring
> > > > in a good description of the benefits etc...
> > > >
> > > > thanks, Mario.
> > > >
> > > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <
> vinoth@apache.org
> > >
> > > > escreveu:
> > > >
> > > > > Hi,
> > > > >
> > > > > We can probably make a new JIRA. Not sure if there is an existing
> > JIRA
> > > to
> > > > > re-use.
> > > > > The Following modules are good to look at.
> > > > >
> > > > > hudi-timeline-service
> > > > > packaging/hudi-timeline-server-bundle
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <
> desavera@gmail.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Sorry Vinoth for not being clear... If that is a work in progress
> > > would
> > > > > you
> > > > > > have a jira I could follow up and contribute to ? If not , what
> is
> > > the
> > > > > > module name you suggest me looking at?
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Mario.
> > > > > >
> > > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Sorry did not understand the last part. :) are you suggesting
> we
> > > > > create a
> > > > > > > jira
> > > > > > >
> > > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > > desavera@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > That sounds great ! Will check that and keep an eye on the
> long
> > > > > running
> > > > > > > > server approach... once it gets a ticket I could watch for
> just
> > > let
> > > > > me
> > > > > > > know
> > > > > > > > please.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vinoth@apache.org
> >
> > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Mario,
> > > > > > > > >
> > > > > > > > > We actually started with the idea of making the timeline
> > > server,
> > > > a
> > > > > > long
> > > > > > > > > running service.  We have a module if you notice that
> builds
> > > our
> > > > a
> > > > > > > bundle
> > > > > > > > > that you could deploy. May be you can play with it and see
> if
> > > > that
> > > > > > > sounds
> > > > > > > > > interesting to you. It will definitely have some rough
> edges
> > > > given
> > > > > > it’s
> > > > > > > > not
> > > > > > > > > been widely used.
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > Vinoth
> > > > > > > > >
> > > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > > desavera@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> > sometime
> > > > > > > thinking
> > > > > > > > > over
> > > > > > > > > > another possibility which would be externalising the Hudi
> > > > > timeline
> > > > > > > > > service
> > > > > > > > > > itself to an external server holding both operational (ie
> > > Hudi)
> > > > > and
> > > > > > > > > > business metadata.
> > > > > > > > > >
> > > > > > > > > > would you guys have any opinion on that ? would that be
> > easy
> > > > as I
> > > > > > do
> > > > > > > > not
> > > > > > > > > > seem to see a way yet , except reading about RocksDB but
> > that
> > > > is
> > > > > > > still
> > > > > > > > > not
> > > > > > > > > > quite clear.
> > > > > > > > > >
> > > > > > > > > > best regards,
> > > > > > > > > >
> > > > > > > > > > Mario.
> > > > > > > > > >
> > > > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > > > > > >
> > > > > > > > > > > Hi Mario,
> > > > > > > > > > >
> > > > > > > > > > > Thanks for the detailed explanation. Hudi already
> allows
> > > > extra
> > > > > > > > metadata
> > > > > > > > > > to
> > > > > > > > > > > be written atomically with each commit i.e write
> > operation.
> > > > In
> > > > > > > fact,
> > > > > > > > > that
> > > > > > > > > > > is how we track checkpoints for our delta streamer
> tool..
> > > It
> > > > > may
> > > > > > > not
> > > > > > > > > > solve
> > > > > > > > > > > the need for querying the data together with this
> > > > information.
> > > > > > but
> > > > > > > > > gives
> > > > > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > > > > >
> > > > > > > > > > > >>If we enable the timeline service metadata model to
> be
> > > > > extended
> > > > > > > we
> > > > > > > > > > could
> > > > > > > > > > > use the service instance itself to support specialised
> > > > queries
> > > > > > that
> > > > > > > > > > involve
> > > > > > > > > > > business qualifiers in order to return a proper set of
> > > > metadata
> > > > > > > > > pointing
> > > > > > > > > > to
> > > > > > > > > > > the related commits
> > > > > > > > > > >
> > > > > > > > > > > This is a good idea actually.. There is another active
> > > > discuss
> > > > > > > thread
> > > > > > > > > on
> > > > > > > > > > > making the metadata queryable.. there is also
> > > > > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which
> we
> > > > paused
> > > > > > for
> > > > > > > > > now..
> > > > > > > > > > > But that's more in line with what you are thinking IIUC
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > > vinoth
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > > > > desavera@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > > Hi Balaji,
> > > > > > > > > > > >
> > > > > > > > > > > > business metadata are all types of info related to
> the
> > > > > business
> > > > > > > > where
> > > > > > > > > > the
> > > > > > > > > > > > Hudi solution is being used... from a COB (ie close
> of
> > > > > business
> > > > > > > > date)
> > > > > > > > > > > > related to that commit to any qualifier related to
> that
> > > > > commit
> > > > > > > that
> > > > > > > > > > might
> > > > > > > > > > > > be useful to be associated with that commit id. If we
> > > > enable
> > > > > > the
> > > > > > > > > > timeline
> > > > > > > > > > > > service metadata model to be extended we could use
> the
> > > > > service
> > > > > > > > > instance
> > > > > > > > > > > > itself to support specialised queries that involve
> > > business
> > > > > > > > > qualifiers
> > > > > > > > > > in
> > > > > > > > > > > > order to return a proper set of metadata pointing to
> > the
> > > > > > related
> > > > > > > > > > commits
> > > > > > > > > > > > that answer a business query.
> > > > > > > > > > > >
> > > > > > > > > > > > if we do not have that flexibility we might end up
> > > > creating a
> > > > > > > > > external
> > > > > > > > > > > > transaction log and then comes the hard task to make
> > that
> > > > > > service
> > > > > > > > in
> > > > > > > > > > sync
> > > > > > > > > > > > to the timeline service.
> > > > > > > > > > > >
> > > > > > > > > > > > let me know if that makes sense to you,
> > > > > > > > > > > >
> > > > > > > > > > > > Mario.
> > > > > > > > > > > >
> > > > > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji
> Varadarajan
> > > > > > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > > > > > >
> > > > > > > > > > > > >  Hi Mario,
> > > > > > > > > > > > > Timeline Server was designed to serve hudi metadata
> > for
> > > > > Hudi
> > > > > > > > > writers
> > > > > > > > > > > and
> > > > > > > > > > > > > readers.  it may not be suitable to serve arbitrary
> > > data.
> > > > > > But,
> > > > > > > it
> > > > > > > > > is
> > > > > > > > > > an
> > > > > > > > > > > > > interesting thought. Can you elaborate more on what
> > > kind
> > > > of
> > > > > > > > > business
> > > > > > > > > > > > > metadata are you looking. Is this something you are
> > > > > planning
> > > > > > to
> > > > > > > > > store
> > > > > > > > > > > in
> > > > > > > > > > > > > commit files ?
> > > > > > > > > > > > > Balaji.V
> > > > > > > > > > > > >
> > > > > > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario
> > de
> > > Sá
> > > > > > Vera
> > > > > > > <
> > > > > > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > >  I see a need for extending the current timeline
> > server
> > > > > > schema
> > > > > > > so
> > > > > > > > > > that
> > > > > > > > > > > a
> > > > > > > > > > > > > flexible model could be achieved in order to
> > > accommodate
> > > > > > > business
> > > > > > > > > > > > metadata.
> > > > > > > > > > > > >
> > > > > > > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > > > > > > >
> > > > > > > > > > > > > Regards,
> > > > > > > > > > > > >
> > > > > > > > > > > > > Mario.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Bhavani Sudha <bh...@gmail.com>.
Definitely. I was trying to add you to the Hudi contributors so you can
create a Jira . For that I need a jira id. If you have not already signed
up, please sign up for Jira and let me know your jira id.

Thanks,
Sudha

On Wed, Jun 10, 2020 at 12:17 AM Mario de Sá Vera <de...@gmail.com>
wrote:

> Hi Sudha,
>
> Can you or Vinoth help me with this? How can we create a JIRA for that ?
>
> I can collaborate bringing the description and definition of done.
>
> Thanks,
>
> Mario.
>
> On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, <bh...@gmail.com> wrote:
>
> > Hi Mario,
> >
> > Can you please share your jira id ?
> >
> > Thanks,
> > Sudha
> >
> > On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <de...@gmail.com>
> > wrote:
> >
> > > hey Vinoth, I noticed you added this suggestion to the weekly log ..
> that
> > > is great ! just let me know if I am able to create a JIRA , as I tried
> to
> > > go to HUDI project in Apache and did not find a way to do it. I can
> bring
> > > in a good description of the benefits etc...
> > >
> > > thanks, Mario.
> > >
> > > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <vinoth@apache.org
> >
> > > escreveu:
> > >
> > > > Hi,
> > > >
> > > > We can probably make a new JIRA. Not sure if there is an existing
> JIRA
> > to
> > > > re-use.
> > > > The Following modules are good to look at.
> > > >
> > > > hudi-timeline-service
> > > > packaging/hudi-timeline-server-bundle
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <desavera@gmail.com
> >
> > > > wrote:
> > > >
> > > > > Sorry Vinoth for not being clear... If that is a work in progress
> > would
> > > > you
> > > > > have a jira I could follow up and contribute to ? If not , what is
> > the
> > > > > module name you suggest me looking at?
> > > > >
> > > > > Regards,
> > > > >
> > > > > Mario.
> > > > >
> > > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org>
> wrote:
> > > > >
> > > > > > Sorry did not understand the last part. :) are you suggesting we
> > > > create a
> > > > > > jira
> > > > > >
> > > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> > desavera@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > That sounds great ! Will check that and keep an eye on the long
> > > > running
> > > > > > > server approach... once it gets a ticket I could watch for just
> > let
> > > > me
> > > > > > know
> > > > > > > please.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > > >
> > > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org>
> > > wrote:
> > > > > > >
> > > > > > > > Hi Mario,
> > > > > > > >
> > > > > > > > We actually started with the idea of making the timeline
> > server,
> > > a
> > > > > long
> > > > > > > > running service.  We have a module if you notice that builds
> > our
> > > a
> > > > > > bundle
> > > > > > > > that you could deploy. May be you can play with it and see if
> > > that
> > > > > > sounds
> > > > > > > > interesting to you. It will definitely have some rough edges
> > > given
> > > > > it’s
> > > > > > > not
> > > > > > > > been widely used.
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > Vinoth
> > > > > > > >
> > > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > > desavera@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Vinoth, thanks for your comments on this. I spent
> sometime
> > > > > > thinking
> > > > > > > > over
> > > > > > > > > another possibility which would be externalising the Hudi
> > > > timeline
> > > > > > > > service
> > > > > > > > > itself to an external server holding both operational (ie
> > Hudi)
> > > > and
> > > > > > > > > business metadata.
> > > > > > > > >
> > > > > > > > > would you guys have any opinion on that ? would that be
> easy
> > > as I
> > > > > do
> > > > > > > not
> > > > > > > > > seem to see a way yet , except reading about RocksDB but
> that
> > > is
> > > > > > still
> > > > > > > > not
> > > > > > > > > quite clear.
> > > > > > > > >
> > > > > > > > > best regards,
> > > > > > > > >
> > > > > > > > > Mario.
> > > > > > > > >
> > > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > > > > >
> > > > > > > > > > Hi Mario,
> > > > > > > > > >
> > > > > > > > > > Thanks for the detailed explanation. Hudi already allows
> > > extra
> > > > > > > metadata
> > > > > > > > > to
> > > > > > > > > > be written atomically with each commit i.e write
> operation.
> > > In
> > > > > > fact,
> > > > > > > > that
> > > > > > > > > > is how we track checkpoints for our delta streamer tool..
> > It
> > > > may
> > > > > > not
> > > > > > > > > solve
> > > > > > > > > > the need for querying the data together with this
> > > information.
> > > > > but
> > > > > > > > gives
> > > > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > > > >
> > > > > > > > > > >>If we enable the timeline service metadata model to be
> > > > extended
> > > > > > we
> > > > > > > > > could
> > > > > > > > > > use the service instance itself to support specialised
> > > queries
> > > > > that
> > > > > > > > > involve
> > > > > > > > > > business qualifiers in order to return a proper set of
> > > metadata
> > > > > > > > pointing
> > > > > > > > > to
> > > > > > > > > > the related commits
> > > > > > > > > >
> > > > > > > > > > This is a good idea actually.. There is another active
> > > discuss
> > > > > > thread
> > > > > > > > on
> > > > > > > > > > making the metadata queryable.. there is also
> > > > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we
> > > paused
> > > > > for
> > > > > > > > now..
> > > > > > > > > > But that's more in line with what you are thinking IIUC
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Thanks
> > > > > > > > > > vinoth
> > > > > > > > > >
> > > > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > > > desavera@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi Balaji,
> > > > > > > > > > >
> > > > > > > > > > > business metadata are all types of info related to the
> > > > business
> > > > > > > where
> > > > > > > > > the
> > > > > > > > > > > Hudi solution is being used... from a COB (ie close of
> > > > business
> > > > > > > date)
> > > > > > > > > > > related to that commit to any qualifier related to that
> > > > commit
> > > > > > that
> > > > > > > > > might
> > > > > > > > > > > be useful to be associated with that commit id. If we
> > > enable
> > > > > the
> > > > > > > > > timeline
> > > > > > > > > > > service metadata model to be extended we could use the
> > > > service
> > > > > > > > instance
> > > > > > > > > > > itself to support specialised queries that involve
> > business
> > > > > > > > qualifiers
> > > > > > > > > in
> > > > > > > > > > > order to return a proper set of metadata pointing to
> the
> > > > > related
> > > > > > > > > commits
> > > > > > > > > > > that answer a business query.
> > > > > > > > > > >
> > > > > > > > > > > if we do not have that flexibility we might end up
> > > creating a
> > > > > > > > external
> > > > > > > > > > > transaction log and then comes the hard task to make
> that
> > > > > service
> > > > > > > in
> > > > > > > > > sync
> > > > > > > > > > > to the timeline service.
> > > > > > > > > > >
> > > > > > > > > > > let me know if that makes sense to you,
> > > > > > > > > > >
> > > > > > > > > > > Mario.
> > > > > > > > > > >
> > > > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > > > > >
> > > > > > > > > > > >  Hi Mario,
> > > > > > > > > > > > Timeline Server was designed to serve hudi metadata
> for
> > > > Hudi
> > > > > > > > writers
> > > > > > > > > > and
> > > > > > > > > > > > readers.  it may not be suitable to serve arbitrary
> > data.
> > > > > But,
> > > > > > it
> > > > > > > > is
> > > > > > > > > an
> > > > > > > > > > > > interesting thought. Can you elaborate more on what
> > kind
> > > of
> > > > > > > > business
> > > > > > > > > > > > metadata are you looking. Is this something you are
> > > > planning
> > > > > to
> > > > > > > > store
> > > > > > > > > > in
> > > > > > > > > > > > commit files ?
> > > > > > > > > > > > Balaji.V
> > > > > > > > > > > >
> > > > > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario
> de
> > Sá
> > > > > Vera
> > > > > > <
> > > > > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > >  I see a need for extending the current timeline
> server
> > > > > schema
> > > > > > so
> > > > > > > > > that
> > > > > > > > > > a
> > > > > > > > > > > > flexible model could be achieved in order to
> > accommodate
> > > > > > business
> > > > > > > > > > > metadata.
> > > > > > > > > > > >
> > > > > > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > > > > > >
> > > > > > > > > > > > Regards,
> > > > > > > > > > > >
> > > > > > > > > > > > Mario.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
Hi Sudha,

Can you or Vinoth help me with this? How can we create a JIRA for that ?

I can collaborate bringing the description and definition of done.

Thanks,

Mario.

On Tue, 9 Jun 2020, 23:46 Bhavani Sudha, <bh...@gmail.com> wrote:

> Hi Mario,
>
> Can you please share your jira id ?
>
> Thanks,
> Sudha
>
> On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > hey Vinoth, I noticed you added this suggestion to the weekly log .. that
> > is great ! just let me know if I am able to create a JIRA , as I tried to
> > go to HUDI project in Apache and did not find a way to do it. I can bring
> > in a good description of the benefits etc...
> >
> > thanks, Mario.
> >
> > Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <vi...@apache.org>
> > escreveu:
> >
> > > Hi,
> > >
> > > We can probably make a new JIRA. Not sure if there is an existing JIRA
> to
> > > re-use.
> > > The Following modules are good to look at.
> > >
> > > hudi-timeline-service
> > > packaging/hudi-timeline-server-bundle
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <de...@gmail.com>
> > > wrote:
> > >
> > > > Sorry Vinoth for not being clear... If that is a work in progress
> would
> > > you
> > > > have a jira I could follow up and contribute to ? If not , what is
> the
> > > > module name you suggest me looking at?
> > > >
> > > > Regards,
> > > >
> > > > Mario.
> > > >
> > > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org> wrote:
> > > >
> > > > > Sorry did not understand the last part. :) are you suggesting we
> > > create a
> > > > > jira
> > > > >
> > > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <
> desavera@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > That sounds great ! Will check that and keep an eye on the long
> > > running
> > > > > > server approach... once it gets a ticket I could watch for just
> let
> > > me
> > > > > know
> > > > > > please.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > >
> > > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org>
> > wrote:
> > > > > >
> > > > > > > Hi Mario,
> > > > > > >
> > > > > > > We actually started with the idea of making the timeline
> server,
> > a
> > > > long
> > > > > > > running service.  We have a module if you notice that builds
> our
> > a
> > > > > bundle
> > > > > > > that you could deploy. May be you can play with it and see if
> > that
> > > > > sounds
> > > > > > > interesting to you. It will definitely have some rough edges
> > given
> > > > it’s
> > > > > > not
> > > > > > > been widely used.
> > > > > > >
> > > > > > > Thanks
> > > > > > > Vinoth
> > > > > > >
> > > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > > desavera@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> > > > > thinking
> > > > > > > over
> > > > > > > > another possibility which would be externalising the Hudi
> > > timeline
> > > > > > > service
> > > > > > > > itself to an external server holding both operational (ie
> Hudi)
> > > and
> > > > > > > > business metadata.
> > > > > > > >
> > > > > > > > would you guys have any opinion on that ? would that be easy
> > as I
> > > > do
> > > > > > not
> > > > > > > > seem to see a way yet , except reading about RocksDB but that
> > is
> > > > > still
> > > > > > > not
> > > > > > > > quite clear.
> > > > > > > >
> > > > > > > > best regards,
> > > > > > > >
> > > > > > > > Mario.
> > > > > > > >
> > > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > > > >
> > > > > > > > > Hi Mario,
> > > > > > > > >
> > > > > > > > > Thanks for the detailed explanation. Hudi already allows
> > extra
> > > > > > metadata
> > > > > > > > to
> > > > > > > > > be written atomically with each commit i.e write operation.
> > In
> > > > > fact,
> > > > > > > that
> > > > > > > > > is how we track checkpoints for our delta streamer tool..
> It
> > > may
> > > > > not
> > > > > > > > solve
> > > > > > > > > the need for querying the data together with this
> > information.
> > > > but
> > > > > > > gives
> > > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > > >
> > > > > > > > > >>If we enable the timeline service metadata model to be
> > > extended
> > > > > we
> > > > > > > > could
> > > > > > > > > use the service instance itself to support specialised
> > queries
> > > > that
> > > > > > > > involve
> > > > > > > > > business qualifiers in order to return a proper set of
> > metadata
> > > > > > > pointing
> > > > > > > > to
> > > > > > > > > the related commits
> > > > > > > > >
> > > > > > > > > This is a good idea actually.. There is another active
> > discuss
> > > > > thread
> > > > > > > on
> > > > > > > > > making the metadata queryable.. there is also
> > > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we
> > paused
> > > > for
> > > > > > > now..
> > > > > > > > > But that's more in line with what you are thinking IIUC
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > > vinoth
> > > > > > > > >
> > > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > > desavera@gmail.com>
> > > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Hi Balaji,
> > > > > > > > > >
> > > > > > > > > > business metadata are all types of info related to the
> > > business
> > > > > > where
> > > > > > > > the
> > > > > > > > > > Hudi solution is being used... from a COB (ie close of
> > > business
> > > > > > date)
> > > > > > > > > > related to that commit to any qualifier related to that
> > > commit
> > > > > that
> > > > > > > > might
> > > > > > > > > > be useful to be associated with that commit id. If we
> > enable
> > > > the
> > > > > > > > timeline
> > > > > > > > > > service metadata model to be extended we could use the
> > > service
> > > > > > > instance
> > > > > > > > > > itself to support specialised queries that involve
> business
> > > > > > > qualifiers
> > > > > > > > in
> > > > > > > > > > order to return a proper set of metadata pointing to the
> > > > related
> > > > > > > > commits
> > > > > > > > > > that answer a business query.
> > > > > > > > > >
> > > > > > > > > > if we do not have that flexibility we might end up
> > creating a
> > > > > > > external
> > > > > > > > > > transaction log and then comes the hard task to make that
> > > > service
> > > > > > in
> > > > > > > > sync
> > > > > > > > > > to the timeline service.
> > > > > > > > > >
> > > > > > > > > > let me know if that makes sense to you,
> > > > > > > > > >
> > > > > > > > > > Mario.
> > > > > > > > > >
> > > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > > > >
> > > > > > > > > > >  Hi Mario,
> > > > > > > > > > > Timeline Server was designed to serve hudi metadata for
> > > Hudi
> > > > > > > writers
> > > > > > > > > and
> > > > > > > > > > > readers.  it may not be suitable to serve arbitrary
> data.
> > > > But,
> > > > > it
> > > > > > > is
> > > > > > > > an
> > > > > > > > > > > interesting thought. Can you elaborate more on what
> kind
> > of
> > > > > > > business
> > > > > > > > > > > metadata are you looking. Is this something you are
> > > planning
> > > > to
> > > > > > > store
> > > > > > > > > in
> > > > > > > > > > > commit files ?
> > > > > > > > > > > Balaji.V
> > > > > > > > > > >
> > > > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de
> Sá
> > > > Vera
> > > > > <
> > > > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > >  I see a need for extending the current timeline server
> > > > schema
> > > > > so
> > > > > > > > that
> > > > > > > > > a
> > > > > > > > > > > flexible model could be achieved in order to
> accommodate
> > > > > business
> > > > > > > > > > metadata.
> > > > > > > > > > >
> > > > > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > > > > >
> > > > > > > > > > > Regards,
> > > > > > > > > > >
> > > > > > > > > > > Mario.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Bhavani Sudha <bh...@gmail.com>.
Hi Mario,

Can you please share your jira id ?

Thanks,
Sudha

On Tue, Jun 9, 2020 at 3:29 AM Mario de Sá Vera <de...@gmail.com> wrote:

> hey Vinoth, I noticed you added this suggestion to the weekly log .. that
> is great ! just let me know if I am able to create a JIRA , as I tried to
> go to HUDI project in Apache and did not find a way to do it. I can bring
> in a good description of the benefits etc...
>
> thanks, Mario.
>
> Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <vi...@apache.org>
> escreveu:
>
> > Hi,
> >
> > We can probably make a new JIRA. Not sure if there is an existing JIRA to
> > re-use.
> > The Following modules are good to look at.
> >
> > hudi-timeline-service
> > packaging/hudi-timeline-server-bundle
> >
> > Thanks
> > Vinoth
> >
> > On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <de...@gmail.com>
> > wrote:
> >
> > > Sorry Vinoth for not being clear... If that is a work in progress would
> > you
> > > have a jira I could follow up and contribute to ? If not , what is the
> > > module name you suggest me looking at?
> > >
> > > Regards,
> > >
> > > Mario.
> > >
> > > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org> wrote:
> > >
> > > > Sorry did not understand the last part. :) are you suggesting we
> > create a
> > > > jira
> > > >
> > > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <de...@gmail.com>
> > > > wrote:
> > > >
> > > > > That sounds great ! Will check that and keep an eye on the long
> > running
> > > > > server approach... once it gets a ticket I could watch for just let
> > me
> > > > know
> > > > > please.
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org>
> wrote:
> > > > >
> > > > > > Hi Mario,
> > > > > >
> > > > > > We actually started with the idea of making the timeline server,
> a
> > > long
> > > > > > running service.  We have a module if you notice that builds our
> a
> > > > bundle
> > > > > > that you could deploy. May be you can play with it and see if
> that
> > > > sounds
> > > > > > interesting to you. It will definitely have some rough edges
> given
> > > it’s
> > > > > not
> > > > > > been widely used.
> > > > > >
> > > > > > Thanks
> > > > > > Vinoth
> > > > > >
> > > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> > desavera@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> > > > thinking
> > > > > > over
> > > > > > > another possibility which would be externalising the Hudi
> > timeline
> > > > > > service
> > > > > > > itself to an external server holding both operational (ie Hudi)
> > and
> > > > > > > business metadata.
> > > > > > >
> > > > > > > would you guys have any opinion on that ? would that be easy
> as I
> > > do
> > > > > not
> > > > > > > seem to see a way yet , except reading about RocksDB but that
> is
> > > > still
> > > > > > not
> > > > > > > quite clear.
> > > > > > >
> > > > > > > best regards,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > > >
> > > > > > > > Hi Mario,
> > > > > > > >
> > > > > > > > Thanks for the detailed explanation. Hudi already allows
> extra
> > > > > metadata
> > > > > > > to
> > > > > > > > be written atomically with each commit i.e write operation.
> In
> > > > fact,
> > > > > > that
> > > > > > > > is how we track checkpoints for our delta streamer tool.. It
> > may
> > > > not
> > > > > > > solve
> > > > > > > > the need for querying the data together with this
> information.
> > > but
> > > > > > gives
> > > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > > >
> > > > > > > > >>If we enable the timeline service metadata model to be
> > extended
> > > > we
> > > > > > > could
> > > > > > > > use the service instance itself to support specialised
> queries
> > > that
> > > > > > > involve
> > > > > > > > business qualifiers in order to return a proper set of
> metadata
> > > > > > pointing
> > > > > > > to
> > > > > > > > the related commits
> > > > > > > >
> > > > > > > > This is a good idea actually.. There is another active
> discuss
> > > > thread
> > > > > > on
> > > > > > > > making the metadata queryable.. there is also
> > > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we
> paused
> > > for
> > > > > > now..
> > > > > > > > But that's more in line with what you are thinking IIUC
> > > > > > > >
> > > > > > > >
> > > > > > > > Thanks
> > > > > > > > vinoth
> > > > > > > >
> > > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > > desavera@gmail.com>
> > > > > > > > wrote:
> > > > > > > >
> > > > > > > > > Hi Balaji,
> > > > > > > > >
> > > > > > > > > business metadata are all types of info related to the
> > business
> > > > > where
> > > > > > > the
> > > > > > > > > Hudi solution is being used... from a COB (ie close of
> > business
> > > > > date)
> > > > > > > > > related to that commit to any qualifier related to that
> > commit
> > > > that
> > > > > > > might
> > > > > > > > > be useful to be associated with that commit id. If we
> enable
> > > the
> > > > > > > timeline
> > > > > > > > > service metadata model to be extended we could use the
> > service
> > > > > > instance
> > > > > > > > > itself to support specialised queries that involve business
> > > > > > qualifiers
> > > > > > > in
> > > > > > > > > order to return a proper set of metadata pointing to the
> > > related
> > > > > > > commits
> > > > > > > > > that answer a business query.
> > > > > > > > >
> > > > > > > > > if we do not have that flexibility we might end up
> creating a
> > > > > > external
> > > > > > > > > transaction log and then comes the hard task to make that
> > > service
> > > > > in
> > > > > > > sync
> > > > > > > > > to the timeline service.
> > > > > > > > >
> > > > > > > > > let me know if that makes sense to you,
> > > > > > > > >
> > > > > > > > > Mario.
> > > > > > > > >
> > > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > > >
> > > > > > > > > >  Hi Mario,
> > > > > > > > > > Timeline Server was designed to serve hudi metadata for
> > Hudi
> > > > > > writers
> > > > > > > > and
> > > > > > > > > > readers.  it may not be suitable to serve arbitrary data.
> > > But,
> > > > it
> > > > > > is
> > > > > > > an
> > > > > > > > > > interesting thought. Can you elaborate more on what kind
> of
> > > > > > business
> > > > > > > > > > metadata are you looking. Is this something you are
> > planning
> > > to
> > > > > > store
> > > > > > > > in
> > > > > > > > > > commit files ?
> > > > > > > > > > Balaji.V
> > > > > > > > > >
> > > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá
> > > Vera
> > > > <
> > > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > >  I see a need for extending the current timeline server
> > > schema
> > > > so
> > > > > > > that
> > > > > > > > a
> > > > > > > > > > flexible model could be achieved in order to accommodate
> > > > business
> > > > > > > > > metadata.
> > > > > > > > > >
> > > > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > > > >
> > > > > > > > > > Regards,
> > > > > > > > > >
> > > > > > > > > > Mario.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
hey Vinoth, I noticed you added this suggestion to the weekly log .. that
is great ! just let me know if I am able to create a JIRA , as I tried to
go to HUDI project in Apache and did not find a way to do it. I can bring
in a good description of the benefits etc...

thanks, Mario.

Em seg., 8 de jun. de 2020 às 12:46, Vinoth Chandar <vi...@apache.org>
escreveu:

> Hi,
>
> We can probably make a new JIRA. Not sure if there is an existing JIRA to
> re-use.
> The Following modules are good to look at.
>
> hudi-timeline-service
> packaging/hudi-timeline-server-bundle
>
> Thanks
> Vinoth
>
> On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > Sorry Vinoth for not being clear... If that is a work in progress would
> you
> > have a jira I could follow up and contribute to ? If not , what is the
> > module name you suggest me looking at?
> >
> > Regards,
> >
> > Mario.
> >
> > On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org> wrote:
> >
> > > Sorry did not understand the last part. :) are you suggesting we
> create a
> > > jira
> > >
> > > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <de...@gmail.com>
> > > wrote:
> > >
> > > > That sounds great ! Will check that and keep an eye on the long
> running
> > > > server approach... once it gets a ticket I could watch for just let
> me
> > > know
> > > > please.
> > > >
> > > > Thanks
> > > >
> > > >
> > > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org> wrote:
> > > >
> > > > > Hi Mario,
> > > > >
> > > > > We actually started with the idea of making the timeline server, a
> > long
> > > > > running service.  We have a module if you notice that builds our a
> > > bundle
> > > > > that you could deploy. May be you can play with it and see if that
> > > sounds
> > > > > interesting to you. It will definitely have some rough edges given
> > it’s
> > > > not
> > > > > been widely used.
> > > > >
> > > > > Thanks
> > > > > Vinoth
> > > > >
> > > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <
> desavera@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> > > thinking
> > > > > over
> > > > > > another possibility which would be externalising the Hudi
> timeline
> > > > > service
> > > > > > itself to an external server holding both operational (ie Hudi)
> and
> > > > > > business metadata.
> > > > > >
> > > > > > would you guys have any opinion on that ? would that be easy as I
> > do
> > > > not
> > > > > > seem to see a way yet , except reading about RocksDB but that is
> > > still
> > > > > not
> > > > > > quite clear.
> > > > > >
> > > > > > best regards,
> > > > > >
> > > > > > Mario.
> > > > > >
> > > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > > >
> > > > > > > Hi Mario,
> > > > > > >
> > > > > > > Thanks for the detailed explanation. Hudi already allows extra
> > > > metadata
> > > > > > to
> > > > > > > be written atomically with each commit i.e write operation. In
> > > fact,
> > > > > that
> > > > > > > is how we track checkpoints for our delta streamer tool.. It
> may
> > > not
> > > > > > solve
> > > > > > > the need for querying the data together with this information.
> > but
> > > > > gives
> > > > > > > you ability to do some basic tagging.. if thats useful
> > > > > > >
> > > > > > > >>If we enable the timeline service metadata model to be
> extended
> > > we
> > > > > > could
> > > > > > > use the service instance itself to support specialised queries
> > that
> > > > > > involve
> > > > > > > business qualifiers in order to return a proper set of metadata
> > > > > pointing
> > > > > > to
> > > > > > > the related commits
> > > > > > >
> > > > > > > This is a good idea actually.. There is another active discuss
> > > thread
> > > > > on
> > > > > > > making the metadata queryable.. there is also
> > > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we paused
> > for
> > > > > now..
> > > > > > > But that's more in line with what you are thinking IIUC
> > > > > > >
> > > > > > >
> > > > > > > Thanks
> > > > > > > vinoth
> > > > > > >
> > > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > > desavera@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi Balaji,
> > > > > > > >
> > > > > > > > business metadata are all types of info related to the
> business
> > > > where
> > > > > > the
> > > > > > > > Hudi solution is being used... from a COB (ie close of
> business
> > > > date)
> > > > > > > > related to that commit to any qualifier related to that
> commit
> > > that
> > > > > > might
> > > > > > > > be useful to be associated with that commit id. If we enable
> > the
> > > > > > timeline
> > > > > > > > service metadata model to be extended we could use the
> service
> > > > > instance
> > > > > > > > itself to support specialised queries that involve business
> > > > > qualifiers
> > > > > > in
> > > > > > > > order to return a proper set of metadata pointing to the
> > related
> > > > > > commits
> > > > > > > > that answer a business query.
> > > > > > > >
> > > > > > > > if we do not have that flexibility we might end up creating a
> > > > > external
> > > > > > > > transaction log and then comes the hard task to make that
> > service
> > > > in
> > > > > > sync
> > > > > > > > to the timeline service.
> > > > > > > >
> > > > > > > > let me know if that makes sense to you,
> > > > > > > >
> > > > > > > > Mario.
> > > > > > > >
> > > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > > >
> > > > > > > > >  Hi Mario,
> > > > > > > > > Timeline Server was designed to serve hudi metadata for
> Hudi
> > > > > writers
> > > > > > > and
> > > > > > > > > readers.  it may not be suitable to serve arbitrary data.
> > But,
> > > it
> > > > > is
> > > > > > an
> > > > > > > > > interesting thought. Can you elaborate more on what kind of
> > > > > business
> > > > > > > > > metadata are you looking. Is this something you are
> planning
> > to
> > > > > store
> > > > > > > in
> > > > > > > > > commit files ?
> > > > > > > > > Balaji.V
> > > > > > > > >
> > > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá
> > Vera
> > > <
> > > > > > > > > desavera@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > >  I see a need for extending the current timeline server
> > schema
> > > so
> > > > > > that
> > > > > > > a
> > > > > > > > > flexible model could be achieved in order to accommodate
> > > business
> > > > > > > > metadata.
> > > > > > > > >
> > > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > > >
> > > > > > > > > Regards,
> > > > > > > > >
> > > > > > > > > Mario.
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Vinoth Chandar <vi...@apache.org>.
Hi,

We can probably make a new JIRA. Not sure if there is an existing JIRA to
re-use.
The Following modules are good to look at.

hudi-timeline-service
packaging/hudi-timeline-server-bundle

Thanks
Vinoth

On Fri, Jun 5, 2020 at 12:56 AM Mario de Sá Vera <de...@gmail.com> wrote:

> Sorry Vinoth for not being clear... If that is a work in progress would you
> have a jira I could follow up and contribute to ? If not , what is the
> module name you suggest me looking at?
>
> Regards,
>
> Mario.
>
> On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org> wrote:
>
> > Sorry did not understand the last part. :) are you suggesting we create a
> > jira
> >
> > On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <de...@gmail.com>
> > wrote:
> >
> > > That sounds great ! Will check that and keep an eye on the long running
> > > server approach... once it gets a ticket I could watch for just let me
> > know
> > > please.
> > >
> > > Thanks
> > >
> > >
> > > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org> wrote:
> > >
> > > > Hi Mario,
> > > >
> > > > We actually started with the idea of making the timeline server, a
> long
> > > > running service.  We have a module if you notice that builds our a
> > bundle
> > > > that you could deploy. May be you can play with it and see if that
> > sounds
> > > > interesting to you. It will definitely have some rough edges given
> it’s
> > > not
> > > > been widely used.
> > > >
> > > > Thanks
> > > > Vinoth
> > > >
> > > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <de...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> > thinking
> > > > over
> > > > > another possibility which would be externalising the Hudi timeline
> > > > service
> > > > > itself to an external server holding both operational (ie Hudi) and
> > > > > business metadata.
> > > > >
> > > > > would you guys have any opinion on that ? would that be easy as I
> do
> > > not
> > > > > seem to see a way yet , except reading about RocksDB but that is
> > still
> > > > not
> > > > > quite clear.
> > > > >
> > > > > best regards,
> > > > >
> > > > > Mario.
> > > > >
> > > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > > >
> > > > > > Hi Mario,
> > > > > >
> > > > > > Thanks for the detailed explanation. Hudi already allows extra
> > > metadata
> > > > > to
> > > > > > be written atomically with each commit i.e write operation. In
> > fact,
> > > > that
> > > > > > is how we track checkpoints for our delta streamer tool.. It may
> > not
> > > > > solve
> > > > > > the need for querying the data together with this information.
> but
> > > > gives
> > > > > > you ability to do some basic tagging.. if thats useful
> > > > > >
> > > > > > >>If we enable the timeline service metadata model to be extended
> > we
> > > > > could
> > > > > > use the service instance itself to support specialised queries
> that
> > > > > involve
> > > > > > business qualifiers in order to return a proper set of metadata
> > > > pointing
> > > > > to
> > > > > > the related commits
> > > > > >
> > > > > > This is a good idea actually.. There is another active discuss
> > thread
> > > > on
> > > > > > making the metadata queryable.. there is also
> > > > > > https://issues.apache.org/jira/browse/HUDI-309 which we paused
> for
> > > > now..
> > > > > > But that's more in line with what you are thinking IIUC
> > > > > >
> > > > > >
> > > > > > Thanks
> > > > > > vinoth
> > > > > >
> > > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> > desavera@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Balaji,
> > > > > > >
> > > > > > > business metadata are all types of info related to the business
> > > where
> > > > > the
> > > > > > > Hudi solution is being used... from a COB (ie close of business
> > > date)
> > > > > > > related to that commit to any qualifier related to that commit
> > that
> > > > > might
> > > > > > > be useful to be associated with that commit id. If we enable
> the
> > > > > timeline
> > > > > > > service metadata model to be extended we could use the service
> > > > instance
> > > > > > > itself to support specialised queries that involve business
> > > > qualifiers
> > > > > in
> > > > > > > order to return a proper set of metadata pointing to the
> related
> > > > > commits
> > > > > > > that answer a business query.
> > > > > > >
> > > > > > > if we do not have that flexibility we might end up creating a
> > > > external
> > > > > > > transaction log and then comes the hard task to make that
> service
> > > in
> > > > > sync
> > > > > > > to the timeline service.
> > > > > > >
> > > > > > > let me know if that makes sense to you,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > > >
> > > > > > > >  Hi Mario,
> > > > > > > > Timeline Server was designed to serve hudi metadata for Hudi
> > > > writers
> > > > > > and
> > > > > > > > readers.  it may not be suitable to serve arbitrary data.
> But,
> > it
> > > > is
> > > > > an
> > > > > > > > interesting thought. Can you elaborate more on what kind of
> > > > business
> > > > > > > > metadata are you looking. Is this something you are planning
> to
> > > > store
> > > > > > in
> > > > > > > > commit files ?
> > > > > > > > Balaji.V
> > > > > > > >
> > > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá
> Vera
> > <
> > > > > > > > desavera@gmail.com> wrote:
> > > > > > > >
> > > > > > > >  I see a need for extending the current timeline server
> schema
> > so
> > > > > that
> > > > > > a
> > > > > > > > flexible model could be achieved in order to accommodate
> > business
> > > > > > > metadata.
> > > > > > > >
> > > > > > > > let me know if that makes sense to anyone here...
> > > > > > > >
> > > > > > > > Regards,
> > > > > > > >
> > > > > > > > Mario.
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
Sorry Vinoth for not being clear... If that is a work in progress would you
have a jira I could follow up and contribute to ? If not , what is the
module name you suggest me looking at?

Regards,

Mario.

On Fri, 5 Jun 2020, 02:12 Vinoth Chandar, <vi...@apache.org> wrote:

> Sorry did not understand the last part. :) are you suggesting we create a
> jira
>
> On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > That sounds great ! Will check that and keep an eye on the long running
> > server approach... once it gets a ticket I could watch for just let me
> know
> > please.
> >
> > Thanks
> >
> >
> > On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org> wrote:
> >
> > > Hi Mario,
> > >
> > > We actually started with the idea of making the timeline server, a long
> > > running service.  We have a module if you notice that builds our a
> bundle
> > > that you could deploy. May be you can play with it and see if that
> sounds
> > > interesting to you. It will definitely have some rough edges given it’s
> > not
> > > been widely used.
> > >
> > > Thanks
> > > Vinoth
> > >
> > > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <de...@gmail.com>
> > > wrote:
> > >
> > > > Hi Vinoth, thanks for your comments on this. I spent sometime
> thinking
> > > over
> > > > another possibility which would be externalising the Hudi timeline
> > > service
> > > > itself to an external server holding both operational (ie Hudi) and
> > > > business metadata.
> > > >
> > > > would you guys have any opinion on that ? would that be easy as I do
> > not
> > > > seem to see a way yet , except reading about RocksDB but that is
> still
> > > not
> > > > quite clear.
> > > >
> > > > best regards,
> > > >
> > > > Mario.
> > > >
> > > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > > mail.vinoth.chandar@gmail.com> escreveu:
> > > >
> > > > > Hi Mario,
> > > > >
> > > > > Thanks for the detailed explanation. Hudi already allows extra
> > metadata
> > > > to
> > > > > be written atomically with each commit i.e write operation. In
> fact,
> > > that
> > > > > is how we track checkpoints for our delta streamer tool.. It may
> not
> > > > solve
> > > > > the need for querying the data together with this information. but
> > > gives
> > > > > you ability to do some basic tagging.. if thats useful
> > > > >
> > > > > >>If we enable the timeline service metadata model to be extended
> we
> > > > could
> > > > > use the service instance itself to support specialised queries that
> > > > involve
> > > > > business qualifiers in order to return a proper set of metadata
> > > pointing
> > > > to
> > > > > the related commits
> > > > >
> > > > > This is a good idea actually.. There is another active discuss
> thread
> > > on
> > > > > making the metadata queryable.. there is also
> > > > > https://issues.apache.org/jira/browse/HUDI-309 which we paused for
> > > now..
> > > > > But that's more in line with what you are thinking IIUC
> > > > >
> > > > >
> > > > > Thanks
> > > > > vinoth
> > > > >
> > > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <
> desavera@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > Hi Balaji,
> > > > > >
> > > > > > business metadata are all types of info related to the business
> > where
> > > > the
> > > > > > Hudi solution is being used... from a COB (ie close of business
> > date)
> > > > > > related to that commit to any qualifier related to that commit
> that
> > > > might
> > > > > > be useful to be associated with that commit id. If we enable the
> > > > timeline
> > > > > > service metadata model to be extended we could use the service
> > > instance
> > > > > > itself to support specialised queries that involve business
> > > qualifiers
> > > > in
> > > > > > order to return a proper set of metadata pointing to the related
> > > > commits
> > > > > > that answer a business query.
> > > > > >
> > > > > > if we do not have that flexibility we might end up creating a
> > > external
> > > > > > transaction log and then comes the hard task to make that service
> > in
> > > > sync
> > > > > > to the timeline service.
> > > > > >
> > > > > > let me know if that makes sense to you,
> > > > > >
> > > > > > Mario.
> > > > > >
> > > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > > <v....@ymail.com.invalid> escreveu:
> > > > > >
> > > > > > >  Hi Mario,
> > > > > > > Timeline Server was designed to serve hudi metadata for Hudi
> > > writers
> > > > > and
> > > > > > > readers.  it may not be suitable to serve arbitrary data. But,
> it
> > > is
> > > > an
> > > > > > > interesting thought. Can you elaborate more on what kind of
> > > business
> > > > > > > metadata are you looking. Is this something you are planning to
> > > store
> > > > > in
> > > > > > > commit files ?
> > > > > > > Balaji.V
> > > > > > >
> > > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera
> <
> > > > > > > desavera@gmail.com> wrote:
> > > > > > >
> > > > > > >  I see a need for extending the current timeline server schema
> so
> > > > that
> > > > > a
> > > > > > > flexible model could be achieved in order to accommodate
> business
> > > > > > metadata.
> > > > > > >
> > > > > > > let me know if that makes sense to anyone here...
> > > > > > >
> > > > > > > Regards,
> > > > > > >
> > > > > > > Mario.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Vinoth Chandar <vi...@apache.org>.
Sorry did not understand the last part. :) are you suggesting we create a
jira

On Thu, Jun 4, 2020 at 1:08 AM Mario de Sá Vera <de...@gmail.com> wrote:

> That sounds great ! Will check that and keep an eye on the long running
> server approach... once it gets a ticket I could watch for just let me know
> please.
>
> Thanks
>
>
> On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org> wrote:
>
> > Hi Mario,
> >
> > We actually started with the idea of making the timeline server, a long
> > running service.  We have a module if you notice that builds our a bundle
> > that you could deploy. May be you can play with it and see if that sounds
> > interesting to you. It will definitely have some rough edges given it’s
> not
> > been widely used.
> >
> > Thanks
> > Vinoth
> >
> > On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <de...@gmail.com>
> > wrote:
> >
> > > Hi Vinoth, thanks for your comments on this. I spent sometime thinking
> > over
> > > another possibility which would be externalising the Hudi timeline
> > service
> > > itself to an external server holding both operational (ie Hudi) and
> > > business metadata.
> > >
> > > would you guys have any opinion on that ? would that be easy as I do
> not
> > > seem to see a way yet , except reading about RocksDB but that is still
> > not
> > > quite clear.
> > >
> > > best regards,
> > >
> > > Mario.
> > >
> > > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > > mail.vinoth.chandar@gmail.com> escreveu:
> > >
> > > > Hi Mario,
> > > >
> > > > Thanks for the detailed explanation. Hudi already allows extra
> metadata
> > > to
> > > > be written atomically with each commit i.e write operation. In fact,
> > that
> > > > is how we track checkpoints for our delta streamer tool.. It may not
> > > solve
> > > > the need for querying the data together with this information. but
> > gives
> > > > you ability to do some basic tagging.. if thats useful
> > > >
> > > > >>If we enable the timeline service metadata model to be extended we
> > > could
> > > > use the service instance itself to support specialised queries that
> > > involve
> > > > business qualifiers in order to return a proper set of metadata
> > pointing
> > > to
> > > > the related commits
> > > >
> > > > This is a good idea actually.. There is another active discuss thread
> > on
> > > > making the metadata queryable.. there is also
> > > > https://issues.apache.org/jira/browse/HUDI-309 which we paused for
> > now..
> > > > But that's more in line with what you are thinking IIUC
> > > >
> > > >
> > > > Thanks
> > > > vinoth
> > > >
> > > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <de...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi Balaji,
> > > > >
> > > > > business metadata are all types of info related to the business
> where
> > > the
> > > > > Hudi solution is being used... from a COB (ie close of business
> date)
> > > > > related to that commit to any qualifier related to that commit that
> > > might
> > > > > be useful to be associated with that commit id. If we enable the
> > > timeline
> > > > > service metadata model to be extended we could use the service
> > instance
> > > > > itself to support specialised queries that involve business
> > qualifiers
> > > in
> > > > > order to return a proper set of metadata pointing to the related
> > > commits
> > > > > that answer a business query.
> > > > >
> > > > > if we do not have that flexibility we might end up creating a
> > external
> > > > > transaction log and then comes the hard task to make that service
> in
> > > sync
> > > > > to the timeline service.
> > > > >
> > > > > let me know if that makes sense to you,
> > > > >
> > > > > Mario.
> > > > >
> > > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > > <v....@ymail.com.invalid> escreveu:
> > > > >
> > > > > >  Hi Mario,
> > > > > > Timeline Server was designed to serve hudi metadata for Hudi
> > writers
> > > > and
> > > > > > readers.  it may not be suitable to serve arbitrary data. But, it
> > is
> > > an
> > > > > > interesting thought. Can you elaborate more on what kind of
> > business
> > > > > > metadata are you looking. Is this something you are planning to
> > store
> > > > in
> > > > > > commit files ?
> > > > > > Balaji.V
> > > > > >
> > > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <
> > > > > > desavera@gmail.com> wrote:
> > > > > >
> > > > > >  I see a need for extending the current timeline server schema so
> > > that
> > > > a
> > > > > > flexible model could be achieved in order to accommodate business
> > > > > metadata.
> > > > > >
> > > > > > let me know if that makes sense to anyone here...
> > > > > >
> > > > > > Regards,
> > > > > >
> > > > > > Mario.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
That sounds great ! Will check that and keep an eye on the long running
server approach... once it gets a ticket I could watch for just let me know
please.

Thanks


On Thu, 4 Jun 2020, 05:34 Vinoth Chandar, <vi...@apache.org> wrote:

> Hi Mario,
>
> We actually started with the idea of making the timeline server, a long
> running service.  We have a module if you notice that builds our a bundle
> that you could deploy. May be you can play with it and see if that sounds
> interesting to you. It will definitely have some rough edges given it’s not
> been widely used.
>
> Thanks
> Vinoth
>
> On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > Hi Vinoth, thanks for your comments on this. I spent sometime thinking
> over
> > another possibility which would be externalising the Hudi timeline
> service
> > itself to an external server holding both operational (ie Hudi) and
> > business metadata.
> >
> > would you guys have any opinion on that ? would that be easy as I do not
> > seem to see a way yet , except reading about RocksDB but that is still
> not
> > quite clear.
> >
> > best regards,
> >
> > Mario.
> >
> > Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> > mail.vinoth.chandar@gmail.com> escreveu:
> >
> > > Hi Mario,
> > >
> > > Thanks for the detailed explanation. Hudi already allows extra metadata
> > to
> > > be written atomically with each commit i.e write operation. In fact,
> that
> > > is how we track checkpoints for our delta streamer tool.. It may not
> > solve
> > > the need for querying the data together with this information. but
> gives
> > > you ability to do some basic tagging.. if thats useful
> > >
> > > >>If we enable the timeline service metadata model to be extended we
> > could
> > > use the service instance itself to support specialised queries that
> > involve
> > > business qualifiers in order to return a proper set of metadata
> pointing
> > to
> > > the related commits
> > >
> > > This is a good idea actually.. There is another active discuss thread
> on
> > > making the metadata queryable.. there is also
> > > https://issues.apache.org/jira/browse/HUDI-309 which we paused for
> now..
> > > But that's more in line with what you are thinking IIUC
> > >
> > >
> > > Thanks
> > > vinoth
> > >
> > > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <de...@gmail.com>
> > > wrote:
> > >
> > > > Hi Balaji,
> > > >
> > > > business metadata are all types of info related to the business where
> > the
> > > > Hudi solution is being used... from a COB (ie close of business date)
> > > > related to that commit to any qualifier related to that commit that
> > might
> > > > be useful to be associated with that commit id. If we enable the
> > timeline
> > > > service metadata model to be extended we could use the service
> instance
> > > > itself to support specialised queries that involve business
> qualifiers
> > in
> > > > order to return a proper set of metadata pointing to the related
> > commits
> > > > that answer a business query.
> > > >
> > > > if we do not have that flexibility we might end up creating a
> external
> > > > transaction log and then comes the hard task to make that service in
> > sync
> > > > to the timeline service.
> > > >
> > > > let me know if that makes sense to you,
> > > >
> > > > Mario.
> > > >
> > > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > > <v....@ymail.com.invalid> escreveu:
> > > >
> > > > >  Hi Mario,
> > > > > Timeline Server was designed to serve hudi metadata for Hudi
> writers
> > > and
> > > > > readers.  it may not be suitable to serve arbitrary data. But, it
> is
> > an
> > > > > interesting thought. Can you elaborate more on what kind of
> business
> > > > > metadata are you looking. Is this something you are planning to
> store
> > > in
> > > > > commit files ?
> > > > > Balaji.V
> > > > >
> > > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <
> > > > > desavera@gmail.com> wrote:
> > > > >
> > > > >  I see a need for extending the current timeline server schema so
> > that
> > > a
> > > > > flexible model could be achieved in order to accommodate business
> > > > metadata.
> > > > >
> > > > > let me know if that makes sense to anyone here...
> > > > >
> > > > > Regards,
> > > > >
> > > > > Mario.
> > > > >
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Vinoth Chandar <vi...@apache.org>.
Hi Mario,

We actually started with the idea of making the timeline server, a long
running service.  We have a module if you notice that builds our a bundle
that you could deploy. May be you can play with it and see if that sounds
interesting to you. It will definitely have some rough edges given it’s not
been widely used.

Thanks
Vinoth

On Wed, Jun 3, 2020 at 2:33 AM Mario de Sá Vera <de...@gmail.com> wrote:

> Hi Vinoth, thanks for your comments on this. I spent sometime thinking over
> another possibility which would be externalising the Hudi timeline service
> itself to an external server holding both operational (ie Hudi) and
> business metadata.
>
> would you guys have any opinion on that ? would that be easy as I do not
> seem to see a way yet , except reading about RocksDB but that is still not
> quite clear.
>
> best regards,
>
> Mario.
>
> Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
> mail.vinoth.chandar@gmail.com> escreveu:
>
> > Hi Mario,
> >
> > Thanks for the detailed explanation. Hudi already allows extra metadata
> to
> > be written atomically with each commit i.e write operation. In fact, that
> > is how we track checkpoints for our delta streamer tool.. It may not
> solve
> > the need for querying the data together with this information. but gives
> > you ability to do some basic tagging.. if thats useful
> >
> > >>If we enable the timeline service metadata model to be extended we
> could
> > use the service instance itself to support specialised queries that
> involve
> > business qualifiers in order to return a proper set of metadata pointing
> to
> > the related commits
> >
> > This is a good idea actually.. There is another active discuss thread on
> > making the metadata queryable.. there is also
> > https://issues.apache.org/jira/browse/HUDI-309 which we paused for now..
> > But that's more in line with what you are thinking IIUC
> >
> >
> > Thanks
> > vinoth
> >
> > On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <de...@gmail.com>
> > wrote:
> >
> > > Hi Balaji,
> > >
> > > business metadata are all types of info related to the business where
> the
> > > Hudi solution is being used... from a COB (ie close of business date)
> > > related to that commit to any qualifier related to that commit that
> might
> > > be useful to be associated with that commit id. If we enable the
> timeline
> > > service metadata model to be extended we could use the service instance
> > > itself to support specialised queries that involve business qualifiers
> in
> > > order to return a proper set of metadata pointing to the related
> commits
> > > that answer a business query.
> > >
> > > if we do not have that flexibility we might end up creating a external
> > > transaction log and then comes the hard task to make that service in
> sync
> > > to the timeline service.
> > >
> > > let me know if that makes sense to you,
> > >
> > > Mario.
> > >
> > > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > > <v....@ymail.com.invalid> escreveu:
> > >
> > > >  Hi Mario,
> > > > Timeline Server was designed to serve hudi metadata for Hudi writers
> > and
> > > > readers.  it may not be suitable to serve arbitrary data. But, it is
> an
> > > > interesting thought. Can you elaborate more on what kind of business
> > > > metadata are you looking. Is this something you are planning to store
> > in
> > > > commit files ?
> > > > Balaji.V
> > > >
> > > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <
> > > > desavera@gmail.com> wrote:
> > > >
> > > >  I see a need for extending the current timeline server schema so
> that
> > a
> > > > flexible model could be achieved in order to accommodate business
> > > metadata.
> > > >
> > > > let me know if that makes sense to anyone here...
> > > >
> > > > Regards,
> > > >
> > > > Mario.
> > > >
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
Hi Vinoth, thanks for your comments on this. I spent sometime thinking over
another possibility which would be externalising the Hudi timeline service
itself to an external server holding both operational (ie Hudi) and
business metadata.

would you guys have any opinion on that ? would that be easy as I do not
seem to see a way yet , except reading about RocksDB but that is still not
quite clear.

best regards,

Mario.

Em seg., 1 de jun. de 2020 às 16:01, Vinoth Chandar <
mail.vinoth.chandar@gmail.com> escreveu:

> Hi Mario,
>
> Thanks for the detailed explanation. Hudi already allows extra metadata to
> be written atomically with each commit i.e write operation. In fact, that
> is how we track checkpoints for our delta streamer tool.. It may not solve
> the need for querying the data together with this information. but gives
> you ability to do some basic tagging.. if thats useful
>
> >>If we enable the timeline service metadata model to be extended we could
> use the service instance itself to support specialised queries that involve
> business qualifiers in order to return a proper set of metadata pointing to
> the related commits
>
> This is a good idea actually.. There is another active discuss thread on
> making the metadata queryable.. there is also
> https://issues.apache.org/jira/browse/HUDI-309 which we paused for now..
> But that's more in line with what you are thinking IIUC
>
>
> Thanks
> vinoth
>
> On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <de...@gmail.com>
> wrote:
>
> > Hi Balaji,
> >
> > business metadata are all types of info related to the business where the
> > Hudi solution is being used... from a COB (ie close of business date)
> > related to that commit to any qualifier related to that commit that might
> > be useful to be associated with that commit id. If we enable the timeline
> > service metadata model to be extended we could use the service instance
> > itself to support specialised queries that involve business qualifiers in
> > order to return a proper set of metadata pointing to the related commits
> > that answer a business query.
> >
> > if we do not have that flexibility we might end up creating a external
> > transaction log and then comes the hard task to make that service in sync
> > to the timeline service.
> >
> > let me know if that makes sense to you,
> >
> > Mario.
> >
> > Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> > <v....@ymail.com.invalid> escreveu:
> >
> > >  Hi Mario,
> > > Timeline Server was designed to serve hudi metadata for Hudi writers
> and
> > > readers.  it may not be suitable to serve arbitrary data. But, it is an
> > > interesting thought. Can you elaborate more on what kind of business
> > > metadata are you looking. Is this something you are planning to store
> in
> > > commit files ?
> > > Balaji.V
> > >
> > >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <
> > > desavera@gmail.com> wrote:
> > >
> > >  I see a need for extending the current timeline server schema so that
> a
> > > flexible model could be achieved in order to accommodate business
> > metadata.
> > >
> > > let me know if that makes sense to anyone here...
> > >
> > > Regards,
> > >
> > > Mario.
> > >
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Vinoth Chandar <ma...@gmail.com>.
Hi Mario,

Thanks for the detailed explanation. Hudi already allows extra metadata to
be written atomically with each commit i.e write operation. In fact, that
is how we track checkpoints for our delta streamer tool.. It may not solve
the need for querying the data together with this information. but gives
you ability to do some basic tagging.. if thats useful

>>If we enable the timeline service metadata model to be extended we could
use the service instance itself to support specialised queries that involve
business qualifiers in order to return a proper set of metadata pointing to
the related commits

This is a good idea actually.. There is another active discuss thread on
making the metadata queryable.. there is also
https://issues.apache.org/jira/browse/HUDI-309 which we paused for now..
But that's more in line with what you are thinking IIUC


Thanks
vinoth

On Mon, Jun 1, 2020 at 4:41 AM Mario de Sá Vera <de...@gmail.com> wrote:

> Hi Balaji,
>
> business metadata are all types of info related to the business where the
> Hudi solution is being used... from a COB (ie close of business date)
> related to that commit to any qualifier related to that commit that might
> be useful to be associated with that commit id. If we enable the timeline
> service metadata model to be extended we could use the service instance
> itself to support specialised queries that involve business qualifiers in
> order to return a proper set of metadata pointing to the related commits
> that answer a business query.
>
> if we do not have that flexibility we might end up creating a external
> transaction log and then comes the hard task to make that service in sync
> to the timeline service.
>
> let me know if that makes sense to you,
>
> Mario.
>
> Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
> <v....@ymail.com.invalid> escreveu:
>
> >  Hi Mario,
> > Timeline Server was designed to serve hudi metadata for Hudi writers and
> > readers.  it may not be suitable to serve arbitrary data. But, it is an
> > interesting thought. Can you elaborate more on what kind of business
> > metadata are you looking. Is this something you are planning to store in
> > commit files ?
> > Balaji.V
> >
> >     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <
> > desavera@gmail.com> wrote:
> >
> >  I see a need for extending the current timeline server schema so that a
> > flexible model could be achieved in order to accommodate business
> metadata.
> >
> > let me know if that makes sense to anyone here...
> >
> > Regards,
> >
> > Mario.
> >
>

Re: How to extend the timeline server schema to accommodate business metadata

Posted by Mario de Sá Vera <de...@gmail.com>.
Hi Balaji,

business metadata are all types of info related to the business where the
Hudi solution is being used... from a COB (ie close of business date)
related to that commit to any qualifier related to that commit that might
be useful to be associated with that commit id. If we enable the timeline
service metadata model to be extended we could use the service instance
itself to support specialised queries that involve business qualifiers in
order to return a proper set of metadata pointing to the related commits
that answer a business query.

if we do not have that flexibility we might end up creating a external
transaction log and then comes the hard task to make that service in sync
to the timeline service.

let me know if that makes sense to you,

Mario.

Em seg., 1 de jun. de 2020 às 06:55, Balaji Varadarajan
<v....@ymail.com.invalid> escreveu:

>  Hi Mario,
> Timeline Server was designed to serve hudi metadata for Hudi writers and
> readers.  it may not be suitable to serve arbitrary data. But, it is an
> interesting thought. Can you elaborate more on what kind of business
> metadata are you looking. Is this something you are planning to store in
> commit files ?
> Balaji.V
>
>     On Sunday, May 31, 2020, 04:22:27 PM PDT, Mario de Sá Vera <
> desavera@gmail.com> wrote:
>
>  I see a need for extending the current timeline server schema so that a
> flexible model could be achieved in order to accommodate business metadata.
>
> let me know if that makes sense to anyone here...
>
> Regards,
>
> Mario.
>