You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hudi.apache.org by Vinoth Chandar <vi...@apache.org> on 2019/04/09 17:28:25 UTC

Re: Hudi + Alluxio

Opened a JIRA for this https://issues.apache.org/jira/browse/HUDI-95

@Semantic Beeng <ni...@semanticbeeng.com>  & I are also interested in this.
Brandon, just wondering if you made any strides on this?
Happy to work with you on the JIRA if you are still interested

On Mon, Mar 18, 2019 at 2:00 PM nishith agarwal <n3...@gmail.com> wrote:

> Brandon,
>
> This sounds great! We've also thought around similar lines of hot % vs cold
> % all accessible through a unified query interface. Let us know how your
> POC turns out.
>
> -Nishith
>
> On Mon, Mar 18, 2019 at 1:04 PM Brandon Geise <br...@gmail.com>
> wrote:
>
> > Will do on the JIRA.  I need to do a POC to see if it's something to
> > pursue on our side.
> >
> > Our use case would be similar using Spark + Presto and then having a 30
> > day hot storage and cold being on S3 and was thinking of having Alluxio
> be
> > cache/memory portion.
> >
> > On 3/18/19, 3:49 PM, "Vinoth Chandar" <vi...@apache.org> wrote:
> >
> >     Great! I can definitely see use-cases for refreshing an
> alluxio/ignite
> >     cache incrementally..
> >
> >     i.e Hive => ETL => Hudi on DFS => Incremental Pull + upsert => Hudi
> on
> > in
> >     memory FS
> >
> >
> >     if you want to pursue a JIRA, please let me know. I will add you as a
> >     contributor
> >
> >     On Mon, Mar 18, 2019 at 12:41 PM Brandon Geise <
> brandongeise@gmail.com
> > >
> >     wrote:
> >
> >     > Thanks, Vinoth.  I'll take a look.  It seems to be a great starting
> > point!
> >     >
> >     > On 3/18/19, 3:35 PM, "Vinoth Chandar" <vi...@apache.org> wrote:
> >     >
> >     >     I have actually played around with Apache Ignite integration
> > which
> >     > supports
> >     >     append().
> >     >
> >     >
> >
> https://github.com/vinothchandar/incubator-hudi/commit/dd578947ec1db9388038f0a1863a90b3761cd571
> >     >
> >     >
> >     >     Alluxio would work as well I believe
> >     >
> >     >     Something like Kafka => DeltaStreamer => Hudi/igfs could give
> > you a
> >     > mutable
> >     >     in-memory near real time analytics
> >     >     (sorry for bundling up so many buzzwords :P)
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >
> >     >     On Mon, Mar 18, 2019 at 12:11 PM Brandon Geise <
> > brandongeise@gmail.com
> >     > >
> >     >     wrote:
> >     >
> >     >     > Hi,
> >     >     >
> >     >     >
> >     >     >
> >     >     > Has anyone used Hudi in combination with Alluxio?  Based on
> my
> >     >     > understanding of each solution, it seems that at a file level
> > this
> >     >     > could/should all work together, but if someone has direct
> > experience
> >     > I’d
> >     >     > love to hear about it.
> >     >     >
> >     >     >
> >     >     >
> >     >     > Thanks,
> >     >     >
> >     >     > Brandon
> >     >     >
> >     >     >
> >     >
> >     >
> >     >
> >     >
> >
> >
> >
> >
>

Re: Hudi + Alluxio

Posted by Vinoth Chandar <vi...@apache.org>.
No problem :).

On Tue, Apr 9, 2019 at 10:31 AM Brandon Geise <br...@gmail.com>
wrote:

> Hi Vinoth,
>
> I haven't.  As a company we started leaning towards ignite so we can focus
> on a cache solution that supports multiple use cases.  I am still
> interested in spending time on this as time allows on my side, but
> currently can't invest as much time as I originally thought.
>
> On 4/9/19, 1:28 PM, "Vinoth Chandar" <vi...@apache.org> wrote:
>
>     Opened a JIRA for this https://issues.apache.org/jira/browse/HUDI-95
>
>     @Semantic Beeng <ni...@semanticbeeng.com>  & I are also interested in
> this.
>     Brandon, just wondering if you made any strides on this?
>     Happy to work with you on the JIRA if you are still interested
>
>     On Mon, Mar 18, 2019 at 2:00 PM nishith agarwal <n3...@gmail.com>
> wrote:
>
>     > Brandon,
>     >
>     > This sounds great! We've also thought around similar lines of hot %
> vs cold
>     > % all accessible through a unified query interface. Let us know how
> your
>     > POC turns out.
>     >
>     > -Nishith
>     >
>     > On Mon, Mar 18, 2019 at 1:04 PM Brandon Geise <
> brandongeise@gmail.com>
>     > wrote:
>     >
>     > > Will do on the JIRA.  I need to do a POC to see if it's something
> to
>     > > pursue on our side.
>     > >
>     > > Our use case would be similar using Spark + Presto and then having
> a 30
>     > > day hot storage and cold being on S3 and was thinking of having
> Alluxio
>     > be
>     > > cache/memory portion.
>     > >
>     > > On 3/18/19, 3:49 PM, "Vinoth Chandar" <vi...@apache.org> wrote:
>     > >
>     > >     Great! I can definitely see use-cases for refreshing an
>     > alluxio/ignite
>     > >     cache incrementally..
>     > >
>     > >     i.e Hive => ETL => Hudi on DFS => Incremental Pull + upsert =>
> Hudi
>     > on
>     > > in
>     > >     memory FS
>     > >
>     > >
>     > >     if you want to pursue a JIRA, please let me know. I will add
> you as a
>     > >     contributor
>     > >
>     > >     On Mon, Mar 18, 2019 at 12:41 PM Brandon Geise <
>     > brandongeise@gmail.com
>     > > >
>     > >     wrote:
>     > >
>     > >     > Thanks, Vinoth.  I'll take a look.  It seems to be a great
> starting
>     > > point!
>     > >     >
>     > >     > On 3/18/19, 3:35 PM, "Vinoth Chandar" <vi...@apache.org>
> wrote:
>     > >     >
>     > >     >     I have actually played around with Apache Ignite
> integration
>     > > which
>     > >     > supports
>     > >     >     append().
>     > >     >
>     > >     >
>     > >
>     >
> https://github.com/vinothchandar/incubator-hudi/commit/dd578947ec1db9388038f0a1863a90b3761cd571
>     > >     >
>     > >     >
>     > >     >     Alluxio would work as well I believe
>     > >     >
>     > >     >     Something like Kafka => DeltaStreamer => Hudi/igfs could
> give
>     > > you a
>     > >     > mutable
>     > >     >     in-memory near real time analytics
>     > >     >     (sorry for bundling up so many buzzwords :P)
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >     On Mon, Mar 18, 2019 at 12:11 PM Brandon Geise <
>     > > brandongeise@gmail.com
>     > >     > >
>     > >     >     wrote:
>     > >     >
>     > >     >     > Hi,
>     > >     >     >
>     > >     >     >
>     > >     >     >
>     > >     >     > Has anyone used Hudi in combination with Alluxio?
> Based on
>     > my
>     > >     >     > understanding of each solution, it seems that at a
> file level
>     > > this
>     > >     >     > could/should all work together, but if someone has
> direct
>     > > experience
>     > >     > I’d
>     > >     >     > love to hear about it.
>     > >     >     >
>     > >     >     >
>     > >     >     >
>     > >     >     > Thanks,
>     > >     >     >
>     > >     >     > Brandon
>     > >     >     >
>     > >     >     >
>     > >     >
>     > >     >
>     > >     >
>     > >     >
>     > >
>     > >
>     > >
>     > >
>     >
>
>
>
>

Re: Hudi + Alluxio

Posted by Brandon Geise <br...@gmail.com>.
Hi Vinoth,

I haven't.  As a company we started leaning towards ignite so we can focus on a cache solution that supports multiple use cases.  I am still interested in spending time on this as time allows on my side, but currently can't invest as much time as I originally thought.

On 4/9/19, 1:28 PM, "Vinoth Chandar" <vi...@apache.org> wrote:

    Opened a JIRA for this https://issues.apache.org/jira/browse/HUDI-95
    
    @Semantic Beeng <ni...@semanticbeeng.com>  & I are also interested in this.
    Brandon, just wondering if you made any strides on this?
    Happy to work with you on the JIRA if you are still interested
    
    On Mon, Mar 18, 2019 at 2:00 PM nishith agarwal <n3...@gmail.com> wrote:
    
    > Brandon,
    >
    > This sounds great! We've also thought around similar lines of hot % vs cold
    > % all accessible through a unified query interface. Let us know how your
    > POC turns out.
    >
    > -Nishith
    >
    > On Mon, Mar 18, 2019 at 1:04 PM Brandon Geise <br...@gmail.com>
    > wrote:
    >
    > > Will do on the JIRA.  I need to do a POC to see if it's something to
    > > pursue on our side.
    > >
    > > Our use case would be similar using Spark + Presto and then having a 30
    > > day hot storage and cold being on S3 and was thinking of having Alluxio
    > be
    > > cache/memory portion.
    > >
    > > On 3/18/19, 3:49 PM, "Vinoth Chandar" <vi...@apache.org> wrote:
    > >
    > >     Great! I can definitely see use-cases for refreshing an
    > alluxio/ignite
    > >     cache incrementally..
    > >
    > >     i.e Hive => ETL => Hudi on DFS => Incremental Pull + upsert => Hudi
    > on
    > > in
    > >     memory FS
    > >
    > >
    > >     if you want to pursue a JIRA, please let me know. I will add you as a
    > >     contributor
    > >
    > >     On Mon, Mar 18, 2019 at 12:41 PM Brandon Geise <
    > brandongeise@gmail.com
    > > >
    > >     wrote:
    > >
    > >     > Thanks, Vinoth.  I'll take a look.  It seems to be a great starting
    > > point!
    > >     >
    > >     > On 3/18/19, 3:35 PM, "Vinoth Chandar" <vi...@apache.org> wrote:
    > >     >
    > >     >     I have actually played around with Apache Ignite integration
    > > which
    > >     > supports
    > >     >     append().
    > >     >
    > >     >
    > >
    > https://github.com/vinothchandar/incubator-hudi/commit/dd578947ec1db9388038f0a1863a90b3761cd571
    > >     >
    > >     >
    > >     >     Alluxio would work as well I believe
    > >     >
    > >     >     Something like Kafka => DeltaStreamer => Hudi/igfs could give
    > > you a
    > >     > mutable
    > >     >     in-memory near real time analytics
    > >     >     (sorry for bundling up so many buzzwords :P)
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > >     >     On Mon, Mar 18, 2019 at 12:11 PM Brandon Geise <
    > > brandongeise@gmail.com
    > >     > >
    > >     >     wrote:
    > >     >
    > >     >     > Hi,
    > >     >     >
    > >     >     >
    > >     >     >
    > >     >     > Has anyone used Hudi in combination with Alluxio?  Based on
    > my
    > >     >     > understanding of each solution, it seems that at a file level
    > > this
    > >     >     > could/should all work together, but if someone has direct
    > > experience
    > >     > I’d
    > >     >     > love to hear about it.
    > >     >     >
    > >     >     >
    > >     >     >
    > >     >     > Thanks,
    > >     >     >
    > >     >     > Brandon
    > >     >     >
    > >     >     >
    > >     >
    > >     >
    > >     >
    > >     >
    > >
    > >
    > >
    > >
    >