You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by Alexey Zinoviev <za...@gmail.com> on 2020/04/09 10:00:03 UTC

Re: Ignite XGBoost support

Morning!
I'm going to publish the roadmap for Ignite ML 2.9 for wide discussion at
the end of April when the dates of 2.9 will be finished.

But I suppose that we will closer to the this point* "**the plan to rely on
models trained elsewhere and then imported into the platform for scoring?",
*I mean distributed inference, hope to increase the amount of integrated
models and model formats.

You mentioned "I’m currently working to integrate the ignite distributed
dataframes", what dataframes are you referring to? Could you share a link
to docs, for example? We have no official term "the ignite distributed
dataframes".

If you get some results about integration with XGBoost, please let me know!

Sincerely yours,
                 Alexey

пт, 27 мар. 2020 г. в 16:29, Carbone, Adam <Ad...@bottomline.com>:

> Good Morning Alexey,
>
>
>
> Let me first answer your questions.
>
>
>
> 1. Are you a member of XGBoost project and have a permission to commit the
> XGBoost project? (in many cases the collaboration involves changes in both
> integrated frameworks)
>
> No I am not personally, nor is our organization.
>
> 2. What are the primitives or integration points are accessible in
> XGBoost? Could you share a paper/article/link to give me a chance to read
> more?
>
> Not sure that we the person on my team doing this work, has that level of
> understanding yet. Like I mentioned in the previous email we were about to
> embark on this when we saw the 2.8 announcement come out and decided to
> look further at what the level of support was.
>
> 3. What is planned architecture with native C++ libraries? Could you share
> it with me and Ignite community?
>
> I will only be able to share the higher level on this ( again if it makes
> sense we could do a deeper dive with the developers that are working on
> this directly), but currently our Tensorflow Neural Network modeling is
> exposed via some internal webservices written in C++ wrapping the
> tensorflow libraries. This is  called within our job scheduling/runner
> framework. These webservices run on different images within our overall
> system. We were looking to do something similar around XGBoost prior to
> seeing it come up in the announcement.
>
>
>
> So it looks like you are using MLeap to import and support external models
> we have looked at the same approach ourselves. From what you mentioned it
> seems that there are currently no intentions to add distributed training of
> any external Algorithms to the platform, are you developing your own
> algorithms? Or is the plan to rely on models trained elsewhere and then
> imported into the platform for scoring? Just interested in the ways that we
> may be able to leverage the platform or help contribute, we are looking to
> use other features of ignite so leveraging additional features over time
> seems like the right approach. I’m currently working to integrate the
> ignite distributed dataframes.
>
>
>
> Regards
>
>
>
> Adam
>
>
>
> Adam Carbone | Director of Innovation – Intelligent Platform Team |
> Bottomline Technologies
> Office: 603-501-6446 | Mobile: 603-570-8418
> www.bottomline.com
>
>
>
>
>
>
>
> *From: *Alexey Zinoviev <za...@gmail.com>
> *Date: *Friday, March 27, 2020 at 1:58 AM
> *To: *"Carbone, Adam" <Ad...@bottomline.com>
> *Cc: *"dev@ignite.apache.org" <de...@ignite.apache.org>
> *Subject: *Re: Ignite XGBoost support
>
>
>
> Morning, Adam, Denis!
>
>
>
> Let me describe the current status
>
>
>
> 1. https://issues.apache.org/jira/browse/IGNITE-10810 is related to MLeap
> not to XGBoost. This is the right ticket for XGBoost
> https://issues.apache.org/jira/browse/IGNITE-10289
>
> 2. Currently, we have no plans to add XGBoost or any external ML library
> for distributed training (inference could be supported now with a few
> limitations, see XGBoost or H2O examples)
>
> 3. We have models storage and partitioned dataset primitives to keep the
> data with MapReduce-like operations, but each algorithm should be
> implemented as a sequence of MR operations manually (we have no MR code
> generation here)
>
>
>
> I have a few questions, could you please answer them?
>
>
>
> 1. Are you a member of XGBoost project and have a permission to commit the
> XGBoost project? (in many cases the collaboration involves changes in both
> integrated frameworks)
>
> 2. What are the primitives or integration points are accessible in
> XGBoost? Could you share a paper/article/link to give me a chance to read
> more?
>
> 3. What is planned architecture with native C++ libraries? Could you share
> it with me and Ignite community?
>
>
>
> P.S. I need to go deeper to understand what capabilities of Ignite ML
> could be used to become the platform for distributed training, you answers
> will be helpful.
>
>
>
> Sincerely yours,
>
>           Alexey Zinoviev
>
>
>
> пт, 27 мар. 2020 г. в 01:04, Carbone, Adam <Ad...@bottomline.com>:
>
> Good afternoon Denis,
>
> Nice to meet you, Hello to you too Alexey. So I'm not sure if it will be
> me or another member on our team, but I wanted to start the discussion.  We
> are investigating/integrating ignite into our ML platform. In addition We
> have already done a separate tensor flow implementation for Neural Network
> using the C++ libraries. And we were about to take the same approach for
> XGBoost, when we saw the 2.8 announcement. So before we went that route I
> wanted to do a more proper investigations as to where things were, and
> where they might head.
>
> Regards
>
> Adam
>
> Adam Carbone | Director of Innovation – Intelligent Platform Team |
> Bottomline Technologies
> Office: 603-501-6446 | Mobile: 603-570-8418
> www.bottomline.com
>
>
>
> On 3/26/20, 5:20 PM, "Denis Magda" <dm...@apache.org> wrote:
>
>     Hi Adam, thanks for starting the thread. The contributions are
>     highly appreciated and we'll be glad to see you among our contributors,
>     especially, if it helps to make our ML library stronger.
>
>     But first things first, let me introduce you to @Alexey Zinoviev
>     <za...@gmail.com> who is our main ML maintainer.
>
>     -
>     Denis
>
>
>     On Thu, Mar 26, 2020 at 1:49 PM Carbone, Adam <
> Adam.Carbone@bottomline.com>
>     wrote:
>
>     > Good Afternoon All
>     >
>     > I was asked to forward this here by Denis Magda. I see in the 2.8
> release
>     > that you implemented importing of XGBoost models for distributed
> inference
>     > =>
>     >
> https://issues.apache.org/jira/browse/IGNITE-10810?focusedCommentId=16728718&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16728718Is
>     > there any plans to add distributed training, We are at a cross roads
> of
>     > building on top of the C++ libraries an XGBoost solution, but if
> this is on
>     > the roadmap maybe we will go the ignite direction vs the pure C++,
> and
>     > maybe we might even be able to help and contribute.
>     >
>     > Regards
>     >
>     > Adam Carbone
>     >
>     > Adam Carbone | Director of Innovation – Intelligent Platform Team |
>     > Bottomline Technologies
>     > Office: 603-501-6446 | Mobile: 603-570-8418
>     > www.bottomline.com
>     >
>     >
>     >
>
>