You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@ignite.apache.org by "Carbone, Adam" <Ad...@bottomline.com> on 2020/04/09 15:13:53 UTC

Re: Ignite XGBoost support

You mentioned "I’m currently working to integrate the ignite distributed dataframes", what dataframes are you referring to? Could you share a link to docs, for example? We have no official term "the ignite distributed dataframes".

I guess the term that the community uses is shared dataframes
https://apacheignite-fs.readme.io/docs/ignite-data-frame

As far as XGBoost, seems as though the platform isn’t really working on training, just inference. My teammate did say he “needs to investigate Ignite for distributed training for both XGBoost as well as TF (tensorflow). And, I have been fiddling around with it” so I will keep you informed.

Regards

~Adam


From: Alexey Zinoviev <za...@gmail.com>
Date: Thursday, April 9, 2020 at 5:59 AM
To: "Carbone, Adam" <Ad...@bottomline.com>
Cc: "dev@ignite.apache.org" <de...@ignite.apache.org>
Subject: Re: Ignite XGBoost support

Morning!
I'm going to publish the roadmap for Ignite ML 2.9 for wide discussion at the end of April when the dates of 2.9 will be finished.

But I suppose that we will closer to the this point "the plan to rely on models trained elsewhere and then imported into the platform for scoring?", I mean distributed inference, hope to increase the amount of integrated models and model formats.

You mentioned "I’m currently working to integrate the ignite distributed dataframes", what dataframes are you referring to? Could you share a link to docs, for example? We have no official term "the ignite distributed dataframes".

If you get some results about integration with XGBoost, please let me know!

Sincerely yours,
                 Alexey

пт, 27 мар. 2020 г. в 16:29, Carbone, Adam <Ad...@bottomline.com>>:
Good Morning Alexey,

Let me first answer your questions.

1. Are you a member of XGBoost project and have a permission to commit the XGBoost project? (in many cases the collaboration involves changes in both integrated frameworks)
No I am not personally, nor is our organization.
2. What are the primitives or integration points are accessible in XGBoost? Could you share a paper/article/link to give me a chance to read more?
Not sure that we the person on my team doing this work, has that level of understanding yet. Like I mentioned in the previous email we were about to embark on this when we saw the 2.8 announcement come out and decided to look further at what the level of support was.
3. What is planned architecture with native C++ libraries? Could you share it with me and Ignite community?
I will only be able to share the higher level on this ( again if it makes sense we could do a deeper dive with the developers that are working on this directly), but currently our Tensorflow Neural Network modeling is exposed via some internal webservices written in C++ wrapping the tensorflow libraries. This is  called within our job scheduling/runner framework. These webservices run on different images within our overall system. We were looking to do something similar around XGBoost prior to seeing it come up in the announcement.



So it looks like you are using MLeap to import and support external models we have looked at the same approach ourselves. From what you mentioned it seems that there are currently no intentions to add distributed training of any external Algorithms to the platform, are you developing your own algorithms? Or is the plan to rely on models trained elsewhere and then imported into the platform for scoring? Just interested in the ways that we may be able to leverage the platform or help contribute, we are looking to use other features of ignite so leveraging additional features over time seems like the right approach. I’m currently working to integrate the ignite distributed dataframes.

Regards

Adam

Adam Carbone | Director of Innovation – Intelligent Platform Team | Bottomline Technologies
Office: 603-501-6446 | Mobile: 603-570-8418
www.bottomline.com<http://www.bottomline.com>



From: Alexey Zinoviev <za...@gmail.com>>
Date: Friday, March 27, 2020 at 1:58 AM
To: "Carbone, Adam" <Ad...@bottomline.com>>
Cc: "dev@ignite.apache.org<ma...@ignite.apache.org>" <de...@ignite.apache.org>>
Subject: Re: Ignite XGBoost support

Morning, Adam, Denis!

Let me describe the current status

1. https://issues.apache.org/jira/browse/IGNITE-10810 is related to MLeap not to XGBoost. This is the right ticket for XGBoost https://issues.apache.org/jira/browse/IGNITE-10289
2. Currently, we have no plans to add XGBoost or any external ML library for distributed training (inference could be supported now with a few limitations, see XGBoost or H2O examples)
3. We have models storage and partitioned dataset primitives to keep the data with MapReduce-like operations, but each algorithm should be implemented as a sequence of MR operations manually (we have no MR code generation here)

I have a few questions, could you please answer them?

1. Are you a member of XGBoost project and have a permission to commit the XGBoost project? (in many cases the collaboration involves changes in both integrated frameworks)
2. What are the primitives or integration points are accessible in XGBoost? Could you share a paper/article/link to give me a chance to read more?
3. What is planned architecture with native C++ libraries? Could you share it with me and Ignite community?

P.S. I need to go deeper to understand what capabilities of Ignite ML could be used to become the platform for distributed training, you answers will be helpful.

Sincerely yours,
          Alexey Zinoviev

пт, 27 мар. 2020 г. в 01:04, Carbone, Adam <Ad...@bottomline.com>>:
Good afternoon Denis,

Nice to meet you, Hello to you too Alexey. So I'm not sure if it will be me or another member on our team, but I wanted to start the discussion.  We are investigating/integrating ignite into our ML platform. In addition We have already done a separate tensor flow implementation for Neural Network using the C++ libraries. And we were about to take the same approach for XGBoost, when we saw the 2.8 announcement. So before we went that route I wanted to do a more proper investigations as to where things were, and where they might head.

Regards

Adam

Adam Carbone | Director of Innovation – Intelligent Platform Team | Bottomline Technologies
Office: 603-501-6446 | Mobile: 603-570-8418
www.bottomline.com<http://www.bottomline.com>



On 3/26/20, 5:20 PM, "Denis Magda" <dm...@apache.org>> wrote:

    Hi Adam, thanks for starting the thread. The contributions are
    highly appreciated and we'll be glad to see you among our contributors,
    especially, if it helps to make our ML library stronger.

    But first things first, let me introduce you to @Alexey Zinoviev
    <za...@gmail.com>> who is our main ML maintainer.

    -
    Denis


    On Thu, Mar 26, 2020 at 1:49 PM Carbone, Adam <Ad...@bottomline.com>>
    wrote:

    > Good Afternoon All
    >
    > I was asked to forward this here by Denis Magda. I see in the 2.8 release
    > that you implemented importing of XGBoost models for distributed inference
    > =>
    > https://issues.apache.org/jira/browse/IGNITE-10810?focusedCommentId=16728718&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16728718Is
    > there any plans to add distributed training, We are at a cross roads of
    > building on top of the C++ libraries an XGBoost solution, but if this is on
    > the roadmap maybe we will go the ignite direction vs the pure C++, and
    > maybe we might even be able to help and contribute.
    >
    > Regards
    >
    > Adam Carbone
    >
    > Adam Carbone | Director of Innovation – Intelligent Platform Team |
    > Bottomline Technologies
    > Office: 603-501-6446 | Mobile: 603-570-8418
    > www.bottomline.com<http://www.bottomline.com>
    >
    >
    >