You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@hivemall.apache.org by Mustafa ��man <mu...@apache.org> on 2021/02/17 01:03:13 UTC

Hive first class support

Hi Hivemall contributors,
I am a Hive committer. I am looking into Hivemall to see where we can improve regarding Hive integration. For example compiler support: syntactic sugar to lessen the verbosity of training/prediction workflows. However I checked the github repo and there does not seem to be any commits after August 2020. I am wondering if there is still active development on this project. I could not find much info on direction of the project. Is there a current road map somewhere? Are there any companies supporting the development of Hivemall currently? Also, are there any benchmarks against established providers such BigQuery, RedShift, Tensorflow etc?

Re: Hive first class support

Posted by Makoto Yui <my...@apache.org>.
Hi Mustafa,

Thank you for being interested in Hivemall.

> syntactic sugar to lessen the verbosity of training/prediction workflows.

For ML workflows, I recommend using external workflow tools like
Apache Airflow and Digdag.

Here is an example workflow using Hivemall with Digdag.
https://github.com/treasure-data/treasure-boxes/blob/master/machine-learning-box/gender_age_prediction/rf_predict.dig

>  I could not find much info on direction of the project. Is there a current road map somewhere?

Unfortunately, it's under maintenance-only mode for now.
I'm personally interested in implementing MLflow integration for
status tracking.

> I am looking into Hivemall to see where we can improve regarding Hive integration.

If Hive committers can help graduating Hivemall as a subject of Hive,
it's welcome.
https://incubator.apache.org/guides/graduation.html#whether_to_graduate_to_subproject_or_to_top_level_project

The functionality of Hivemall is already diverse and stable enough.
I'm not sure about other possibilities of Hive integration but the
proposal is welcome.

> Are there any companies supporting the development of Hivemall currently?

I'm not sure but my employee (Arm Treasure Data) is using Hivemall in
production.

You can find uses of Hivemall by searching linkedin
https://www.linkedin.com/search/results/all/?keywords=hivemall&origin=GLOBAL_SEARCH_HEADER

> Also, are there any benchmarks against established providers such BigQuery, RedShift, Tensorflow etc?

I don't have a chance to run Redshift ML or BigQuery ML; however,
Hivemal also supports training by XGBoost and have more functions for
feature engineering and evaluation when compared to BigQuery ML.

Thanks,
Makoto

2021年2月17日(水) 10:03 Mustafa İman <mu...@apache.org>:
>
> Hi Hivemall contributors,
> I am a Hive committer. I am looking into Hivemall to see where we can improve regarding Hive integration. For example compiler support: syntactic sugar to lessen the verbosity of training/prediction workflows. However I checked the github repo and there does not seem to be any commits after August 2020. I am wondering if there is still active development on this project. I could not find much info on direction of the project. Is there a current road map somewhere? Are there any companies supporting the development of Hivemall currently? Also, are there any benchmarks against established providers such BigQuery, RedShift, Tensorflow etc?

-- 
Makoto YUI <myui AT apache.org>
Principal Engineer, Arm Treasure Data.
http://myui.github.io/