You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@marvin.apache.org by Leonardo Tavares <le...@gmail.com> on 2020/05/26 02:10:25 UTC

Some issues and suggestions

Hello everyone, first let me introduce myself:
My name is Leonardo Tavares and I'm an undergrad Computer Engineering
student at UFSCar doing my undergraduate dissertation supervised by
Professor Daniel Lucrédio.

I'm using Marvin as a MLOps framework for implementing a proof of concept
architecture based on a project from work. It consists mainly on having
multiple models running at the same time and a "model selector" that parses
all requests and sends to the correct model.

Along the development of the project I was writing down every issue I faced
and points where Marvin helped me compared to vanilla Python. I was going
to open some issues in the GitHub repository, but I couldn't find the tab,
so I'm going to drop all feedback here and if needed I can send each issue
in a separate email.

Regarding specs, I was using a Windows 10 laptop but the Marvin
engines/notebooks were running in a Docker container using Python 3.7. To
reduce the message size, I'm going to focus on the issues I had during the
project development:

   - Probably my biggest issue, Marvin's documentation is scarce, meaning
   that most of the problems I faced I had to open the source code and try to
   understand what was happening.
   - The public engines available on GitHub have an older Marvin template,
   resulting in broken dependencies.
   - The usage of gRPC depends on compiling the "actions.proto" file from
   source (or has some other easier way to use that is not documented).
   - Reloading artifacts using gRPC does not work. Only works when using
   rest.
   - The artifact names are static, not allowing the creation of other
   artifacts and demanding the user knowledge of each artifact for each DASFE
   action, despite having artifacts definition on "engine.metadata".
   - If you select a blank cell and tag it as a DASFE action, it will cause
   the engine to break (maybe could have a "pass" when the cell is blank?).
   - "marvin test" command always fails, even with thedryrun and http/gRPC
   server working.
   - "marvin pkg-updatedeps" is broken, get_installed_distributions was
   deprecated in newer versions of pip.
   - "make docker-build" is broken, due to the deprecation of the
   "ppa:webupd8team/java" (along with possibly some other issues).

I'd also like to suggest on a point that isn't a issue but that I think
would be a great addition to the framework:

   - If I managed to correctly understand the artifact versioning,
   it currently works by dumping each "marvin_" variable into a pickle and
   loading it. Many problems can occur if the source code of the project
   changes between artifact versions.
      - A possible solution is to tag each artifact with a git commit,
      saving the pickle of each artifact along with the commit that it was
      generated.
      - When reloading, the engine executor would not only load the pickle
      of the artifact but also change the source code being executed to the
      version tagged with the artifact.

What do you think?
Overall, even with all the issues presented above, the framework have great
features that speedup the development and management of the engines, having
great potential to be used in several fields.
To me, the most glaring issue is the lack of detailed documentation and
tutorials, turning away possible adopters due to the steep learning curve.
To help with this issue I'd like to help on writing some tutorials on gRPC,
if possible :)

Thanks!

-- 
Leonardo Tavares Oliveira
Graduando em Engenharia de Computação
Universidade Federal de São Carlos

<http://www.instructables.com/member/leonardots/>
<https://github.com/leotavares>
<https://www.linkedin.com/in/leonardot1802/>
<http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8064478U3>

Re: Some issues and suggestions

Posted by Wei Chen <we...@apache.org>.
Hello Leonardo,

The current documentation only contains high level concept and there are
not many samples.
If you have anything that you can help to share with us please submit a PR
to this Repo!
https://github.com/apache/incubator-marvin-website

Best Regards
Wei

On Wed, May 27, 2020 at 6:22 AM Zhang Yifei <yi...@gmail.com> wrote:

> Hello Leonardo,
>
> Lots of good advice!
> You can find Marvin's issues here: issues.apache.org/jira
>
> Best regards
>
> Em seg., 25 de mai. de 2020 às 23:10, Leonardo Tavares <
> leonardot1802@gmail.com> escreveu:
>
> > Hello everyone, first let me introduce myself:
> > My name is Leonardo Tavares and I'm an undergrad Computer Engineering
> > student at UFSCar doing my undergraduate dissertation supervised by
> > Professor Daniel Lucrédio.
> >
> > I'm using Marvin as a MLOps framework for implementing a proof of concept
> > architecture based on a project from work. It consists mainly on having
> > multiple models running at the same time and a "model selector" that
> parses
> > all requests and sends to the correct model.
> >
> > Along the development of the project I was writing down every issue I
> faced
> > and points where Marvin helped me compared to vanilla Python. I was going
> > to open some issues in the GitHub repository, but I couldn't find the
> tab,
> > so I'm going to drop all feedback here and if needed I can send each
> issue
> > in a separate email.
> >
> > Regarding specs, I was using a Windows 10 laptop but the Marvin
> > engines/notebooks were running in a Docker container using Python 3.7. To
> > reduce the message size, I'm going to focus on the issues I had during
> the
> > project development:
> >
> >    - Probably my biggest issue, Marvin's documentation is scarce, meaning
> >    that most of the problems I faced I had to open the source code and
> try
> > to
> >    understand what was happening.
> >    - The public engines available on GitHub have an older Marvin
> template,
> >    resulting in broken dependencies.
> >    - The usage of gRPC depends on compiling the "actions.proto" file from
> >    source (or has some other easier way to use that is not documented).
> >    - Reloading artifacts using gRPC does not work. Only works when using
> >    rest.
> >    - The artifact names are static, not allowing the creation of other
> >    artifacts and demanding the user knowledge of each artifact for each
> > DASFE
> >    action, despite having artifacts definition on "engine.metadata".
> >    - If you select a blank cell and tag it as a DASFE action, it will
> cause
> >    the engine to break (maybe could have a "pass" when the cell is
> blank?).
> >    - "marvin test" command always fails, even with thedryrun and
> http/gRPC
> >    server working.
> >    - "marvin pkg-updatedeps" is broken, get_installed_distributions was
> >    deprecated in newer versions of pip.
> >    - "make docker-build" is broken, due to the deprecation of the
> >    "ppa:webupd8team/java" (along with possibly some other issues).
> >
> > I'd also like to suggest on a point that isn't a issue but that I think
> > would be a great addition to the framework:
> >
> >    - If I managed to correctly understand the artifact versioning,
> >    it currently works by dumping each "marvin_" variable into a pickle
> and
> >    loading it. Many problems can occur if the source code of the project
> >    changes between artifact versions.
> >       - A possible solution is to tag each artifact with a git commit,
> >       saving the pickle of each artifact along with the commit that it
> was
> >       generated.
> >       - When reloading, the engine executor would not only load the
> pickle
> >       of the artifact but also change the source code being executed to
> the
> >       version tagged with the artifact.
> >
> > What do you think?
> > Overall, even with all the issues presented above, the framework have
> great
> > features that speedup the development and management of the engines,
> having
> > great potential to be used in several fields.
> > To me, the most glaring issue is the lack of detailed documentation and
> > tutorials, turning away possible adopters due to the steep learning
> curve.
> > To help with this issue I'd like to help on writing some tutorials on
> gRPC,
> > if possible :)
> >
> > Thanks!
> >
> > --
> > Leonardo Tavares Oliveira
> > Graduando em Engenharia de Computação
> > Universidade Federal de São Carlos
> >
> > <http://www.instructables.com/member/leonardots/>
> > <https://github.com/leotavares>
> > <https://www.linkedin.com/in/leonardot1802/>
> > <http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8064478U3>
> >
>
>
> --
> --------------------------------------------------------------
> Zhang Yifei
>

Re: Some issues and suggestions

Posted by Zhang Yifei <yi...@gmail.com>.
Hello Leonardo,

Lots of good advice!
You can find Marvin's issues here: issues.apache.org/jira

Best regards

Em seg., 25 de mai. de 2020 às 23:10, Leonardo Tavares <
leonardot1802@gmail.com> escreveu:

> Hello everyone, first let me introduce myself:
> My name is Leonardo Tavares and I'm an undergrad Computer Engineering
> student at UFSCar doing my undergraduate dissertation supervised by
> Professor Daniel Lucrédio.
>
> I'm using Marvin as a MLOps framework for implementing a proof of concept
> architecture based on a project from work. It consists mainly on having
> multiple models running at the same time and a "model selector" that parses
> all requests and sends to the correct model.
>
> Along the development of the project I was writing down every issue I faced
> and points where Marvin helped me compared to vanilla Python. I was going
> to open some issues in the GitHub repository, but I couldn't find the tab,
> so I'm going to drop all feedback here and if needed I can send each issue
> in a separate email.
>
> Regarding specs, I was using a Windows 10 laptop but the Marvin
> engines/notebooks were running in a Docker container using Python 3.7. To
> reduce the message size, I'm going to focus on the issues I had during the
> project development:
>
>    - Probably my biggest issue, Marvin's documentation is scarce, meaning
>    that most of the problems I faced I had to open the source code and try
> to
>    understand what was happening.
>    - The public engines available on GitHub have an older Marvin template,
>    resulting in broken dependencies.
>    - The usage of gRPC depends on compiling the "actions.proto" file from
>    source (or has some other easier way to use that is not documented).
>    - Reloading artifacts using gRPC does not work. Only works when using
>    rest.
>    - The artifact names are static, not allowing the creation of other
>    artifacts and demanding the user knowledge of each artifact for each
> DASFE
>    action, despite having artifacts definition on "engine.metadata".
>    - If you select a blank cell and tag it as a DASFE action, it will cause
>    the engine to break (maybe could have a "pass" when the cell is blank?).
>    - "marvin test" command always fails, even with thedryrun and http/gRPC
>    server working.
>    - "marvin pkg-updatedeps" is broken, get_installed_distributions was
>    deprecated in newer versions of pip.
>    - "make docker-build" is broken, due to the deprecation of the
>    "ppa:webupd8team/java" (along with possibly some other issues).
>
> I'd also like to suggest on a point that isn't a issue but that I think
> would be a great addition to the framework:
>
>    - If I managed to correctly understand the artifact versioning,
>    it currently works by dumping each "marvin_" variable into a pickle and
>    loading it. Many problems can occur if the source code of the project
>    changes between artifact versions.
>       - A possible solution is to tag each artifact with a git commit,
>       saving the pickle of each artifact along with the commit that it was
>       generated.
>       - When reloading, the engine executor would not only load the pickle
>       of the artifact but also change the source code being executed to the
>       version tagged with the artifact.
>
> What do you think?
> Overall, even with all the issues presented above, the framework have great
> features that speedup the development and management of the engines, having
> great potential to be used in several fields.
> To me, the most glaring issue is the lack of detailed documentation and
> tutorials, turning away possible adopters due to the steep learning curve.
> To help with this issue I'd like to help on writing some tutorials on gRPC,
> if possible :)
>
> Thanks!
>
> --
> Leonardo Tavares Oliveira
> Graduando em Engenharia de Computação
> Universidade Federal de São Carlos
>
> <http://www.instructables.com/member/leonardots/>
> <https://github.com/leotavares>
> <https://www.linkedin.com/in/leonardot1802/>
> <http://buscatextual.cnpq.br/buscatextual/visualizacv.do?id=K8064478U3>
>


-- 
--------------------------------------------------------------
Zhang Yifei