You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@airavata.apache.org by Lahiru Jayathilake <la...@gmail.com> on 2023/05/11 18:26:08 UTC

SMILES Django Portal

Hi Everyone,

The initial development of the SMILES Django portal is available in this
PR[1]. It includes the backend code that enables CRUD operations on the
three data product types(computational, experimental, and literature),
retrieves each type of data product by combining different schemas defined
on the Airavata Data Catalog side, Celery and Redis-based solution for
handling large file uploads, and implementation of the Airavata Data
Catalog services as of now. In addition, the VueJS frontend has been
developed to visualize the different types of data products, allow users to
upload data products, and a detailed view of an experimental data product.
Please note that some frontend changes are in progress.

[1] - https://github.com/SciGaP/smiles-django-portal/pull/1

Cheers!
Lahiru

On Wed, Feb 22, 2023 at 8:55 PM Lahiru Jayathilake <
lahirujayathilake@gmail.com> wrote:

> Hi All,
>
> To provide an update about the project, we have decided to proceed with
> approach one with a few changes. Instead of using Django model classes for
> the Computational, Experimental, and Literature data, protobuf generated
> model classes will be used to handle communications. This approach will
> enable easier accommodation of changes to the models without having to
> modify the Django code base.
> [image: SMILES Django Portal.png]
>
> The project has been renamed to smiles-django-portal [1], and I have
> created a PR [2] which has the functionality of creating a computational
> product (note that the frontend has not been implemented yet), as well as
> the Python client implementation to call the Airavata Data Catalog service.
>
> With regards to the SMILES protobuf files, there are two possible ways of
> creating them,
>
> 1. Declaring all the fields in the proto file
> All the data related to specific SMILES data products will be mentioned
> within the proto file, along with the required fields for the Airavata Data
> Catalog product, except metadata. Fields such as data_product_id and name
> will be included. When the Airavata Data Catalog gRPC service is invoked,
> all the SMILES-specific data will go to the metadata field as a JSON
> string.
>
> This method has already been implemented in the previous PR. A sample
> protobuf file [3] (this is not the final version) was borrowed from here [4]
>
> 2. Using JSON-LD for metadata
> In this schema, we are using the "google.protobuf.Struct" type to
> represent the metadata for each data product. This allows us to store
> JSON-LD data in the metadata field, as the "google.protobuf.Struct" type
> can hold arbitrary JSON data. The rest of the required fields will also be
> mentioned (e.g., data_product_id, name, etc.)."
> The 'google.protobuf.Struct metadata' will be assigned to the 'metadata'
> field of the Airavata Data Catalog product as a JSON string.
>
> A Sample protobuf file,
>
> syntax = "proto3";
>
> import "google/protobuf/struct.proto";
>
> message ComputationalDP {
>   string data_product_id = 1;
>   string parent_data_product_id = 2;
>   string name = 3;
>   google.protobuf.Struct metadata = 4;
> }
>
> Using this approach, I believe we can have the following two main
> advantages,
>
> - Flexibility (Because the "google.protobuf.Struct" type can hold
> arbitrary JSON data, we can represent a wide range of data structures, from
> simple key-value pairs to nested objects and arrays. This can make it
> easier to work with complex data and integrate it with other systems)
> - Type safety (By using the "google.protobuf.Struct" type, we can ensure
> that the JSON data is well-formed and conforms to a specific schema)
>
> I'd like to hear your thoughts and feedback on this.
>
> [1] - https://github.com/SciGaP/smiles-django-portal
> [2] - https://github.com/SciGaP/smiles-django-portal/pull/1
> [3] -
> https://github.com/lahirujayathilake/smiles-django-portal/blob/main/data_catalog/proto/computational_dp.proto
> [4] -
> https://github.com/bhavesh-asana/SEAGrid/blob/main/rpcHandler/ExpDBDataHandler/proto/molecule.proto
>
> Thanks,
> Lahiru
>
>
> On Fri, Feb 17, 2023 at 12:28 PM Lahiru Jayathilake <
> lahirujayathilake@gmail.com> wrote:
>
>> Hi Suresh,
>>
>> Thanks for the advice, sure I will do it as you suggested.
>>
>> Lahiru
>>
>> On Thu, Feb 16, 2023 at 7:42 PM Suresh Marru <sm...@apache.org> wrote:
>>
>>> Hi Lahiru,
>>>
>>> The two dependencies, a Django-grpc fork (
>>> https://github.com/socotecio/django-socio-grpc/) and
>>> https://github.com/grpc/grpc-web are reasonably ok. So building on them
>>> may not be a bad idea. But if you are hitting too frequent roadblocks, it
>>> may be wise to switch to Django-rest-framework and take your approach 1.
>>> Sometimes the downsides of depending on not-so-actively maintained
>>> dependencies outweigh the technical advantages.
>>>
>>> So + 1 to proceed with grpc, but if you stumble, revert to the
>>> REST-based approach.
>>>
>>> Suresh
>>>
>>> On Feb 16, 2023, at 3:24 AM, Lahiru Jayathilake <
>>> lahirujayathilake@gmail.com> wrote:
>>>
>>> Hi Marcus,
>>>
>>> Thanks for the suggestions and the heads-up. Sure, I will do more
>>> investigation on that and get back to you with the details.
>>>
>>> Thanks,
>>> Lahiru
>>>
>>> On Wed, Feb 15, 2023 at 8:36 PM Christie, Marcus Aaron <ma...@iu.edu>
>>> wrote:
>>>
>>>> Hi Lahiru,
>>>>
>>>> Thanks for putting together this investigation. I'm not 100% sure but
>>>> it looks like gRPC-JS only works with Node.js since it uses Node.js APIs. I
>>>> think you'll need gRPC-Web to make gRPC calls from a browser. My
>>>> understanding is that that requires an Envoy proxy on the server side.
>>>> (Rereading your email, I think you probably already know this, but just in
>>>> case I thought I would point this out.)
>>>>
>>>> It looks like django-grpc-framework isn't an active project [1], so I
>>>> agree with your concern about depending on it. One issue with using gRPC in
>>>> Django, I think, is that the integration that we've done with the Django
>>>> framework would need to be re-implemented, things like middleware and
>>>> authentication.  It's probably doable, just something to keep in mind.
>>>>
>>>> It would be good if the gRPC server could run on the same HTTP port as
>>>> the Django server, but I'm not sure how that would work.  From the client,
>>>> accessing the Django server or the gRPC server should both be over SSL, on
>>>> the same port. Maybe on the backend they run on different ports but with
>>>> the proxy it looks like from the client's perspective they run on the same
>>>> port.
>>>>
>>>> The django-grpc-framework project may be good to mine for some ideas. I
>>>> like that it follows django-rest-framework conventions. We use
>>>> django-rest-framework in the Airavata Django Portal.
>>>>
>>>> Thanks,
>>>>
>>>> Marcus
>>>>
>>>> [1] https://github.com/fengsp/django-grpc-framework/issues/34
>>>>
>>>> > On Feb 14, 2023, at 1:17 PM, Lahiru Jayathilake <
>>>> lahirujayathilake@gmail.com> wrote:
>>>> >
>>>> > Hi Suresh,
>>>> >
>>>> > Thank you for the feedback. The other library that can be used to
>>>> facilitate browser communication with gRPC services is gRPC-JS (
>>>> https://github.com/grpc/grpc-node/tree/master/packages/grpc-js).
>>>> However, in terms of browser support, gRPC-Web is specifically designed for
>>>> use in web browsers, and it supports all major browsers including Chrome,
>>>> Firefox, Safari, and Edge. In contrast, gRPC-JS is designed to work with
>>>> both web browsers and Node.js, and it may require additional configuration
>>>> to work correctly in web browsers and it is a bit cumbersome.
>>>> >
>>>> > @machrist@iu.edu I had a chat with Suresh and wanted to clarify a
>>>> few points with you.
>>>> >
>>>> > 1. In the second approach what I have done is spinup up a gRPC server
>>>> in the background (inside Django App). When I was doing that I came across
>>>> a framework called django-grpc-framework [1][2]. I did not proceed with
>>>> that framework because it is coming from a personal repository. What do you
>>>> think? Is it good to go with this?
>>>> >
>>>> > 2. Any suggestions or comments on the approach of using gRPC
>>>> (gRPC-web) to establish communications with the frontend?
>>>> >
>>>> > I'd be really happy to hear your thoughts and suggestions on these.
>>>> >
>>>> > [1] - https://github.com/fengsp/django-grpc-framework
>>>> > [2] - https://pypi.org/project/djangogrpcframework/
>>>> >
>>>> > Thanks,
>>>> > Lahiru
>>>> >
>>>> >
>>>> >
>>>> > On Tue, Feb 14, 2023 at 7:47 PM Suresh Marru <sm...@apache.org>
>>>> wrote:
>>>> > Hi Lahiru,
>>>> >
>>>> > Thank you for summarizing both of these, and your POCs of both
>>>> approaches are helpful. The second option, if feasible, will be preferable.
>>>> You already mentioned the performance. In addition, you will also get
>>>> forward/backward compatibility if the underlying protobuff structures are
>>>> maintained with some discipline.
>>>> >
>>>> > The big plus side of your 1st approach is the REST-compatible
>>>> javascript libraries. Other than grpc-web (
>>>> https://github.com/grpc/grpc-web) have you seen broader support? I see
>>>> you are building on grpc-js how is that experience?
>>>> >
>>>> > Suresh
>>>> >
>>>> >> On Feb 13, 2023, at 2:49 PM, Lahiru Jayathilake <
>>>> lahirujayathilake@gmail.com> wrote:
>>>> >>
>>>> >> Hi All,
>>>> >>
>>>> >> I have been engaging with the SMILES project to implement the
>>>> Gateway and its necessary components. Just to give you a brief
>>>> introduction, the SMILES project has three types of data that need to be
>>>> combined for publication: Computational DB, Literature DB, and Experiment
>>>> DB. There should be a frontend to filter, create, and delete data products,
>>>> with a Django app as the backend that will communicate with Apache Airavata
>>>> Data Catalog [1].
>>>> >>
>>>> >> Mainly, I have been exploring two approaches.
>>>> >>
>>>> >> 1.
>>>> >> <approach1.png>
>>>> >>
>>>> >> The frontend will communicate with the Django app via REST, and the
>>>> Django app will manage the manipulation of data products through gRPC calls
>>>> to the Data Catalog API. Django models will be used to represent the
>>>> Computational, Literature, and Experiment data products, without storing
>>>> the data. In the end, these data products will reside in the Data Catalog,
>>>> following its established conventions.
>>>> >>
>>>> >> POC - https://github.com/lahirujayathilake/SEAGrid
>>>> >> This has been implemented to cover the data product creation
>>>> >>
>>>> >> 2.
>>>> >> <with-grpc.png>
>>>> >>
>>>> >> In this approach, the distinction will be a gRPC server operating
>>>> within the Django app. To represent the three data products, protobufs will
>>>> be defined that extend the DataCatalog proto messages [2]. The frontend
>>>> will communicate using gRPC calls.
>>>> >> The gRPC API can be used to manipulate data from other clients,
>>>> resulting in improved performance.
>>>> >>
>>>> >> POC - https://github.com/lahirujayathilake/SEAGrid/tree/with-grpc
>>>> >> (The frontend is inprogress)
>>>> >>
>>>> >> I would like to hear your thoughts and feedback on the designs to
>>>> improve and to go with the right approach.
>>>> >>
>>>> >>
>>>> >> [1] - https://github.com/apache/airavata-data-catalog
>>>> >> [2] -
>>>> https://github.com/apache/airavata-data-catalog/blob/main/data-catalog-api/stubs/src/main/proto/DataCatalogAPI.proto
>>>> >>
>>>> >> Cheers!
>>>> >> Lahiru
>>>> >
>>>>
>>>>
>>>