You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@superset.apache.org by Sriram V <v....@tcs.com.INVALID> on 2021/08/24 08:33:58 UTC

Size capability of superset

Hi Team,
We are evaluating superset for visualizing trillion rows of data.

  1.  Is there any size limitation for superset to upload the data as csv? We could not even upload the csv file of size 500MB (With 2 million rows of data) to superset.
  2.  If there is no size limitation, can you provide some user stories that superset is currently performing by handling huge data.
  3.  Is there any direct way to upload or connect data in the form of orc ,parquet and delta lake format without connecting to data bricks?

Thanks,

Regards,
Sriram V
Engineer
Tata Consultancy Services,
Chennai
=====-----=====-----=====
Notice: The information contained in this e-mail
message and/or attachments to it may contain 
confidential or privileged information. If you are 
not the intended recipient, any dissemination, use, 
review, distribution, printing or copying of the 
information contained in this e-mail message 
and/or attachments to it are strictly prohibited. If 
you have received this communication in error, 
please notify us by reply e-mail or telephone and 
immediately and permanently delete the message 
and any attachments. Thank you



Re: Size capability of superset

Posted by Srinivasa Kadamati <sr...@preset.io>.
I second everything Erik just said, spot on!

On Tue, Aug 24, 2021 at 11:45 AM Erik Ritter <er...@gmail.com>
wrote:

> Hi Sriram,
>
> I would strongly recommend against using the upload CSV functionality in
> Superset as a major part of your data warehouse pipeline, and instead use a
> tool like Apache Airflow to write and schedule ingestion pipelines, or use
> CLIs/GUIs provided by your warehouse's DB to load data in one-off use
> cases. The upload CSV feature is primarily intended for importing small
> amounts of data from external sources into your data warehouse and not for
> large CSV files or creating core tables.
>
> As for visualizing trillions of rows of data, a lot of that will fall on
> your warehouse's DB query engine to handle. I assume you don't want to
> display trillions of datapoints in the browser, but instead show
> aggregations of those rows. Superset should perform fine, as the actual
> aggregations are computed in the warehouse, and Superset only handles the
> aggregated result set (and includes default row limits on said aggregations
> to prevent overloading of the browser or the webserver).
>
> I hope this helps!
> Erik Ritter
>
> On Tue, Aug 24, 2021 at 8:28 AM Sriram V <v....@tcs.com.invalid> wrote:
>
> > Hi Team,
> > We are evaluating superset for visualizing trillion rows of data.
> >
> >   1.  Is there any size limitation for superset to upload the data as
> csv?
> > We could not even upload the csv file of size 500MB (With 2 million rows
> of
> > data) to superset.
> >   2.  If there is no size limitation, can you provide some user stories
> > that superset is currently performing by handling huge data.
> >   3.  Is there any direct way to upload or connect data in the form of
> orc
> > ,parquet and delta lake format without connecting to data bricks?
> >
> > Thanks,
> >
> > Regards,
> > Sriram V
> > Engineer
> > Tata Consultancy Services,
> > Chennai
> > =====-----=====-----=====
> > Notice: The information contained in this e-mail
> > message and/or attachments to it may contain
> > confidential or privileged information. If you are
> > not the intended recipient, any dissemination, use,
> > review, distribution, printing or copying of the
> > information contained in this e-mail message
> > and/or attachments to it are strictly prohibited. If
> > you have received this communication in error,
> > please notify us by reply e-mail or telephone and
> > immediately and permanently delete the message
> > and any attachments. Thank you
> >
> >
> >
>

Re: Size capability of superset

Posted by Erik Ritter <er...@gmail.com>.
Hi Sriram,

I would strongly recommend against using the upload CSV functionality in
Superset as a major part of your data warehouse pipeline, and instead use a
tool like Apache Airflow to write and schedule ingestion pipelines, or use
CLIs/GUIs provided by your warehouse's DB to load data in one-off use
cases. The upload CSV feature is primarily intended for importing small
amounts of data from external sources into your data warehouse and not for
large CSV files or creating core tables.

As for visualizing trillions of rows of data, a lot of that will fall on
your warehouse's DB query engine to handle. I assume you don't want to
display trillions of datapoints in the browser, but instead show
aggregations of those rows. Superset should perform fine, as the actual
aggregations are computed in the warehouse, and Superset only handles the
aggregated result set (and includes default row limits on said aggregations
to prevent overloading of the browser or the webserver).

I hope this helps!
Erik Ritter

On Tue, Aug 24, 2021 at 8:28 AM Sriram V <v....@tcs.com.invalid> wrote:

> Hi Team,
> We are evaluating superset for visualizing trillion rows of data.
>
>   1.  Is there any size limitation for superset to upload the data as csv?
> We could not even upload the csv file of size 500MB (With 2 million rows of
> data) to superset.
>   2.  If there is no size limitation, can you provide some user stories
> that superset is currently performing by handling huge data.
>   3.  Is there any direct way to upload or connect data in the form of orc
> ,parquet and delta lake format without connecting to data bricks?
>
> Thanks,
>
> Regards,
> Sriram V
> Engineer
> Tata Consultancy Services,
> Chennai
> =====-----=====-----=====
> Notice: The information contained in this e-mail
> message and/or attachments to it may contain
> confidential or privileged information. If you are
> not the intended recipient, any dissemination, use,
> review, distribution, printing or copying of the
> information contained in this e-mail message
> and/or attachments to it are strictly prohibited. If
> you have received this communication in error,
> please notify us by reply e-mail or telephone and
> immediately and permanently delete the message
> and any attachments. Thank you
>
>
>