You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@carbondata.apache.org by "suyash yadav (Jira)" <ji...@apache.org> on 2020/12/16 03:57:00 UTC

[jira] [Created] (CARBONDATA-4085) How to improve query execution time further

suyash yadav created CARBONDATA-4085:
----------------------------------------

             Summary: How to improve query execution time further
                 Key: CARBONDATA-4085
                 URL: https://issues.apache.org/jira/browse/CARBONDATA-4085
             Project: CarbonData
          Issue Type: Improvement
          Components: sql
    Affects Versions: 2.0.1
            Reporter: suyash yadav
             Fix For: 2.0.1


Hi Team,

We are doing a POC where we would like oour query execution to be fatser, mostly in the range of 3 to 4 seconds.

We have read carbon docuements where it has been claimed that carbondata can help to scan PETABYTES of data and present results in 3 to 4 seconds , which does not seem to be the case as per our observation.

Our table size is 1.6 billionand  query is fetching only 4K records but still it takes around 22 to 25 seconds for query execution.

Below is our query that we are firing:

==============================

spark.sql("select ts,resource,metric,value from fact_timestamp_global left join tags_10_days_test on fact_timestamp_global.tags_id= tags_10_days_test.id where metric in ('Outbound Utilization (percent)','Inbound Utilization (percent)') and resource='10.212.7.98_if:<0001>' and ts>='2020-09-28 00:00:00' and ts<='2020-09-28 23:55:55'").show(false)

=================================



Definition of fact_timestamp_global is like below:

========================

spark.sql("create table Fact_timestamp_GLOBAL(ts timestamp,metric string,tags_id string,value double) partitioned by (ts2 timestamp) stored as carbondata TBLPROPERTIES ('SORT_COLUMNS'='ts,metric','SORT_SCOPE'='GLOBAL_SORT')").show()

==========================

Definition of tags_10_days_test is like below:

====================

spark.sql("create table tags_10_days_test(id string,resource string) stored as carbondata TBLPROPERTIES('SORT_COLUMNS'='id,resource')").show()

======================

 

Kindly go through above points and help us the query performence further.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)