You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@carbondata.apache.org by aaron <94...@qq.com> on 2018/10/01 15:53:45 UTC

Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

I think the query plan info is not right,

1. Total blocklet from carbondata cli is 233 + 86 = 319
2. But query plan tell me that I have 560 blocklet

I hope below info could help you to locate issue.

***********************************************************************************
I use carbondata cli print the blocklet summary like below:

java -cp "/home/hadoop/carbontool/*:/opt/spark/jars/*"
org.apache.carbondata.tool.CarbonCli -cmd summary -a -p
hdfs://ec2-dca-aa-p-sdn-16.appannie.org:9000/usr/carbon/data/default/storev3/

## Summary
total: 80 blocks, 9 shards, 233 blocklets, 62,698 pages, 2,006,205,228 rows,
12.40GB
avg: 158.72MB/block, 54.50MB/blocklet, 25,077,565 rows/block, 8,610,322
rows/blocklet

java -cp "/home/hadoop/carbontool/*:/opt/spark/jars/*"
org.apache.carbondata.tool.CarbonCli -cmd summary -a -p
hdfs://ec2-dca-aa-p-sdn-16.appannie.org:9000/usr/carbon/data/default/usage_basickpi/

## Summary
total: 30 blocks, 14 shards, 86 blocklets, 3,498 pages, 111,719,467 rows,
4.24GB
avg: 144.57MB/block, 50.43MB/blocklet, 3,723,982 rows/block, 1,299,063
rows/blocklet


************************************************************************************
But at the same time, I run a sql, carbon told me below info:

|== CarbonData Profiler ==
Table Scan on storev3
 - total: 194 blocks, 560 blocklets
 - filter: (((((((((granularity <> null and date <> null) and date >=
1472688000000000 between date <= 1475280000000000) and true) and granularity
= monthly) and country_code in
(LiteralExpression(US);LiteralExpression(CN);LiteralExpression(JP);)) and
device_code in (LiteralExpression(ios-phone);)) and product_id <> null) and
country_code <> null) and device_code <> null)
 - pruned by Main DataMap
    - skipped: 192 blocks, 537 blocklets


************************************************************************************
The select sql like is

SELECT f.country_code, f.date, f.product_id, f.category_id, f.arpu FROM (
    SELECT a.country_code, a.date, a.product_id, a.category_id,
a.revenue/a.average_active_users as arpu
    FROM(
        SELECT r.device_code, r.category_id, r.country_code, r.date,
r.product_id, r.revenue, u.average_active_users
        FROM
        (
            SELECT b.device_code, b.country_code, b.product_id,  b.date,
b.category_id, sum(b.revenue) as revenue
            FROM (
                SELECT v.device_code, v.country_code, v.product_id,
v.revenue, v.date, p.category_id FROM
                (
                    SELECT device_code, country_code, product_id,
est_revenue as revenue, timeseries(date, 'month') as date
                    FROM storev3
                    WHERE market_code='apple-store' AND date BETWEEN
'2016-09-01' AND '2016-10-01' and device_code in ('ios-phone') and
country_code in ('US', 'CN', 'JP')
                ) as v
                JOIN(
                    SELECT DISTINCT product_id, category_id
                    FROM storev3
                    WHERE market_code='apple-store' AND date BETWEEN
'2016-09-01' AND '2016-10-01' and device_code in ('ios-phone') and
category_id in (100000, 100001, 100021) and country_code in ('US', 'CN',
'JP')
                ) as p
                ON p.product_id = v.product_id
            ) as b
            GROUP BY b.device_code, b.country_code, b.product_id, b.date,
b.category_id
        ) AS r
        JOIN
        (
            SELECT country_code, date, product_id, (CASE WHEN
est_average_active_users is not NULL THEN est_average_active_users ELSE 0
END) as average_active_users, device_code
            FROM usage_basickpi
            WHERE date BETWEEN '2016-09-01' AND '2016-10-01'and granularity
='monthly' and country_code in ('US', 'CN', 'JP') AND device_code in
('ios-phone')
        ) AS u
        ON r.country_code=u.country_code AND r.date=u.date AND
r.product_id=u.product_id AND r.device_code=u.device_code
    ) AS a
)AS f
ORDER BY f.arpu DESC
LIMIT 10

Thanks
Aaron




--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

Posted by xuchuanyin <xu...@hust.edu.cn>.
Hi, aaron.
For the wrong pruning information statistics in the query plan, do you
execute the queries concurrently?
I noticed that the pruning collector is single thread, if you ran queries
concurrently, the statistics for pruning will be incorrect.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

Posted by aaron <94...@qq.com>.
Data should be right.



--
Sent from: http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/

Re: [ISSUE] carbondata1.5.0 and spark 2.3.2 query plan issue

Posted by Ravindra Pesala <ra...@gmail.com>.
Hi Aaron,

CarbonData profiler is not tested feature added in old version, So it might
have broken or not adding correct information during explain command.  We
will try to correct it in the next version,  meanwhile can you please
check and make sure that the data you are getting from query is right.

Regards,
Ravindra.

On Mon, 1 Oct 2018 at 21:23, aaron <94...@qq.com> wrote:

> I think the query plan info is not right,
>
> 1. Total blocklet from carbondata cli is 233 + 86 = 319
> 2. But query plan tell me that I have 560 blocklet
>
> I hope below info could help you to locate issue.
>
>
> ***********************************************************************************
> I use carbondata cli print the blocklet summary like below:
>
> java -cp "/home/hadoop/carbontool/*:/opt/spark/jars/*"
> org.apache.carbondata.tool.CarbonCli -cmd summary -a -p
> hdfs://
> ec2-dca-aa-p-sdn-16.appannie.org:9000/usr/carbon/data/default/storev3/
>
> ## Summary
> total: 80 blocks, 9 shards, 233 blocklets, 62,698 pages, 2,006,205,228
> rows,
> 12.40GB
> avg: 158.72MB/block, 54.50MB/blocklet, 25,077,565 rows/block, 8,610,322
> rows/blocklet
>
> java -cp "/home/hadoop/carbontool/*:/opt/spark/jars/*"
> org.apache.carbondata.tool.CarbonCli -cmd summary -a -p
> hdfs://
> ec2-dca-aa-p-sdn-16.appannie.org:9000/usr/carbon/data/default/usage_basickpi/
>
> ## Summary
> total: 30 blocks, 14 shards, 86 blocklets, 3,498 pages, 111,719,467 rows,
> 4.24GB
> avg: 144.57MB/block, 50.43MB/blocklet, 3,723,982 rows/block, 1,299,063
> rows/blocklet
>
>
>
> ************************************************************************************
> But at the same time, I run a sql, carbon told me below info:
>
> |== CarbonData Profiler ==
> Table Scan on storev3
>  - total: 194 blocks, 560 blocklets
>  - filter: (((((((((granularity <> null and date <> null) and date >=
> 1472688000000000 between date <= 1475280000000000) and true) and
> granularity
> = monthly) and country_code in
> (LiteralExpression(US);LiteralExpression(CN);LiteralExpression(JP);)) and
> device_code in (LiteralExpression(ios-phone);)) and product_id <> null) and
> country_code <> null) and device_code <> null)
>  - pruned by Main DataMap
>     - skipped: 192 blocks, 537 blocklets
>
>
>
> ************************************************************************************
> The select sql like is
>
> SELECT f.country_code, f.date, f.product_id, f.category_id, f.arpu FROM (
>     SELECT a.country_code, a.date, a.product_id, a.category_id,
> a.revenue/a.average_active_users as arpu
>     FROM(
>         SELECT r.device_code, r.category_id, r.country_code, r.date,
> r.product_id, r.revenue, u.average_active_users
>         FROM
>         (
>             SELECT b.device_code, b.country_code, b.product_id,  b.date,
> b.category_id, sum(b.revenue) as revenue
>             FROM (
>                 SELECT v.device_code, v.country_code, v.product_id,
> v.revenue, v.date, p.category_id FROM
>                 (
>                     SELECT device_code, country_code, product_id,
> est_revenue as revenue, timeseries(date, 'month') as date
>                     FROM storev3
>                     WHERE market_code='apple-store' AND date BETWEEN
> '2016-09-01' AND '2016-10-01' and device_code in ('ios-phone') and
> country_code in ('US', 'CN', 'JP')
>                 ) as v
>                 JOIN(
>                     SELECT DISTINCT product_id, category_id
>                     FROM storev3
>                     WHERE market_code='apple-store' AND date BETWEEN
> '2016-09-01' AND '2016-10-01' and device_code in ('ios-phone') and
> category_id in (100000, 100001, 100021) and country_code in ('US', 'CN',
> 'JP')
>                 ) as p
>                 ON p.product_id = v.product_id
>             ) as b
>             GROUP BY b.device_code, b.country_code, b.product_id, b.date,
> b.category_id
>         ) AS r
>         JOIN
>         (
>             SELECT country_code, date, product_id, (CASE WHEN
> est_average_active_users is not NULL THEN est_average_active_users ELSE 0
> END) as average_active_users, device_code
>             FROM usage_basickpi
>             WHERE date BETWEEN '2016-09-01' AND '2016-10-01'and granularity
> ='monthly' and country_code in ('US', 'CN', 'JP') AND device_code in
> ('ios-phone')
>         ) AS u
>         ON r.country_code=u.country_code AND r.date=u.date AND
> r.product_id=u.product_id AND r.device_code=u.device_code
>     ) AS a
> )AS f
> ORDER BY f.arpu DESC
> LIMIT 10
>
> Thanks
> Aaron
>
>
>
>
> --
> Sent from:
> http://apache-carbondata-dev-mailing-list-archive.1130556.n5.nabble.com/
>


-- 
Thanks & Regards,
Ravi