You are viewing a plain text version of this content. The canonical link for it is here.

Posted to issues@spark.apache.org by "LanYang (Jira)" <ji...@apache.org> on 2021/09/14 03:18:00 UTC

[jira] [Created] (SPARK-36749) The count result of the dimension table filed changes as `exector.memory` changes.

LanYang created SPARK-36749:
-------------------------------

             Summary: The count result of the dimension table filed changes as `exector.memory` changes.
                 Key: SPARK-36749
                 URL: https://issues.apache.org/jira/browse/SPARK-36749
             Project: Spark
          Issue Type: Bug
          Components: Input/Output
    Affects Versions: 2.1.3
         Environment: hadoop version is:

2.7.5

spark version is:

2.1.3

*job default parameters:*

spark.driver.cores=1

spark.driver.memory=512m

spark.executor.instances=1

spark.executor.cores=1

spark.executor.memory=512m
            Reporter: LanYang
         Attachments: corrent_result.log, wrong_result.log

hi~, every one!

Here‘s a very strange questions!!! 

The meaning of this sql is count the number of the specified columns in each table after joining the table。as follows:

 
{quote}{{SELECT cast(COUNT(DISTINCT tps.prod_siginst_id) AS STRING) AS siginst_cnt,}}
{{ cast(COUNT(DISTINCT qpl.list_id) AS STRING) AS list_cnt,}}
{{ cast(count(DISTINCT if(tb.brand_source=1,tps.prod_siginst_id,NULL)) AS STRING) AS domestic_siginst_cnt,}}
{{ cast(count(DISTINCT if(tb.brand_source=2,tps.prod_siginst_id,NULL)) AS STRING) AS import_siginst_cnt,}}
{{ cast(count(DISTINCT if(qpl.list_name NOT LIKE'%un_normal%',tps.prod_siginst_id,NULL)) AS STRING) AS standard_cnt,}}
{{ cast(count(DISTINCT if(qpl.list_name LIKE'%un_normal%',tps.prod_siginst_id,NULL)) AS STRING) AS nostandard_cnt}}
{{FROM tableA tbi}}
{{LEFT JOIN tableB tps ON tbi.prod_inst_id=tps.prod_inst_id}}
{{LEFT JOIN tableC qpl ON tbi.prod_type_id=qpl.list_id}}
{{LEFT JOIN tableD ON tps.brand_id=tb.brand_id}}
{{WHERE tbi.prod_status=1}}
{{ AND tbi.prod_sell_status=1}}
{{ AND tb.recommend_flag=1;}}
{quote}
 

and the phenomenon of the question is if i add memory for executor, the count result of the tableC field(list_id,list_name) will changes as well. until the executor‘s memory is big enough, the result doesn't change.

 

TableC is a dimensional table and the amount of data is fixed.

 

In my opinions, this job should failed rather than output an incorrect count result if executor is insufficient memory.

Could you please help me check whether this is a bug of spark itself or something wrong with my sql writing?

 

here is log of this job.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org