You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@spark.apache.org by "t oo (JIRA)" <ji...@apache.org> on 2019/02/12 22:10:00 UTC

[jira] [Commented] (SPARK-26777) SQL worked in 2.3.2 and fails in 2.4.0

    [ https://issues.apache.org/jira/browse/SPARK-26777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16766526#comment-16766526 ] 

t oo commented on SPARK-26777:
------------------------------

[~yuri.budilov] - can you try in spark-shell instead of pyspark? Also with plain spark instead of EMR spark?

> SQL worked in 2.3.2 and fails in 2.4.0
> --------------------------------------
>
>                 Key: SPARK-26777
>                 URL: https://issues.apache.org/jira/browse/SPARK-26777
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Core
>    Affects Versions: 2.4.0
>            Reporter: Yuri Budilov
>            Priority: Major
>
> Following SQL worked in Spark 2.3.2 and now fails on 2.4.0 (AWS EMR Spark)
> ------------ PySpark call below:
> spark.sql("select partition_year_utc,partition_month_utc,partition_day_utc \
> from datalake_reporting.copy_of_leads_notification \
> where partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification) \
> and partition_month_utc = \
>  (select max(partition_month_utc) from datalake_reporting.copy_of_leads_notification as m \
>  where \
>  m.partition_year_utc = (select max(partition_year_utc) from datalake_reporting.copy_of_leads_notification)) \
>  and partition_day_utc = (select max(d.partition_day_utc) from datalake_reporting.copy_of_leads_notification as d \
>  where d.partition_month_utc = \
>  (select max(m1.partition_month_utc) from datalake_reporting.copy_of_leads_notification as m1 \
>  where m1.partition_year_utc = \
>  (select max(y.partition_year_utc) from datalake_reporting.copy_of_leads_notification as y) \
>  ) \
>  ) \
>  order by 1 desc, 2 desc, 3 desc limit 1 ").show(1,False)
> Error: (no need for data, this is syntax).
> py4j.protocol.Py4JJavaError: An error occurred while calling o1326.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: scalar-subquery#4495 []
>  
> Note: all 3 columns in query are Partitioned columns - see bottom of the schema)
>  
> Hive EMR AWS Schema is:
>  
> CREATE EXTERNAL TABLE `copy_of_leads_notification`(
> `message.environment.siteorigin` string, `dcpheader.dcploaddateutc` string, `message.id` int, `source.properties._country` string, `message.created` string, `dcpheader.generatedmessageid` string, `message.tags` bigint, `source.properties._enqueuedtimeutc` string, `source.properties._leadtype` string, `message.itemid` string, `message.prospect.postcode` string, `message.prospect.email` string, `message.referenceid` string, `message.item.year` string, `message.identifier` string, `dcpheader.dcploadmonthutc` string, `message.processed` string, `source.properties._tenant` string, `message.item.price` string, `message.subscription.confirmresponse` boolean, `message.itemtype` string, `message.prospect.lastname` string, `message.subscription.insurancequote` boolean, `source.exchangename` string, `message.prospect.identificationnumbers` bigint, `message.environment.ipaddress` string, `dcpheader.dcploaddayutc` string, `source.properties._itemtype` string, `source.properties._requesttype` string, `message.item.make` string, `message.prospect.firstname` string, `message.subscription.survey` boolean, `message.prospect.homephone` string, `message.extendedproperties` bigint, `message.subscription.financequote` boolean, `message.uniqueidentifier` string, `source.properties._id` string, `dcpheader.sourcemessageguid` string, `message.requesttype` string, `source.routingkey` string, `message.service` string, `message.item.model` string, `message.environment.pagesource` string, `source.source` string, `message.sellerid` string, `partition_date_utc` string, `message.selleridentifier` string, `message.subscription.newsletter` boolean, `dcpheader.dcploadyearutc` string, `message.leadtype` string, `message.history` bigint, `message.callconnect.calloutcome` string, `message.callconnect.datecreatedutc` string, `message.callconnect.callrecordingurl` string, `message.callconnect.transferoutcome` string, `message.callconnect.hiderecording` boolean, `message.callconnect.callstartutc` string, `message.callconnect.code` string, `message.callconnect.callduration` string, `message.fraudnetinfo` string, `message.callconnect.answernumber` string, `message.environment.sourcedevice` string, `message.comments` string, `message.fraudinfo.servervariables` bigint, `message.callconnect.servicenumber` string, `message.callconnect.callid` string, `message.callconnect.voicemailurl` string, `message.item.stocknumber` string, `message.callconnect.answerduration` string, `message.callconnect.callendutc` string, `message.item.series` string, `message.item.detailsurl` string, `message.item.pricetype` string, `message.item.description` string, `message.item.colour` string, `message.item.badge` string, `message.item.odometer` string, `message.environment.requestheader` string, `message.item.registrationnumber` string, `message.item.bodytype` string, `message.item.fueltype` string, `message.item.redbookcode` string, `message.item.spotid` string, `message.item.id` string, `message.item.transmission` string, `message.item.vin` string, `message.item.enginedescription` string, `message.prospect.mobilephone` string, `message.prospect.membertrackingid` string, `message.environment.username` string, `message.prospect.workphone` string, `message.environment.servername` string, `message.environment.sessionid` string, `message.tradein.type` string, `message.tradein.model` string, `message.tradein.year` string, `message.tradein.make` string, `message.tradein.kms` string, `message.fraudinfo.servertimestamp` string, `message.prospect.suburb` string, `message.callconnect.username` string, `message.callconnect.password` string, `message.status` string, `message.tradein.colour` string, `message.prospect.address` string, `message.prospect.state` string, `message.tradein.detailsurl` string, `message.prospect.faxnumber` string, `message.prospect.companyname` string, `message.prospect.title` string, `message.callconnect.calloutcometext` string, `message.prospect.preferredcontacttime` string, `message.fraudinfo.devicedata` string, `message.prospect.preferredcontactmethod` string, `message.assignment.assigned` string, `message.assignment.email` string, `message.assignment.name` string, `message.country` string, `message.financepackage.id` string, `message.financepackage.version` string)
> PARTITIONED BY ( `partition_year_utc` string, `partition_month_utc` string, `partition_day_utc` string, `job_run_guid` string)
> ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS
> INPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat'
> OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
> LOCATION 's3://datalake/yurib_test/leads_notification'
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscribe@spark.apache.org
For additional commands, e-mail: issues-help@spark.apache.org