You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by nirav nishith <ni...@gmail.com> on 2018/04/12 06:44:25 UTC

Fwd: pyspark:APID iS coming as null

insert_push_body_df = spark.sql('''select CASE WHEN t2.offer_id='' then
NULL else t2.offer_id end as offer_id,\
CASE WHEN t2.content_set_id='' then NULL else t2.content_set_id end as
content_set_id,\
CASE WHEN t2.post_id='' then NULL else t2.post_id end as
post_id,t2.nuid,t2.apid,\
t2.push_id,t2.event_id as id1,t2.offset,t2.event_occurre
d,min(t2.event_processed),\
t2.payload,t2.group_id,t2.trimmed,t2.event_type,from_unixtime(unix_timestamp(),'yyyy-MM-dd
hh:mm:ss') as load_date,\
t2.app_id as app_id,CAST(REGEXP_REPLACE(SUBSTR(t2.event_occurred, 1, 10),
'-', '') AS INT) AS occurred_part FROM \
(SELECT regexp_extract(t1.pl_unbase,'"offer_id":"([^"]*)"',1) as offer_id,\
regexp_extract(t1.pl_unbase,'"content_set_id":"([^"]*)"',1) as
content_set_id,\
regexp_extract(t1.pl_unbase,'"post_id":"([^"]*)"',1) as post_id,\
regexp_replace(COALESCE(get_json_object(t1.pl_unbase, '$.audience.alias'),\
get_json_object(t1.pl_unbase,'$.audience.or.alias')),'\\[|\\"|\\]','') as
nuid,\
regexp_replace(COALESCE(get_json_object(t1.pl_unbase, '$.audience.apid'),\
get_json_object(t1.pl_unbase, '$.audience.or.apid') ), '\\[|\\"|\\]','') as
apid,\
t1.push_id  as push_id,t1.event_id  as event_id,t1.offset as offset,\
t1.occurred as event_occurred,t1.processed as event_processed,t1.pl_unbase
as payload,\
t1.group_id as group_id,t1.trimmed as trimmed,\
t1.event_type as event_type,t1.app_id as app_id FROM (select
cast(unbase64(body.payload) as string) as pl_unbase,\
body.push_id as push_id,id as event_id,offset as offset,occurred as
occurred,\
processed as processed,body.group_id as group_id,body.trimmed as trimmed,\
type as event_type,app_id as app_id from dev_stg_urban_airship.push_body_stage)
t1 ) t2 GROUP BY t2.offer_id,\
t2.content_set_id,t2.post_id,t2.nuid,t2.apid,t2.push_id,t2.e
vent_id,t2.offset,t2.event_occurred,t2.payload,t2.group_id,
t2.trimmed,t2.event_type,t2.app_id''')



In the above code when running as HQL APID value is getting populated ,but
when using as sprak_sql its giving NULL.
ITS line 12
regexp_replace(COALESCE(get_json_object(t1.pl_unbase, '$.audience.apid'),\
get_json_object(t1.pl_unbase, '$.audience.or.apid') ), '\\[|\\"|\\]','') as
apid,\

HOW does get_json_object works in spark

Regards
Nirav