You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Prasad Ravilla <pr...@slalom.com> on 2016/01/06 02:10:01 UTC
DataFrame withColumnRenamed throwing NullPointerException
I am joining two data frames as shown in the code below. This is throwing NullPointerException.
I have a number of different join throughout the program and the SparkContext throws this NullPointerException on a randomly on one of the joins.
The two data frames are very large data frames ( around 1TB)
I am using Spark version 1.5.2.
Thanks in advance for any insights.
Regards,
Prasad.
Below is the code.
val userAndFmSegment = userData.as("userdata").join(fmSegmentData.withColumnRenamed("USER_ID", "FM_USER_ID").as("fmsegmentdata"),
$"userdata.PRIMARY_USER_ID" === $"fmsegmentdata.FM_USER_ID"
&& $"fmsegmentdata.END_DATE" >= date_sub($"userdata.REPORT_DATE", trailingWeeks * 7)
&& $"fmsegmentdata.START_DATE" <= date_sub($"userdata.REPORT_DATE", trailingWeeks * 7)
, "inner").select(
"USER_ID",
"PRIMARY_USER_ID",
"FM_BUYER_TYPE_CD"
)
Log
16/01/05 17:41:19 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException
java.lang.NullPointerException
at org.apache.spark.sql.DataFrame.withColumnRenamed(DataFrame.scala:1161)
at DnaAgg$.getUserIdAndFMSegmentId$1(DnaAgg.scala:294)
at DnaAgg$.main(DnaAgg.scala:339)
at DnaAgg.main(DnaAgg.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)