You are viewing a plain text version of this content. The canonical link for it is here.
Posted to user@spark.apache.org by Prasad Ravilla <pr...@slalom.com> on 2016/01/06 02:10:01 UTC

DataFrame withColumnRenamed throwing NullPointerException

I am joining two data frames as shown in the code below. This is throwing NullPointerException.

I have a number of different join throughout the program and the SparkContext throws this NullPointerException on a randomly on one of the joins.
The two data frames are very large data frames ( around 1TB)

I am using Spark version 1.5.2.

Thanks in advance for any insights.

Regards,
Prasad.


Below is the code.

val userAndFmSegment = userData.as("userdata").join(fmSegmentData.withColumnRenamed("USER_ID", "FM_USER_ID").as("fmsegmentdata"),

    $"userdata.PRIMARY_USER_ID" === $"fmsegmentdata.FM_USER_ID"

        && $"fmsegmentdata.END_DATE" >= date_sub($"userdata.REPORT_DATE", trailingWeeks * 7)

        && $"fmsegmentdata.START_DATE" <= date_sub($"userdata.REPORT_DATE", trailingWeeks * 7)

    , "inner").select(

    "USER_ID",

    "PRIMARY_USER_ID",

    "FM_BUYER_TYPE_CD"

)





Log


16/01/05 17:41:19 ERROR ApplicationMaster: User class threw exception: java.lang.NullPointerException

java.lang.NullPointerException

at org.apache.spark.sql.DataFrame.withColumnRenamed(DataFrame.scala:1161)

at DnaAgg$.getUserIdAndFMSegmentId$1(DnaAgg.scala:294)

at DnaAgg$.main(DnaAgg.scala:339)

at DnaAgg.main(DnaAgg.scala)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

at java.lang.reflect.Method.invoke(Method.java:606)

at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:525)