You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@datafu.apache.org by "Eyal Allweil (Jira)" <ji...@apache.org> on 2022/10/31 11:50:00 UTC

[jira] [Updated] (DATAFU-168) Support Spark 2.4.6 and up - fix collectLimitedList compilation

     [ https://issues.apache.org/jira/browse/DATAFU-168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]

Eyal Allweil updated DATAFU-168:
--------------------------------
    Summary: Support Spark 2.4.6 and up - fix collectLimitedList compilation  (was: Add support for Spark 2.4.6 and up)

> Support Spark 2.4.6 and up - fix collectLimitedList compilation
> ---------------------------------------------------------------
>
>                 Key: DATAFU-168
>                 URL: https://issues.apache.org/jira/browse/DATAFU-168
>             Project: DataFu
>          Issue Type: Improvement
>    Affects Versions: 1.6.1
>            Reporter: Eyal Allweil
>            Priority: Major
>             Fix For: 1.8.0
>
>
> Once DATAFU-167 is merged, datafu-spark will support Spark versions up to 2.4.5. However, because our implementation of _collectLimitedList_ extends Spark's {_}collect{_}, and because its interface was changed in 2.4.6, compilation is broken for us.
>  
> Here is the relevant line from collectLimitedList: [https://github.com/apache/datafu/blob/master/datafu-spark/src/main/scala/spark/utils/overwrites/SparkOverwriteUDAFs.scala#L104)]
> Here is the compilation warning:
> {code:java}
> /Users/eyal/git/datafu/datafu-spark/src/main/scala/spark/utils/overwrites/SparkOverwriteUDAFs.scala:104: class CollectLimitedList needs to be abstract, since:
> it has 3 unimplemented members.
> /** As seen from class CollectLimitedList, the missing signatures are as follows.
>  *  For convenience, these are usable as stub implementations.
>  */
>   // Members declared in org.apache.spark.sql.catalyst.expressions.aggregate.Collect
>   protected val bufferElementType: org.apache.spark.sql.types.DataType = ???
>   protected def convertToBufferElement(value: Any): Any = ???
>   // Members declared in org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate
>   def eval(buffer: scala.collection.mutable.ArrayBuffer[Any]): Any = ???
> case class CollectLimitedList(child: Expression,
>            ^
> one error found
> FAILURE: Build failed with an exception.
> {code}
>  
>  
> We need to either *1)* update our implementation, and drop support for older versions (and then release this in our version 1.8.0) or *2)* copy the code in a backwards compatible way.
> Please note that you can replicate this compilation error on the master branch even without merging DATAFU-167 by running:
> {code:java}
> ./gradlew :datafu-spark:test -PscalaVersion=2.11 -PsparkVersion=2.4.6 --tests "DataFrame*"{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)