You are viewing a plain text version of this content. The canonical link for it is here.

Posted to commits@datafu.apache.org by ey...@apache.org on 2021/10/03 19:09:50 UTC

[datafu] branch master updated: DATAFU-158 Document explodeArray function

This is an automated email from the ASF dual-hosted git repository.

eyal pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/datafu.git


The following commit(s) were added to refs/heads/master by this push:
     new af79454  DATAFU-158 Document explodeArray function
af79454 is described below

commit af79454721bab3c0b45a163ccceadc5579161d2a
Author: efrotenberg <ef...@paypal.com>
AuthorDate: Sun Oct 3 12:07:02 2021 +0300

    DATAFU-158 Document explodeArray function
    
    Signed-off-by: Eyal Allweil <ea...@paypal.com>
---
 datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala b/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala
index 7853e16..b459da4 100644
--- a/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala
+++ b/datafu-spark/src/main/scala/datafu/spark/SparkDFUtils.scala
@@ -508,7 +508,10 @@ object SparkDFUtils {
                    "range_size")
   }
 
-/** given an array column that you need to explode into different columns, use this method.
+/**
+   * Given an array column that you need to explode into different columns, use this method.
+   * This function counts the number of output columns by executing the Spark job internally on the input array column.
+   * Consider caching the input dataframe if this is an expensive operation.
    *
    * @param df
    * @param arrayCol