You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by NewBoLing <gi...@git.apache.org> on 2017/05/20 11:13:03 UTC

[GitHub] spark pull request #18044: Branch 2.2

GitHub user NewBoLing opened a pull request:

    https://github.com/apache/spark/pull/18044

    Branch 2.2

    ## What changes were proposed in this pull request?
    
    (Please fill in changes proposed in this fix)
    
    ## How was this patch tested?
    
    (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests)
    (If this patch involves UI changes, please attach a screenshot; otherwise, remove this)
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-2.2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18044.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18044
    
----
commit ecf5605a104be67b29d29c00dc98ddab7975c9c1
Author: 郭小龙 10207633 <gu...@zte.com.cn>
Date:   2017-04-18T17:02:21Z

    [SPARK-20354][CORE][REST-API] When I request access to the 'http: //ip:port/api/v1/applications' link, return 'sparkUser' is empty in REST API.
    
    ## What changes were proposed in this pull request?
    
    When I request access to the 'http: //ip:port/api/v1/applications' link, get the json. I need the 'sparkUser' field specific value, because my Spark big data management platform needs to filter through this field which user submits the application to facilitate my administration and query, but the current return of the json string is empty, causing me this Function can not be achieved, that is, I do not know who the specific application is submitted by this REST Api.
    
    **current return json:**
    [ {
      "id" : "app-20170417152053-0000",
      "name" : "KafkaWordCount",
      "attempts" : [ {
        "startTime" : "2017-04-17T07:20:51.395GMT",
        "endTime" : "1969-12-31T23:59:59.999GMT",
        "lastUpdated" : "2017-04-17T07:20:51.395GMT",
        "duration" : 0,
        **"sparkUser" : "",**
        "completed" : false,
        "endTimeEpoch" : -1,
        "startTimeEpoch" : 1492413651395,
        "lastUpdatedEpoch" : 1492413651395
      } ]
    } ]
    
    **When I fix this question, return json:**
    [ {
      "id" : "app-20170417154201-0000",
      "name" : "KafkaWordCount",
      "attempts" : [ {
        "startTime" : "2017-04-17T07:41:57.335GMT",
        "endTime" : "1969-12-31T23:59:59.999GMT",
        "lastUpdated" : "2017-04-17T07:41:57.335GMT",
        "duration" : 0,
        **"sparkUser" : "mr",**
        "completed" : false,
        "startTimeEpoch" : 1492414917335,
        "endTimeEpoch" : -1,
        "lastUpdatedEpoch" : 1492414917335
      } ]
    } ]
    
    ## How was this patch tested?
    
    manual tests
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: 郭小龙 10207633 <gu...@zte.com.cn>
    Author: guoxiaolong <gu...@zte.com.cn>
    Author: guoxiaolongzte <gu...@zte.com.cn>
    
    Closes #17656 from guoxiaolongzte/SPARK-20354.
    
    (cherry picked from commit 1f81dda37cfc2049fabd6abd93ef3720d0aa03ea)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 7dbc0a9101954f1d514b97143b24fbe1e439181b
Author: Kyle Kelley <rg...@gmail.com>
Date:   2017-04-18T19:35:27Z

    [SPARK-20360][PYTHON] reprs for interpreters
    
    ## What changes were proposed in this pull request?
    
    Establishes a very minimal `_repr_html_` for PySpark's `SparkContext`.
    
    ## How was this patch tested?
    
    nteract:
    
    ![screen shot 2017-04-17 at 3 41 29 pm](https://cloud.githubusercontent.com/assets/836375/25107701/d57090ba-2385-11e7-8147-74bc2c50a41b.png)
    
    Jupyter:
    
    ![screen shot 2017-04-17 at 3 53 19 pm](https://cloud.githubusercontent.com/assets/836375/25107725/05bf1fe8-2386-11e7-93e1-07a20c917dde.png)
    
    Hydrogen:
    
    ![screen shot 2017-04-17 at 3 49 55 pm](https://cloud.githubusercontent.com/assets/836375/25107664/a75e1ddc-2385-11e7-8477-258661833007.png)
    
    Author: Kyle Kelley <rg...@gmail.com>
    
    Closes #17662 from rgbkrk/repr.
    
    (cherry picked from commit f654b39a63d4f9b118733733c7ed2a1b58649e3d)
    Signed-off-by: Holden Karau <ho...@us.ibm.com>

commit 6a25d391f81eed5f23e105c2db427ae8bb032752
Author: Tathagata Das <ta...@gmail.com>
Date:   2017-04-18T23:10:40Z

    [SPARK-20377][SS] Fix JavaStructuredSessionization example
    
    ## What changes were proposed in this pull request?
    
    Extra accessors in java bean class causes incorrect encoder generation, which corrupted the state when using timeouts.
    
    ## How was this patch tested?
    manually ran the example
    
    Author: Tathagata Das <ta...@gmail.com>
    
    Closes #17676 from tdas/SPARK-20377.
    
    (cherry picked from commit 74aa0df8f7f132b62754e5159262e4a5b9b641ab)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit a33d448058ae6608d1031c4c34334778b3c39675
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-04-19T02:58:05Z

    [SPARK-20254][SQL] Remove unnecessary data conversion for Dataset with primitive array
    
    ## What changes were proposed in this pull request?
    
    This PR elminates unnecessary data conversion, which is introduced by SPARK-19716, for Dataset with primitve array in the generated Java code.
    When we run the following example program, now we get the Java code "Without this PR". In this code, lines 56-82 are unnecessary since the primitive array in ArrayData can be converted into Java primitive array by using ``toDoubleArray()`` method. ``GenericArrayData`` is not required.
    
    ```java
    val ds = sparkContext.parallelize(Seq(Array(1.1, 2.2)), 1).toDS.cache
    ds.count
    ds.map(e => e).show
    ```
    
    Without this PR
    ```
    == Parsed Logical Plan ==
    'SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#25]
    +- 'MapElements <function1>, class [D, [StructField(value,ArrayType(DoubleType,false),true)], obj#24: [D
       +- 'DeserializeToObject unresolveddeserializer(unresolvedmapobjects(<function1>, getcolumnbyordinal(0, ArrayType(DoubleType,false)), None).toDoubleArray), obj#23: [D
          +- SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#2]
             +- ExternalRDD [obj#1]
    
    == Analyzed Logical Plan ==
    value: array<double>
    SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#25]
    +- MapElements <function1>, class [D, [StructField(value,ArrayType(DoubleType,false),true)], obj#24: [D
       +- DeserializeToObject mapobjects(MapObjects_loopValue5, MapObjects_loopIsNull5, DoubleType, assertnotnull(lambdavariable(MapObjects_loopValue5, MapObjects_loopIsNull5, DoubleType, true), - array element class: "scala.Double", - root class: "scala.Array"), value#2, None, MapObjects_builderValue5).toDoubleArray, obj#23: [D
          +- SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#2]
             +- ExternalRDD [obj#1]
    
    == Optimized Logical Plan ==
    SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#25]
    +- MapElements <function1>, class [D, [StructField(value,ArrayType(DoubleType,false),true)], obj#24: [D
       +- DeserializeToObject mapobjects(MapObjects_loopValue5, MapObjects_loopIsNull5, DoubleType, assertnotnull(lambdavariable(MapObjects_loopValue5, MapObjects_loopIsNull5, DoubleType, true), - array element class: "scala.Double", - root class: "scala.Array"), value#2, None, MapObjects_builderValue5).toDoubleArray, obj#23: [D
          +- InMemoryRelation [value#2], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                +- *SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#2]
                   +- Scan ExternalRDDScan[obj#1]
    
    == Physical Plan ==
    *SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#25]
    +- *MapElements <function1>, obj#24: [D
       +- *DeserializeToObject mapobjects(MapObjects_loopValue5, MapObjects_loopIsNull5, DoubleType, assertnotnull(lambdavariable(MapObjects_loopValue5, MapObjects_loopIsNull5, DoubleType, true), - array element class: "scala.Double", - root class: "scala.Array"), value#2, None, MapObjects_builderValue5).toDoubleArray, obj#23: [D
          +- InMemoryTableScan [value#2]
                +- InMemoryRelation [value#2], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
                      +- *SerializeFromObject [staticinvoke(class org.apache.spark.sql.catalyst.expressions.UnsafeArrayData, ArrayType(DoubleType,false), fromPrimitiveArray, input[0, [D, true], true) AS value#2]
                         +- Scan ExternalRDDScan[obj#1]
    ```
    
    ```java
    /* 050 */   protected void processNext() throws java.io.IOException {
    /* 051 */     while (inputadapter_input.hasNext() && !stopEarly()) {
    /* 052 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 053 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
    /* 054 */       ArrayData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getArray(0));
    /* 055 */
    /* 056 */       ArrayData deserializetoobject_value1 = null;
    /* 057 */
    /* 058 */       if (!inputadapter_isNull) {
    /* 059 */         int deserializetoobject_dataLength = inputadapter_value.numElements();
    /* 060 */
    /* 061 */         Double[] deserializetoobject_convertedArray = null;
    /* 062 */         deserializetoobject_convertedArray = new Double[deserializetoobject_dataLength];
    /* 063 */
    /* 064 */         int deserializetoobject_loopIndex = 0;
    /* 065 */         while (deserializetoobject_loopIndex < deserializetoobject_dataLength) {
    /* 066 */           MapObjects_loopValue2 = (double) (inputadapter_value.getDouble(deserializetoobject_loopIndex));
    /* 067 */           MapObjects_loopIsNull2 = inputadapter_value.isNullAt(deserializetoobject_loopIndex);
    /* 068 */
    /* 069 */           if (MapObjects_loopIsNull2) {
    /* 070 */             throw new RuntimeException(((java.lang.String) references[0]));
    /* 071 */           }
    /* 072 */           if (false) {
    /* 073 */             deserializetoobject_convertedArray[deserializetoobject_loopIndex] = null;
    /* 074 */           } else {
    /* 075 */             deserializetoobject_convertedArray[deserializetoobject_loopIndex] = MapObjects_loopValue2;
    /* 076 */           }
    /* 077 */
    /* 078 */           deserializetoobject_loopIndex += 1;
    /* 079 */         }
    /* 080 */
    /* 081 */         deserializetoobject_value1 = new org.apache.spark.sql.catalyst.util.GenericArrayData(deserializetoobject_convertedArray); /*###*/
    /* 082 */       }
    /* 083 */       boolean deserializetoobject_isNull = true;
    /* 084 */       double[] deserializetoobject_value = null;
    /* 085 */       if (!inputadapter_isNull) {
    /* 086 */         deserializetoobject_isNull = false;
    /* 087 */         if (!deserializetoobject_isNull) {
    /* 088 */           Object deserializetoobject_funcResult = null;
    /* 089 */           deserializetoobject_funcResult = deserializetoobject_value1.toDoubleArray();
    /* 090 */           if (deserializetoobject_funcResult == null) {
    /* 091 */             deserializetoobject_isNull = true;
    /* 092 */           } else {
    /* 093 */             deserializetoobject_value = (double[]) deserializetoobject_funcResult;
    /* 094 */           }
    /* 095 */
    /* 096 */         }
    /* 097 */         deserializetoobject_isNull = deserializetoobject_value == null;
    /* 098 */       }
    /* 099 */
    /* 100 */       boolean mapelements_isNull = true;
    /* 101 */       double[] mapelements_value = null;
    /* 102 */       if (!false) {
    /* 103 */         mapelements_resultIsNull = false;
    /* 104 */
    /* 105 */         if (!mapelements_resultIsNull) {
    /* 106 */           mapelements_resultIsNull = deserializetoobject_isNull;
    /* 107 */           mapelements_argValue = deserializetoobject_value;
    /* 108 */         }
    /* 109 */
    /* 110 */         mapelements_isNull = mapelements_resultIsNull;
    /* 111 */         if (!mapelements_isNull) {
    /* 112 */           Object mapelements_funcResult = null;
    /* 113 */           mapelements_funcResult = ((scala.Function1) references[1]).apply(mapelements_argValue);
    /* 114 */           if (mapelements_funcResult == null) {
    /* 115 */             mapelements_isNull = true;
    /* 116 */           } else {
    /* 117 */             mapelements_value = (double[]) mapelements_funcResult;
    /* 118 */           }
    /* 119 */
    /* 120 */         }
    /* 121 */         mapelements_isNull = mapelements_value == null;
    /* 122 */       }
    /* 123 */
    /* 124 */       serializefromobject_resultIsNull = false;
    /* 125 */
    /* 126 */       if (!serializefromobject_resultIsNull) {
    /* 127 */         serializefromobject_resultIsNull = mapelements_isNull;
    /* 128 */         serializefromobject_argValue = mapelements_value;
    /* 129 */       }
    /* 130 */
    /* 131 */       boolean serializefromobject_isNull = serializefromobject_resultIsNull;
    /* 132 */       final ArrayData serializefromobject_value = serializefromobject_resultIsNull ? null : org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(serializefromobject_argValue);
    /* 133 */       serializefromobject_isNull = serializefromobject_value == null;
    /* 134 */       serializefromobject_holder.reset();
    /* 135 */
    /* 136 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 137 */
    /* 138 */       if (serializefromobject_isNull) {
    /* 139 */         serializefromobject_rowWriter.setNullAt(0);
    /* 140 */       } else {
    /* 141 */         // Remember the current cursor so that we can calculate how many bytes are
    /* 142 */         // written later.
    /* 143 */         final int serializefromobject_tmpCursor = serializefromobject_holder.cursor;
    /* 144 */
    /* 145 */         if (serializefromobject_value instanceof UnsafeArrayData) {
    /* 146 */           final int serializefromobject_sizeInBytes = ((UnsafeArrayData) serializefromobject_value).getSizeInBytes();
    /* 147 */           // grow the global buffer before writing data.
    /* 148 */           serializefromobject_holder.grow(serializefromobject_sizeInBytes);
    /* 149 */           ((UnsafeArrayData) serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor);
    /* 150 */           serializefromobject_holder.cursor += serializefromobject_sizeInBytes;
    /* 151 */
    /* 152 */         } else {
    /* 153 */           final int serializefromobject_numElements = serializefromobject_value.numElements();
    /* 154 */           serializefromobject_arrayWriter.initialize(serializefromobject_holder, serializefromobject_numElements, 8);
    /* 155 */
    /* 156 */           for (int serializefromobject_index = 0; serializefromobject_index < serializefromobject_numElements; serializefromobject_index++) {
    /* 157 */             if (serializefromobject_value.isNullAt(serializefromobject_index)) {
    /* 158 */               serializefromobject_arrayWriter.setNullDouble(serializefromobject_index);
    /* 159 */             } else {
    /* 160 */               final double serializefromobject_element = serializefromobject_value.getDouble(serializefromobject_index);
    /* 161 */               serializefromobject_arrayWriter.write(serializefromobject_index, serializefromobject_element);
    /* 162 */             }
    /* 163 */           }
    /* 164 */         }
    /* 165 */
    /* 166 */         serializefromobject_rowWriter.setOffsetAndSize(0, serializefromobject_tmpCursor, serializefromobject_holder.cursor - serializefromobject_tmpCursor);
    /* 167 */       }
    /* 168 */       serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize());
    /* 169 */       append(serializefromobject_result);
    /* 170 */       if (shouldStop()) return;
    /* 171 */     }
    /* 172 */   }
    ```
    
    With this PR (eliminated lines 56-62 in the above code)
    ```java
    /* 047 */   protected void processNext() throws java.io.IOException {
    /* 048 */     while (inputadapter_input.hasNext() && !stopEarly()) {
    /* 049 */       InternalRow inputadapter_row = (InternalRow) inputadapter_input.next();
    /* 050 */       boolean inputadapter_isNull = inputadapter_row.isNullAt(0);
    /* 051 */       ArrayData inputadapter_value = inputadapter_isNull ? null : (inputadapter_row.getArray(0));
    /* 052 */
    /* 053 */       boolean deserializetoobject_isNull = true;
    /* 054 */       double[] deserializetoobject_value = null;
    /* 055 */       if (!inputadapter_isNull) {
    /* 056 */         deserializetoobject_isNull = false;
    /* 057 */         if (!deserializetoobject_isNull) {
    /* 058 */           Object deserializetoobject_funcResult = null;
    /* 059 */           deserializetoobject_funcResult = inputadapter_value.toDoubleArray();
    /* 060 */           if (deserializetoobject_funcResult == null) {
    /* 061 */             deserializetoobject_isNull = true;
    /* 062 */           } else {
    /* 063 */             deserializetoobject_value = (double[]) deserializetoobject_funcResult;
    /* 064 */           }
    /* 065 */
    /* 066 */         }
    /* 067 */         deserializetoobject_isNull = deserializetoobject_value == null;
    /* 068 */       }
    /* 069 */
    /* 070 */       boolean mapelements_isNull = true;
    /* 071 */       double[] mapelements_value = null;
    /* 072 */       if (!false) {
    /* 073 */         mapelements_resultIsNull = false;
    /* 074 */
    /* 075 */         if (!mapelements_resultIsNull) {
    /* 076 */           mapelements_resultIsNull = deserializetoobject_isNull;
    /* 077 */           mapelements_argValue = deserializetoobject_value;
    /* 078 */         }
    /* 079 */
    /* 080 */         mapelements_isNull = mapelements_resultIsNull;
    /* 081 */         if (!mapelements_isNull) {
    /* 082 */           Object mapelements_funcResult = null;
    /* 083 */           mapelements_funcResult = ((scala.Function1) references[0]).apply(mapelements_argValue);
    /* 084 */           if (mapelements_funcResult == null) {
    /* 085 */             mapelements_isNull = true;
    /* 086 */           } else {
    /* 087 */             mapelements_value = (double[]) mapelements_funcResult;
    /* 088 */           }
    /* 089 */
    /* 090 */         }
    /* 091 */         mapelements_isNull = mapelements_value == null;
    /* 092 */       }
    /* 093 */
    /* 094 */       serializefromobject_resultIsNull = false;
    /* 095 */
    /* 096 */       if (!serializefromobject_resultIsNull) {
    /* 097 */         serializefromobject_resultIsNull = mapelements_isNull;
    /* 098 */         serializefromobject_argValue = mapelements_value;
    /* 099 */       }
    /* 100 */
    /* 101 */       boolean serializefromobject_isNull = serializefromobject_resultIsNull;
    /* 102 */       final ArrayData serializefromobject_value = serializefromobject_resultIsNull ? null : org.apache.spark.sql.catalyst.expressions.UnsafeArrayData.fromPrimitiveArray(serializefromobject_argValue);
    /* 103 */       serializefromobject_isNull = serializefromobject_value == null;
    /* 104 */       serializefromobject_holder.reset();
    /* 105 */
    /* 106 */       serializefromobject_rowWriter.zeroOutNullBytes();
    /* 107 */
    /* 108 */       if (serializefromobject_isNull) {
    /* 109 */         serializefromobject_rowWriter.setNullAt(0);
    /* 110 */       } else {
    /* 111 */         // Remember the current cursor so that we can calculate how many bytes are
    /* 112 */         // written later.
    /* 113 */         final int serializefromobject_tmpCursor = serializefromobject_holder.cursor;
    /* 114 */
    /* 115 */         if (serializefromobject_value instanceof UnsafeArrayData) {
    /* 116 */           final int serializefromobject_sizeInBytes = ((UnsafeArrayData) serializefromobject_value).getSizeInBytes();
    /* 117 */           // grow the global buffer before writing data.
    /* 118 */           serializefromobject_holder.grow(serializefromobject_sizeInBytes);
    /* 119 */           ((UnsafeArrayData) serializefromobject_value).writeToMemory(serializefromobject_holder.buffer, serializefromobject_holder.cursor);
    /* 120 */           serializefromobject_holder.cursor += serializefromobject_sizeInBytes;
    /* 121 */
    /* 122 */         } else {
    /* 123 */           final int serializefromobject_numElements = serializefromobject_value.numElements();
    /* 124 */           serializefromobject_arrayWriter.initialize(serializefromobject_holder, serializefromobject_numElements, 8);
    /* 125 */
    /* 126 */           for (int serializefromobject_index = 0; serializefromobject_index < serializefromobject_numElements; serializefromobject_index++) {
    /* 127 */             if (serializefromobject_value.isNullAt(serializefromobject_index)) {
    /* 128 */               serializefromobject_arrayWriter.setNullDouble(serializefromobject_index);
    /* 129 */             } else {
    /* 130 */               final double serializefromobject_element = serializefromobject_value.getDouble(serializefromobject_index);
    /* 131 */               serializefromobject_arrayWriter.write(serializefromobject_index, serializefromobject_element);
    /* 132 */             }
    /* 133 */           }
    /* 134 */         }
    /* 135 */
    /* 136 */         serializefromobject_rowWriter.setOffsetAndSize(0, serializefromobject_tmpCursor, serializefromobject_holder.cursor - serializefromobject_tmpCursor);
    /* 137 */       }
    /* 138 */       serializefromobject_result.setTotalSize(serializefromobject_holder.totalSize());
    /* 139 */       append(serializefromobject_result);
    /* 140 */       if (shouldStop()) return;
    /* 141 */     }
    /* 142 */   }
    ```
    
    ## How was this patch tested?
    
    Add test suites into `DatasetPrimitiveSuite`
    
    Author: Kazuaki Ishizaki <is...@jp.ibm.com>
    
    Closes #17568 from kiszk/SPARK-20254.
    
    (cherry picked from commit e468a96c404eb54261ab219734f67dc2f5b06dc0)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit ef6923f7ea85a3163a5d11ad0f63aff7ec5100e6
Author: zero323 <ze...@users.noreply.github.com>
Date:   2017-04-19T02:59:18Z

    [SPARK-20208][R][DOCS] Document R fpGrowth support
    
    ## What changes were proposed in this pull request?
    
    Document  fpGrowth in:
    
    - vignettes
    - programming guide
    - code example
    
    ## How was this patch tested?
    
    Manual tests.
    
    Author: zero323 <ze...@users.noreply.github.com>
    
    Closes #17557 from zero323/SPARK-20208.
    
    (cherry picked from commit 702d85af2df9433254af6fa029683aa19c52a276)
    Signed-off-by: Felix Cheung <fe...@apache.org>

commit 274a3e294d4a302e6b7194ce0ee00d8de66e31ba
Author: Koert Kuipers <ko...@tresata.com>
Date:   2017-04-19T07:52:47Z

    [SPARK-20359][SQL] Avoid unnecessary execution in EliminateOuterJoin optimization that can lead to NPE
    
    Avoid necessary execution that can lead to NPE in EliminateOuterJoin and add test in DataFrameSuite to confirm NPE is no longer thrown
    
    ## What changes were proposed in this pull request?
    Change leftHasNonNullPredicate and rightHasNonNullPredicate to lazy so they are only executed when needed.
    
    ## How was this patch tested?
    
    Added test in DataFrameSuite that failed before this fix and now succeeds. Note that a test in catalyst project would be better but i am unsure how to do this.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: Koert Kuipers <ko...@tresata.com>
    
    Closes #17660 from koertkuipers/feat-catch-npe-in-eliminate-outer-join.
    
    (cherry picked from commit 608bf30f0b9759fd0b9b9f33766295550996a9eb)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit a87e21dd2a08c2e030b592322b8c7c4b5915725b
Author: Liang-Chi Hsieh <vi...@gmail.com>
Date:   2017-04-19T08:01:28Z

    [SPARK-20356][SQL] Pruned InMemoryTableScanExec should have correct output partitioning and ordering
    
    ## What changes were proposed in this pull request?
    
    The output of `InMemoryTableScanExec` can be pruned and mismatch with `InMemoryRelation` and its child plan's output. This causes wrong output partitioning and ordering.
    
    ## How was this patch tested?
    
    Jenkins tests.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: Liang-Chi Hsieh <vi...@gmail.com>
    
    Closes #17679 from viirya/SPARK-20356.
    
    (cherry picked from commit 773754b6c1516c15b64846a00e491535cbcb1007)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 8baa970bcd6ccb810f95113f7c2dd7fbc1935a0a
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-04-19T11:18:54Z

    [SPARK-20343][BUILD] Avoid Unidoc build only if Hadoop 2.6 is explicitly set in SBT build
    
    ## What changes were proposed in this pull request?
    
    This PR proposes two things as below:
    
    - Avoid Unidoc build only if Hadoop 2.6 is explicitly set in SBT build
    
      Due to a different dependency resolution in SBT & Unidoc by an unknown reason, the documentation build fails on a specific machine & environment in Jenkins but it was unable to reproduce.
    
      So, this PR just checks an environment variable `AMPLAB_JENKINS_BUILD_PROFILE` that is set in Hadoop 2.6 SBT build against branches on Jenkins, and then disables Unidoc build. **Note that PR builder will still build it with Hadoop 2.6 & SBT.**
    
      ```
      ========================================================================
      Building Unidoc API Documentation
      ========================================================================
      [info] Building Spark unidoc (w/Hive 1.2.1) using SBT with these arguments:  -Phadoop-2.6 -Pmesos -Pkinesis-asl -Pyarn -Phive-thriftserver -Phive unidoc
      Using /usr/java/jdk1.8.0_60 as default JAVA_HOME.
      ...
      ```
    
      I checked the environment variables from the logs (first bit) as below:
    
      - **spark-master-test-sbt-hadoop-2.6** (this one is being failed) - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.6/lastBuild/consoleFull
    
      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      SPARK_BRANCH=master
      AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6   <- I use this variable
      AMPLAB_JENKINS="true"
      ```
      - spark-master-test-sbt-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-sbt-hadoop-2.7/lastBuild/consoleFull
    
      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      SPARK_BRANCH=master
      AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.7
      AMPLAB_JENKINS="true"
      ```
    
      - spark-master-test-maven-hadoop-2.6 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.6/lastBuild/consoleFull
    
      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      HADOOP_PROFILE=hadoop-2.6
      HADOOP_VERSION=
      SPARK_BRANCH=master
      AMPLAB_JENKINS="true"
      ```
    
      - spark-master-test-maven-hadoop-2.7 - https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-master-test-maven-hadoop-2.7/lastBuild/consoleFull
    
      ```
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      HADOOP_PROFILE=hadoop-2.7
      HADOOP_VERSION=
      SPARK_BRANCH=master
      AMPLAB_JENKINS="true"
      ```
    
      - PR builder - https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/75843/consoleFull
    
      ```
      JENKINS_MASTER_HOSTNAME=amp-jenkins-master
      JAVA_HOME=/usr/java/jdk1.8.0_60
      JAVA_7_HOME=/usr/java/jdk1.7.0_79
      ```
    
      Assuming from other logs in branch-2.1
    
        - SBT & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-sbt-hadoop-2.6/lastBuild/consoleFull
    
          ```
          JAVA_HOME=/usr/java/jdk1.8.0_60
          JAVA_7_HOME=/usr/java/jdk1.7.0_79
          SPARK_BRANCH=branch-2.1
          AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.6
          AMPLAB_JENKINS="true"
          ```
    
        - Maven & Hadoop 2.6 against branch-2.1 https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.1-test-maven-hadoop-2.6/lastBuild/consoleFull
    
          ```
          JAVA_HOME=/usr/java/jdk1.8.0_60
          JAVA_7_HOME=/usr/java/jdk1.7.0_79
          HADOOP_PROFILE=hadoop-2.6
          HADOOP_VERSION=
          SPARK_BRANCH=branch-2.1
          AMPLAB_JENKINS="true"
          ```
    
      We have been using the same convention for those variables. These are actually being used in `run-tests.py` script - here https://github.com/apache/spark/blob/master/dev/run-tests.py#L519-L520
    
    - Revert the previous try
    
      After https://github.com/apache/spark/pull/17651, it seems the build still fails on SBT Hadoop 2.6 master.
    
      I am unable to reproduce this - https://github.com/apache/spark/pull/17477#issuecomment-294094092 and the reviewer was too. So, this got merged as it looks the only way to verify this is to merge it currently (as no one seems able to reproduce this).
    
    ## How was this patch tested?
    
    I only checked `is_hadoop_version_2_6 = os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6"` is working fine as expected as below:
    
    ```python
    >>> import collections
    >>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.6"})
    >>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
    False
    >>> os = collections.namedtuple('os', 'environ')(environ={"AMPLAB_JENKINS_BUILD_PROFILE": "hadoop2.7"})
    >>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
    True
    >>> os = collections.namedtuple('os', 'environ')(environ={})
    >>> print(not os.environ.get("AMPLAB_JENKINS_BUILD_PROFILE") == "hadoop2.6")
    True
    ```
    
    I tried many ways but I was unable to reproduce this in my local. Sean also tried the way I did but he was also unable to reproduce this.
    
    Please refer the comments in https://github.com/apache/spark/pull/17477#issuecomment-294094092
    
    Author: hyukjinkwon <gu...@gmail.com>
    
    Closes #17669 from HyukjinKwon/revert-SPARK-20343.
    
    (cherry picked from commit 35378766ad7d3c494425a8781efe9cb9349732b7)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 80a60da8f42e86ae1a045d9fd0dcec3234b6ff40
Author: cody koeninger <co...@koeninger.org>
Date:   2017-04-19T17:58:58Z

    [SPARK-20036][DOC] Note incompatible dependencies on org.apache.kafka artifacts
    
    ## What changes were proposed in this pull request?
    
    Note that you shouldn't manually add dependencies on org.apache.kafka artifacts
    
    ## How was this patch tested?
    
    Doc only change, did jekyll build and looked at the page.
    
    Author: cody koeninger <co...@koeninger.org>
    
    Closes #17675 from koeninger/SPARK-20036.
    
    (cherry picked from commit 71a8e9df12e547cb4716f954ecb762b358f862d5)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit d649787ee506a7f23a47afea6d951e299067a3dd
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-04-19T20:10:44Z

    [SPARK-20397][SPARKR][SS] Fix flaky test: test_streaming.R.Terminated by error
    
    ## What changes were proposed in this pull request?
    
    Checking a source parameter is asynchronous. When the query is created, it's not guaranteed that source has been created. This PR just increases the timeout of awaitTermination to ensure the parsing error is thrown.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #17687 from zsxwing/SPARK-20397.
    
    (cherry picked from commit 4fea7848c45d85ff3ad0863de5d1449d1fd1b4b0)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 371af9623ea9c14791f2b5d22ccf9425eaef1659
Author: ptkool <mi...@shopify.com>
Date:   2017-04-20T01:51:13Z

    [SPARK-20350] Add optimization rules to apply Complementation Laws.
    
    ## What changes were proposed in this pull request?
    
    Apply Complementation Laws during boolean expression simplification.
    
    ## How was this patch tested?
    
    Tested using unit tests, integration tests, and manual tests.
    
    Author: ptkool <mi...@shopify.com>
    Author: Michael Styles <mi...@shopify.com>
    
    Closes #17650 from ptkool/apply_complementation_laws.
    
    (cherry picked from commit 63824b2c8e010ba03013be498def236c654d4fed)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit af9f18c31b749a600016391f9aaba5c8748d252f
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-04-20T01:58:14Z

    [MINOR][SS] Fix a missing space in UnsupportedOperationChecker error message
    
    ## What changes were proposed in this pull request?
    
    Also went through the same file to ensure other string concatenation are correct.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #17691 from zsxwing/fix-error-message.
    
    (cherry picked from commit 39e303a8b6db642c26dbc26ba92e87680f50e4da)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit e6bbdb0c50657190192933f29b92278ea8f37704
Author: Eric Liang <ek...@databricks.com>
Date:   2017-04-20T02:53:40Z

    [SPARK-20398][SQL] range() operator should include cancellation reason when killed
    
    ## What changes were proposed in this pull request?
    
    https://issues.apache.org/jira/browse/SPARK-19820 adds a reason field for why tasks were killed. However, for backwards compatibility it left the old TaskKilledException constructor which defaults to "unknown reason".
    The range() operator should use the constructor that fills in the reason rather than dropping it on task kill.
    
    ## How was this patch tested?
    
    Existing tests, and I tested this manually.
    
    Author: Eric Liang <ek...@databricks.com>
    
    Closes #17692 from ericl/fix-kill-reason-in-range.
    
    (cherry picked from commit dd6d55d5de970662eccf024e5eae4e6821373d35)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 8d658b90b9f08ed4a3a899aad5d3ea77986b7302
Author: ymahajan <ym...@snappydata.io>
Date:   2017-04-20T03:08:31Z

    Fixed typos in docs
    
    ## What changes were proposed in this pull request?
    
    Typos at a couple of place in the docs.
    
    ## How was this patch tested?
    
    build including docs
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: ymahajan <ym...@snappydata.io>
    
    Closes #17690 from ymahajan/master.
    
    (cherry picked from commit bdc60569196e9ae4e9086c3e514a406a9e8b23a6)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit d01122dbc23206e203784d62312e9cac93564b45
Author: Xiao Li <ga...@gmail.com>
Date:   2017-04-20T10:13:48Z

    [SPARK-20156][SQL][FOLLOW-UP] Java String toLowerCase "Turkish locale bug" in Database and Table DDLs
    
    ### What changes were proposed in this pull request?
    Database and Table names conform the Hive standard ("[a-zA-z_0-9]+"), i.e. if this name only contains characters, numbers, and _.
    
    When calling `toLowerCase` on the names, we should add `Locale.ROOT` to the `toLowerCase`for avoiding inadvertent locale-sensitive variation in behavior (aka the "Turkish locale problem").
    
    ### How was this patch tested?
    Added a test case
    
    Author: Xiao Li <ga...@gmail.com>
    
    Closes #17655 from gatorsmile/locale.
    
    (cherry picked from commit 55bea56911a958f6d3ec3ad96fb425cc71ec03f4)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit 9fd25fbc48730c34e9dd7a43806ee7ef91a49221
Author: Reynold Xin <rx...@databricks.com>
Date:   2017-04-20T12:29:59Z

    [SPARK-20405][SQL] Dataset.withNewExecutionId should be private
    
    ## What changes were proposed in this pull request?
    Dataset.withNewExecutionId is only used in Dataset itself and should be private.
    
    ## How was this patch tested?
    N/A - this is a simple visibility change.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #17699 from rxin/SPARK-20405.
    
    (cherry picked from commit c6f62c5b8106534007df31ca8c460064b89b450b)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 9904526259caca9559d8f1e0da8ea761f5ce1fd0
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-04-20T14:59:38Z

    [SPARK-20409][SQL] fail early if aggregate function in GROUP BY
    
    ## What changes were proposed in this pull request?
    
    It's illegal to have aggregate function in GROUP BY, and we should fail at analysis phase, if this happens.
    
    ## How was this patch tested?
    
    new regression test
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #17704 from cloud-fan/minor.
    
    (cherry picked from commit b91873db0930c6fe885c27936e1243d5fabd03ed)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 32c5a105ef3036bde5222f6b81282b970554432a
Author: Bogdan Raducanu <bo...@databricks.com>
Date:   2017-04-20T16:49:39Z

    [SPARK-20407][TESTS] ParquetQuerySuite 'Enabling/disabling ignoreCorruptFiles' flaky test
    
    ## What changes were proposed in this pull request?
    
    SharedSQLContext.afterEach now calls DebugFilesystem.assertNoOpenStreams inside eventually.
    SQLTestUtils withTempDir calls waitForTasksToFinish before deleting the directory.
    
    ## How was this patch tested?
    Added new test in ParquetQuerySuite based on the flaky test
    
    Author: Bogdan Raducanu <bo...@databricks.com>
    
    Closes #17701 from bogdanrdc/SPARK-20407.
    
    (cherry picked from commit c5a31d160f47ba51bb9f8a4f3141851034640fc7)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit e929cd76720f9f448f2774c33305a91318bce033
Author: Eric Liang <ek...@databricks.com>
Date:   2017-04-20T16:55:10Z

    [SPARK-20358][CORE] Executors failing stage on interrupted exception thrown by cancelled tasks
    
    ## What changes were proposed in this pull request?
    
    This was a regression introduced by my earlier PR here: https://github.com/apache/spark/pull/17531
    
    It turns out NonFatal() does not in fact catch InterruptedException.
    
    ## How was this patch tested?
    
    Extended cancellation unit test coverage. The first test fails before this patch.
    
    cc JoshRosen mridulm
    
    Author: Eric Liang <ek...@databricks.com>
    
    Closes #17659 from ericl/spark-20358.
    
    (cherry picked from commit b2ebadfd55283348b8a8b37e28075fca0798228a)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit 01f62625c817da2c77880d662736b0081dcc7b75
Author: Herman van Hovell <hv...@databricks.com>
Date:   2017-04-20T20:37:04Z

    [SPARK-20410][SQL] Make sparkConf a def in SharedSQLContext
    
    ## What changes were proposed in this pull request?
    It is kind of annoying that `SharedSQLContext.sparkConf` is a val when overriding test cases, because you cannot call `super` on it. This PR makes it a function.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Herman van Hovell <hv...@databricks.com>
    
    Closes #17705 from hvanhovell/SPARK-20410.
    
    (cherry picked from commit 033206355339677812a250b2b64818a261871fd2)
    Signed-off-by: Herman van Hovell <hv...@databricks.com>

commit 7e9eba08acad17f338b4261eebedca7e4f6d3f2a
Author: jerryshao <ss...@hortonworks.com>
Date:   2017-04-20T23:02:09Z

    [SPARK-20172][CORE] Add file permission check when listing files in FsHistoryProvider
    
    ## What changes were proposed in this pull request?
    
    In the current Spark's HistoryServer we expected to get `AccessControlException` during listing all the files, but unfortunately it was not worked because we actually doesn't check the access permission and no other calls will throw such exception. What was worse is that this check will be deferred until reading files, which is not necessary and quite verbose, since it will be printed out the exception in every 10 seconds when checking the files.
    
    So here with this fix, we actually check the read permission during listing the files, which could avoid unnecessary file read later on and suppress the verbose log.
    
    ## How was this patch tested?
    
    Add unit test to verify.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #17495 from jerryshao/SPARK-20172.
    
    (cherry picked from commit 592f5c89349f3c5b6ec0531c6514b8f7d95ad8da)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit d17dea8f17989e5f8f7809a8564493d82290b5df
Author: Juliusz Sompolski <ju...@databricks.com>
Date:   2017-04-21T01:49:42Z

    [SPARK-20367] Properly unescape column names of partitioning columns parsed from paths.
    
    ## What changes were proposed in this pull request?
    
    When infering partitioning schema from paths, the column in parsePartitionColumn should be unescaped with unescapePathName, just like it is being done in e.g. parsePathFragmentAsSeq.
    
    ## How was this patch tested?
    
    Added a test to FileIndexSuite.
    
    Author: Juliusz Sompolski <ju...@databricks.com>
    
    Closes #17703 from juliuszsompolski/SPARK-20367.
    
    (cherry picked from commit 0368eb9d86634c83b3140ce3190cb9e0d0b7fd86)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 5ce76804cadc1d73d338aedb0347e1f18ea82f1f
Author: Herman van Hovell <hv...@databricks.com>
Date:   2017-04-21T02:06:12Z

    [SPARK-20329][SQL] Make timezone aware expression without timezone unresolved
    
    ## What changes were proposed in this pull request?
    A cast expression with a resolved time zone is not equal to a cast expression without a resolved time zone. The `ResolveAggregateFunction` assumed that these expression were the same, and would fail to resolve `HAVING` clauses which contain a `Cast` expression.
    
    This is in essence caused by the fact that a `TimeZoneAwareExpression` can be resolved without a set time zone. This PR fixes this, and makes a `TimeZoneAwareExpression` unresolved as long as it has no TimeZone set.
    
    ## How was this patch tested?
    Added a regression test to the `SQLQueryTestSuite.having` file.
    
    Author: Herman van Hovell <hv...@databricks.com>
    
    Closes #17641 from hvanhovell/SPARK-20329.
    
    (cherry picked from commit 760c8d088df1d35d7b8942177d47bc1677daf143)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit 6cd2f16b155ce42d8e379de5ce6ced7804fbde92
Author: Takeshi Yamamuro <ya...@apache.org>
Date:   2017-04-21T02:40:21Z

    [SPARK-20281][SQL] Print the identical Range parameters of SparkContext APIs and SQL in explain
    
    ## What changes were proposed in this pull request?
    This pr modified code to print the identical `Range` parameters of SparkContext APIs and SQL in `explain` output. In the current master, they internally use `defaultParallelism` for `splits` by default though, they print different strings in explain output;
    
    ```
    scala> spark.range(4).explain
    == Physical Plan ==
    *Range (0, 4, step=1, splits=Some(8))
    
    scala> sql("select * from range(4)").explain
    == Physical Plan ==
    *Range (0, 4, step=1, splits=None)
    ```
    
    ## How was this patch tested?
    Added tests in `SQLQuerySuite` and modified some results in the existing tests.
    
    Author: Takeshi Yamamuro <ya...@apache.org>
    
    Closes #17670 from maropu/SPARK-20281.
    
    (cherry picked from commit 48d760d028dd73371f99d084c4195dbc4dda5267)
    Signed-off-by: Xiao Li <ga...@gmail.com>

commit cddb4b7db81b01b4abf2ab683aba97e4eabb9769
Author: Herman van Hovell <hv...@databricks.com>
Date:   2017-04-21T07:05:03Z

    [SPARK-20420][SQL] Add events to the external catalog
    
    ## What changes were proposed in this pull request?
    It is often useful to be able to track changes to the `ExternalCatalog`. This PR makes the `ExternalCatalog` emit events when a catalog object is changed. Events are fired before and after the change.
    
    The following events are fired per object:
    
    - Database
      - CreateDatabasePreEvent: event fired before the database is created.
      - CreateDatabaseEvent: event fired after the database has been created.
      - DropDatabasePreEvent: event fired before the database is dropped.
      - DropDatabaseEvent: event fired after the database has been dropped.
    - Table
      - CreateTablePreEvent: event fired before the table is created.
      - CreateTableEvent: event fired after the table has been created.
      - RenameTablePreEvent: event fired before the table is renamed.
      - RenameTableEvent: event fired after the table has been renamed.
      - DropTablePreEvent: event fired before the table is dropped.
      - DropTableEvent: event fired after the table has been dropped.
    - Function
      - CreateFunctionPreEvent: event fired before the function is created.
      - CreateFunctionEvent: event fired after the function has been created.
      - RenameFunctionPreEvent: event fired before the function is renamed.
      - RenameFunctionEvent: event fired after the function has been renamed.
      - DropFunctionPreEvent: event fired before the function is dropped.
      - DropFunctionPreEvent: event fired after the function has been dropped.
    
    The current events currently only contain the names of the object modified. We add more events, and more details at a later point.
    
    A user can monitor changes to the external catalog by adding a listener to the Spark listener bus checking for `ExternalCatalogEvent`s using the `SparkListener.onOtherEvent` hook. A more direct approach is add listener directly to the `ExternalCatalog`.
    
    ## How was this patch tested?
    Added the `ExternalCatalogEventSuite`.
    
    Author: Herman van Hovell <hv...@databricks.com>
    
    Closes #17710 from hvanhovell/SPARK-20420.
    
    (cherry picked from commit e2b3d2367a563d4600d8d87b5317e71135c362f0)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit eb4d097c3c73d1aaf4cd9e17193a6b06ba273429
Author: Hervé <du...@users.noreply.github.com>
Date:   2017-04-21T07:52:18Z

    Small rewording about history server use case
    
    Hello
    PR #10991 removed the built-in history view from Spark Standalone, so the history server is no longer useful to Yarn or Mesos only.
    
    Author: Hervé <du...@users.noreply.github.com>
    
    Closes #17709 from dud225/patch-1.
    
    (cherry picked from commit 34767997e0c6cb28e1fac8cb650fa3511f260ca5)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit aaeca8bdd4bbbad5a14e1030e1d7ecf4836e8a5d
Author: Juliusz Sompolski <ju...@databricks.com>
Date:   2017-04-21T14:11:24Z

    [SPARK-20412] Throw ParseException from visitNonOptionalPartitionSpec instead of returning null values.
    
    ## What changes were proposed in this pull request?
    
    If a partitionSpec is supposed to not contain optional values, a ParseException should be thrown, and not nulls returned.
    The nulls can later cause NullPointerExceptions in places not expecting them.
    
    ## How was this patch tested?
    
    A query like "SHOW PARTITIONS tbl PARTITION(col1='val1', col2)" used to throw a NullPointerException.
    Now it throws a ParseException.
    
    Author: Juliusz Sompolski <ju...@databricks.com>
    
    Closes #17707 from juliuszsompolski/SPARK-20412.
    
    (cherry picked from commit c9e6035e1fb825d280eaec3bdfc1e4d362897ffd)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit adaa3f7e027338522e8a71ea40b3237d5889a30d
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2017-04-21T14:25:35Z

    [SPARK-20341][SQL] Support BigInt's value that does not fit in long value range
    
    ## What changes were proposed in this pull request?
    
    This PR avoids an exception in the case where `scala.math.BigInt` has a value that does not fit into long value range (e.g. `Long.MAX_VALUE+1`). When we run the following code by using the current Spark, the following exception is thrown.
    
    This PR keeps the value using `BigDecimal` if we detect such an overflow case by catching `ArithmeticException`.
    
    Sample program:
    ```
    case class BigIntWrapper(value:scala.math.BigInt)```
    spark.createDataset(BigIntWrapper(scala.math.BigInt("10000000000000000002"))::Nil).show
    ```
    Exception:
    ```
    Error while encoding: java.lang.ArithmeticException: BigInteger out of long range
    staticinvoke(class org.apache.spark.sql.types.Decimal$, DecimalType(38,0), apply, assertnotnull(assertnotnull(input[0, org.apache.spark.sql.BigIntWrapper, true])).value, true) AS value#0
    java.lang.RuntimeException: Error while encoding: java.lang.ArithmeticException: BigInteger out of long range
    staticinvoke(class org.apache.spark.sql.types.Decimal$, DecimalType(38,0), apply, assertnotnull(assertnotnull(input[0, org.apache.spark.sql.BigIntWrapper, true])).value, true) AS value#0
    	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:290)
    	at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454)
    	at org.apache.spark.sql.SparkSession$$anonfun$2.apply(SparkSession.scala:454)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
    	at scala.collection.immutable.List.foreach(List.scala:381)
    	at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
    	at scala.collection.immutable.List.map(List.scala:285)
    	at org.apache.spark.sql.SparkSession.createDataset(SparkSession.scala:454)
    	at org.apache.spark.sql.Agg$$anonfun$18.apply$mcV$sp(MySuite.scala:192)
    	at org.apache.spark.sql.Agg$$anonfun$18.apply(MySuite.scala:192)
    	at org.apache.spark.sql.Agg$$anonfun$18.apply(MySuite.scala:192)
    	at org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
    	at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
    	at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
    	at org.scalatest.Transformer.apply(Transformer.scala:22)
    	at org.scalatest.Transformer.apply(Transformer.scala:20)
    	at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
    	at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:68)
    	at org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
    	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
    	at org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
    	at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
    	at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
    ...
    Caused by: java.lang.ArithmeticException: BigInteger out of long range
    	at java.math.BigInteger.longValueExact(BigInteger.java:4531)
    	at org.apache.spark.sql.types.Decimal.set(Decimal.scala:140)
    	at org.apache.spark.sql.types.Decimal$.apply(Decimal.scala:434)
    	at org.apache.spark.sql.types.Decimal.apply(Decimal.scala)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder.toRow(ExpressionEncoder.scala:287)
    	... 59 more
    ```
    
    ## How was this patch tested?
    
    Add new test suite into `DecimalSuite`
    
    Author: Kazuaki Ishizaki <is...@jp.ibm.com>
    
    Closes #17684 from kiszk/SPARK-20341.
    
    (cherry picked from commit a750a595976791cb8a77063f690ea8f82ea75a8f)
    Signed-off-by: Wenchen Fan <we...@databricks.com>

commit ff1f989f29c08bb5297f3aa35f30ff06e0cb8046
Author: WeichenXu <we...@outlook.com>
Date:   2017-04-21T17:58:13Z

    [SPARK-20423][ML] fix MLOR coeffs centering when reg == 0
    
    ## What changes were proposed in this pull request?
    
    When reg == 0, MLOR has multiple solutions and we need to centralize the coeffs to get identical result.
    BUT current implementation centralize the `coefficientMatrix` by the global coeffs means.
    
    In fact the `coefficientMatrix` should be centralized on each feature index itself.
    Because, according to the MLOR probability distribution function, it can be proven easily that:
    suppose `{ w0, w1, .. w(K-1) }` make up the `coefficientMatrix`,
    then `{ w0 + c, w1 + c, ... w(K - 1) + c}` will also be the equivalent solution.
    `c` is an arbitrary vector of `numFeatures` dimension.
    reference
    https://core.ac.uk/download/pdf/6287975.pdf
    
    So that we need to centralize the `coefficientMatrix` on each feature dimension separately.
    
    **We can also confirm this through R library `glmnet`, that MLOR in `glmnet` always generate coefficients result that the sum of each dimension is all `zero`, when reg == 0.**
    
    ## How was this patch tested?
    
    Tests added.
    
    Author: WeichenXu <We...@outlook.com>
    
    Closes #17706 from WeichenXu123/mlor_center.
    
    (cherry picked from commit eb00378f0eed6afbf328ae6cd541cc202d14c1f0)
    Signed-off-by: DB Tsai <db...@dbtsai.com>

commit 6c2489c66682fdc6a886346ed980d95e6e5eefde
Author: 郭小龙 10207633 <gu...@zte.com.cn>
Date:   2017-04-21T19:08:26Z

    [SPARK-20401][DOC] In the spark official configuration document, the 'spark.driver.supervise' configuration parameter specification and default values are necessary.
    
    ## What changes were proposed in this pull request?
    Use the REST interface submits the spark job.
    e.g.
    curl -X  POST http://10.43.183.120:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data'{
        "action": "CreateSubmissionRequest",
        "appArgs": [
            "myAppArgument"
        ],
        "appResource": "/home/mr/gxl/test.jar",
        "clientSparkVersion": "2.2.0",
        "environmentVariables": {
            "SPARK_ENV_LOADED": "1"
        },
        "mainClass": "cn.zte.HdfsTest",
        "sparkProperties": {
            "spark.jars": "/home/mr/gxl/test.jar",
            **"spark.driver.supervise": "true",**
            "spark.app.name": "HdfsTest",
            "spark.eventLog.enabled": "false",
            "spark.submit.deployMode": "cluster",
            "spark.master": "spark://10.43.183.120:6066"
        }
    }'
    
    **I hope that make sure that the driver is automatically restarted if it fails with non-zero exit code.
    But I can not find the 'spark.driver.supervise' configuration parameter specification and default values from the spark official document.**
    ## How was this patch tested?
    
    manual tests
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: 郭小龙 10207633 <gu...@zte.com.cn>
    Author: guoxiaolong <gu...@zte.com.cn>
    Author: guoxiaolongzte <gu...@zte.com.cn>
    
    Closes #17696 from guoxiaolongzte/SPARK-20401.
    
    (cherry picked from commit ad290402aa1d609abf5a2883a6d87fa8bc2bd517)
    Signed-off-by: Sean Owen <so...@cloudera.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18044: Branch 2.2

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/18044
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18044: Branch 2.2

Posted by srowen <gi...@git.apache.org>.

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/18044
  
    @NewBoLing  close this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark issue #18044: Branch 2.2

Posted by jiangxb1987 <gi...@git.apache.org>.

Github user jiangxb1987 commented on the issue:

    https://github.com/apache/spark/pull/18044
  
    Please close this.@NewBoLing


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #18044: Branch 2.2

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/18044


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org