You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by muyannian <gi...@git.apache.org> on 2017/02/11 01:56:53 UTC

[GitHub] spark pull request #16890: when colum is use alias ,the order by result is w...

GitHub user muyannian opened a pull request:

    https://github.com/apache/spark/pull/16890

    when colum is use alias ,the order by result is wrong

    i write two sql.
    the first order by  result is wrong, but the second order by  result is right,that may be a bug?
    
    ---sql 1
    select amtlong as yasname ,usernick,count(*) as cnt,sum(amtdouble) as amt  from ydb_import_txt  group by usernick, amtlong 
    order by  amt desc,cnt,nick,amtlong limit 230
    select amtlong as yasname ,usernick,count(*) as cnt,sum(amtdouble) as amt  from ydb_import_txt  group by usernick, amtlong 
    order by  amt desc,cnt,nick,amtlong limit 230
    220@ 9189       \u595a\u9e3f\u714a  1       99.97
    221@ 7105       \u595a\u9e3f\u714a  1       99.97
    
    --sql2 
    select amtlong as yasname ,usernick,count(*) as cnt,sum(amtdouble) as amt  from ydb_import_txt  group by usernick, amtlong 
    order by  amt desc,cnt,nick,amtlong limit 230
    select amtlong as yasname ,usernick,count(*) as cnt,sum(amtdouble) as amt  from ydb_import_txt  group by usernick, amtlong 
    order by  amt desc,cnt,nick,yasname  limit 230
    220@ 7105       \u595a\u9e3f\u714a  1       99.97
    221@ 9189       \u595a\u9e3f\u714a  1       99.97


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark master

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16890.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16890
    
----
commit 923e594844a7ad406195b91877f0fb374d5a454b
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-08T02:55:01Z

    [SPARK-18941][SQL][DOC] Add a new behavior document on `CREATE/DROP TABLE` with `LOCATION`
    
    ## What changes were proposed in this pull request?
    
    This PR adds a new behavior change description on `CREATE TABLE ... LOCATION` at `sql-programming-guide.md` clearly under `Upgrading From Spark SQL 1.6 to 2.0`. This change is introduced at Apache Spark 2.0.0 as [SPARK-15276](https://issues.apache.org/jira/browse/SPARK-15276).
    
    ## How was this patch tested?
    
    ```
    SKIP_API=1 jekyll build
    ```
    
    **Newly Added Description**
    <img width="913" alt="new" src="https://cloud.githubusercontent.com/assets/9700541/21743606/7efe2b12-d4ba-11e6-8a0d-551222718ea2.png">
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16400 from dongjoon-hyun/SPARK-18941.

commit 6b6b555a1e667a9f03dfe4a21e56c513a353a58d
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-01-08T09:10:36Z

    [SPARK-18862][SPARKR][ML] Split SparkR mllib.R into multiple files
    
    ## What changes were proposed in this pull request?
    SparkR ```mllib.R``` is getting bigger as we add more ML wrappers, I'd like to split it into multiple files to make us easy to maintain:
    * mllib_classification.R
    * mllib_clustering.R
    * mllib_recommendation.R
    * mllib_regression.R
    * mllib_stat.R
    * mllib_tree.R
    * mllib_utils.R
    
    Note: Only reorg, no actual code change.
    
    ## How was this patch tested?
    Existing tests.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #16312 from yanboliang/spark-18862.

commit cd1d00adaff65e8adfebc2342dd422c53f98166b
Author: zuotingbing <zu...@zte.com.cn>
Date:   2017-01-08T09:29:01Z

    [SPARK-19026] SPARK_LOCAL_DIRS(multiple directories on different disks) cannot be deleted
    
    JIRA Issue: https://issues.apache.org/jira/browse/SPARK-19026
    
    SPARK_LOCAL_DIRS (Standalone) can  be a comma-separated list of multiple directories on different disks, e.g. SPARK_LOCAL_DIRS=/dir1,/dir2,/dir3, if there is a IOExecption when create sub directory on dir3 , the sub directory which have been created successfully on dir1 and dir2 cannot be deleted anymore when the application finishes.
    So we should catch the IOExecption at Utils.createDirectory  , otherwise the variable "appDirectories(appId)" which the function maybeCleanupApplication calls will not be set then dir1 and dir2 will not be cleaned up .
    
    Author: zuotingbing <zu...@zte.com.cn>
    
    Closes #16439 from zuotingbing/master.

commit 4351e62207957bec663108a571cff2bfaaa9e7d5
Author: Dilip Biswal <db...@us.ibm.com>
Date:   2017-01-08T22:09:07Z

    [SPARK-19093][SQL] Cached tables are not used in SubqueryExpression
    
    ## What changes were proposed in this pull request?
    Consider the plans inside subquery expressions while looking up cache manager to make
    use of cached data. Currently CacheManager.useCachedData does not consider the
    subquery expressions in the plan.
    
    SQL
    ```
    select * from rows where not exists (select * from rows)
    ```
    Before the fix
    ```
    == Optimized Logical Plan ==
    Join LeftAnti
    :- InMemoryRelation [_1#3775, _2#3776], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
    :     +- *FileScan parquet [_1#3775,_2#3776] Batched: true, Format: Parquet, Location: InMemoryFileIndex[dbfs:/tmp/rows], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_1:string,_2:string>
    +- Project [_1#3775 AS _1#3775#4001, _2#3776 AS _2#3776#4002]
       +- Relation[_1#3775,_2#3776] parquet
    ```
    
    After
    ```
    == Optimized Logical Plan ==
    Join LeftAnti
    :- InMemoryRelation [_1#256, _2#257], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
    :     +- *FileScan parquet [_1#256,_2#257] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/rows], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_1:string,_2:string>
    +- Project [_1#256 AS _1#256#298, _2#257 AS _2#257#299]
       +- InMemoryRelation [_1#256, _2#257], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas)
             +- *FileScan parquet [_1#256,_2#257] Batched: true, Format: Parquet, Location: InMemoryFileIndex[file:/tmp/rows], PartitionFilters: [], PushedFilters: [], ReadSchema: struct<_1:string,_2:string>
    ```
    
    Query2
    ```
     SELECT * FROM t1
     WHERE
     c1 IN (SELECT c1 FROM t2 WHERE c1 IN (SELECT c1 FROM t3 WHERE c1 = 1))
    ```
    Before
    ```
    == Analyzed Logical Plan ==
    c1: int
    Project [c1#3]
    +- Filter predicate-subquery#47 [(c1#3 = c1#10)]
       :  +- Project [c1#10]
       :     +- Filter predicate-subquery#46 [(c1#10 = c1#17)]
       :        :  +- Project [c1#17]
       :        :     +- Filter (c1#17 = 1)
       :        :        +- SubqueryAlias t3, `t3`
       :        :           +- Project [value#15 AS c1#17]
       :        :              +- LocalRelation [value#15]
       :        +- SubqueryAlias t2, `t2`
       :           +- Project [value#8 AS c1#10]
       :              +- LocalRelation [value#8]
       +- SubqueryAlias t1, `t1`
          +- Project [value#1 AS c1#3]
             +- LocalRelation [value#1]
    
    == Optimized Logical Plan ==
    Join LeftSemi, (c1#3 = c1#10)
    :- InMemoryRelation [c1#3], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), t1
    :     +- LocalTableScan [c1#3]
    +- Project [value#8 AS c1#10]
       +- Join LeftSemi, (value#8 = c1#17)
          :- LocalRelation [value#8]
          +- Project [value#15 AS c1#17]
             +- Filter (value#15 = 1)
                +- LocalRelation [value#15]
    
    ```
    After
    ```
    == Analyzed Logical Plan ==
    c1: int
    Project [c1#3]
    +- Filter predicate-subquery#47 [(c1#3 = c1#10)]
       :  +- Project [c1#10]
       :     +- Filter predicate-subquery#46 [(c1#10 = c1#17)]
       :        :  +- Project [c1#17]
       :        :     +- Filter (c1#17 = 1)
       :        :        +- SubqueryAlias t3, `t3`
       :        :           +- Project [value#15 AS c1#17]
       :        :              +- LocalRelation [value#15]
       :        +- SubqueryAlias t2, `t2`
       :           +- Project [value#8 AS c1#10]
       :              +- LocalRelation [value#8]
       +- SubqueryAlias t1, `t1`
          +- Project [value#1 AS c1#3]
             +- LocalRelation [value#1]
    
    == Optimized Logical Plan ==
    Join LeftSemi, (c1#3 = c1#10)
    :- InMemoryRelation [c1#3], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), t1
    :     +- LocalTableScan [c1#3]
    +- Join LeftSemi, (c1#10 = c1#17)
       :- InMemoryRelation [c1#10], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), t2
       :     +- LocalTableScan [c1#10]
       +- Filter (c1#17 = 1)
          +- InMemoryRelation [c1#17], true, 10000, StorageLevel(disk, memory, deserialized, 1 replicas), t1
                +- LocalTableScan [c1#3]
    ```
    ## How was this patch tested?
    Added new tests in CachedTableSuite.
    
    Author: Dilip Biswal <db...@us.ibm.com>
    
    Closes #16493 from dilipbiswal/SPARK-19093.

commit 1f6ded6455d07ec8828fc9662ddffe55cbba4238
Author: anabranch <wa...@gmail.com>
Date:   2017-01-09T01:53:53Z

    [SPARK-19127][DOCS] Update Rank Function Documentation
    
    ## What changes were proposed in this pull request?
    
    - [X] Fix inconsistencies in function reference for dense rank and dense
    - [X] Make all languages equivalent in their reference to `dense_rank` and `rank`.
    
    ## How was this patch tested?
    
    N/A for docs.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: anabranch <wa...@gmail.com>
    
    Closes #16505 from anabranch/SPARK-19127.

commit 19d9d4c855eab8f647a5ec66b079172de81221d0
Author: anabranch <wa...@gmail.com>
Date:   2017-01-09T04:37:46Z

    [SPARK-19126][DOCS] Update Join Documentation Across Languages
    
    ## What changes were proposed in this pull request?
    
    - [X] Make sure all join types are clearly mentioned
    - [X] Make join labeling/style consistent
    - [X] Make join label ordering docs the same
    - [X] Improve join documentation according to above for Scala
    - [X] Improve join documentation according to above for Python
    - [X] Improve join documentation according to above for R
    
    ## How was this patch tested?
    No tests b/c docs.
    
    Please review http://spark.apache.org/contributing.html before opening a pull request.
    
    Author: anabranch <wa...@gmail.com>
    
    Closes #16504 from anabranch/SPARK-19126.

commit 3ccabdfb4d760d684b1e0c0ed448a57331f209f2
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-01-09T05:15:52Z

    [SPARK-17077][SQL] Cardinality estimation for project operator
    
    ## What changes were proposed in this pull request?
    
    Support cardinality estimation for project operator.
    
    ## How was this patch tested?
    
    Add a test suite and a base class in the catalyst package.
    
    Author: Zhenhua Wang <wz...@163.com>
    
    Closes #16430 from wzhfy/projectEstimation.

commit 15c2bd01b03b1a07f10779f68118cd28f2c62c9a
Author: Zhenhua Wang <wz...@163.com>
Date:   2017-01-09T19:29:42Z

    [SPARK-19020][SQL] Cardinality estimation of aggregate operator
    
    ## What changes were proposed in this pull request?
    
    Support cardinality estimation of aggregate operator
    
    ## How was this patch tested?
    
    Add test cases
    
    Author: Zhenhua Wang <wz...@163.com>
    Author: wangzhenhua <wa...@huawei.com>
    
    Closes #16431 from wzhfy/aggEstimation.

commit faabe69cc081145f43f9c68db1a7a8c5c39684fb
Author: Burak Yavuz <br...@gmail.com>
Date:   2017-01-09T22:25:38Z

    [SPARK-18952] Regex strings not properly escaped in codegen for aggregations
    
    ## What changes were proposed in this pull request?
    
    If I use the function regexp_extract, and then in my regex string, use `\`, i.e. escape character, this fails codegen, because the `\` character is not properly escaped when codegen'd.
    
    Example stack trace:
    ```
    /* 059 */     private int maxSteps = 2;
    /* 060 */     private int numRows = 0;
    /* 061 */     private org.apache.spark.sql.types.StructType keySchema = new org.apache.spark.sql.types.StructType().add("date_format(window#325.start, yyyy-MM-dd HH:mm)", org.apache.spark.sql.types.DataTypes.StringType)
    /* 062 */     .add("regexp_extract(source#310.description, ([a-zA-Z]+)\[.*, 1)", org.apache.spark.sql.types.DataTypes.StringType);
    /* 063 */     private org.apache.spark.sql.types.StructType valueSchema = new org.apache.spark.sql.types.StructType().add("sum", org.apache.spark.sql.types.DataTypes.LongType);
    /* 064 */     private Object emptyVBase;
    
    ...
    
    org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 62, Column 58: Invalid escape sequence
    	at org.codehaus.janino.Scanner.scanLiteralCharacter(Scanner.java:918)
    	at org.codehaus.janino.Scanner.produce(Scanner.java:604)
    	at org.codehaus.janino.Parser.peekRead(Parser.java:3239)
    	at org.codehaus.janino.Parser.parseArguments(Parser.java:3055)
    	at org.codehaus.janino.Parser.parseSelector(Parser.java:2914)
    	at org.codehaus.janino.Parser.parseUnaryExpression(Parser.java:2617)
    	at org.codehaus.janino.Parser.parseMultiplicativeExpression(Parser.java:2573)
    	at org.codehaus.janino.Parser.parseAdditiveExpression(Parser.java:2552)
    ```
    
    In the codegend expression, the literal should use `\\` instead of `\`
    
    A similar problem was solved here: https://github.com/apache/spark/pull/15156.
    
    ## How was this patch tested?
    
    Regression test in `DataFrameAggregationSuite`
    
    Author: Burak Yavuz <br...@gmail.com>
    
    Closes #16361 from brkyvz/reg-break.

commit 3ef6d98a803fdff182ab4556c3273ec5fa0ff002
Author: Yanbo Liang <yb...@gmail.com>
Date:   2017-01-10T05:38:46Z

    [SPARK-17847][ML] Reduce shuffled data size of GaussianMixture & copy the implementation from mllib to ml
    
    ## What changes were proposed in this pull request?
    
    Copy `GaussianMixture` implementation from mllib to ml, then we can add new features to it.
    I left mllib `GaussianMixture` untouched, unlike some other algorithms to wrap the ml implementation. For the following reasons:
    - mllib `GaussianMixture` allows k == 1, but ml does not.
    - mllib `GaussianMixture` supports setting initial model, but ml does not support currently. (We will definitely add this feature for ml in the future)
    
    We can get around these issues to make mllib as a wrapper calling into ml, but I'd prefer to leave mllib untouched which can make ml clean.
    
    Meanwhile, There is a big performance improvement for `GaussianMixture` in this PR. Since the covariance matrix of multivariate gaussian distribution is symmetric, we can only store the upper triangular part of the matrix and it will greatly reduce the shuffled data size. In my test, this change will reduce shuffled data size by about 50% and accelerate the job execution.
    
    Before this PR:
    ![image](https://cloud.githubusercontent.com/assets/1962026/19641622/4bb017ac-9996-11e6-8ece-83db184b620a.png)
    After this PR:
    ![image](https://cloud.githubusercontent.com/assets/1962026/19641635/629c21fe-9996-11e6-91e9-83ab74ae0126.png)
    ## How was this patch tested?
    
    Existing tests and added new tests.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #15413 from yanboliang/spark-17847.

commit b0e5840d4b37d7b73e300671795185bba37effb0
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-01-10T10:18:07Z

    [SPARK-19134][EXAMPLE] Fix several sql, mllib and status api examples not working
    
    ## What changes were proposed in this pull request?
    
    **binary_classification_metrics_example.py**
    
    LibSVM datasource loads `ml.linalg.SparseVector` whereas the example requires it to be `mllib.linalg.SparseVector`.  For the equivalent Scala exmaple, `BinaryClassificationMetricsExample.scala` seems fine.
    
    ```
    ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
    ```
    
    ```
      File ".../spark/examples/src/main/python/mllib/binary_classification_metrics_example.py", line 39, in <lambda>
        .rdd.map(lambda row: LabeledPoint(row[0], row[1]))
      File ".../spark/python/pyspark/mllib/regression.py", line 54, in __init__
        self.features = _convert_to_vector(features)
      File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 80, in _convert_to_vector
        raise TypeError("Cannot convert type %s into Vector" % type(l))
    TypeError: Cannot convert type <class 'pyspark.ml.linalg.SparseVector'> into Vector
    ```
    
    **status_api_demo.py** (this one does not work on Python 3.4.6)
    
    It's `queue` in Python 3+.
    
    ```
    PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
    ```
    
    ```
    Traceback (most recent call last):
      File ".../spark/examples/src/main/python/status_api_demo.py", line 22, in <module>
        import Queue
    ImportError: No module named 'Queue'
    ```
    
    **bisecting_k_means_example.py**
    
    `BisectingKMeansModel` does not implement `save` and `load` in Python.
    
    ```bash
    ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
    ```
    
    ```
    Traceback (most recent call last):
      File ".../spark/examples/src/main/python/mllib/bisecting_k_means_example.py", line 46, in <module>
        model.save(sc, path)
    AttributeError: 'BisectingKMeansModel' object has no attribute 'save'
    ```
    
    **elementwise_product_example.py**
    
    It calls `collect` from the vector.
    
    ```bash
    ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
    ```
    
    ```
    Traceback (most recent call last):
      File ".../spark/examples/src/main/python/mllib/elementwise_product_example.py", line 48, in <module>
        for each in transformedData2.collect():
      File ".../spark/python/pyspark/mllib/linalg/__init__.py", line 478, in __getattr__
        return getattr(self.array, item)
    AttributeError: 'numpy.ndarray' object has no attribute 'collect'
    ```
    
    **These three tests look throwing an exception for a relative path set in `spark.sql.warehouse.dir`.**
    
    **hive.py**
    
    ```
    ./bin/spark-submit examples/src/main/python/sql/hive.py
    ```
    
    ```
    Traceback (most recent call last):
      File ".../spark/examples/src/main/python/sql/hive.py", line 47, in <module>
        spark.sql("CREATE TABLE IF NOT EXISTS src (key INT, value STRING) USING hive")
      File ".../spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 541, in sql
      File ".../spark/python/lib/py4j-0.10.4-src.zip/py4j/java_gateway.py", line 1133, in __call__
      File ".../spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
    pyspark.sql.utils.AnalysisException: 'org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse);'
    ```
    
    **SparkHiveExample.scala**
    
    ```
    ./bin/run-example sql.hive.SparkHiveExample
    ```
    
    ```
    Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
    	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
    	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
    	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
    ```
    
    **JavaSparkHiveExample.java**
    
    ```
    ./bin/run-example sql.hive.JavaSparkHiveExample
    ```
    
    ```
    Exception in thread "main" org.apache.hadoop.hive.ql.metadata.HiveException: Unable to alter table. java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:./spark-warehouse
    	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:498)
    	at org.apache.hadoop.hive.ql.metadata.Hive.alterTable(Hive.java:484)
    	at org.apache.hadoop.hive.ql.metadata.Hive.loadTable(Hive.java:1668)
    ```
    
    ## How was this patch tested?
    
    Manually via
    
    ```
    ./bin/spark-submit examples/src/main/python/mllib/binary_classification_metrics_example.py
    ```
    
    ```
    PYSPARK_PYTHON=python3 ./bin/spark-submit examples/src/main/python/status_api_demo.py
    ```
    
    ```
    ./bin/spark-submit examples/src/main/python/mllib/bisecting_k_means_example.py
    ```
    
    ```
    ./bin/spark-submit examples/src/main/python/mllib/elementwise_product_example.py
    ```
    
    ```
    ./bin/spark-submit examples/src/main/python/sql/hive.py
    ```
    
    ```
    ./bin/run-example sql.hive.JavaSparkHiveExample
    ```
    
    ```
    ./bin/run-example sql.hive.SparkHiveExample
    ```
    
    These were found via
    
    ```bash
    find ./examples/src/main/python -name "*.py" -exec spark-submit {} \;
    ```
    
    Author: hyukjinkwon <gu...@gmail.com>
    
    Closes #16515 from HyukjinKwon/minor-example-fix.

commit b0319c2ecb51bb97c3228afa4a384572b9ffbce6
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-10T11:26:51Z

    [SPARK-19107][SQL] support creating hive table with DataFrameWriter and Catalog
    
    ## What changes were proposed in this pull request?
    
    After unifying the CREATE TABLE syntax in https://github.com/apache/spark/pull/16296, it's pretty easy to support creating hive table with `DataFrameWriter` and `Catalog` now.
    
    This PR basically just removes the hive provider check in `DataFrameWriter.saveAsTable` and `Catalog.createExternalTable`, and add tests.
    
    ## How was this patch tested?
    
    new tests in `HiveDDLSuite`
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #16487 from cloud-fan/hive-table.

commit acfc5f354332107cc744fb636e3730f6fc48b2fe
Author: Liwei Lin <lw...@gmail.com>
Date:   2017-01-10T11:35:46Z

    [SPARK-16845][SQL] `GeneratedClass$SpecificOrdering` grows beyond 64 KB
    
    ## What changes were proposed in this pull request?
    
    Prior to this patch, we'll generate `compare(...)` for `GeneratedClass$SpecificOrdering` like below, leading to Janino exceptions saying the code grows beyond 64 KB.
    
    ``` scala
    /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
    /* ..... */   ...
    /* 10969 */   private int compare(InternalRow a, InternalRow b) {
    /* 10970 */     InternalRow i = null;  // Holds current row being evaluated.
    /* 10971 */
    /* 1.... */     code for comparing field0
    /* 1.... */     code for comparing field1
    /* 1.... */     ...
    /* 1.... */     code for comparing field449
    /* 15012 */
    /* 15013 */     return 0;
    /* 15014 */   }
    /* 15015 */ }
    ```
    
    This patch would break `compare(...)` into smaller `compare_xxx(...)` methods when necessary; then we'll get generated `compare(...)` like:
    
    ``` scala
    /* 001 */ public SpecificOrdering generate(Object[] references) {
    /* 002 */   return new SpecificOrdering(references);
    /* 003 */ }
    /* 004 */
    /* 005 */ class SpecificOrdering extends o.a.s.sql.catalyst.expressions.codegen.BaseOrdering {
    /* 006 */
    /* 007 */     ...
    /* 1.... */
    /* 11290 */   private int compare_0(InternalRow a, InternalRow b) {
    /* 11291 */     InternalRow i = null;  // Holds current row being evaluated.
    /* 11292 */
    /* 11293 */     i = a;
    /* 11294 */     boolean isNullA;
    /* 11295 */     UTF8String primitiveA;
    /* 11296 */     {
    /* 11297 */
    /* 11298 */       Object obj = ((Expression) references[0]).eval(null);
    /* 11299 */       UTF8String value = (UTF8String) obj;
    /* 11300 */       isNullA = false;
    /* 11301 */       primitiveA = value;
    /* 11302 */     }
    /* 11303 */     i = b;
    /* 11304 */     boolean isNullB;
    /* 11305 */     UTF8String primitiveB;
    /* 11306 */     {
    /* 11307 */
    /* 11308 */       Object obj = ((Expression) references[0]).eval(null);
    /* 11309 */       UTF8String value = (UTF8String) obj;
    /* 11310 */       isNullB = false;
    /* 11311 */       primitiveB = value;
    /* 11312 */     }
    /* 11313 */     if (isNullA && isNullB) {
    /* 11314 */       // Nothing
    /* 11315 */     } else if (isNullA) {
    /* 11316 */       return -1;
    /* 11317 */     } else if (isNullB) {
    /* 11318 */       return 1;
    /* 11319 */     } else {
    /* 11320 */       int comp = primitiveA.compare(primitiveB);
    /* 11321 */       if (comp != 0) {
    /* 11322 */         return comp;
    /* 11323 */       }
    /* 11324 */     }
    /* 11325 */
    /* 11326 */
    /* 11327 */     i = a;
    /* 11328 */     boolean isNullA1;
    /* 11329 */     UTF8String primitiveA1;
    /* 11330 */     {
    /* 11331 */
    /* 11332 */       Object obj1 = ((Expression) references[1]).eval(null);
    /* 11333 */       UTF8String value1 = (UTF8String) obj1;
    /* 11334 */       isNullA1 = false;
    /* 11335 */       primitiveA1 = value1;
    /* 11336 */     }
    /* 11337 */     i = b;
    /* 11338 */     boolean isNullB1;
    /* 11339 */     UTF8String primitiveB1;
    /* 11340 */     {
    /* 11341 */
    /* 11342 */       Object obj1 = ((Expression) references[1]).eval(null);
    /* 11343 */       UTF8String value1 = (UTF8String) obj1;
    /* 11344 */       isNullB1 = false;
    /* 11345 */       primitiveB1 = value1;
    /* 11346 */     }
    /* 11347 */     if (isNullA1 && isNullB1) {
    /* 11348 */       // Nothing
    /* 11349 */     } else if (isNullA1) {
    /* 11350 */       return -1;
    /* 11351 */     } else if (isNullB1) {
    /* 11352 */       return 1;
    /* 11353 */     } else {
    /* 11354 */       int comp = primitiveA1.compare(primitiveB1);
    /* 11355 */       if (comp != 0) {
    /* 11356 */         return comp;
    /* 11357 */       }
    /* 11358 */     }
    /* 1.... */
    /* 1.... */   ...
    /* 1.... */
    /* 12652 */     return 0;
    /* 12653 */   }
    /* 1.... */
    /* 1.... */   ...
    /* 15387 */
    /* 15388 */   public int compare(InternalRow a, InternalRow b) {
    /* 15389 */
    /* 15390 */     int comp_0 = compare_0(a, b);
    /* 15391 */     if (comp_0 != 0) {
    /* 15392 */       return comp_0;
    /* 15393 */     }
    /* 15394 */
    /* 15395 */     int comp_1 = compare_1(a, b);
    /* 15396 */     if (comp_1 != 0) {
    /* 15397 */       return comp_1;
    /* 15398 */     }
    /* 1.... */
    /* 1.... */     ...
    /* 1.... */
    /* 15450 */     return 0;
    /* 15451 */   }
    /* 15452 */ }
    ```
    ## How was this patch tested?
    - a new added test case which
      - would fail prior to this patch
      - would pass with this patch
    - ordering correctness should already be covered by existing tests like those in `OrderingSuite`
    
    ## Acknowledgement
    
    A major part of this PR - the refactoring work of `splitExpression()` - has been done by ueshin.
    
    Author: Liwei Lin <lw...@gmail.com>
    Author: Takuya UESHIN <ue...@happy-camper.st>
    Author: Takuya Ueshin <ue...@happy-camper.st>
    
    Closes #15480 from lw-lin/spec-ordering-64k-.

commit 32286ba68af03af6b9ff50d5dece050e5417307a
Author: Peng, Meng <pe...@intel.com>
Date:   2017-01-10T13:09:58Z

    [SPARK-17645][MLLIB][ML][FOLLOW-UP] document minor change
    
    ## What changes were proposed in this pull request?
    Add FDR test case in ml/feature/ChiSqSelectorSuite.
    Improve some comments in the code.
    This is a follow-up pr for #15212.
    
    ## How was this patch tested?
    ut
    
    Author: Peng, Meng <pe...@intel.com>
    
    Closes #16434 from mpjlu/fdr_fwe_update.

commit 4e27578faa67c7a71a9b938aafbaf79bdbf36831
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-01-10T13:19:21Z

    [SPARK-18922][SQL][CORE][STREAMING][TESTS] Fix all identified tests failed due to path and resource-not-closed problems on Windows
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to fix all the test failures identified by testing with AppVeyor.
    
    **Scala - aborted tests**
    
    ```
    WindowQuerySuite:
      Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.execution.WindowQuerySuite *** ABORTED *** (156 milliseconds)
       org.apache.spark.sql.AnalysisException: LOAD DATA input path does not exist: C:projectssparksqlhive   argetscala-2.11   est-classesdatafilespart_tiny.txt;
    
    OrcSourceSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.orc.OrcSourceSuite *** ABORTED *** (62 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
    ParquetMetastoreSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.ParquetMetastoreSuite *** ABORTED *** (4 seconds, 703 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
    ParquetSourceSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.sql.hive.ParquetSourceSuite *** ABORTED *** (3 seconds, 907 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark  arget mpspark-581a6575-454f-4f21-a516-a07f95266143;
    
    KafkaRDDSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaRDDSuite *** ABORTED *** (5 seconds, 212 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-4722304d-213e-4296-b556-951df1a46807
    
    DirectKafkaStreamSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.DirectKafkaStreamSuite *** ABORTED *** (7 seconds, 127 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d0d3eba7-4215-4e10-b40e-bb797e89338e
       at org.apache.spark.util.Utils$.deleteRecursively(Utils.scala:1010)
    
    ReliableKafkaStreamSuite
     Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.ReliableKafkaStreamSuite *** ABORTED *** (5 seconds, 498 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-d33e45a0-287e-4bed-acae-ca809a89d888
    
    KafkaStreamSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaStreamSuite *** ABORTED *** (2 seconds, 892 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-59c9d169-5a56-4519-9ef0-cefdbd3f2e6c
    
    KafkaClusterSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka.KafkaClusterSuite *** ABORTED *** (1 second, 690 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-3ef402b0-8689-4a60-85ae-e41e274f179d
    
    DirectKafkaStreamSuite:
     Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka010.DirectKafkaStreamSuite *** ABORTED *** (59 seconds, 626 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-426107da-68cf-4d94-b0d6-1f428f1c53f6
    
    KafkaRDDSuite:
    Exception encountered when attempting to run a suite with class name: org.apache.spark.streaming.kafka010.KafkaRDDSuite *** ABORTED *** (2 minutes, 6 seconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-b9ce7929-5dae-46ab-a0c4-9ef6f58fbc2
    ```
    
    **Java - failed tests**
    
    ```
    Test org.apache.spark.streaming.kafka.JavaKafkaRDDSuite.testKafkaRDD failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-1cee32f4-4390-4321-82c9-e8616b3f0fb0, took 9.61 sec
    
    Test org.apache.spark.streaming.kafka.JavaKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-f42695dd-242e-4b07-847c-f299b8e4676e, took 11.797 sec
    
    Test org.apache.spark.streaming.kafka.JavaDirectKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-85c0d062-78cf-459c-a2dd-7973572101ce, took 1.581 sec
    
    Test org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite.testKafkaRDD failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-49eb6b5c-8366-47a6-83f2-80c443c48280, took 17.895 sec
    
    org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite.testKafkaStream failed: java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-898cf826-d636-4b1c-a61a-c12a364c02e7, took 8.858 sec
    ```
    
    **Scala - failed tests**
    
    ```
    PartitionProviderCompatibilitySuite:
     - insert overwrite partition of new datasource table overwrites just partition *** FAILED *** (828 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-bb6337b9-4f99-45ab-ad2c-a787ab965c09
    
     - SPARK-18635 special chars in partition values - partition management true *** FAILED *** (5 seconds, 360 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - SPARK-18635 special chars in partition values - partition management false *** FAILED *** (141 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    ```
    
    ```
    UtilsSuite:
     - reading offset bytes of a file (compressed) *** FAILED *** (0 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-ecb2b7d5-db8b-43a7-b268-1bf242b5a491
    
     - reading offset bytes across multiple files (compressed) *** FAILED *** (0 milliseconds)
       java.io.IOException: Failed to delete: C:\projects\spark\target\tmp\spark-25cc47a8-1faa-4da5-8862-cf174df63ce0
    ```
    
    ```
    StatisticsSuite:
     - MetastoreRelations fallback to HDFS for size estimation *** FAILED *** (110 milliseconds)
       org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view 'csv_table' not found in database 'default';
    ```
    
    ```
    SQLQuerySuite:
     - permanent UDTF *** FAILED *** (125 milliseconds)
       org.apache.spark.sql.AnalysisException: Undefined function: 'udtf_count_temp'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 24
    
     - describe functions - user defined functions *** FAILED *** (125 milliseconds)
       org.apache.spark.sql.AnalysisException: Undefined function: 'udtf_count'. This function is neither a registered temporary function nor a permanent function registered in the database 'default'.; line 1 pos 7
    
     - CTAS without serde with location *** FAILED *** (16 milliseconds)
       java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-ed673d73-edfc-404e-829e-2e2b9725d94e/c1
    
     - derived from Hive query file: drop_database_removes_partition_dirs.q *** FAILED *** (47 milliseconds)
       java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-d2ddf08e-699e-45be-9ebd-3dfe619680fe/drop_database_removes_partition_dirs_table
    
     - derived from Hive query file: drop_table_removes_partition_dirs.q *** FAILED *** (0 milliseconds)
       java.lang.IllegalArgumentException: java.net.URISyntaxException: Relative path in absolute URI: file:C:projectsspark%09arget%09mpspark-d2ddf08e-699e-45be-9ebd-3dfe619680fe/drop_table_removes_partition_dirs_table2
    
     - SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH *** FAILED *** (109 milliseconds)
       java.nio.file.InvalidPathException: Illegal char <:> at index 2: /C:/projects/spark/sql/hive/projectsspark	arget	mpspark-1a122f8c-dfb3-46c4-bab1-f30764baee0e/*part-r*
    ```
    
    ```
    HiveDDLSuite:
     - drop external tables in default database *** FAILED *** (16 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - add/drop partitions - external table *** FAILED *** (16 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - create/drop database - location without pre-created directory *** FAILED *** (16 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - create/drop database - location with pre-created directory *** FAILED *** (32 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - drop database containing tables - CASCADE *** FAILED *** (94 milliseconds)
       CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
    
     - drop an empty database - CASCADE *** FAILED *** (63 milliseconds)
       CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
    
     - drop database containing tables - RESTRICT *** FAILED *** (47 milliseconds)
       CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
    
     - drop an empty database - RESTRICT *** FAILED *** (47 milliseconds)
       CatalogDatabase(db1,,file:/C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be/db1.db,Map()) did not equal CatalogDatabase(db1,,file:C:/projects/spark/target/tmp/warehouse-d0665ee0-1e39-4805-b471-0b764f7838be\db1.db,Map()) (HiveDDLSuite.scala:675)
    
     - CREATE TABLE LIKE an external data source table *** FAILED *** (140 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-c5eba16d-07ae-4186-95bb-21c5811cf888;
    
     - CREATE TABLE LIKE an external Hive serde table *** FAILED *** (16 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - desc table for data source table - no user-defined schema *** FAILED *** (125 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-e8bf5bf5-721a-4cbe-9d6	at scala.collection.immutable.List.foreach(List.scala:381)d-5543a8301c1d;
    ```
    
    ```
    MetastoreDataSourcesSuite
     - CTAS: persisted bucketed data source table *** FAILED *** (16 milliseconds)
       java.lang.IllegalArgumentException: Can not create a Path from an empty string
    ```
    
    ```
    ShowCreateTableSuite:
     - simple external hive table *** FAILED *** (0 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    ```
    
    ```
    PartitionedTablePerfStatsSuite:
     - hive table: partitioned pruned table reports only selected files *** FAILED *** (313 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: partitioned pruned table reports only selected files *** FAILED *** (219 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-311f45f8-d064-4023-a4bb-e28235bff64d;
    
     - hive table: lazy partition pruning reads only necessary partition data *** FAILED *** (203 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: lazy partition pruning reads only necessary partition data *** FAILED *** (187 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-fde874ca-66bd-4d0b-a40f-a043b65bf957;
    
     - hive table: lazy partition pruning with file status caching enabled *** FAILED *** (188 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: lazy partition pruning with file status caching enabled *** FAILED *** (187 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-e6d20183-dd68-4145-acbe-4a509849accd;
    
     - hive table: file status caching respects refresh table and refreshByPath *** FAILED *** (172 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: file status caching respects refresh table and refreshByPath *** FAILED *** (203 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-8b2c9651-2adf-4d58-874f-659007e21463;
    
     - hive table: file status cache respects size limit *** FAILED *** (219 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: file status cache respects size limit *** FAILED *** (171 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-7835ab57-cb48-4d2c-bb1d-b46d5a4c47e4;
    
     - datasource table: table setup does not scan filesystem *** FAILED *** (266 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-20598d76-c004-42a7-8061-6c56f0eda5e2;
    
     - hive table: table setup does not scan filesystem *** FAILED *** (266 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - hive table: num hive client calls does not scale with partition count *** FAILED *** (2 seconds, 281 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: num hive client calls does not scale with partition count *** FAILED *** (2 seconds, 422 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4cfed321-4d1d-4b48-8d34-5c169afff383;
    
     - hive table: files read and cached when filesource partition management is off *** FAILED *** (234 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    
     - datasource table: all partition data cached in memory when partition management is off *** FAILED *** (203 milliseconds)
       org.apache.spark.sql.AnalysisException: Path does not exist: file:/C:projectsspark	arget	mpspark-4bcc0398-15c9-4f6a-811e-12d40f3eec12;
    
     - SPARK-18700: table loaded only once even when resolved concurrently *** FAILED *** (1 second, 266 milliseconds)
       org.apache.spark.sql.AnalysisException: org.apache.hadoop.hive.ql.metadata.HiveException: MetaException(message:java.lang.IllegalArgumentException: Can not create a Path from an empty string);
    ```
    
    ```
    HiveSparkSubmitSuite:
     - temporary Hive UDF: define a UDF and use it *** FAILED *** (2 seconds, 94 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - permanent Hive UDF: define a UDF and use it *** FAILED *** (281 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - permanent Hive UDF: use a already defined permanent function *** FAILED *** (718 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-8368: includes jars passed in through --jars *** FAILED *** (3 seconds, 521 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-8020: set sql conf in spark conf *** FAILED *** (0 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-8489: MissingRequirementError during reflection *** FAILED *** (94 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-9757 Persist Parquet relation with decimal column *** FAILED *** (16 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-11009 fix wrong result of Window function in cluster mode *** FAILED *** (16 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-14244 fix window partition size attribute binding failure *** FAILED *** (78 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - set spark.sql.warehouse.dir *** FAILED *** (16 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - set hive.metastore.warehouse.dir *** FAILED *** (15 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-16901: set javax.jdo.option.ConnectionURL *** FAILED *** (16 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-18360: default table path of tables in default database should depend on the location of default database *** FAILED *** (15 milliseconds)
       java.io.IOException: Cannot run program "./bin/spark-submit" (in directory "C:\projects\spark"): CreateProcess error=2, The system cannot find the file specified
    ```
    
    ```
    UtilsSuite:
     - resolveURIs with multiple paths *** FAILED *** (0 milliseconds)
       ".../jar3,file:/C:/pi.py[%23]py.pi,file:/C:/path%..." did not equal ".../jar3,file:/C:/pi.py[#]py.pi,file:/C:/path%..." (UtilsSuite.scala:468)
    ```
    
    ```
    CheckpointSuite:
     - recovery with file input stream *** FAILED *** (10 seconds, 205 milliseconds)
       The code passed to eventually never returned normally. Attempted 660 times over 10.014272499999999 seconds. Last failure message: Unexpected internal error near index 1
       \
        ^. (CheckpointSuite.scala:680)
    ```
    
    ## How was this patch tested?
    
    Manually via AppVeyor as below:
    
    **Scala - aborted tests**
    
    ```
    WindowQuerySuite - all passed
    OrcSourceSuite:
    - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (4 seconds, 417 milliseconds)
      org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
    ParquetMetastoreSuite - all passed
    ParquetSourceSuite - all passed
    KafkaRDDSuite - all passed
    DirectKafkaStreamSuite - all passed
    ReliableKafkaStreamSuite - all passed
    KafkaStreamSuite - all passed
    KafkaClusterSuite - all passed
    DirectKafkaStreamSuite - all passed
    KafkaRDDSuite - all passed
    ```
    
    **Java - failed tests**
    
    ```
    org.apache.spark.streaming.kafka.JavaKafkaRDDSuite - all passed
    org.apache.spark.streaming.kafka.JavaDirectKafkaStreamSuite - all passed
    org.apache.spark.streaming.kafka.JavaKafkaStreamSuite - all passed
    org.apache.spark.streaming.kafka010.JavaDirectKafkaStreamSuite - all passed
    org.apache.spark.streaming.kafka010.JavaKafkaRDDSuite - all passed
    ```
    
    **Scala - failed tests**
    
    ```
    PartitionProviderCompatibilitySuite:
    - insert overwrite partition of new datasource table overwrites just partition (1 second, 953 milliseconds)
    - SPARK-18635 special chars in partition values - partition management true (6 seconds, 31 milliseconds)
    - SPARK-18635 special chars in partition values - partition management false (4 seconds, 578 milliseconds)
    ```
    
    ```
    UtilsSuite:
    - reading offset bytes of a file (compressed) (203 milliseconds)
    - reading offset bytes across multiple files (compressed) (0 milliseconds)
    ```
    
    ```
    StatisticsSuite:
    - MetastoreRelations fallback to HDFS for size estimation (94 milliseconds)
    ```
    
    ```
    SQLQuerySuite:
     - permanent UDTF (407 milliseconds)
     - describe functions - user defined functions (441 milliseconds)
     - CTAS without serde with location (2 seconds, 831 milliseconds)
     - derived from Hive query file: drop_database_removes_partition_dirs.q (734 milliseconds)
     - derived from Hive query file: drop_table_removes_partition_dirs.q (563 milliseconds)
     - SPARK-17796 Support wildcard character in filename for LOAD DATA LOCAL INPATH (453 milliseconds)
    ```
    
    ```
    HiveDDLSuite:
     - drop external tables in default database (3 seconds, 5 milliseconds)
     - add/drop partitions - external table (2 seconds, 750 milliseconds)
     - create/drop database - location without pre-created directory (500 milliseconds)
     - create/drop database - location with pre-created directory (407 milliseconds)
     - drop database containing tables - CASCADE (453 milliseconds)
     - drop an empty database - CASCADE (375 milliseconds)
     - drop database containing tables - RESTRICT (328 milliseconds)
     - drop an empty database - RESTRICT (391 milliseconds)
     - CREATE TABLE LIKE an external data source table (953 milliseconds)
     - CREATE TABLE LIKE an external Hive serde table (3 seconds, 782 milliseconds)
     - desc table for data source table - no user-defined schema (1 second, 150 milliseconds)
    ```
    
    ```
    MetastoreDataSourcesSuite
     - CTAS: persisted bucketed data source table (875 milliseconds)
    ```
    
    ```
    ShowCreateTableSuite:
     - simple external hive table (78 milliseconds)
    ```
    
    ```
    PartitionedTablePerfStatsSuite:
     - hive table: partitioned pruned table reports only selected files (1 second, 109 milliseconds)
    - datasource table: partitioned pruned table reports only selected files (860 milliseconds)
     - hive table: lazy partition pruning reads only necessary partition data (859 milliseconds)
     - datasource table: lazy partition pruning reads only necessary partition data (1 second, 219 milliseconds)
     - hive table: lazy partition pruning with file status caching enabled (875 milliseconds)
     - datasource table: lazy partition pruning with file status caching enabled (890 milliseconds)
     - hive table: file status caching respects refresh table and refreshByPath (922 milliseconds)
     - datasource table: file status caching respects refresh table and refreshByPath (640 milliseconds)
     - hive table: file status cache respects size limit (469 milliseconds)
     - datasource table: file status cache respects size limit (453 milliseconds)
     - datasource table: table setup does not scan filesystem (328 milliseconds)
     - hive table: table setup does not scan filesystem (313 milliseconds)
     - hive table: num hive client calls does not scale with partition count (5 seconds, 431 milliseconds)
     - datasource table: num hive client calls does not scale with partition count (4 seconds, 79 milliseconds)
     - hive table: files read and cached when filesource partition management is off (656 milliseconds)
     - datasource table: all partition data cached in memory when partition management is off (484 milliseconds)
     - SPARK-18700: table loaded only once even when resolved concurrently (2 seconds, 578 milliseconds)
    ```
    
    ```
    HiveSparkSubmitSuite:
     - temporary Hive UDF: define a UDF and use it (1 second, 745 milliseconds)
     - permanent Hive UDF: define a UDF and use it (406 milliseconds)
     - permanent Hive UDF: use a already defined permanent function (375 milliseconds)
     - SPARK-8368: includes jars passed in through --jars (391 milliseconds)
     - SPARK-8020: set sql conf in spark conf (156 milliseconds)
     - SPARK-8489: MissingRequirementError during reflection (187 milliseconds)
     - SPARK-9757 Persist Parquet relation with decimal column (157 milliseconds)
     - SPARK-11009 fix wrong result of Window function in cluster mode (156 milliseconds)
     - SPARK-14244 fix window partition size attribute binding failure (156 milliseconds)
     - set spark.sql.warehouse.dir (172 milliseconds)
     - set hive.metastore.warehouse.dir (156 milliseconds)
     - SPARK-16901: set javax.jdo.option.ConnectionURL (157 milliseconds)
     - SPARK-18360: default table path of tables in default database should depend on the location of default database (172 milliseconds)
    ```
    
    ```
    UtilsSuite:
     - resolveURIs with multiple paths (0 milliseconds)
    ```
    
    ```
    CheckpointSuite:
     - recovery with file input stream (4 seconds, 452 milliseconds)
    ```
    
    Note: after resolving the aborted tests, there is a test failure identified as below:
    
    ```
    OrcSourceSuite:
    - SPARK-18220: read Hive orc table with varchar column *** FAILED *** (4 seconds, 417 milliseconds)
      org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code -101 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. org.apache.hadoop.io.nativeio.NativeIO$Windows.access0(Ljava/lang/String;I)Z
      at org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$runHive$1.apply(HiveClientImpl.scala:625)
    ```
    
    This does not look due to this problem so this PR does not fix it here.
    
    Author: hyukjinkwon <gu...@gmail.com>
    
    Closes #16451 from HyukjinKwon/all-path-resource-fixes.

commit 2cfd41ac02193aaf121afcddcb6383f4d075ea1e
Author: hyukjinkwon <gu...@gmail.com>
Date:   2017-01-10T13:22:35Z

    [SPARK-19117][TESTS] Skip the tests using script transformation on Windows
    
    ## What changes were proposed in this pull request?
    
    This PR proposes to skip the tests for script transformation failed on Windows due to fixed bash location.
    
    ```
    SQLQuerySuite:
     - script *** FAILED *** (553 milliseconds)
       org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 56.0 failed 1 times, most recent failure: Lost task 0.0 in stage 56.0 (TID 54, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - Star Expansion - script transform *** FAILED *** (2 seconds, 375 milliseconds)
       org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 389.0 failed 1 times, most recent failure: Lost task 0.0 in stage 389.0 (TID 725, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - test script transform for stdout *** FAILED *** (2 seconds, 813 milliseconds)
       org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 391.0 failed 1 times, most recent failure: Lost task 0.0 in stage 391.0 (TID 726, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - test script transform for stderr *** FAILED *** (2 seconds, 407 milliseconds)
       org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 393.0 failed 1 times, most recent failure: Lost task 0.0 in stage 393.0 (TID 727, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - test script transform data type *** FAILED *** (171 milliseconds)
       org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 395.0 failed 1 times, most recent failure: Lost task 0.0 in stage 395.0 (TID 728, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    ```
    
    ```
    HiveQuerySuite:
     - transform *** FAILED *** (359 milliseconds)
       Failed to execute query using catalyst:
       Error: Job aborted due to stage failure: Task 0 in stage 1347.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1347.0 (TID 2395, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - schema-less transform *** FAILED *** (344 milliseconds)
       Failed to execute query using catalyst:
       Error: Job aborted due to stage failure: Task 0 in stage 1348.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1348.0 (TID 2396, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - transform with custom field delimiter *** FAILED *** (296 milliseconds)
       Failed to execute query using catalyst:
       Error: Job aborted due to stage failure: Task 0 in stage 1349.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1349.0 (TID 2397, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - transform with custom field delimiter2 *** FAILED *** (297 milliseconds)
       Failed to execute query using catalyst:
       Error: Job aborted due to stage failure: Task 0 in stage 1350.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1350.0 (TID 2398, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - transform with custom field delimiter3 *** FAILED *** (312 milliseconds)
       Failed to execute query using catalyst:
       Error: Job aborted due to stage failure: Task 0 in stage 1351.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1351.0 (TID 2399, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - transform with SerDe2 *** FAILED *** (437 milliseconds)
       org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1355.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1355.0 (TID 2403, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    ```
    
    ```
    LogicalPlanToSQLSuite:
     - script transformation - schemaless *** FAILED *** (78 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1968.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1968.0 (TID 3932, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
      - script transformation - alias list *** FAILED *** (94 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1969.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1969.0 (TID 3933, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation - alias list with type *** FAILED *** (93 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1970.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1970.0 (TID 3934, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation - row format delimited clause with only one format property *** FAILED *** (78 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1971.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1971.0 (TID 3935, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation - row format delimited clause with multiple format properties *** FAILED *** (94 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1972.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1972.0 (TID 3936, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation - row format serde clauses with SERDEPROPERTIES *** FAILED *** (78 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1973.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1973.0 (TID 3937, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation - row format serde clauses without SERDEPROPERTIES *** FAILED *** (78 milliseconds)
       ...
       Cause: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1974.0 failed 1 times, most recent failure: Lost task 0.0 in stage 1974.0 (TID 3938, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    ```
    
    ```
    ScriptTransformationSuite:
     - cat without SerDe *** FAILED *** (156 milliseconds)
       ...
       Caused by: java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - cat with LazySimpleSerDe *** FAILED *** (63 milliseconds)
        ...
        org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2383.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2383.0 (TID 4819, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation should not swallow errors from upstream operators (no serde) *** FAILED *** (78 milliseconds)
        ...
        org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2384.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2384.0 (TID 4820, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - script transformation should not swallow errors from upstream operators (with serde) *** FAILED *** (47 milliseconds)
        ...
        org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2385.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2385.0 (TID 4821, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    
     - SPARK-14400 script transformation should fail for bad script command *** FAILED *** (47 milliseconds)
       "Job aborted due to stage failure: Task 0 in stage 2386.0 failed 1 times, most recent failure: Lost task 0.0 in stage 2386.0 (TID 4822, localhost, executor driver): java.io.IOException: Cannot run program "/bin/bash": CreateProcess error=2, The system cannot find the file specified
    ```
    
    ## How was this patch tested?
    
    AppVeyor as below:
    
    ```
    SQLQuerySuite:
      - script !!! CANCELED !!! (63 milliseconds)
      - Star Expansion - script transform !!! CANCELED !!! (0 milliseconds)
      - test script transform for stdout !!! CANCELED !!! (0 milliseconds)
      - test script transform for stderr !!! CANCELED !!! (0 milliseconds)
      - test script transform data type !!! CANCELED !!! (0 milliseconds)
    ```
    
    ```
    HiveQuerySuite:
      - transform !!! CANCELED !!! (31 milliseconds)
      - schema-less transform !!! CANCELED !!! (0 milliseconds)
      - transform with custom field delimiter !!! CANCELED !!! (0 milliseconds)
      - transform with custom field delimiter2 !!! CANCELED !!! (0 milliseconds)
      - transform with custom field delimiter3 !!! CANCELED !!! (0 milliseconds)
      - transform with SerDe2 !!! CANCELED !!! (0 milliseconds)
    ```
    
    ```
    LogicalPlanToSQLSuite:
      - script transformation - schemaless !!! CANCELED !!! (78 milliseconds)
      - script transformation - alias list !!! CANCELED !!! (0 milliseconds)
      - script transformation - alias list with type !!! CANCELED !!! (0 milliseconds)
      - script transformation - row format delimited clause with only one format property !!! CANCELED !!! (15 milliseconds)
      - script transformation - row format delimited clause with multiple format properties !!! CANCELED !!! (0 milliseconds)
      - script transformation - row format serde clauses with SERDEPROPERTIES !!! CANCELED !!! (0 milliseconds)
      - script transformation - row format serde clauses without SERDEPROPERTIES !!! CANCELED !!! (0 milliseconds)
    ```
    
    ```
    ScriptTransformationSuite:
      - cat without SerDe !!! CANCELED !!! (62 milliseconds)
      - cat with LazySimpleSerDe !!! CANCELED !!! (0 milliseconds)
      - script transformation should not swallow errors from upstream operators (no serde) !!! CANCELED !!! (0 milliseconds)
      - script transformation should not swallow errors from upstream operators (with serde) !!! CANCELED !!! (0 milliseconds)
      - SPARK-14400 script transformation should fail for bad script command !!! CANCELED !!! (0 milliseconds)
    ```
    
    Jenkins tests
    
    Author: hyukjinkwon <gu...@gmail.com>
    
    Closes #16501 from HyukjinKwon/windows-bash.

commit a2c6adcc5d2702d2f0e9b239517353335e5f911e
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-10T13:27:55Z

    [SPARK-18857][SQL] Don't use `Iterator.duplicate` for `incrementalCollect` in Thrift Server
    
    ## What changes were proposed in this pull request?
    
    To support `FETCH_FIRST`, SPARK-16563 used Scala `Iterator.duplicate`. However,
    Scala `Iterator.duplicate` uses a **queue to buffer all items between both iterators**,
    this causes GC and hangs for queries with large number of rows. We should not use this,
    especially for `spark.sql.thriftServer.incrementalCollect`.
    
    https://github.com/scala/scala/blob/2.12.x/src/library/scala/collection/Iterator.scala#L1262-L1300
    
    ## How was this patch tested?
    
    Pass the existing tests.
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16440 from dongjoon-hyun/SPARK-18857.

commit 3ef183a941d45b2f7ad167ea5133a93de0da5176
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-01-10T14:24:45Z

    [SPARK-19113][SS][TESTS] Set UncaughtExceptionHandler in onQueryStarted to ensure catching fatal errors during query initialization
    
    ## What changes were proposed in this pull request?
    
    StreamTest sets `UncaughtExceptionHandler` after starting the query now. It may not be able to catch fatal errors during query initialization. This PR uses `onQueryStarted` callback to fix it.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #16492 from zsxwing/SPARK-19113.

commit d5b1dc934a2482886c2c095de90e4c6a49ec42bd
Author: Dongjoon Hyun <do...@apache.org>
Date:   2017-01-10T18:49:44Z

    [SPARK-19137][SQL] Fix `withSQLConf` to reset `OptionalConfigEntry` correctly
    
    ## What changes were proposed in this pull request?
    
    `DataStreamReaderWriterSuite` makes test files in source folder like the followings. Interestingly, the root cause is `withSQLConf` fails to reset `OptionalConfigEntry` correctly. In other words, it resets the config into `Some(undefined)`.
    
    ```bash
    $ git status
    Untracked files:
      (use "git add <file>..." to include in what will be committed)
    
            sql/core/%253Cundefined%253E/
            sql/core/%3Cundefined%3E/
    ```
    
    ## How was this patch tested?
    
    Manual.
    ```
    build/sbt "project sql" test
    git status
    ```
    
    Author: Dongjoon Hyun <do...@apache.org>
    
    Closes #16522 from dongjoon-hyun/SPARK-19137.

commit 9bc3507e411b0ad9207e3053f80ac82f19b18f26
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-01-10T19:42:07Z

    [SPARK-19133][SPARKR][ML] fix glm for Gamma, clarify glm family supported
    
    ## What changes were proposed in this pull request?
    
    R family is a longer list than what Spark supports.
    
    ## How was this patch tested?
    
    manual
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #16511 from felixcheung/rdocglmfamily.

commit 856bae6af64982ae0221948c58ff564887e54a70
Author: Sean Owen <so...@cloudera.com>
Date:   2017-01-10T20:40:21Z

    [SPARK-18997][CORE] Recommended upgrade libthrift to 0.9.3
    
    ## What changes were proposed in this pull request?
    
    Updates to libthrift 0.9.3 to address a CVE.
    
    ## How was this patch tested?
    
    Existing tests.
    
    Author: Sean Owen <so...@cloudera.com>
    
    Closes #16530 from srowen/SPARK-18997.

commit bc6c56e940fe93591a1e5ba45751f1b243b57e28
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2017-01-11T01:58:11Z

    [SPARK-19140][SS] Allow update mode for non-aggregation streaming queries
    
    ## What changes were proposed in this pull request?
    
    This PR allow update mode for non-aggregation streaming queries. It will be same as the append mode if a query has no aggregations.
    
    ## How was this patch tested?
    
    Jenkins
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #16520 from zsxwing/update-without-agg.

commit 3b19c74e71fd6af18047747843e962b5401db4d9
Author: Wenchen Fan <we...@databricks.com>
Date:   2017-01-11T05:33:44Z

    [SPARK-19157][SQL] should be able to change spark.sql.runSQLOnFiles at runtime
    
    ## What changes were proposed in this pull request?
    
    The analyzer rule that supports to query files directly will be added to `Analyzer.extendedResolutionRules` when SparkSession is created, according to the `spark.sql.runSQLOnFiles` flag. If the flag is off when we create `SparkSession`, this rule is not added and we can not query files directly even we turn on the flag later.
    
    This PR fixes this bug by always adding that rule to `Analyzer.extendedResolutionRules`.
    
    ## How was this patch tested?
    
    new regression test
    
    Author: Wenchen Fan <we...@databricks.com>
    
    Closes #16531 from cloud-fan/sql-on-files.

commit a6155135690433988aa0cbf22f260f52a235e9f5
Author: wangzhenhua <wa...@huawei.com>
Date:   2017-01-11T06:34:44Z

    [SPARK-19149][SQL] Unify two sets of statistics in LogicalPlan
    
    ## What changes were proposed in this pull request?
    
    Currently we have two sets of statistics in LogicalPlan: a simple stats and a stats estimated by cbo, but the computing logic and naming are quite confusing, we need to unify these two sets of stats.
    
    ## How was this patch tested?
    
    Just modify existing tests.
    
    Author: wangzhenhua <wa...@huawei.com>
    Author: Zhenhua Wang <wz...@163.com>
    
    Closes #16529 from wzhfy/unifyStats.

commit 4239a1081ad96a503fbf9277e42b97422bb8af3e
Author: jerryshao <ss...@hortonworks.com>
Date:   2017-01-11T15:24:02Z

    [SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS security filesystems
    
    Currently Spark can only get token renewal interval from security HDFS (hdfs://), if Spark runs with other security file systems like webHDFS (webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get token renewal intervals from these tokens. These will make Spark unable to work with these security clusters. So instead of only checking HDFS token, we should generalize to support different DelegationTokenIdentifier.
    
    ## How was this patch tested?
    
    Manually verified in security cluster.
    
    Author: jerryshao <ss...@hortonworks.com>
    
    Closes #16432 from jerryshao/SPARK-19021.

commit d749c06677c2fd383733337f1c00f542da122b8d
Author: Felix Cheung <fe...@hotmail.com>
Date:   2017-01-11T16:29:09Z

    [SPARK-19130][SPARKR] Support setting literal value as column implicitly
    
    ## What changes were proposed in this pull request?
    
    ```
    df$foo <- 1
    ```
    
    instead of
    ```
    df$foo <- lit(1)
    ```
    
    ## How was this patch tested?
    
    unit tests
    
    Author: Felix Cheung <fe...@hotmail.com>
    
    Closes #16510 from felixcheung/rlitcol.

commit 3bc2eff8880a3ba8d4318118715ea1a47048e3de
Author: Bryan Cutler <cu...@gmail.com>
Date:   2017-01-11T19:57:38Z

    [SPARK-17568][CORE][DEPLOY] Add spark-submit option to override ivy settings used to resolve packages/artifacts
    
    ## What changes were proposed in this pull request?
    
    Adding option in spark-submit to allow overriding the default IvySettings used to resolve artifacts as part of the Spark Packages functionality.  This will allow all artifact resolution to go through a central managed repository, such as Nexus or Artifactory, where site admins can better approve and control what is used with Spark apps.
    
    This change restructures the creation of the IvySettings object in two distinct ways.  First, if the `spark.ivy.settings` option is not defined then `buildIvySettings` will create a default settings instance, as before, with defined repositories (Maven Central) included.  Second, if the option is defined, the ivy settings file will be loaded from the given path and only repositories defined within will be used for artifact resolution.
    ## How was this patch tested?
    
    Existing tests for default behaviour, Manual tests that load a ivysettings.xml file with local and Nexus repositories defined.  Added new test to load a simple Ivy settings file with a local filesystem resolver.
    
    Author: Bryan Cutler <cu...@gmail.com>
    Author: Ian Hummel <ia...@themodernlife.net>
    
    Closes #15119 from BryanCutler/spark-custom-IvySettings.

commit 30a07071f099c0ebcf04c4df61f8d414dcbad7b5
Author: jiangxingbo <ji...@gmail.com>
Date:   2017-01-11T21:44:07Z

    [SPARK-18801][SQL] Support resolve a nested view
    
    ## What changes were proposed in this pull request?
    
    We should be able to resolve a nested view. The main advantage is that if you update an underlying view, the current view also gets updated.
    The new approach should be compatible with older versions of SPARK/HIVE, that means:
    1. The new approach should be able to resolve the views that created by older versions of SPARK/HIVE;
    2. The new approach should be able to resolve the views that are currently supported by SPARK SQL.
    
    The new approach mainly brings in the following changes:
    1. Add a new operator called `View` to keep track of the CatalogTable that describes the view, and the output attributes as well as the child of the view;
    2. Update the `ResolveRelations` rule to resolve the relations and views, note that a nested view should be resolved correctly;
    3. Add `viewDefaultDatabase` variable to `CatalogTable` to keep track of the default database name used to resolve a view, if the `CatalogTable` is not a view, then the variable should be `None`;
    4. Add `AnalysisContext` to enable us to still support a view created with CTE/Windows query;
    5. Enables the view support without enabling Hive support (i.e., enableHiveSupport);
    6. Fix a weird behavior: the result of a view query may have different schema if the referenced table has been changed. After this PR, we try to cast the child output attributes to that from the view schema, throw an AnalysisException if cast is not allowed.
    
    Note this is compatible with the views defined by older versions of Spark(before 2.2), which have empty `defaultDatabase` and all the relations in `viewText` have database part defined.
    
    ## How was this patch tested?
    1. Add new tests in `SessionCatalogSuite` to test the function `lookupRelation`;
    2. Add new test case in `SQLViewSuite` to test resolve a nested view.
    
    Author: jiangxingbo <ji...@gmail.com>
    
    Closes #16233 from jiangxb1987/resolve-view.

commit 66fe819ada6435f3a351c2d257e73b8e6f6085cd
Author: Reynold Xin <rx...@databricks.com>
Date:   2017-01-11T22:25:36Z

    [SPARK-19149][SQL] Follow-up: simplify cache implementation.
    
    ## What changes were proposed in this pull request?
    This patch simplifies slightly the logical plan statistics cache implementation, as discussed in https://github.com/apache/spark/pull/16529
    
    ## How was this patch tested?
    N/A - this has no behavior change.
    
    Author: Reynold Xin <rx...@databricks.com>
    
    Closes #16544 from rxin/SPARK-19149.

commit 43fa21b3e62ee108bcecb74398f431f08c6b625c
Author: wangzhenhua <wa...@huawei.com>
Date:   2017-01-11T23:00:58Z

    [SPARK-19132][SQL] Add test cases for row size estimation and aggregate estimation
    
    ## What changes were proposed in this pull request?
    
    In this pr, we add more test cases for project and aggregate estimation.
    
    ## How was this patch tested?
    
    Add test cases.
    
    Author: wangzhenhua <wa...@huawei.com>
    
    Closes #16551 from wzhfy/addTests.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #16890: when colum is use alias ,the order by result is w...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/16890


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16890: when colum is use alias ,the order by result is wrong

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16890
  
    Could you please this and ask this to Spark user mailing list? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16890: when colum is use alias ,the order by result is wrong

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/16890
  
    @muyannian Could you click the "Close pull request" button below?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16890: when colum is use alias ,the order by result is wrong

Posted by srowen <gi...@git.apache.org>.
Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/16890
  
    @muyannian Close this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #16890: when colum is use alias ,the order by result is wrong

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/16890
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org