You are viewing a plain text version of this content. The canonical link for it is here.

Posted to reviews@spark.apache.org by pprado <gi...@git.apache.org> on 2016/05/13 18:52:15 UTC

[GitHub] spark pull request: Problem select empty ORC table

GitHub user pprado opened a pull request:

    https://github.com/apache/spark/pull/13103

    Problem select empty ORC table

    ## Error when I selected empty ORC table
    
    > [pprado@hadoop-m ~]$ beeline -u jdbc:hive2://
    WARNING: Use "yarn jar" to launch YARN applications.
    Connecting to jdbc:hive2://
    Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
    Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
    Transaction isolation: TRANSACTION_REPEATABLE_READ
    Beeline version 1.2.1000.2.4.2.0-258 by Apache Hive
    
    > 
    
    On beeline => `create table my_test (id int, name String) stored as orc;`
    On beeline => `select * from my_test;`
    
    > 
    16/05/13 18:18:57 [main]: ERROR hdfs.KeyProviderCache: Could not find uri with key [dfs.encryption.key.provider.uri] to create a keyProvider !!
    OK
    +-------------+---------------+--+
    | my_test.id  | my_test.name  |
    +-------------+---------------+--+
    +-------------+---------------+--+
    No rows selected (1.227 seconds)
    
    > 
    
    Hive is OK!
    
    Now, when i execute pyspark.
    
    > Welcome to
    >     SPARK   version 1.6.1
    > 
    > Using Python version 2.6.6 (r266:84292, Jul 23 2015 15:22:56)
    > SparkContext available as sc, HiveContext available as sqlContext.
    > 
    > 
    
    PySpark => `sqlContext.sql("select * from my_test")`
    
    > 16/05/13 18:33:41 INFO ParseDriver: Parsing command: select * from my_test
    > 16/05/13 18:33:41 INFO ParseDriver: Parse Completed
    > Traceback (most recent call last):
    >   File "<stdin>", line 1, in <module>
    >   File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/context.py", line 580, in sql
    >     return DataFrame(self._ssql_ctx.sql(sqlQuery), self)
    >   File "/usr/hdp/2.4.2.0-258/spark/python/lib/py4j-0.9-src.zip/py4j/java_gateway.py", line 813, in __call__
    >   File "/usr/hdp/2.4.2.0-258/spark/python/pyspark/sql/utils.py", line 53, in deco
    >     raise IllegalArgumentException(s.split(': ', 1)[1], stackTrace)
    > pyspark.sql.utils.IllegalArgumentException: u'orcFileOperator: path hdfs://hadoop-m.c.sva-0001.internal:8020/apps/hive/warehouse/my_test does not have valid orc files matching the pattern'
    
    when i create parquet table, it's all right. I do not have problem.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/apache/spark branch-1.6

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13103.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13103
    
----
commit bd33d4ee847973289a58032df35375f03e9f9865
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2015-12-18T22:05:06Z

    [SPARK-12404][SQL] Ensure objects passed to StaticInvoke is Serializable
    
    Now `StaticInvoke` receives `Any` as a object and `StaticInvoke` can be serialized but sometimes the object passed is not serializable.
    
    For example, following code raises Exception because `RowEncoder#extractorsFor` invoked indirectly makes `StaticInvoke`.
    
    ```
    case class TimestampContainer(timestamp: java.sql.Timestamp)
    val rdd = sc.parallelize(1 to 2).map(_ => TimestampContainer(System.currentTimeMillis))
    val df = rdd.toDF
    val ds = df.as[TimestampContainer]
    val rdd2 = ds.rdd                                 <----------------- invokes extractorsFor indirectory
    ```
    
    I'll add test cases.
    
    Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
    Author: Michael Armbrust <mi...@databricks.com>
    
    Closes #10357 from sarutak/SPARK-12404.
    
    (cherry picked from commit 6eba655259d2bcea27d0147b37d5d1e476e85422)
    Signed-off-by: Michael Armbrust <mi...@databricks.com>

commit eca401ee5d3ae683cbee531c1f8bc981f9603fc8
Author: Burak Yavuz <br...@gmail.com>
Date:   2015-12-18T23:24:41Z

    [SPARK-11985][STREAMING][KINESIS][DOCS] Update Kinesis docs
    
     - Provide example on `message handler`
     - Provide bit on KPL record de-aggregation
     - Fix typos
    
    Author: Burak Yavuz <br...@gmail.com>
    
    Closes #9970 from brkyvz/kinesis-docs.
    
    (cherry picked from commit 2377b707f25449f4557bf048bb384c743d9008e5)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit d6a519ff20652494ac3aeba477526ad1fd810a3c
Author: Yanbo Liang <yb...@gmail.com>
Date:   2015-12-19T08:34:30Z

    [SQL] Fix mistake doc of join type for dataframe.join
    
    Fix mistake doc of join type for ```dataframe.join```.
    
    Author: Yanbo Liang <yb...@gmail.com>
    
    Closes #10378 from yanboliang/leftsemi.
    
    (cherry picked from commit a073a73a561e78c734119c8b764d37a4e5e70da4)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit c754a08793458813d608e48ad1b158da770cd992
Author: pshearer <ps...@massmutual.com>
Date:   2015-12-21T22:04:59Z

    Doc typo: ltrim = trim from left end, not right
    
    Author: pshearer <ps...@massmutual.com>
    
    Closes #10414 from pshearer/patch-1.
    
    (cherry picked from commit fc6dbcc7038c2b030ef6a2dc8be5848499ccee1c)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit ca3998512dd7801379c96c9399d3d053ab7472cd
Author: Andrew Or <an...@databricks.com>
Date:   2015-12-21T22:09:04Z

    [SPARK-12466] Fix harmless NPE in tests
    
    ```
    [info] ReplayListenerSuite:
    [info] - Simple replay (58 milliseconds)
    java.lang.NullPointerException
    	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:982)
    	at org.apache.spark.deploy.master.Master$$anonfun$asyncRebuildSparkUI$1.applyOrElse(Master.scala:980)
    ```
    https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-Master-SBT/4316/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/consoleFull
    
    This was introduced in #10284. It's harmless because the NPE is caused by a race that occurs mainly in `local-cluster` tests (but don't actually fail the tests).
    
    Tested locally to verify that the NPE is gone.
    
    Author: Andrew Or <an...@databricks.com>
    
    Closes #10417 from andrewor14/fix-harmless-npe.
    
    (cherry picked from commit d655d37ddf59d7fb6db529324ac8044d53b2622a)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit 4062cda3087ae42c6c3cb24508fc1d3a931accdf
Author: Patrick Wendell <pw...@gmail.com>
Date:   2015-12-22T01:50:29Z

    Preparing Spark release v1.6.0-rc4

commit 5b19e7cfded0e2e41b6f427b4c3cfc3f06f85466
Author: Patrick Wendell <pw...@gmail.com>
Date:   2015-12-22T01:50:36Z

    Preparing development version 1.6.0-SNAPSHOT

commit 309ef355fc511b70765983358d5c92b5f1a26bce
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-12-22T06:28:18Z

    [MINOR] Fix typos in JavaStreamingContext
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #10424 from zsxwing/typo.
    
    (cherry picked from commit 93da8565fea42d8ac978df411daced4a9ea3a9c8)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 0f905d7df43b20d9335ec880b134d8d4f962c297
Author: Josh Rosen <jo...@databricks.com>
Date:   2015-12-22T07:12:05Z

    [SPARK-11823][SQL] Fix flaky JDBC cancellation test in HiveThriftBinaryServerSuite
    
    This patch fixes a flaky "test jdbc cancel" test in HiveThriftBinaryServerSuite. This test is prone to a race-condition which causes it to block indefinitely with while waiting for an extremely slow query to complete, which caused many Jenkins builds to time out.
    
    For more background, see my comments on #6207 (the PR which introduced this test).
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #10425 from JoshRosen/SPARK-11823.
    
    (cherry picked from commit 2235cd44407e3b6b401fb84a2096ade042c51d36)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit 94fb5e870403e19feca8faf7d98bba6d14f7a362
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-12-22T23:33:30Z

    [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #10439 from zsxwing/kafka-message-handler-doc.
    
    (cherry picked from commit 93db50d1c2ff97e6eb9200a995e4601f752968ae)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit 942c0577b201a08fffdcaf71e4d1867266ae309e
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-12-23T00:39:10Z

    [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming
    
    This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support checkpointing.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #10385 from zsxwing/accumulator-broadcast-example.
    
    (cherry picked from commit 20591afd790799327f99485c5a969ed7412eca45)
    Signed-off-by: Tathagata Das <ta...@gmail.com>

commit c6c9bf99af0ee0559248ad772460e9b2efde5861
Author: pierre-borckmans <pi...@realimpactanalytics.com>
Date:   2015-12-23T07:00:42Z

    [SPARK-12477][SQL] - Tungsten projection fails for null values in array fields
    
    Accessing null elements in an array field fails when tungsten is enabled.
    It works in Spark 1.3.1, and in Spark > 1.5 with Tungsten disabled.
    
    This PR solves this by checking if the accessed element in the array field is null, in the generated code.
    
    Example:
    ```
    // Array of String
    case class AS( as: Seq[String] )
    val dfAS = sc.parallelize( Seq( AS ( Seq("a",null,"b") ) ) ).toDF
    dfAS.registerTempTable("T_AS")
    for (i <- 0 to 2) { println(i + " = " + sqlContext.sql(s"select as[$i] from T_AS").collect.mkString(","))}
    ```
    
    With Tungsten disabled:
    ```
    0 = [a]
    1 = [null]
    2 = [b]
    ```
    
    With Tungsten enabled:
    ```
    0 = [a]
    15/12/22 09:32:50 ERROR Executor: Exception in task 7.0 in stage 1.0 (TID 15)
    java.lang.NullPointerException
    	at org.apache.spark.sql.catalyst.expressions.UnsafeRowWriters$UTF8StringWriter.getSize(UnsafeRowWriters.java:90)
    	at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source)
    	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:90)
    	at org.apache.spark.sql.execution.TungstenProject$$anonfun$3$$anonfun$apply$3.apply(basicOperators.scala:88)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    	at scala.collection.Iterator$class.foreach(Iterator.scala:727)
    	at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
    ```
    
    Author: pierre-borckmans <pi...@realimpactanalytics.com>
    
    Closes #10429 from pierre-borckmans/SPARK-12477_Tungsten-Projection-Null-Element-In-Array.
    
    (cherry picked from commit 43b2a6390087b7ce262a54dc8ab8dd825db62e21)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit 5987b1658b837400691160c38ba6eedc47274ee4
Author: Adrian Bridgett <ad...@smop.co.uk>
Date:   2015-12-24T00:00:03Z

    [SPARK-12499][BUILD] don't force MAVEN_OPTS
    
    allow the user to override MAVEN_OPTS (2GB wasn't sufficient for me)
    
    Author: Adrian Bridgett <ad...@smop.co.uk>
    
    Closes #10448 from abridgett/feature/do_not_force_maven_opts.
    
    (cherry picked from commit ead6abf7e7fc14b451214951d4991d497aa65e63)
    Signed-off-by: Josh Rosen <jo...@databricks.com>

commit b49856ae5983aca8ed7df2f478fc5f399ec34ce8
Author: Nong Li <no...@databricks.com>
Date:   2015-12-19T00:05:18Z

    [SPARK-12411][CORE] Decrease executor heartbeat timeout to match heartbeat interval
    
    Previously, the rpc timeout was the default network timeout, which is the same value
    the driver uses to determine dead executors. This means if there is a network issue,
    the executor is determined dead after one heartbeat attempt. There is a separate config
    for the heartbeat interval which is a better value to use for the heartbeat RPC. With
    this change, the executor will make multiple heartbeat attempts even with RPC issues.
    
    Author: Nong Li <no...@databricks.com>
    
    Closes #10365 from nongli/spark-12411.

commit 4dd8712c1b64a64da0fa0413e2c9be68ad0ddc17
Author: Kazuaki Ishizaki <is...@jp.ibm.com>
Date:   2015-12-24T12:27:55Z

    [SPARK-12502][BUILD][PYTHON] Script /dev/run-tests fails when IBM Java is used
    
    fix an exception with IBM JDK by removing update field from a JavaVersion tuple. This is because IBM JDK does not have information on update '_xx'
    
    Author: Kazuaki Ishizaki <is...@jp.ibm.com>
    
    Closes #10463 from kiszk/SPARK-12502.
    
    (cherry picked from commit 9e85bb71ad2d7d3a9da0cb8853f3216d37e6ff47)
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.co.jp>

commit 865dd8bccfc994310ad6664151d469043706ef3b
Author: CK50 <ch...@oracle.com>
Date:   2015-12-24T13:39:11Z

    [SPARK-12010][SQL] Spark JDBC requires support for column-name-free INSERT syntax
    
    In the past Spark JDBC write only worked with technologies which support the following INSERT statement syntax (JdbcUtils.scala: insertStatement()):
    
    INSERT INTO $table VALUES ( ?, ?, ..., ? )
    
    But some technologies require a list of column names:
    
    INSERT INTO $table ( $colNameList ) VALUES ( ?, ?, ..., ? )
    
    This was blocking the use of e.g. the Progress JDBC Driver for Cassandra.
    
    Another limitation is that syntax 1 relies no the dataframe field ordering match that of the target table. This works fine, as long as the target table has been created by writer.jdbc().
    
    If the target table contains more columns (not created by writer.jdbc()), then the insert fails due mismatch of number of columns or their data types.
    
    This PR switches to the recommended second INSERT syntax. Column names are taken from datafram field names.
    
    Author: CK50 <ch...@oracle.com>
    
    Closes #10380 from CK50/master-SPARK-12010-2.
    
    (cherry picked from commit 502476e45c314a1229b3bce1c61f5cb94a9fc04b)
    Signed-off-by: Sean Owen <so...@cloudera.com>

commit b8da77ef776ab9cdc130a70293d75e7bdcdf95b0
Author: gatorsmile <ga...@gmail.com>
Date:   2015-12-28T07:18:48Z

    [SPARK-12520] [PYSPARK] Correct Descriptions and Add Use Cases in Equi-Join
    
    After reading the JIRA https://issues.apache.org/jira/browse/SPARK-12520, I double checked the code.
    
    For example, users can do the Equi-Join like
      ```df.join(df2, 'name', 'outer').select('name', 'height').collect()```
    - There exists a bug in 1.5 and 1.4. The code just ignores the third parameter (join type) users pass. However, the join type we called is `Inner`, even if the user-specified type is the other type (e.g., `Outer`).
    - After a PR: https://github.com/apache/spark/pull/8600, the 1.6 does not have such an issue, but the description has not been updated.
    
    Plan to submit another PR to fix 1.5 and issue an error message if users specify a non-inner join type when using Equi-Join.
    
    Author: gatorsmile <ga...@gmail.com>
    
    Closes #10477 from gatorsmile/pyOuterJoin.

commit 1fbcb6e7be9cd9fa5255837cfc5358f2283f4aaf
Author: Yaron Weinsberg <wy...@gmail.com>
Date:   2015-12-28T20:19:11Z

    [SPARK-12517] add default RDD name for one created via sc.textFile
    
    The feature was first added at commit: 7b877b27053bfb7092e250e01a3b887e1b50a109 but was later removed (probably by mistake) at commit: fc8b58195afa67fbb75b4c8303e022f703cbf007.
    This change sets the default path of RDDs created via sc.textFile(...) to the path argument.
    
    Here is the symptom:
    
    * Using spark-1.5.2-bin-hadoop2.6:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res5: String = null
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res6: String = /home/root/.bashrc
    
    * while using Spark 1.3.1:
    
    scala> sc.textFile("/home/root/.bashrc").name
    res0: String = /home/root/.bashrc
    
    scala> sc.binaryFiles("/home/root/.bashrc").name
    res1: String = /home/root/.bashrc
    
    Author: Yaron Weinsberg <wy...@gmail.com>
    Author: yaron <ya...@il.ibm.com>
    
    Closes #10456 from wyaron/master.
    
    (cherry picked from commit 73b70f076d4e22396b7e145f2ce5974fbf788048)
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.co.jp>

commit 7c7d76f34c0e09aae12f03e7c2922d4eb50d1830
Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
Date:   2015-12-28T20:33:19Z

    [SPARK-12424][ML] The implementation of ParamMap#filter is wrong.
    
    ParamMap#filter uses `mutable.Map#filterKeys`. The return type of `filterKey` is collection.Map, not mutable.Map but the result is casted to mutable.Map using `asInstanceOf` so we get `ClassCastException`.
    Also, the return type of Map#filterKeys is not Serializable. It's the issue of Scala (https://issues.scala-lang.org/browse/SI-6654).
    
    Author: Kousuke Saruta <sa...@oss.nttdata.co.jp>
    
    Closes #10381 from sarutak/SPARK-12424.
    
    (cherry picked from commit 07165ca06fe0866677525f85fec25e4dbd336674)
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.co.jp>

commit a9c52d4954aa445ab751b38ddbfd8fb6f84d7c14
Author: Daoyuan Wang <da...@intel.com>
Date:   2015-12-28T22:02:30Z

    [SPARK-12222][CORE] Deserialize RoaringBitmap using Kryo serializer throw Buffer underflow exception
    
    Since we only need to implement `def skipBytes(n: Int)`,
    code in #10213 could be simplified.
    davies scwf
    
    Author: Daoyuan Wang <da...@intel.com>
    
    Closes #10253 from adrian-wang/kryo.
    
    (cherry picked from commit a6d385322e7dfaff600465fa5302010a5f122c6b)
    Signed-off-by: Kousuke Saruta <sa...@oss.nttdata.co.jp>

commit fd202485ace613d9930d0ede48ba8a65920004db
Author: Shixiong Zhu <sh...@databricks.com>
Date:   2015-12-28T23:01:51Z

    [SPARK-12489][CORE][SQL][MLIB] Fix minor issues found by FindBugs
    
    Include the following changes:
    
    1. Close `java.sql.Statement`
    2. Fix incorrect `asInstanceOf`.
    3. Remove unnecessary `synchronized` and `ReentrantLock`.
    
    Author: Shixiong Zhu <sh...@databricks.com>
    
    Closes #10440 from zsxwing/findbugs.
    
    (cherry picked from commit 710b41172958a0b3a2b70c48821aefc81893731b)
    Signed-off-by: Shixiong Zhu <sh...@databricks.com>

commit 85a871818ee1134deb29387c78c6ce21eb6d2acb
Author: Takeshi YAMAMURO <li...@gmail.com>
Date:   2015-12-29T05:28:32Z

    [SPARK-11394][SQL] Throw IllegalArgumentException for unsupported types in postgresql
    
    If DataFrame has BYTE types, throws an exception:
    org.postgresql.util.PSQLException: ERROR: type "byte" does not exist
    
    Author: Takeshi YAMAMURO <li...@gmail.com>
    
    Closes #9350 from maropu/FixBugInPostgreJdbc.
    
    (cherry picked from commit 73862a1eb9744c3c32458c9c6f6431c23783786a)
    Signed-off-by: Yin Huai <yh...@databricks.com>

commit c069ffc2b13879f471e6d888116f45f6a8902236
Author: Forest Fang <fo...@outlook.com>
Date:   2015-12-29T07:15:24Z

    [SPARK-12526][SPARKR] ifelse`, `when`, `otherwise` unable to take Column as value
    
    `ifelse`, `when`, `otherwise` is unable to take `Column` typed S4 object as values.
    
    For example:
    ```r
    ifelse(lit(1) == lit(1), lit(2), lit(3))
    ifelse(df$mpg > 0, df$mpg, 0)
    ```
    will both fail with
    ```r
    attempt to replicate an object of type 'environment'
    ```
    
    The PR replaces `ifelse` calls with `if ... else ...` inside the function implementations to avoid attempt to vectorize(i.e. `rep()`). It remains to be discussed whether we should instead support vectorization in these functions for consistency because `ifelse` in base R is vectorized but I cannot foresee any scenarios these functions will want to be vectorized in SparkR.
    
    For reference, added test cases which trigger failures:
    ```r
    . Error: when(), otherwise() and ifelse() with column on a DataFrame ----------
    error in evaluating the argument 'x' in selecting a method for function 'collect':
      error in evaluating the argument 'col' in selecting a method for function 'select':
      attempt to replicate an object of type 'environment'
    Calls: when -> when -> ifelse -> ifelse
    
    1: withCallingHandlers(eval(code, new_test_environment), error = capture_calls, message = function(c) invokeRestart("muffleMessage"))
    2: eval(code, new_test_environment)
    3: eval(expr, envir, enclos)
    4: expect_equal(collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))[, 1], c(NA, 1)) at test_sparkSQL.R:1126
    5: expect_that(object, equals(expected, label = expected.label, ...), info = info, label = label)
    6: condition(object)
    7: compare(actual, expected, ...)
    8: collect(select(df, when(df$a > 1 & df$b > 2, lit(1))))
    Error: Test failures
    Execution halted
    ```
    
    Author: Forest Fang <fo...@outlook.com>
    
    Closes #10481 from saurfang/spark-12526.
    
    (cherry picked from commit d80cc90b5545cff82cd9b340f12d01eafc9ca524)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit 8dc65497152f2c8949b08fddad853d31c4bd9ae5
Author: Holden Karau <ho...@us.ibm.com>
Date:   2015-12-30T19:14:47Z

    [SPARK-12300] [SQL] [PYSPARK] fix schema inferance on local collections
    
    Current schema inference for local python collections halts as soon as there are no NullTypes. This is different than when we specify a sampling ratio of 1.0 on a distributed collection. This could result in incomplete schema information.
    
    Author: Holden Karau <ho...@us.ibm.com>
    
    Closes #10275 from holdenk/SPARK-12300-fix-schmea-inferance-on-local-collections.
    
    (cherry picked from commit d1ca634db4ca9db7f0ba7ca38a0e03bcbfec23c9)
    Signed-off-by: Davies Liu <da...@gmail.com>

commit cd86075b52d6363f674dffc3eb71d90449563879
Author: Carson Wang <ca...@intel.com>
Date:   2015-12-30T21:49:10Z

    [SPARK-12399] Display correct error message when accessing REST API with an unknown app Id
    
    I got an exception when accessing the below REST API with an unknown application Id.
    `http://<server-url>:18080/api/v1/applications/xxx/jobs`
    Instead of an exception, I expect an error message "no such app: xxx" which is a similar error message when I access `/api/v1/applications/xxx`
    ```
    org.spark-project.guava.util.concurrent.UncheckedExecutionException: java.util.NoSuchElementException: no app with key xxx
    	at org.spark-project.guava.cache.LocalCache$Segment.get(LocalCache.java:2263)
    	at org.spark-project.guava.cache.LocalCache.get(LocalCache.java:4000)
    	at org.spark-project.guava.cache.LocalCache.getOrLoad(LocalCache.java:4004)
    	at org.spark-project.guava.cache.LocalCache$LocalLoadingCache.get(LocalCache.java:4874)
    	at org.apache.spark.deploy.history.HistoryServer.getSparkUI(HistoryServer.scala:116)
    	at org.apache.spark.status.api.v1.UIRoot$class.withSparkUI(ApiRootResource.scala:226)
    	at org.apache.spark.deploy.history.HistoryServer.withSparkUI(HistoryServer.scala:46)
    	at org.apache.spark.status.api.v1.ApiRootResource.getJobs(ApiRootResource.scala:66)
    ```
    
    Author: Carson Wang <ca...@intel.com>
    
    Closes #10352 from carsonwang/unknownAppFix.
    
    (cherry picked from commit b244297966be1d09f8e861cfe2d8e69f7bed84da)
    Signed-off-by: Marcelo Vanzin <va...@cloudera.com>

commit 4e9dd16987b3cba19dcf6437f3b6c8aeb59e2e39
Author: felixcheung <fe...@hotmail.com>
Date:   2016-01-03T15:23:35Z

    [SPARK-12327][SPARKR] fix code for lintr warning for commented code
    
    shivaram
    
    Author: felixcheung <fe...@hotmail.com>
    
    Closes #10408 from felixcheung/rcodecomment.
    
    (cherry picked from commit c3d505602de2fd2361633f90e4fff7e041849e28)
    Signed-off-by: Shivaram Venkataraman <sh...@cs.berkeley.edu>

commit f7a322382a3c1eed7088541add55a7813813a958
Author: Xiu Guo <xg...@gmail.com>
Date:   2016-01-04T04:48:56Z

    [SPARK-12562][SQL] DataFrame.write.format(text) requires the column name to be called value
    
    Author: Xiu Guo <xg...@gmail.com>
    
    Closes #10515 from xguo27/SPARK-12562.
    
    (cherry picked from commit 84f8492c1555bf8ab44c9818752278f61768eb16)
    Signed-off-by: Reynold Xin <rx...@databricks.com>

commit cd02038198fa57da816211d7bc65921ff9f1e9bb
Author: Nong Li <no...@databricks.com>
Date:   2016-01-04T18:37:56Z

    [SPARK-12486] Worker should kill the executors more forcefully if possible.
    
    This patch updates the ExecutorRunner's terminate path to use the new java 8 API
    to terminate processes more forcefully if possible. If the executor is unhealthy,
    it would previously ignore the destroy() call. Presumably, the new java API was
    added to handle cases like this.
    
    We could update the termination path in the future to use OS specific commands
    for older java versions.
    
    Author: Nong Li <no...@databricks.com>
    
    Closes #10438 from nongli/spark-12486-executors.
    
    (cherry picked from commit 8f659393b270c46e940c4e98af2d996bd4fd6442)
    Signed-off-by: Andrew Or <an...@databricks.com>

commit b5a1f564a3c099ef0b674599f0b012d9346115a3
Author: Pete Robbins <ro...@gmail.com>
Date:   2016-01-04T18:43:21Z

    [SPARK-12470] [SQL] Fix size reduction calculation
    
    also only allocate required buffer size
    
    Author: Pete Robbins <ro...@gmail.com>
    
    Closes #10421 from robbinspg/master.
    
    (cherry picked from commit b504b6a90a95a723210beb0031ed41a75d702f66)
    Signed-off-by: Davies Liu <da...@gmail.com>
    
    Conflicts:
    	sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/codegen/GenerateUnsafeRowJoiner.scala

commit 7f37c1e45d52b7823d566349e2be21366d73651f
Author: Josh Rosen <jo...@databricks.com>
Date:   2016-01-04T18:39:42Z

    [SPARK-12579][SQL] Force user-specified JDBC driver to take precedence
    
    Spark SQL's JDBC data source allows users to specify an explicit JDBC driver to load (using the `driver` argument), but in the current code it's possible that the user-specified driver will not be used when it comes time to actually create a JDBC connection.
    
    In a nutshell, the problem is that you might have multiple JDBC drivers on the classpath that claim to be able to handle the same subprotocol, so simply registering the user-provided driver class with the our `DriverRegistry` and JDBC's `DriverManager` is not sufficient to ensure that it's actually used when creating the JDBC connection.
    
    This patch addresses this issue by first registering the user-specified driver with the DriverManager, then iterating over the driver manager's loaded drivers in order to obtain the correct driver and use it to create a connection (previously, we just called `DriverManager.getConnection()` directly).
    
    If a user did not specify a JDBC driver to use, then we call `DriverManager.getDriver` to figure out the class of the driver to use, then pass that class's name to executors; this guards against corner-case bugs in situations where the driver and executor JVMs might have different sets of JDBC drivers on their classpaths (previously, there was the (rare) potential for `DriverManager.getConnection()` to use different drivers on the driver and executors if the user had not explicitly specified a JDBC driver class and the classpaths were different).
    
    This patch is inspired by a similar patch that I made to the `spark-redshift` library (https://github.com/databricks/spark-redshift/pull/143), which contains its own modified fork of some of Spark's JDBC data source code (for cross-Spark-version compatibility reasons).
    
    Author: Josh Rosen <jo...@databricks.com>
    
    Closes #10519 from JoshRosen/jdbc-driver-precedence.
    
    (cherry picked from commit 6c83d938cc61bd5fabaf2157fcc3936364a83f02)
    Signed-off-by: Yin Huai <yh...@databricks.com>

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by AmplabJenkins <gi...@git.apache.org>.

Github user AmplabJenkins commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219130690
  
    Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219193053
  
    In https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports , It seems not suggesting to create a PR first but ask it to mailing list first.
    
    Also, It seems not a very serious bug according to the guide. It seems priority of JIRA can be `minor` or `major`.
    
    And, I think it is more important to follow the guide. Otherwise, it would be chaotic to deal with issues and PRs because it seems Spark is a really active and popular project.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by pprado <gi...@git.apache.org>.

Github user pprado commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219191363
  
    Sorry, I tried to follow this document:
    
    https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingBugReports
    
    Jira is not the best tool for bug reporting.
    
    Again, sorry if you do not follow good documentation practices expected for
    you, but I think more important to report a very serious BUG, which did not
    exist in version 1.6.0.
    
    Thanks,
    Pedro Prado
    
    2016-05-13 21:30 GMT-03:00 Hyukjin Kwon <no...@github.com>:
    
    > Because I guess this is a contribution to Spark and there is a guide for
    > this. It seems there are a lot of things wrong with this PR (e.g. no JIRA).
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/13103#issuecomment-219188872>
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219188872
  
    Because I guess this is a contribution to Spark and there is a guide for this. It seems there are a lot of things wrong with this PR (e.g. no JIRA).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request #13103: Problem select empty ORC table

Posted by asfgit <gi...@git.apache.org>.

Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/13103


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219193377
  
    For me, I would like to say this might better be closed if this PR does not propose code changes. I think It would be nicer if the bug report is in the JIRA.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by pprado <gi...@git.apache.org>.

Github user pprado commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219187681
  
    Why?
    Em 13 de mai de 2016 9:08 PM, "Hyukjin Kwon" <no...@github.com>
    escreveu:
    
    > I think it would be good if you follow
    > https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide.
    >
    > —
    > You are receiving this because you authored the thread.
    > Reply to this email directly or view it on GitHub
    > <https://github.com/apache/spark/pull/13103#issuecomment-219187010>
    >



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by andrewor14 <gi...@git.apache.org>.

Github user andrewor14 commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-220425159
  
    Let's close this PR. This patch isn't opened against the correct branch anyway.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org

[GitHub] spark pull request: Problem select empty ORC table

Posted by HyukjinKwon <gi...@git.apache.org>.

Github user HyukjinKwon commented on the pull request:

    https://github.com/apache/spark/pull/13103#issuecomment-219187010
  
    I think it would be good if you follow  https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org