You are viewing a plain text version of this content. The canonical link for it is here.
Posted to reviews@spark.apache.org by BryanCutler <gi...@git.apache.org> on 2017/12/04 23:36:13 UTC

[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

GitHub user BryanCutler opened a pull request:

    https://github.com/apache/spark/pull/19884

    [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

    ## What changes were proposed in this pull request?
    
    Upgrade Spark to Arrow 0.8.0 for Java and Python
    
    ## How was this patch tested?
    
    Existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/BryanCutler/spark arrow-upgrade-080-SPARK-22324

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/19884.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #19884
    
----

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    The Arrow 0.8.0 release vote just started today. Assuming it passes, the earliest you could see packages pushed to PyPI or conda-forge would be sometime on Thursday evening or Friday. 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158206051
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, functionType=None):
     
            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
            >>> from pyspark.sql.types import IntegerType, StringType
    -       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
    -       >>> @pandas_udf(StringType())
    +       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # doctest: +SKIP
    +       >>> @pandas_udf(StringType())  # doctest: +SKIP
            ... def to_upper(s):
            ...     return s.str.upper()
            ...
    -       >>> @pandas_udf("integer", PandasUDFType.SCALAR)
    +       >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
            ... def add_one(x):
            ...     return x + 1
            ...
    -       >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age"))
    +       >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", "age"))  # doctest: +SKIP
    --- End diff --
    
    why change `John Doe` to `John`? And are we going to re-enable these doctest later?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85043/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85044/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157957465
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala ---
    @@ -126,18 +121,14 @@ class ArrowPythonRunner(
           private var schema: StructType = _
           private var vectors: Array[ColumnVector] = _
     
    -      private var closed = false
    -
           context.addTaskCompletionListener { _ =>
             // todo: we need something like `reader.end()`, which release all the resources, but leave
    --- End diff --
    
    ok done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157747561
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    +    """ Raise ImportError if minimum version of pyarrow is not installed
    +    """
    +    from distutils.version import LooseVersion
    +    import pyarrow
    +    if pyarrow.__version__ < LooseVersion('0.8.0'):
    --- End diff --
    
    Just quickly checked other codes in few places I know. Let's use `LooseVersion` for both sides as @ueshin suggested to reduce possible confusion if you wouldn't mind.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157738702
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    >  don't we need to use LooseVersion for pyarrow.__version__, too?
    
    Seems fine by 
    
    https://github.com/python/cpython/blob/6f0eb93183519024cb360162bdd81b9faec97ba6/Lib/distutils/version.py#L331-L340
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @wesm I was able to install pyarrow 0.8.0 to my local environment via conda. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158112470
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3356,6 +3356,7 @@ def test_schema_conversion_roundtrip(self):
             self.assertEquals(self.schema, schema_rt)
     
     
    +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed")
    --- End diff --
    
    @ueshin @HyukjinKwon just confirming that this test should be conditional on pandas/pyarrow being installed as we will check for a minimum pyarrow version when using `pandas_udf `?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Hi @shaneknapp , I think we are all ready here to try updating to pyarrow 0.8.0.  The build here should pass once this version is available, if you want to just try updating a single worker first and get an idea if all is well.  Also, if you didn't see here https://github.com/apache/spark/pull/19884#issuecomment-351916074, I believe there are some workers without Pandas 0.19.2 and some without pyarrow already installed.  Thanks!!!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    we should be good to go:
    
    ```$ pssh -h jenkins_workers.txt -t 0 "export PATH=/home/anaconda/envs/py3k/bin$PATH; pip install pyarrow==0.8.0"
    [1] 05:55:00 [SUCCESS] amp-jenkins-worker-01
    [2] 05:55:00 [SUCCESS] amp-jenkins-worker-03
    [3] 05:55:00 [SUCCESS] amp-jenkins-worker-08
    [4] 05:55:00 [SUCCESS] amp-jenkins-worker-07
    [5] 05:55:00 [SUCCESS] amp-jenkins-worker-05
    [6] 05:55:00 [SUCCESS] amp-jenkins-worker-04
    [7] 05:55:00 [SUCCESS] amp-jenkins-worker-06
    [8] 05:55:00 [SUCCESS] amp-jenkins-worker-02
    ```
    
    ...and...
    
    ```$ pssh -h jenkins_workers.txt -t 0 -i "export PATH=/home/anaconda/envs/py3k/bin:$PATH; pip show pyarrow | grep ^Version"
    [1] 05:56:28 [SUCCESS] amp-jenkins-worker-02
    Version: 0.8.0
    [2] 05:56:28 [SUCCESS] amp-jenkins-worker-06
    Version: 0.8.0
    [3] 05:56:28 [SUCCESS] amp-jenkins-worker-03
    Version: 0.8.0
    [4] 05:56:28 [SUCCESS] amp-jenkins-worker-05
    Version: 0.8.0
    [5] 05:56:28 [SUCCESS] amp-jenkins-worker-08
    Version: 0.8.0
    [6] 05:56:28 [SUCCESS] amp-jenkins-worker-04
    Version: 0.8.0
    [7] 05:56:28 [SUCCESS] amp-jenkins-worker-07
    Version: 0.8.0
    [8] 05:56:28 [SUCCESS] amp-jenkins-worker-01
    Version: 0.8.0
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84738 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84738/testReport)** for PR 19884 at commit [`46ad595`](https://github.com/apache/spark/commit/46ad5951652c40de3c2c108c9b952b16dfcc3ad5).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    If you want to install pyarrow 0.8.0 via conda it's available now from the `-c conda-forge` channel (https://anaconda.org/conda-forge/pyarrow). I am not sure where we are at on PyPI / pip packages -- I will start the update process later today if no one else does cc @BryanCutler @siddharthteotia @xhochy 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Just for a refreshing reminder about Jenkins, I happened to check what we have in Jenkins roughly a month ago (just simply by printing out the versions within PySpark tests) in a specific machine:
    
    ```
    PyPy - No Pandas
    Python 2.7 Pandas [0.16.0]
    Python 3.4 Pandas [0.19.2]
    ```
    
    ```
    PyPy - No PyArrow
    python 2.7 - No PyArrow
    Python 3.4 PyArrow [0.4.1]
    ```
    
    I think we should also make sure which Python has the corresponding Pandas and PyArrow.
    
    Also, we dropped Pandas 0.19.2 per http://apache-spark-developers-list.1001551.n3.nabble.com/discuss-PySpark-Can-we-drop-support-old-Pandas-lt-0-19-2-or-what-version-should-we-support-td22834.html and https://github.com/apache/spark/pull/19607. 
    I think each Python also should have Pandas 0.19.2 now if I haven't missed something.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157737242
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    Seems fine. Only I know about `LooseVersion` is it compares versions in string correctly. Sure, we should add it there too.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r161391208
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -91,7 +91,7 @@ public long position() {
       }
     
       @Override
    -  public long transfered() {
    +  public long transferred() {
    --- End diff --
    
    It doesn't. The old method is implemented in `AbstractFileRegion.transfered`. In addition, the whole network module is private, we don't need to maintain compatibility.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85246 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85246/testReport)** for PR 19884 at commit [`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802).
     * This patch passes all tests.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158173623
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3356,6 +3356,7 @@ def test_schema_conversion_roundtrip(self):
             self.assertEquals(self.schema, schema_rt)
     
     
    +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed")
    --- End diff --
    
    I can't take a closer look now but let's do this if it passes the tests cc @ueshin 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r161397435
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -91,7 +91,7 @@ public long position() {
       }
     
       @Override
    -  public long transfered() {
    +  public long transferred() {
    --- End diff --
    
    I see. Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85100/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @BryanCutler, did we resolve https://github.com/apache/spark/pull/19884#issuecomment-353276931? If not, shall we file a JIRA?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85099/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84663 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84663/testReport)** for PR 19884 at commit [`fdba406`](https://github.com/apache/spark/commit/fdba406f29216b8ef592de45dc36461217113410).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158212056
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, functionType=None):
     
            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
            >>> from pyspark.sql.types import IntegerType, StringType
    -       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
    -       >>> @pandas_udf(StringType())
    +       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # doctest: +SKIP
    +       >>> @pandas_udf(StringType())  # doctest: +SKIP
            ... def to_upper(s):
            ...     return s.str.upper()
            ...
    -       >>> @pandas_udf("integer", PandasUDFType.SCALAR)
    +       >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
            ... def add_one(x):
            ...     return x + 1
            ...
    -       >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age"))
    +       >>> df = spark.createDataFrame([(1, "John Doe", 21)],
    +       ...                            ("id", "name", "age"))  # doctest: +SKIP
            >>> df.select(slen("name").alias("slen(name)"), to_upper("name"), add_one("age")) \\
            ...     .show()  # doctest: +SKIP
            +----------+--------------+------------+
            |slen(name)|to_upper(name)|add_one(age)|
            +----------+--------------+------------+
    -       |         8|      JOHN DOE|          22|
    +       |         8|          JOHN|          22|
    --- End diff --
    
    oops, done!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @HyukjinKwon @wesm @BryanCutler 
    
    alright.  here's my plan for right now:
    * python 3.4.5 -- upgrade pyarrow --> 0.8.0  (confirmed working on my staging environment)
    
    what i'm not going to do today:
    * install pyarrow for python 2.7 
    * mess with the pypy installation
    
    i should have pyarrow updated across all workers in ~15 mins, tops.
    
    and please note that spark is only built on centos and ubuntu *nix distros @ RISELab (neé AMPLab).  we do not have, nor plan on having any windows build nodes in the immediate future.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    cc @zsxwing as well, I saw you opened a JIRA about this - SPARK-22656



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157960643
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -33,6 +33,10 @@ def _wrap_function(sc, func, returnType):
     
     
     def _create_udf(f, returnType, evalType):
    +    from pyspark.sql.utils import _require_minimum_pyarrow_version
    +
    +    _require_minimum_pyarrow_version()
    --- End diff --
    
    Yeah, that is not good!  I was a little hesitant to put it in `def pandas_udf` because things are a little different when used as a decorator.  How about leave it in `_create_udf` only when the eval type is a Pandas form?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    LGTM
    
    Merged to master.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85195/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158206309
  
    --- Diff: python/pyspark/sql/utils.py ---
    @@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr):
         for i in range(0, len(arr)):
             jarr[i] = arr[i]
         return jarr
    +
    +
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    @ueshin did we do the same thing for pandas?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85195 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85195/testReport)** for PR 19884 at commit [`d92ae90`](https://github.com/apache/spark/commit/d92ae90e05f55955eaad8e7f55e6324bf333a6bc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    pypy is 2.5.1, no pandas or pyarrow   (/usr/bin/pypy -- hand-rolled dist i put together ~3 years ago)
    
    python 3.4.5:  pyarrow 0.4.1, pandas 0.19.2  (managed by anaconda)
    
    python 2.7.13:  no pyarrow, pandas 0.16.0  (managed by anaconda)
    
    please correct me if i'm wrong, but i was under the impression that we're only supporting pyarrow w/python3.



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85220/testReport)** for PR 19884 at commit [`423b68c`](https://github.com/apache/spark/commit/423b68cc2831106bcd7d59e84c86c4511e6fb347).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155989817
  
    --- Diff: pom.xml ---
    @@ -185,7 +185,7 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    -    <arrow.version>0.4.0</arrow.version>
    +    <arrow.version>0.8.0-SNAPSHOT</arrow.version>
    --- End diff --
    
    Please don't forget that we also need to update `dev/deps/spark-deps-hadoop-2.x` files.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Ok I did some local testing with these changes and pyarrow 0.8.0 with different combinations of Python and Pandas:
    
    **python 3.6.3, pandas 0.19.2**
    
    ERROR: test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests)
    ```
    pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us, tz=UTC] would lose data: 28800000000001
    ```
    
    **python 3.6.3, pandas 0.21.0**
    
    All tests pass
    
    **python 2.7.14, pandas 0.21.1**
    
    All tests pass
    
    **python 2.7.14, pandas 0.19.2**
    
    ERROR: test_createDataFrame_does_not_modify_input (pyspark.sql.tests.ArrowTests)
    ```
    pyarrow.lib.ArrowInvalid: Casting from timestamp[ns] to timestamp[us, tz=UTC] would lose data: 28800000000001
    ```
    It seems like pandas 0.19.2 has a timestamp issue, but let's see if it is reproduced in the Jenkins env here
    
    cc @wesm 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r154810506
  
    --- Diff: pom.xml ---
    @@ -185,7 +185,7 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    -    <arrow.version>0.4.0</arrow.version>
    +    <arrow.version>0.8.0-SNAPSHOT</arrow.version>
    --- End diff --
    
    Is there any ETA for the offficial 0.8.0?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85165 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85165/testReport)** for PR 19884 at commit [`d92ae90`](https://github.com/apache/spark/commit/d92ae90e05f55955eaad8e7f55e6324bf333a6bc).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85244/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    For https://github.com/apache/spark/pull/19884#issuecomment-352993779,
    
    > I have seen few tests failed in Python 2 and PyPy with Pandas and PyArrow in my local if i am not mistaken and if I remember correctly but haven't got really enough time to check if it is an actual issue and file a JIRA. But, I am pretty sure some tests will fail after the upgrade (or installation) of Pandas and PyArrow.
    
    Just double checked in my local by:
    
    Python 3: pandas (0.19.2)/ pyarrow (0.4.1) - all pass
    
    ```
    ./run-tests --python-executables=python3 --modules pyspark-sql
    ```
    
    PyPy 5.8.0: pandas (0.21.1) / no pyarrow - all pass
    
    ```
    ./run-tests --python-executables=pypy --modules pyspark-sql
    ```
    
    Python 2.7: pandas (0.20.2) / pyarrow (0.4.1) - several few tests look constantly failed (3 times).
    
    ```
    ./run-tests --python-executables=python2.7 --modules pyspark-sql
    ```
    
    for example as below:
    
    ```
    ..E.......................
    ======================================================================
    ERROR [2.557s]: test_createDataFrame_respect_session_timezone (pyspark.sql.tests.ArrowTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/...spark/python/pyspark/sql/tests.py", line 3284, in test_createDataFrame_respect_session_timezone
        self.assertEqual(result_la, result_arrow_la)
    AssertionError: Lists differ: [Row(1_str_t=u'a', 2_int_t=1, ... != [Row(1_str_t=u'a', 2_int_t=1, ...
    
    First differing element 0:
    Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.20000000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1969, 1, 1, 1, 1, 1))
    Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.20000000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1968, 12, 31, 8, 1, 1))
    
    Diff is 2160 characters long. Set self.maxDiff to None to see it.
    
    ======================================================================
    ERROR [0.209s]: test_createDataFrame_toggle (pyspark.sql.tests.ArrowTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/...spark/python/pyspark/sql/tests.py", line 3270, in test_createDataFrame_toggle
        self.assertEquals(df_no_arrow.collect(), df_arrow.collect())
    AssertionError: Lists differ: [Row(1_str_t=u'a', 2_int_t=1, ... != [Row(1_str_t=u'a', 2_int_t=1, ...
    
    First differing element 0:
    Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.20000000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1969, 1, 1, 18, 1, 1))
    Row(1_str_t=u'a', 2_int_t=1, 3_long_t=10, 4_float_t=0.20000000298023224, 5_double_t=2.0, 6_date_t=datetime.date(1969, 1, 1), 7_timestamp_t=datetime.datetime(1969, 1, 1, 1, 1, 1))
    
    Diff is 2160 characters long. Set self.maxDiff to None to see it.
    
    ======================================================================
    ERROR [0.166s]: test_toPandas_arrow_toggle (pyspark.sql.tests.ArrowTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/...spark/python/pyspark/sql/tests.py", line 3216, in test_toPandas_arrow_toggle
        self.assertFramesEqual(pdf_arrow, pdf)
      File "/...spark/python/pyspark/sql/tests.py", line 3178, in assertFramesEqual
        self.assertTrue(df_without.equals(df_with_arrow), msg=msg)
    AssertionError: DataFrame from Arrow is not equal
    
    With Arrow:
      1_str_t  2_int_t  3_long_t  4_float_t  5_double_t   6_date_t  \
    
            7_timestamp_t
    dtype: object
    
    Without:
      1_str_t  2_int_t  3_long_t  4_float_t  5_double_t   6_date_t  \
    
            7_timestamp_t
    dtype: object
    
    ======================================================================
    ERROR [0.182s]: test_toPandas_respect_session_timezone (pyspark.sql.tests.ArrowTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/...spark/python/pyspark/sql/tests.py", line 3227, in test_toPandas_respect_session_timezone
        self.assertFramesEqual(pdf_arrow_la, pdf_la)
      File "/...spark/python/pyspark/sql/tests.py", line 3178, in assertFramesEqual
        self.assertTrue(df_without.equals(df_with_arrow), msg=msg)
    AssertionError: DataFrame from Arrow is not equal
    
    With Arrow:
      1_str_t  2_int_t  3_long_t  4_float_t  5_double_t   6_date_t  \
    
            7_timestamp_t
    dtype: object
    
    Without:
      1_str_t  2_int_t  3_long_t  4_float_t  5_double_t   6_date_t  \
    
            7_timestamp_t
    dtype: object
    
    ======================================================================
    ERROR [0.015s]: test_vectorized_udf_check_config (pyspark.sql.tests.VectorizedUDFTests)
    ----------------------------------------------------------------------
    Traceback (most recent call last):
      File "/...spark/python/pyspark/sql/tests.py", line 3804, in test_vectorized_udf_check_config
        result = df.select(check_records_per_batch(col("id")))
      File "/...spark/python/pyspark/sql/udf.py", line 151, in wrapper
        return self(*args)
      File "/...spark/python/pyspark/sql/udf.py", line 132, in __call__
        judf = self._judf
      File "/...spark/python/pyspark/sql/udf.py", line 116, in _judf
        self._judf_placeholder = self._create_judf()
      File "/...spark/python/pyspark/sql/udf.py", line 125, in _create_judf
        wrapped_func = _wrap_function(sc, self.func, self.returnType)
      File "/...spark/python/pyspark/sql/udf.py", line 30, in _wrap_function
        pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
      File "/...spark/python/pyspark/rdd.py", line 2389, in _prepare_for_python_RDD
        pickled_command = ser.dumps(command)
      File "/...spark/python/pyspark/serializers.py", line 574, in dumps
        return cloudpickle.dumps(obj, 2)
      File "/...spark/python/pyspark/cloudpickle.py", line 918, in dumps
        cp.dump(obj)
      File "/...spark/python/pyspark/cloudpickle.py", line 235, in dump
        return Pickler.dump(self, obj)
    ...
      File "/...spark/python/pyspark/cloudpickle.py", line 835, in save_file
        raise pickle.PicklingError("Cannot pickle files that are not opened for reading: %s" % obj.mode)
    pickle.PicklingError: Cannot pickle files that are not opened for reading: w
    
    ----------------------------------------------------------------------
    Ran 226 tests in 245.263s
    
    FAILED (errors=5, skipped=3)
    
    ...
    ```
    
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    I used a workaround for timestamp casts that allows the tests to pass for me locally, and left a note to look into the root cause later.  Hopefully this should pass now and we will be good to merge.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85044 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85044/testReport)** for PR 19884 at commit [`ad8d5e2`](https://github.com/apache/spark/commit/ad8d5e26d040f282dea531785c41dd645b9f0a9e).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85160 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85160/testReport)** for PR 19884 at commit [`0047f7a`](https://github.com/apache/spark/commit/0047f7a6560bfbb46d7ee28df0c2781f7538b907).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84743/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84447 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84447/testReport)** for PR 19884 at commit [`4b0790b`](https://github.com/apache/spark/commit/4b0790bfdd281719c2b5471c077e495523b28e3b).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r156240909
  
    --- Diff: dev/deps/spark-deps-hadoop-2.6 ---
    @@ -144,7 +144,9 @@ metrics-json-3.1.5.jar
     metrics-jvm-3.1.5.jar
     minlog-1.3.0.jar
     netty-3.9.9.Final.jar
    -netty-all-4.0.47.Final.jar
    +netty-all-4.1.17.Final.jar
    +netty-buffer-4.1.17.Final.jar
    +netty-common-4.1.17.Final.jar
    --- End diff --
    
    @zsxwing do you think `netty-buffer` and `netty-common` can be safely excluded in the Spark pom because the same classes also in `netty-all`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    LGTM, I'm also fine to ignore some tests if they are hard to fix, to unblock other PRs sooner.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    I just uploaded the pip packages for Windows and Linux so they are available.  There is an error building the Mac packages, so those will come later after that is resolved.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85160 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85160/testReport)** for PR 19884 at commit [`0047f7a`](https://github.com/apache/spark/commit/0047f7a6560bfbb46d7ee28df0c2781f7538b907).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158206546
  
    --- Diff: python/pyspark/sql/utils.py ---
    @@ -110,3 +110,12 @@ def toJArray(gateway, jtype, arr):
         for i in range(0, len(arr)):
             jarr[i] = arr[i]
         return jarr
    +
    +
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    No. I just checked if `ImportError` occurred or not. We should do the same thing for pandas later.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85044 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85044/testReport)** for PR 19884 at commit [`ad8d5e2`](https://github.com/apache/spark/commit/ad8d5e26d040f282dea531785c41dd645b9f0a9e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Just as a matter of prioritization -- do you need pip or conda packages to be able to proceed with finishing/verifying this patch? Getting pip packages up on PyPI shouldn't take too long after the release vote closes 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Fix comment - Hey @BryanCutler, to me I am fine to skip some tests for now if they take a while to fix.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85246 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85246/testReport)** for PR 19884 at commit [`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @wesm Yes, I'd like to use it asap to verify this patch and to confirm the behavior of my PR #18754 for `DecimalType` support. Thanks.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85242 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85242/testReport)** for PR 19884 at commit [`ae84c84`](https://github.com/apache/spark/commit/ae84c8454875906e488b895e18ad78ddf6e9fbc9).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Thanks all for reviewing and getting the Netty upgrade in also!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85244 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85244/testReport)** for PR 19884 at commit [`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85242/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157420989
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala ---
    @@ -86,21 +86,16 @@ private[sql] object ArrowConverters {
         val root = VectorSchemaRoot.create(arrowSchema, allocator)
         val arrowWriter = ArrowWriter.create(root)
     
    -    var closed = false
    -
         context.addTaskCompletionListener { _ =>
    -      if (!closed) {
    -        root.close()
    -        allocator.close()
    -      }
    +      root.close()
    +      allocator.close()
    --- End diff --
    
    We can simplify 2 places in `ArrowPythonRunner` as well?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84616/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84616 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84616/testReport)** for PR 19884 at commit [`93b1eb3`](https://github.com/apache/spark/commit/93b1eb37fa1f39d5d69853248f72808bf3b05a81).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r161391299
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -91,7 +91,7 @@ public long position() {
       }
     
       @Override
    -  public long transfered() {
    +  public long transferred() {
    --- End diff --
    
    Oh, I see. `AbstractFileRegion.transfered` is `final` so it may break binary compatibility. However, this is fine since it's a private module.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r156097961
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -223,27 +223,13 @@ def _create_batch(series, timezone):
             series = [series]
         series = ((s, None) if not isinstance(s, (list, tuple)) else s for s in series)
     
    -    # If a nullable integer series has been promoted to floating point with NaNs, need to cast
    -    # NOTE: this is not necessary with Arrow >= 0.7
    -    def cast_series(s, t):
    -        if type(t) == pa.TimestampType:
    -            # NOTE: convert to 'us' with astype here, unit ignored in `from_pandas` see ARROW-1680
    -            return _check_series_convert_timestamps_internal(s.fillna(0), timezone)\
    -                .values.astype('datetime64[us]', copy=False)
    -        # NOTE: can not compare None with pyarrow.DataType(), fixed with Arrow >= 0.7.1
    -        elif t is not None and t == pa.date32():
    -            # TODO: this converts the series to Python objects, possibly avoid with Arrow >= 0.8
    -            return s.dt.date
    -        elif t is None or s.dtype == t.to_pandas_dtype():
    -            return s
    -        else:
    -            return s.fillna(0).astype(t.to_pandas_dtype(), copy=False)
    -
    -    # Some object types don't support masks in Arrow, see ARROW-1721
         def create_array(s, t):
    -        casted = cast_series(s, t)
    -        mask = None if casted.dtype == 'object' else s.isnull()
    -        return pa.Array.from_pandas(casted, mask=mask, type=t)
    +        mask = s.isnull()
    +        # Workaround for casting timestamp units with timezone, ARROW-1906
    --- End diff --
    
    Yes, just fixed in ARROW-1906 https://github.com/apache/arrow/pull/1411


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    The highlights that pertain to Spark for the update from Arrow versoin 0.4.1 to 0.8.0 include:
    
    * Java refactoring for more simple API
    * Type support for DecimalType, ArrayType
    * Improved type casting support in Python
    * 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    > Great, @BryanCutler . Could you put the highlight in the PR description, too?
    
    Sure, thanks @dongjoon-hyun !  Will do, just want to go back and check the release notes first


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    > We are supporting pyarrow on Python 2.7 except for Windows
    
    Hi @wesm, mind if I ask the details about Windows? I think we should add few asserts in the version checks separately later (not here) for sure. Does PyArrow work on other combinations like Python 3 & Windows but only not in Python 2.7 & Windows combination?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r156242013
  
    --- Diff: dev/deps/spark-deps-hadoop-2.6 ---
    @@ -144,7 +144,9 @@ metrics-json-3.1.5.jar
     metrics-jvm-3.1.5.jar
     minlog-1.3.0.jar
     netty-3.9.9.Final.jar
    -netty-all-4.0.47.Final.jar
    +netty-all-4.1.17.Final.jar
    +netty-buffer-4.1.17.Final.jar
    +netty-common-4.1.17.Final.jar
    --- End diff --
    
    Cool, thx just wanted to be sure


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157952778
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    +    """ Raise ImportError if minimum version of pyarrow is not installed
    +    """
    +    from distutils.version import LooseVersion
    +    import pyarrow
    +    if pyarrow.__version__ < LooseVersion('0.8.0'):
    --- End diff --
    
    sure, no prob


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84740 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84740/testReport)** for PR 19884 at commit [`c3d612f`](https://github.com/apache/spark/commit/c3d612fd1151271d4da027e3d252eed0aea4ea60).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    We are supporting pyarrow on Python 2.7 except for Windows


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84663 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84663/testReport)** for PR 19884 at commit [`fdba406`](https://github.com/apache/spark/commit/fdba406f29216b8ef592de45dc36461217113410).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    I saw #18974 tried to upgrade Arrow but got closed due to some Jenkins issue. @ueshin do you have any idea what may block this PR? Jenkins cannot support to install multiple versions of PyArrow?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    >Jenkins cannot support to install multiple versions of PyArrow?
    
    @zsxwing that's right, we will have to coordinate to make sure the Jenkins pyarrow is upgraded to version 0.8 as well.  I'm not sure the best way to coordinate all of this because this PR, jenkins upgrade, and Spark Netty upgrade all need to happen at the same time.
    
    @holdenk @shaneknapp will one of you be able to work on the pyarrow upgrade for Jenkins sometime around next week?  (assuming Arrow 0.8 is released in the next day or so)


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155647741
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1658,13 +1657,13 @@ def from_arrow_type(at):
             spark_type = FloatType()
         elif at == pa.float64():
             spark_type = DoubleType()
    -    elif type(at) == pa.DecimalType:
    +    elif pa.types.is_decimal(at):
             spark_type = DecimalType(precision=at.precision, scale=at.scale)
    -    elif at == pa.string():
    +    elif pa.types.is_string(at):
             spark_type = StringType()
         elif at == pa.date32():
             spark_type = DateType()
    -    elif type(at) == pa.TimestampType:
    +    elif pa.types.is_timestamp(at):
    --- End diff --
    
    Yep, this is right. I'm opening a JIRA to add more functions for testing exact types


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157761407
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    > just want to verify that we want to check for a minimum version of pyarrow and this is the right place to put this function?
    
    Could we put this under `pyspark.sql.utils` for now if we are all fine?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85205/testReport)** for PR 19884 at commit [`faa9f09`](https://github.com/apache/spark/commit/faa9f09faabdc8047b58283f9101142eecf1c754).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r156237523
  
    --- Diff: pom.xml ---
    @@ -185,7 +185,7 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    -    <arrow.version>0.4.0</arrow.version>
    +    <arrow.version>0.8.0-SNAPSHOT</arrow.version>
    --- End diff --
    
    done


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158208592
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2141,22 +2141,22 @@ def pandas_udf(f=None, returnType=None, functionType=None):
     
            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
            >>> from pyspark.sql.types import IntegerType, StringType
    -       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
    -       >>> @pandas_udf(StringType())
    +       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # doctest: +SKIP
    +       >>> @pandas_udf(StringType())  # doctest: +SKIP
            ... def to_upper(s):
            ...     return s.str.upper()
            ...
    -       >>> @pandas_udf("integer", PandasUDFType.SCALAR)
    +       >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
            ... def add_one(x):
            ...     return x + 1
            ...
    -       >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age"))
    +       >>> df = spark.createDataFrame([(1, "John", 21)], ("id", "name", "age"))  # doctest: +SKIP
    --- End diff --
    
    The name change shouldn't have been committed, I'll change it back.  I don't think we can make the doctests conditional on if pandas/pyarrow is installed, so unless we make these required dependencies and have them installed on all the workers, we need to skip them.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85220/testReport)** for PR 19884 at commit [`423b68c`](https://github.com/apache/spark/commit/423b68cc2831106bcd7d59e84c86c4511e6fb347).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85205/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157952725
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    sounds good


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157961467
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -33,6 +33,10 @@ def _wrap_function(sc, func, returnType):
     
     
     def _create_udf(f, returnType, evalType):
    +    from pyspark.sql.utils import _require_minimum_pyarrow_version
    +
    +    _require_minimum_pyarrow_version()
    --- End diff --
    
    Yeah, sounds good.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85205 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85205/testReport)** for PR 19884 at commit [`faa9f09`](https://github.com/apache/spark/commit/faa9f09faabdc8047b58283f9101142eecf1c754).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by cloud-fan <gi...@git.apache.org>.
Github user cloud-fan commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158211101
  
    --- Diff: python/pyspark/sql/functions.py ---
    @@ -2141,22 +2141,23 @@ def pandas_udf(f=None, returnType=None, functionType=None):
     
            >>> from pyspark.sql.functions import pandas_udf, PandasUDFType
            >>> from pyspark.sql.types import IntegerType, StringType
    -       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())
    -       >>> @pandas_udf(StringType())
    +       >>> slen = pandas_udf(lambda s: s.str.len(), IntegerType())  # doctest: +SKIP
    +       >>> @pandas_udf(StringType())  # doctest: +SKIP
            ... def to_upper(s):
            ...     return s.str.upper()
            ...
    -       >>> @pandas_udf("integer", PandasUDFType.SCALAR)
    +       >>> @pandas_udf("integer", PandasUDFType.SCALAR)  # doctest: +SKIP
            ... def add_one(x):
            ...     return x + 1
            ...
    -       >>> df = spark.createDataFrame([(1, "John Doe", 21)], ("id", "name", "age"))
    +       >>> df = spark.createDataFrame([(1, "John Doe", 21)],
    +       ...                            ("id", "name", "age"))  # doctest: +SKIP
            >>> df.select(slen("name").alias("slen(name)"), to_upper("name"), add_one("age")) \\
            ...     .show()  # doctest: +SKIP
            +----------+--------------+------------+
            |slen(name)|to_upper(name)|add_one(age)|
            +----------+--------------+------------+
    -       |         8|      JOHN DOE|          22|
    +       |         8|          JOHN|          22|
    --- End diff --
    
    nit: we should revert this too


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155647982
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1658,13 +1657,13 @@ def from_arrow_type(at):
             spark_type = FloatType()
         elif at == pa.float64():
             spark_type = DoubleType()
    -    elif type(at) == pa.DecimalType:
    +    elif pa.types.is_decimal(at):
             spark_type = DecimalType(precision=at.precision, scale=at.scale)
    -    elif at == pa.string():
    +    elif pa.types.is_string(at):
             spark_type = StringType()
         elif at == pa.date32():
             spark_type = DateType()
    -    elif type(at) == pa.TimestampType:
    +    elif pa.types.is_timestamp(at):
    --- End diff --
    
    https://issues.apache.org/jira/browse/ARROW-1905


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/spark/pull/19884


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85099 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85099/testReport)** for PR 19884 at commit [`22c6b92`](https://github.com/apache/spark/commit/22c6b92fc6ab31c332715c9372cfc1fe27835ccb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84447/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @BryanCutler could you just pull my changes into this PR since we need to both changes to pass Jenkins? Thanks!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85203 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85203/testReport)** for PR 19884 at commit [`715f83d`](https://github.com/apache/spark/commit/715f83dfb96823fc79bca0fcd904c1ddeddaf6d6).
     * This patch **fails Python style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85165 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85165/testReport)** for PR 19884 at commit [`d92ae90`](https://github.com/apache/spark/commit/d92ae90e05f55955eaad8e7f55e6324bf333a6bc).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @shaneknapp afaik Spark is supporting pyarrow with python 2.7 and we should be testing these also, but I'm not sure about pypy.  Maybe @ueshin or @HyukjinKwon can confirm before we start upgrading?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    We need to upgrade to 0.8.0 now because we changed the binary format 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @BryanCutler @HyukjinKwon 
    
    pandas and pyarrow are most definitely installed on all of the jenkins workers.  the 'missing' packages happened after we had a power outage at the colo, and the jenkins workers rebooted while the master (on UPS) didn't.  this causes the PATH env var to be dropped, which means that instead of seeing the anaconda installation in PATH, jenkins defaults to system python (which has the absolute minimum of packages installed).
    
    regarding the pyarrow upgrade:  let's schedule it for wednesday (tomorrow) morning, EST.  i'm about to get on another plane and have a few more hours of traveling left today.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157686204
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala ---
    @@ -126,18 +121,14 @@ class ArrowPythonRunner(
           private var schema: StructType = _
           private var vectors: Array[ColumnVector] = _
     
    -      private var closed = false
    -
           context.addTaskCompletionListener { _ =>
             // todo: we need something like `reader.end()`, which release all the resources, but leave
    --- End diff --
    
    Awesome. Then we should use the API at the end of arrow stream and here just in case.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by dongjoon-hyun <gi...@git.apache.org>.
Github user dongjoon-hyun commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Great, @BryanCutler . Could you put the highlight in the PR description, too?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by gatorsmile <gi...@git.apache.org>.
Github user gatorsmile commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r161390222
  
    --- Diff: common/network-common/src/main/java/org/apache/spark/network/protocol/MessageWithHeader.java ---
    @@ -91,7 +91,7 @@ public long position() {
       }
     
       @Override
    -  public long transfered() {
    +  public long transferred() {
    --- End diff --
    
    This break binary compatibility. Is it OK? @zsxwing 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157738359
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    +    """ Raise ImportError if minimum version of pyarrow is not installed
    +    """
    +    from distutils.version import LooseVersion
    +    import pyarrow
    +    if pyarrow.__version__ < LooseVersion('0.8.0'):
    --- End diff --
    
    BTW, I usually do like `LooseVersion(pyarrow.__version__) < "0.8.0"` tho.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    > @BryanCutler can you give me a minimal repro for the timestamp issue you cited above?
    
    Sure @wesm, I'll ping you with a repro


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84663/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85043 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85043/testReport)** for PR 19884 at commit [`084d30b`](https://github.com/apache/spark/commit/084d30b6fae89fbcb7f47013f47b7848ff135387).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85244 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85244/testReport)** for PR 19884 at commit [`b0200ef`](https://github.com/apache/spark/commit/b0200efd30c6fe77ec6e57d65f3bc828be0e1802).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r158205751
  
    --- Diff: python/pyspark/sql/tests.py ---
    @@ -3356,6 +3356,7 @@ def test_schema_conversion_roundtrip(self):
             self.assertEquals(self.schema, schema_rt)
     
     
    +@unittest.skipIf(not _have_pandas or not _have_arrow, "Pandas or Arrow not installed")
    --- End diff --
    
    Sorry for the delay. Yeah, we should check the minimum pyarrow version.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @HyukjinKwon [ARROW-1949](https://issues.apache.org/jira/browse/ARROW-1949) was created to add an option to allow truncation when data will be lost.  Once that is in Arrow, we can remove the workaround if we want.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84743 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84743/testReport)** for PR 19884 at commit [`3a5e3c1`](https://github.com/apache/spark/commit/3a5e3c12da0bd8cdcb580067281b908422f35a6d).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Hi @zsxwing is it okay to resolve SPARK-19552?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Jenkins, retest this please.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    retest this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85242/testReport)** for PR 19884 at commit [`ae84c84`](https://github.com/apache/spark/commit/ae84c8454875906e488b895e18ad78ddf6e9fbc9).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    the install is done w/pip in a conda environment.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r154811610
  
    --- Diff: pom.xml ---
    @@ -185,7 +185,7 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    -    <arrow.version>0.4.0</arrow.version>
    +    <arrow.version>0.8.0-SNAPSHOT</arrow.version>
    --- End diff --
    
    We are still wrapping a few things up, should be later this week or early next week.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    alright https://github.com/apache/spark/pull/18754 should now be unblocked.  let me know if there's anything else that needs to happen.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157683878
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowPythonRunner.scala ---
    @@ -126,18 +121,14 @@ class ArrowPythonRunner(
           private var schema: StructType = _
           private var vectors: Array[ColumnVector] = _
     
    -      private var closed = false
    -
           context.addTaskCompletionListener { _ =>
             // todo: we need something like `reader.end()`, which release all the resources, but leave
    --- End diff --
    
    @ueshin is this still a valid todo?  It is possible in Arrow now by calling `reader.close(false)` where the `false` flag will free resources but keep the stream open.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157677769
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/arrow/ArrowConverters.scala ---
    @@ -86,21 +86,16 @@ private[sql] object ArrowConverters {
         val root = VectorSchemaRoot.create(arrowSchema, allocator)
         val arrowWriter = ArrowWriter.create(root)
     
    -    var closed = false
    -
         context.addTaskCompletionListener { _ =>
    -      if (!closed) {
    -        root.close()
    -        allocator.close()
    -      }
    +      root.close()
    +      allocator.close()
    --- End diff --
    
    Yes thanks for the reminder, they are updated now also.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157683384
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    @ueshin @HyukjinKwon just want to verify that we want to check for a minimum version of pyarrow and this is the right place to put this function?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test PASSed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85246/
    Test PASSed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155626249
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1658,13 +1657,13 @@ def from_arrow_type(at):
             spark_type = FloatType()
         elif at == pa.float64():
             spark_type = DoubleType()
    -    elif type(at) == pa.DecimalType:
    +    elif pa.types.is_decimal(at):
             spark_type = DecimalType(precision=at.precision, scale=at.scale)
    -    elif at == pa.string():
    +    elif pa.types.is_string(at):
             spark_type = StringType()
         elif at == pa.date32():
             spark_type = DateType()
    -    elif type(at) == pa.TimestampType:
    +    elif pa.types.is_timestamp(at):
    --- End diff --
    
    @icexelloss @wesm is this the recommended way to check type id for the latest pyarrow?  For types with a single bit width, I am using the is_* functions, like `is_timestamp`, but for others I still need to check object equality such as `t == pa.date32()` because there is no `is_date32()` only `is_date()`


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157958862
  
    --- Diff: python/pyspark/sql/udf.py ---
    @@ -33,6 +33,10 @@ def _wrap_function(sc, func, returnType):
     
     
     def _create_udf(f, returnType, evalType):
    +    from pyspark.sql.utils import _require_minimum_pyarrow_version
    +
    +    _require_minimum_pyarrow_version()
    --- End diff --
    
    We can't put this here because `_create_udf` is used for `udf` (not `pandas_udf`) as well. It is not related to pyarrow and we should be able to define normal udf without pyarrow.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84738 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84738/testReport)** for PR 19884 at commit [`46ad595`](https://github.com/apache/spark/commit/46ad5951652c40de3c2c108c9b952b16dfcc3ad5).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84740/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85100 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85100/testReport)** for PR 19884 at commit [`22c6b92`](https://github.com/apache/spark/commit/22c6b92fc6ab31c332715c9372cfc1fe27835ccb).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85048 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85048/testReport)** for PR 19884 at commit [`ad8d5e2`](https://github.com/apache/spark/commit/ad8d5e26d040f282dea531785c41dd645b9f0a9e).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85043 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85043/testReport)** for PR 19884 at commit [`084d30b`](https://github.com/apache/spark/commit/084d30b6fae89fbcb7f47013f47b7848ff135387).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r157692179
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1679,6 +1678,15 @@ def from_arrow_schema(arrow_schema):
              for field in arrow_schema])
     
     
    +def _require_minimum_pyarrow_version():
    --- End diff --
    
    I'm not familiar with `LooseVersion`, but don't we need to use `LooseVersion` for `pyarrow.__version__`, too? cc @HyukjinKwon 
    Btw, shall we add the dependency to `setup.py`?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85195 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85195/testReport)** for PR 19884 at commit [`d92ae90`](https://github.com/apache/spark/commit/d92ae90e05f55955eaad8e7f55e6324bf333a6bc).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    This is a WIP to start updating Spark to use Arrow 0.8.0 which will be released soon.
    
    TODO:
    
    - [ ] Update to reflect Java API changes
    - [ ] Update to reflect Python API changes
    - [ ] Use new Python type checking
    - [ ] Remove Python type casting workarounds


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85048 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85048/testReport)** for PR 19884 at commit [`ad8d5e2`](https://github.com/apache/spark/commit/ad8d5e26d040f282dea531785c41dd645b9f0a9e).
     * This patch **fails PySpark unit tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    I need to go sleep now and I guess @ueshin should be sleeping too. Let me leave my signoff here - LGTM if the tests pass. I guess now other builds in other PRs would be broken without this PR.
    
    Let me cc @cloud-fan and @srowen here who I believe live in a different timezone.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85099 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85099/testReport)** for PR 19884 at commit [`22c6b92`](https://github.com/apache/spark/commit/22c6b92fc6ab31c332715c9372cfc1fe27835ccb).
     * This patch **fails due to an unknown error code, -9**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85160/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85203/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @zsxwing, fyi after applying your Netty upgrade patch to Arrow, and then your other patch for Spark, all of the Spark Scala/Java tests pass


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85100 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85100/testReport)** for PR 19884 at commit [`22c6b92`](https://github.com/apache/spark/commit/22c6b92fc6ab31c332715c9372cfc1fe27835ccb).
     * This patch **fails from timeout after a configured wait of \`250m\`**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Hmm.. @shaneknapp but if I remember and understood correctly, wouldn't this fail to build when the PATH is dropped? I ended up with checking both Pandas and PyArrow at that time because I realised some tests looked continuously skipped in few specific Python versions.
    
    If you could easily run commands in some workers, it might be worth to double check for sure .. :
    
    ```bash
    python2.7 -c "import pandas; print(pandas.__version__)"
    python2.7 -c "import pyarrow; print(pyarrow.__version__)"
    python3.4 -c "import pandas; print(pandas.__version__)"
    python3.4 -c "import pyarrow; print(pyarrow.__version__)"
    pypy -c "import pandas; print(pandas.__version__)"
    pypy -c "import pyarrow; print(pyarrow.__version__)"
    ```


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/84738/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @BryanCutler can you give me a minimal repro for the timestamp issue you cited above? 


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85048/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by wesm <gi...@git.apache.org>.
Github user wesm commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155854009
  
    --- Diff: pom.xml ---
    @@ -185,7 +185,7 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    -    <arrow.version>0.4.0</arrow.version>
    +    <arrow.version>0.8.0-SNAPSHOT</arrow.version>
    --- End diff --
    
    Should be able to cut an RC beginning of next week. I would suggest mvn-installing from Arrow master for the time being


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84740 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84740/testReport)** for PR 19884 at commit [`c3d612f`](https://github.com/apache/spark/commit/c3d612fd1151271d4da027e3d252eed0aea4ea60).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84447 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84447/testReport)** for PR 19884 at commit [`4b0790b`](https://github.com/apache/spark/commit/4b0790bfdd281719c2b5471c077e495523b28e3b).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85165/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    >yeah, i can do the upgrade next week. i'll be working remotely from the east coast, but unavailable at all on monday due to travel.
    
    Great, thanks @shaneknapp !  I'll ping you when I think we are set to go


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r156241339
  
    --- Diff: dev/deps/spark-deps-hadoop-2.6 ---
    @@ -144,7 +144,9 @@ metrics-json-3.1.5.jar
     metrics-jvm-3.1.5.jar
     minlog-1.3.0.jar
     netty-3.9.9.Final.jar
    -netty-all-4.0.47.Final.jar
    +netty-all-4.1.17.Final.jar
    +netty-buffer-4.1.17.Final.jar
    +netty-common-4.1.17.Final.jar
    --- End diff --
    
    @BryanCutler Yes. It should be safe.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Test FAILed.
    Refer to this link for build results (access rights to CI server needed): 
    https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/85220/
    Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Thanks for doing the update @shaneknapp and thanks for looking into the details @HyukjinKwon !
    I'll look into the test issues with python2.7, it looks to be related to timestamps..


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    test this please


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    >When I tried to run tests locally, I got OutOfMemoryException
    
    @ueshin , you got that error because the latest Arrow has upgraded Netty to 4.1.17 but Spark has an older version on the classpath.  If you apply #19829 on top of this PR, the tests should pass.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84616 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84616/testReport)** for PR 19884 at commit [`93b1eb3`](https://github.com/apache/spark/commit/93b1eb37fa1f39d5d69853248f72808bf3b05a81).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155983224
  
    --- Diff: python/pyspark/serializers.py ---
    @@ -223,27 +223,13 @@ def _create_batch(series, timezone):
             series = [series]
         series = ((s, None) if not isinstance(s, (list, tuple)) else s for s in series)
     
    -    # If a nullable integer series has been promoted to floating point with NaNs, need to cast
    -    # NOTE: this is not necessary with Arrow >= 0.7
    -    def cast_series(s, t):
    -        if type(t) == pa.TimestampType:
    -            # NOTE: convert to 'us' with astype here, unit ignored in `from_pandas` see ARROW-1680
    -            return _check_series_convert_timestamps_internal(s.fillna(0), timezone)\
    -                .values.astype('datetime64[us]', copy=False)
    -        # NOTE: can not compare None with pyarrow.DataType(), fixed with Arrow >= 0.7.1
    -        elif t is not None and t == pa.date32():
    -            # TODO: this converts the series to Python objects, possibly avoid with Arrow >= 0.8
    -            return s.dt.date
    -        elif t is None or s.dtype == t.to_pandas_dtype():
    -            return s
    -        else:
    -            return s.fillna(0).astype(t.to_pandas_dtype(), copy=False)
    -
    -    # Some object types don't support masks in Arrow, see ARROW-1721
         def create_array(s, t):
    -        casted = cast_series(s, t)
    -        mask = None if casted.dtype == 'object' else s.isnull()
    -        return pa.Array.from_pandas(casted, mask=mask, type=t)
    +        mask = s.isnull()
    +        # Workaround for casting timestamp units with timezone, ARROW-1906
    --- End diff --
    
    Will the fix for this workaround be included in Arrow 0.8?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by zsxwing <gi...@git.apache.org>.
Github user zsxwing commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    @HyukjinKwon yeah, I closed the ticket.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by HyukjinKwon <gi...@git.apache.org>.
Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    I think we should test PyArrow and Pandas with PyPy and Python 2.x too ... why not?
    
    @shaneknapp, how about upgrading the current PyArrow (python 3.4.5: pyarrow 0.4.1 -> python 3.4.5: pyarrow 0.8.0), for now and then separately proceed others later if it is relatively easy to you (assuming you are currently ready only for upgrading PyArrow for now)?
    
    Looks this upgrade blocks #18754 so upgrading python 3.4.5: pyarrow 0.4.1 -> python 3.4.5: pyarrow 0.8.0 alone should relatively be safe and not break the tests.
    
    I have seen few tests failed in Python 2 and PyPy with Pandas and PyArrow in my local if i am not mistaken but haven't got really enough time to check if it is an actual issue and file a JIRA. But, I am pretty sure some tests will fail after the upgrade.
    (^ Please let me know if anyone tried this thing before as well ..)
    
    If the below works to you @shaneknapp, I would like to suggest:
    
    1. Upgrade python 3.4.5: pyarrow 0.4.1 -> python 3.4.5: pyarrow 0.8.0 alone, and unblock #18754.
    2. Investigate and check if the tests with PyPy and Python 2 actually pass
      2.1. If not, file a JIRA and fix at our best first.
    3. Upgrade others.
    
    If you'd prefer doing it in one-go, I (and probably some guys here) will try to investigate if the tests pass with Pandas 0.19.2 and PyArrow with PyPy and Python 2 first quickly and will let you know.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    > Quick comment - Hey @BryanCutler, to me I am fine to skip some tests for now if they take a while to fix.
    
    The last failure was the same one I was seeing locally from https://github.com/apache/spark/pull/19884#issuecomment-353155446 with pandas 0.19.2.  Let me take a quick look and see if I can understand why only this one test is having an issue.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #84743 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/84743/testReport)** for PR 19884 at commit [`3a5e3c1`](https://github.com/apache/spark/commit/3a5e3c12da0bd8cdcb580067281b908422f35a6d).
     * This patch **fails Scala style tests**.
     * This patch merges cleanly.
     * This patch adds no public classes.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by SparkQA <gi...@git.apache.org>.
Github user SparkQA commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    **[Test build #85203 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/85203/testReport)** for PR 19884 at commit [`715f83d`](https://github.com/apache/spark/commit/715f83dfb96823fc79bca0fcd904c1ddeddaf6d6).


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by shaneknapp <gi...@git.apache.org>.
Github user shaneknapp commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    yeah, i can do the upgrade next week.  i'll be working remotely from the
    east coast, but unavailable at all on monday due to travel.
    
    On Mon, Dec 11, 2017 at 1:59 PM, Bryan Cutler <no...@github.com>
    wrote:
    
    > Jenkins cannot support to install multiple versions of PyArrow?
    >
    > @zsxwing <https://github.com/zsxwing> that's right, we will have to
    > coordinate to make sure the Jenkins pyarrow is upgraded to version 0.8 as
    > well. I'm not sure the best way to coordinate all of this because this PR,
    > jenkins upgrade, and Spark Netty upgrade all need to happen at the same
    > time.
    >
    > @holdenk <https://github.com/holdenk> @shaneknapp
    > <https://github.com/shaneknapp> will one of you be able to work on the
    > pyarrow upgrade for Jenkins sometime around next week? (assuming Arrow 0.8
    > is released in the next day or so)
    >
    > —
    > You are receiving this because you were mentioned.
    > Reply to this email directly, view it on GitHub
    > <https://github.com/apache/spark/pull/19884#issuecomment-350871930>, or mute
    > the thread
    > <https://github.com/notifications/unsubscribe-auth/ABiDrEi41hUQjTmiBiKwTHlm1onv23lfks5s_aW5gaJpZM4Q1ftW>
    > .
    >



---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by BryanCutler <gi...@git.apache.org>.
Github user BryanCutler commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155668850
  
    --- Diff: python/pyspark/sql/types.py ---
    @@ -1658,13 +1657,13 @@ def from_arrow_type(at):
             spark_type = FloatType()
         elif at == pa.float64():
             spark_type = DoubleType()
    -    elif type(at) == pa.DecimalType:
    +    elif pa.types.is_decimal(at):
             spark_type = DecimalType(precision=at.precision, scale=at.scale)
    -    elif at == pa.string():
    +    elif pa.types.is_string(at):
             spark_type = StringType()
         elif at == pa.date32():
             spark_type = DateType()
    -    elif type(at) == pa.TimestampType:
    +    elif pa.types.is_timestamp(at):
    --- End diff --
    
    Sounds good, thanks for confirming!


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark pull request #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to ...

Posted by ueshin <gi...@git.apache.org>.
Github user ueshin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/19884#discussion_r155708588
  
    --- Diff: pom.xml ---
    @@ -185,7 +185,7 @@
         <paranamer.version>2.8</paranamer.version>
         <maven-antrun.version>1.8</maven-antrun.version>
         <commons-crypto.version>1.0.0</commons-crypto.version>
    -    <arrow.version>0.4.0</arrow.version>
    +    <arrow.version>0.8.0-SNAPSHOT</arrow.version>
    --- End diff --
    
    Can we download the snapshot from somewhere for our local tests?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org


[GitHub] spark issue #19884: [WIP][SPARK-22324][SQL][PYTHON] Upgrade Arrow to 0.8.0

Posted by AmplabJenkins <gi...@git.apache.org>.
Github user AmplabJenkins commented on the issue:

    https://github.com/apache/spark/pull/19884
  
    Merged build finished. Test FAILed.


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscribe@spark.apache.org
For additional commands, e-mail: reviews-help@spark.apache.org