You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by Leemoonsoo <gi...@git.apache.org> on 2015/06/29 22:26:57 UTC

[GitHub] incubator-zeppelin pull request: [ZEPPELIN-97][ZEPPELIN-134] pyspa...

GitHub user Leemoonsoo opened a pull request:

    https://github.com/apache/incubator-zeppelin/pull/129

    [ZEPPELIN-97][ZEPPELIN-134] pyspark issue with mllib api

    There were issue [ZEPPELIN-97](https://issues.apache.org/jira/browse/ZEPPELIN-97) with pyspark 1.4. The reason is, from pyspark 1.4, java gateway is created with `auto_convert = True` option. This PR fixes the problem.
    
    This PR also handles [ZEPPELIN-134](https://issues.apache.org/jira/browse/ZEPPELIN-134), inject sqlContext.
    
    And it finally improves to print more verbose stacktrace message, for example
    
    from
    
    ```
    (<type 'exceptions.AttributeError'>, AttributeError("'list' object has no attribute '_get_object_id'",), <traceback object at 0x392b638>)
    ```
    
    to
    
    ```
    Traceback (most recent call last):
      File "/var/folders/zt/nd4j13y14jjg7_5pc4xgy7t80000gn/T//zeppelin_pyspark.py", line 110, in <module>
        eval(compiledCode)
      File "<string>", line 3, in <module>
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 1200, in withColumn
        return self.select('*', col.alias(colName))
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 738, in select
        jdf = self._jdf.select(self._jcols(*cols))
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 630, in _jcols
        return self._jseq(cols, _to_java_column)
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/dataframe.py", line 617, in _jseq
        return _to_seq(self.sql_ctx._sc, cols, converter)
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/pyspark/sql/column.py", line 60, in _to_seq
        return sc._jvm.PythonUtils.toSeq(cols)
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", line 529, in __call__
        [get_command_part(arg, self.pool) for arg in new_args])
      File "/Users/moon/Projects/zeppelin/spark-1.4.0-bin-hadoop2.3/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", line 265, in get_command_part
        command_part = REFERENCE_TYPE + parameter._get_object_id()
    AttributeError: 'list' object has no attribute '_get_object_id'
    ```

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/Leemoonsoo/incubator-zeppelin ZEPPELIN-97

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/incubator-zeppelin/pull/129.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #129
    
----
commit bce3c1d33e5ab48146c2d70e81935e361fcff9c2
Author: Lee moon soo <mo...@apache.org>
Date:   2015-06-29T19:53:10Z

    Print more stacktrace

commit ab01a665781a9b1399eb000ec480ed1ed4d9b715
Author: Lee moon soo <mo...@apache.org>
Date:   2015-06-29T20:20:36Z

    Add testcase for auto_convert option

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: [ZEPPELIN-97][ZEPPELIN-134] pyspa...

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/129#issuecomment-116857501
  
    Ready to merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: [ZEPPELIN-97][ZEPPELIN-134] pyspa...

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/incubator-zeppelin/pull/129


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-zeppelin pull request: [ZEPPELIN-97][ZEPPELIN-134] pyspa...

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the pull request:

    https://github.com/apache/incubator-zeppelin/pull/129#issuecomment-117051643
  
    LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---