You are viewing a plain text version of this content. The canonical link for it is here.
Posted to dev@zeppelin.apache.org by m30m <gi...@git.apache.org> on 2016/11/24 16:07:52 UTC

[GitHub] zeppelin pull request #1677: Add doc for exchanging data frames

GitHub user m30m opened a pull request:

    https://github.com/apache/zeppelin/pull/1677

    Add doc for exchanging data frames

    ### What is this PR for?
    ZeppelinContext can be used to exchange DataFrames but there are some nasty tricks and typecasts.
    It's good to provide some examples.
    
    
    ### What type of PR is it?
    Documentation
    
    ### Questions:
    * Does the licenses files need update? no
    * Is there breaking changes for older versions? no
    * Does this needs documentation? no

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/m30m/zeppelin patch-3

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/zeppelin/pull/1677.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1677
    
----
commit a039d5c4ac2e4887ce013b658e25cd1f43861a5b
Author: Mohammad Amin Khashkhashi Moghaddam <am...@gmail.com>
Date:   2016-11-24T16:05:58Z

    Add doc for exchanging data frames
    
    ZeppelinContext can be used to exchange DataFrames but there are some nasty tricks and typecasts.
    It's good to provide some examples.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by m30m <gi...@git.apache.org>.
Github user m30m commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    It's not possible to put the DataFrame directly because of this error:
    ```Exception: Traceback (most recent call last):
      File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1124, in __call__
    args_command, temp_args = self._build_args(*args)
    
      File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/java_gateway.py", line 1094, in _build_args
        [get_command_part(arg, self.pool) for arg in new_args])
    
      File "/spark-2.0.1-bin-hadoop2.7/python/lib/py4j-0.10.3-src.zip/py4j/protocol.py", line 289, in get_command_part
        command_part = REFERENCE_TYPE + parameter._get_object_id()
    
      File "/spark-2.0.1-bin-hadoop2.7/python/pyspark/sql/dataframe.py", line 841, in __getattr__
        "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
    
    AttributeError: 'DataFrame' object has no attribute '_get_object_id'


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by m30m <gi...@git.apache.org>.
Github user m30m commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    Yes, that's a good idea. Shall I add a commit to this branch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin pull request #1677: Add doc for exchanging data frames

Posted by asfgit <gi...@git.apache.org>.
Github user asfgit closed the pull request at:

    https://github.com/apache/zeppelin/pull/1677


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    If we want to support the feature I mentioned I above in another PR, then the document here is useless because we have to update the doc later. So it would be better to do it in this PR IMHO. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    Let's keep this as documentation only and let's open a JIRA (another PR) for the DataFrame support?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    Yes, and you also need to update method `__getitem__` so that user don't need to construct DataFrame as this. `z.get("myScalaDataFrame")` should return DataFrame directly
    ```
    myScalaDataFrame = DataFrame(z.get("myScalaDataFrame"), sqlContext)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    Merge to master if there're no further discussions


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    Should we do it implicitly for user in `ZeppelinContext`? Because I feel the syntax is not easy to understand if user don't know the internal implementation of pyspark. And I think we should not expose such internal things to users.
    
    ```
    z.put("myPythonDataFrame", postsDf._jdf)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by m30m <gi...@git.apache.org>.
Github user m30m commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    I'm not sure whether it's a good idea to hide this complexity in a special way and I should check whether these changes are backward compatible. So I guess a doc-only PR, with a JIRA issue afterwards to handle some spark special types is a better solution.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by felixcheung <gi...@git.apache.org>.
Github user felixcheung commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    well, it's a lot quicker to get doc-only PR in :)
    besides we should have a JIRA for changes like this. It's your call, @m30m 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by zjffdu <gi...@git.apache.org>.
Github user zjffdu commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    I mean we can internally do this in `PyZeppelinContext` as following:
    ```
    def __setitem__(self, key, item):
        if isinstance(item, DataFrame):
           self.z.put(key, item._jdf)
        else:
           self.z.put(key, item)
    ```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] zeppelin issue #1677: Add doc for exchanging data frames

Posted by Leemoonsoo <gi...@git.apache.org>.
Github user Leemoonsoo commented on the issue:

    https://github.com/apache/zeppelin/pull/1677
  
    @m30m Awesome!
    
    LGTM and merge to master if there're no more comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastructure@apache.org or file a JIRA ticket
with INFRA.
---