You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "simicd (via GitHub)" <gi...@apache.org> on 2023/02/19 21:11:24 UTC

[GitHub] [arrow-datafusion-python] simicd opened a new pull request, #197: Implement `to_pandas()`

simicd opened a new pull request, #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197

   # Which issue does this PR close?
   Closes #139.
   
    # Rationale for this change
   Convert datafusion dataframe directly to a pandas dataframe
   
   # What changes are included in this PR?
   Implement `to_pandas()` method using pyarrow library
   
   # Are there any user-facing changes?
   New `to_pandas()` method
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove merged pull request #197: Implement `to_pandas()`

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove merged PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on pull request #197: Implement `to_pandas()`

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197#issuecomment-1436119287

   This is looking good so far. Thanks @simicd 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] simicd commented on pull request #197: Implement `to_pandas()`

Posted by "simicd (via GitHub)" <gi...@apache.org>.
simicd commented on PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197#issuecomment-1439146797

   Thanks for already looking into the PR @andygrove, resolved your feedback and set the PR as ready for final review.
   Also thanks for the additional hint @krzysztof-kwitt, the docs are now updated as well


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] simicd commented on pull request #197: Implement `to_pandas()`

Posted by "simicd (via GitHub)" <gi...@apache.org>.
simicd commented on PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197#issuecomment-1445137528

   @krzysztof-kwitt Good catch! Indeed, it fails with an error - I opened #234 to track the issue


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] krzysztof-kwitt commented on pull request #197: Implement `to_pandas()`

Posted by "krzysztof-kwitt (via GitHub)" <gi...@apache.org>.
krzysztof-kwitt commented on PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197#issuecomment-1439591571

   I wonder if this pandas will still work for empty result - 0 rows/batches. @simicd What do you think? 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] andygrove commented on a diff in pull request #197: Implement `to_pandas()`

Posted by "andygrove (via GitHub)" <gi...@apache.org>.
andygrove commented on code in PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197#discussion_r1111337115


##########
src/dataframe.rs:
##########
@@ -313,6 +313,24 @@ impl PyDataFrame {
         Ok(())
     }
 
+    // Convert to pandas dataframe with pyarrow
+    // Collect the batches, pass to Arrow Table & then convert to Pandas DataFrame

Review Comment:
   ```suggestion
       /// Convert to pandas dataframe with pyarrow
       /// Collect the batches, pass to Arrow Table & then convert to Pandas DataFrame
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion-python] krzysztof-kwitt commented on pull request #197: Implement `to_pandas()`

Posted by "krzysztof-kwitt (via GitHub)" <gi...@apache.org>.
krzysztof-kwitt commented on PR #197:
URL: https://github.com/apache/arrow-datafusion-python/pull/197#issuecomment-1437117775

   Can you also update the documentation?
   https://github.com/apache/arrow-datafusion-python/blame/main/README.md#L73-L77
   ```diff
   -# collect as list of pyarrow.RecordBatch
   -results = df.collect()
   -# get first batch
   -batch = results[0]
   -# convert to Pandas
   -df = batch.to_pandas()
   # collect as pandas
   df = df.to_pandas()
   ```
   and
   https://github.com/apache/arrow-datafusion-python/blob/950a5789b612f97794ff7250310ae3289227590f/examples/sql-to-pandas.py#L36-L43


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org