You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2020/07/13 18:32:46 UTC

[GitHub] [arrow] wesm opened a new pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

wesm opened a new pull request #7735:
URL: https://github.com/apache/arrow/pull/7735


   Using a fairly large IPC stream file:
   
   Before:
   
   ```
   In [1]: timeit pa.ipc.open_stream('nyctaxi.arrow').read_all()
   129 ms ± 1.51 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```
   
   After
   
   ```
   In [1]: timeit pa.ipc.open_stream('nyctaxi.arrow').read_all()
   87.6 ms ± 1.68 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
   ```
   
   I found some other performance concerns with the internals of this operation that I reported as ARROW-9441. 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm closed pull request #7735: ARROW-9442: [Python] Do not call Validate() in pyarrow_wrap_table

Posted by GitBox <gi...@apache.org>.
wesm closed pull request #7735:
URL: https://github.com/apache/arrow/pull/7735


   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657722389


   We could also add a `validate` option to `read_all` if there are concerns about the result being invalid


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7735: ARROW-9442: [Python] Do not call Validate() in pyarrow_wrap_table

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657863743


   +1


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] github-actions[bot] commented on pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657728533


   https://issues.apache.org/jira/browse/ARROW-9442


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657755356


   Good point. I will look to see what the validation check was added (presumably it was to fix some segfault)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm edited a comment on pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

Posted by GitBox <gi...@apache.org>.
wesm edited a comment on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657755356


   Good point. I will look to see why the validation check was added (presumably it was to fix some segfault)


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] pitrou commented on pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

Posted by GitBox <gi...@apache.org>.
pitrou commented on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657749727


   `pyarrow_wrap_array` doesn't validate, so perhaps we should drop validation from `pyarrow_wrap_table` as well (and instead validate in interested consumers).


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow] wesm commented on pull request #7735: ARROW-9442: [Python] Add pyarrow_wrap_table_no_validate to improve performance in cases where the Table is known to already be valid

Posted by GitBox <gi...@apache.org>.
wesm commented on pull request #7735:
URL: https://github.com/apache/arrow/pull/7735#issuecomment-657819341


   Looks like the check was added in https://github.com/apache/arrow/commit/ef993877ac05bc840aa4670d6e9aa2da9f4eb8d6
   
   I'll remove the unconditional Validate call and move the validation into the places where it's needed


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
users@infra.apache.org