You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/05/05 12:49:09 UTC

[GitHub] [arrow-datafusion] gandronchik commented on pull request #2177: User Defined Table Function (udtf) support

gandronchik commented on PR #2177:
URL: https://github.com/apache/arrow-datafusion/pull/2177#issuecomment-1118507143

   > @gandronchik thank you for the explanation in this PR's description. It helps though I will admit I still don't fully understand what is going o.
   > 
   > I agree with @doki23 -- I expect a table function to logically return a table (that something with both rows and columns)
   > 
   > > Regarding signature, I decided to use a single vector and vector with sizes of sections instead of vec of vecs to have better performance. If we use Vec, this will require a lot of memory in case of a request for millions of rows.
   > 
   > The way the rest of DataFusion avoids buffering all the intermediate results at once int memory is with `Stream`s but then that requires interacting with rust's `async` ecosystem which is non trivial
   > 
   > If you wanted a streaming solution, that would mean the signature might look something like the following (maybe)
   > 
   > ```rust
   > Arc<dyn Fn(Box<dyn SendableRecordBatchStream>) -> Result<Box<dyn SendableRecordBatchStream>> + Send + Sync>;
   > ```
   
   Looks like I got the title wrong. I have implemented a function that returns many rows, probably it is not a table function. If I rename it, will it be fine?
   
   Regarding the function signature, I think my solution is a compromise between vec<vec> and streaming. Actually, I don't think that function can return so many rows. However, of course, I will rewrite it if you want. So which solution do we choose: current `Result<(ArrayRef, Vec<usize>)> + Send + Sync>`, `Result<Vec<ColumnarValue>> + Send + Sync>` or `Result<Box< dyn SendableRecordBatchStream>> + Send + Sync>` ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org