You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/10 09:51:36 UTC

[GitHub] [arrow-datafusion] gandronchik commented on pull request #2177: User Defined Table Function (udtf) support

gandronchik commented on PR #2177:
URL: https://github.com/apache/arrow-datafusion/pull/2177#issuecomment-1152184255

   @alamb Hello! Sorry for the long response. 
   
   I am sorry for so big PR with so a bad description. 
   
   Now I try to explain what is happening here. 
   Honestly, I made mistake with the naming. I supported Set Returning Function. (https://www.postgresql.org/docs/current/functions-srf.html)
   
   As I know DataFunction is oriented on PostgreSQL behavior.  So, the functionality I provide here is Postgres functionality. 
   
   We already use it in Cube.js. We implemented a several functions:
   - **generate_series** (https://www.postgresql.org/docs/current/functions-srf.html)
   - **generate_subscripts** (https://www.postgresql.org/docs/current/functions-srf.html)
   - **unnest** (https://www.postgresql.org/docs/current/functions-array.html)
   
   Please, look at my PR closer. I am ready to improve it, rename some structures, etc.
   
   
   Bellow, I provide the implementation of generate_series function (real Postgres function):
   
   ```
   macro_rules! generate_series_udtf {
       ($ARGS:expr, $TYPE: ident, $PRIMITIVE_TYPE: ident) => {{
           let mut section_sizes: Vec<usize> = Vec::new();
           let l_arr = &$ARGS[0].as_any().downcast_ref::<PrimitiveArray<$TYPE>>();
           if l_arr.is_some() {
               let l_arr = l_arr.unwrap();
               let r_arr = downcast_primitive_arg!($ARGS[1], "right", $TYPE);
               let step_arr = PrimitiveArray::<$TYPE>::from_value(1 as $PRIMITIVE_TYPE, 1);
               let step_arr = if $ARGS.len() > 2 {
                   downcast_primitive_arg!($ARGS[2], "step", $TYPE)
               } else {
                   &step_arr
               };
   
               let mut builder = PrimitiveBuilder::<$TYPE>::new(1);
               for (i, (start, end)) in l_arr.iter().zip(r_arr.iter()).enumerate() {
                   let step = if step_arr.len() > i {
                       step_arr.value(i)
                   } else {
                       step_arr.value(0)
                   };
   
                   let start = start.unwrap();
                   let end = end.unwrap();
                   let mut section_size: i64 = 0;
                   if start <= end && step > 0 as $PRIMITIVE_TYPE {
                       let mut current = start;
                       loop {
                           if current > end {
                               break;
                           }
                           builder.append_value(current).unwrap();
   
                           section_size += 1;
                           current += step;
                       }
                   }
                   section_sizes.push(section_size as usize);
               }
   
               return Ok((Arc::new(builder.finish()) as ArrayRef, section_sizes));
           }
       }};
   }
   
   pub fn create_generate_series_udtf() -> TableUDF {
       let fun = make_table_function(move |args: &[ArrayRef]| {
           assert!(args.len() == 2 || args.len() == 3);
   
           if args[0].as_any().downcast_ref::<Int64Array>().is_some() {
               generate_series_udtf!(args, Int64Type, i64)
           } else if args[0].as_any().downcast_ref::<Float64Array>().is_some() {
               generate_series_udtf!(args, Float64Type, f64)
           }
   
           Err(DataFusionError::Execution(format!("Unsupported type")))
       });
   
       let return_type: ReturnTypeFunction = Arc::new(move |tp| {
           if tp.len() > 0 {
               Ok(Arc::new(tp[0].clone()))
           } else {
               Ok(Arc::new(DataType::Int64))
           }
       });
   
       TableUDF::new(
           "generate_series",
           &Signature::one_of(
               vec![
                   TypeSignature::Exact(vec![DataType::Int64, DataType::Int64]),
                   TypeSignature::Exact(vec![DataType::Int64, DataType::Int64, DataType::Int64]),
                   TypeSignature::Exact(vec![DataType::Float64, DataType::Float64]),
                   TypeSignature::Exact(vec![
                       DataType::Float64,
                       DataType::Float64,
                       DataType::Float64,
                   ]),
               ],
               Volatility::Immutable,
           ),
           &return_type,
           &fun,
       )
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org