You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/25 05:41:12 UTC

[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #2330: [Question] how to add expr inline

Ted-Jiang opened a new issue, #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330

   **Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
   Hi,  I have met some problems.  I want to implement an expr `inline` which 
   
   inline(expr) - Explodes an array of structs into a table.:
   ```
   > SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
    1  a
    2  b
   ```
   
    i have two solutions:
   1. add inline in ScalarFunction , But  i found 
   ```
   pub type ScalarFunctionImplementation =
       Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue> + Send + Sync>;
   ```
   
   In this case, I can not produce more than two columns by using  ScalarFunction, am i right ?
   
   2. implement a Expr like `Wildcard`
   I think this is not a good way 😂
   
   Could anyone give me some advice ?
   
   in spark i found, is there something similar?
   ```
   /**
    * Explodes an array of structs into a table.
    */
   // scalastyle:off line.size.limit line.contains.tab
   @ExpressionDescription(
     usage = "_FUNC_(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.",
     examples = """
       Examples:
         > SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));
          1	a
          2	b
     """,
     since = "2.0.0",
     group = "generator_funcs")
   // scalastyle:on line.size.limit line.contains.tab
   case class Inline(child: Expression) extends UnaryExpression with CollectionGenerator {
     override val inline: Boolean = true
     override val position: Boolean = false
   
     override def checkInputDataTypes(): TypeCheckResult = child.dataType match {
       case ArrayType(st: StructType, _) =>
         TypeCheckResult.TypeCheckSuccess
       case _ =>
         TypeCheckResult.TypeCheckFailure(
           s"input to function $prettyName should be array of struct type, " +
             s"not ${child.dataType.catalogString}")
     }
   
     override def elementSchema: StructType = child.dataType match {
       case ArrayType(st: StructType, _) => st
     }
   
     override def collectionType: DataType = child.dataType
   
     private lazy val numFields = elementSchema.fields.length
   
     override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
       val inputArray = child.eval(input).asInstanceOf[ArrayData]
       if (inputArray == null) {
         Nil
       } else {
         for (i <- 0 until inputArray.numElements())
           yield inputArray.getStruct(i, numFields)
       }
     }
   
     override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
       child.genCode(ctx)
     }
   
     override protected def withNewChildInternal(newChild: Expression): Inline = copy(child = newChild)
   }
   ```
   
   **Describe the solution you'd like**
   A clear and concise description of what you want to happen.
   
   **Describe alternatives you've considered**
   A clear and concise description of any alternative solutions or features you've considered.
   
   **Additional context**
   Add any other context or screenshots about the feature request here.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2330: [Question] how to add expr inline

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109792696

   @andygrove Another question, if add TableFunction like ScalarFunction
   ```rust
    /// Represents the call of a built-in scalar function with a set of arguments.
       ScalarFunction {
           /// The function
           fun: built_in_function::BuiltinScalarFunction,
           /// List of expressions to feed to the functions as arguments
           args: Vec<Expr>,
       },
       TableFunction {
           /// The function
           fun: built_in_function::BuiltinTableFunction,
           /// List of expressions to feed to the functions as arguments
           args: Vec<Expr>,
       },
   ```
   
    if we treat it as a `Expr` , we need change it to `PhysicalExpr` but 
   ``` rust
   /// Evaluate an expression against a RecordBatch
       fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>;
   ```
   
   ```rust
   pub enum ColumnarValue {
       /// Array of values
       Array(ArrayRef),
       /// A single value
       Scalar(ScalarValue),
   }
   ```
   cause of it return ColumnarValue, we can not return result as a table, am i right?
   
   Should i implement a `TablePhysicalExpr`
   using 
   ```rust
     fn evaluate(&self, batch: &RecordBatch) -> Result<Vec<ColumnarValue>>;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] gandronchik commented on issue #2330: [Question] how to add expr inline

Posted by GitBox <gi...@apache.org>.
gandronchik commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109797732

   @andygrove @Ted-Jiang I already implemented it. Today or tomorrow I will create PR. Thanks


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] andygrove commented on issue #2330: [Question] how to add expr inline

Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109238122

   I think we will need to introduce a new type of function to cover `inline` since it is not a scalar function.
   
   I don't have a good name for it yet but I think we need something like this?
   
   ``` rust
   pub type MultiColumnProducingFunctionImplementation =
       Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2330: [Question] how to add expr inline

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109315229

   > I think we will need to introduce a new type of function to cover `inline` since it is not a scalar function.
   > 
   > I don't have a good name for it yet but I think we need something like this?
   > 
   > ```rust
   > pub type MultiColumnProducingFunctionImplementation =
   >     Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;
   > ```
   
   Thanks ❤️ @andygrove, I agree we need func return `Vec<ColumnarValue` i will working on this


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2330: [Question] how to add expr inline

Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109889647

   @gandronchik Thanks a lot !❤️


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow-datafusion] alamb commented on issue #2330: [Question] how to add expr inline

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1111365450

   As @gandronchik  implies, this type of `expr` that yields a table is typically called a `TableFunction` -- he has an implementation at https://github.com/apache/arrow-datafusion/pull/2177


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org