You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/04/25 05:41:12 UTC
[GitHub] [arrow-datafusion] Ted-Jiang opened a new issue, #2330: [Question] how to add expr inline
Ted-Jiang opened a new issue, #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330
**Is your feature request related to a problem or challenge? Please describe what you are trying to do.**
Hi, I have met some problems. I want to implement an expr `inline` which
inline(expr) - Explodes an array of structs into a table.:
```
> SELECT inline(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
```
i have two solutions:
1. add inline in ScalarFunction , But i found
```
pub type ScalarFunctionImplementation =
Arc<dyn Fn(&[ColumnarValue]) -> Result<ColumnarValue> + Send + Sync>;
```
In this case, I can not produce more than two columns by using ScalarFunction, am i right ?
2. implement a Expr like `Wildcard`
I think this is not a good way 😂
Could anyone give me some advice ?
in spark i found, is there something similar?
```
/**
* Explodes an array of structs into a table.
*/
// scalastyle:off line.size.limit line.contains.tab
@ExpressionDescription(
usage = "_FUNC_(expr) - Explodes an array of structs into a table. Uses column names col1, col2, etc. by default unless specified otherwise.",
examples = """
Examples:
> SELECT _FUNC_(array(struct(1, 'a'), struct(2, 'b')));
1 a
2 b
""",
since = "2.0.0",
group = "generator_funcs")
// scalastyle:on line.size.limit line.contains.tab
case class Inline(child: Expression) extends UnaryExpression with CollectionGenerator {
override val inline: Boolean = true
override val position: Boolean = false
override def checkInputDataTypes(): TypeCheckResult = child.dataType match {
case ArrayType(st: StructType, _) =>
TypeCheckResult.TypeCheckSuccess
case _ =>
TypeCheckResult.TypeCheckFailure(
s"input to function $prettyName should be array of struct type, " +
s"not ${child.dataType.catalogString}")
}
override def elementSchema: StructType = child.dataType match {
case ArrayType(st: StructType, _) => st
}
override def collectionType: DataType = child.dataType
private lazy val numFields = elementSchema.fields.length
override def eval(input: InternalRow): TraversableOnce[InternalRow] = {
val inputArray = child.eval(input).asInstanceOf[ArrayData]
if (inputArray == null) {
Nil
} else {
for (i <- 0 until inputArray.numElements())
yield inputArray.getStruct(i, numFields)
}
}
override protected def doGenCode(ctx: CodegenContext, ev: ExprCode): ExprCode = {
child.genCode(ctx)
}
override protected def withNewChildInternal(newChild: Expression): Inline = copy(child = newChild)
}
```
**Describe the solution you'd like**
A clear and concise description of what you want to happen.
**Describe alternatives you've considered**
A clear and concise description of any alternative solutions or features you've considered.
**Additional context**
Add any other context or screenshots about the feature request here.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2330: [Question] how to add expr inline
Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109792696
@andygrove Another question, if add TableFunction like ScalarFunction
```rust
/// Represents the call of a built-in scalar function with a set of arguments.
ScalarFunction {
/// The function
fun: built_in_function::BuiltinScalarFunction,
/// List of expressions to feed to the functions as arguments
args: Vec<Expr>,
},
TableFunction {
/// The function
fun: built_in_function::BuiltinTableFunction,
/// List of expressions to feed to the functions as arguments
args: Vec<Expr>,
},
```
if we treat it as a `Expr` , we need change it to `PhysicalExpr` but
``` rust
/// Evaluate an expression against a RecordBatch
fn evaluate(&self, batch: &RecordBatch) -> Result<ColumnarValue>;
```
```rust
pub enum ColumnarValue {
/// Array of values
Array(ArrayRef),
/// A single value
Scalar(ScalarValue),
}
```
cause of it return ColumnarValue, we can not return result as a table, am i right?
Should i implement a `TablePhysicalExpr`
using
```rust
fn evaluate(&self, batch: &RecordBatch) -> Result<Vec<ColumnarValue>>;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] gandronchik commented on issue #2330: [Question] how to add expr inline
Posted by GitBox <gi...@apache.org>.
gandronchik commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109797732
@andygrove @Ted-Jiang I already implemented it. Today or tomorrow I will create PR. Thanks
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] andygrove commented on issue #2330: [Question] how to add expr inline
Posted by GitBox <gi...@apache.org>.
andygrove commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109238122
I think we will need to introduce a new type of function to cover `inline` since it is not a scalar function.
I don't have a good name for it yet but I think we need something like this?
``` rust
pub type MultiColumnProducingFunctionImplementation =
Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2330: [Question] how to add expr inline
Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109315229
> I think we will need to introduce a new type of function to cover `inline` since it is not a scalar function.
>
> I don't have a good name for it yet but I think we need something like this?
>
> ```rust
> pub type MultiColumnProducingFunctionImplementation =
> Arc<dyn Fn(&[ColumnarValue]) -> Result<Vec<ColumnarValue>> + Send + Sync>;
> ```
Thanks ❤️ @andygrove, I agree we need func return `Vec<ColumnarValue` i will working on this
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] Ted-Jiang commented on issue #2330: [Question] how to add expr inline
Posted by GitBox <gi...@apache.org>.
Ted-Jiang commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1109889647
@gandronchik Thanks a lot !❤️
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow-datafusion] alamb commented on issue #2330: [Question] how to add expr inline
Posted by GitBox <gi...@apache.org>.
alamb commented on issue #2330:
URL: https://github.com/apache/arrow-datafusion/issues/2330#issuecomment-1111365450
As @gandronchik implies, this type of `expr` that yields a table is typically called a `TableFunction` -- he has an implementation at https://github.com/apache/arrow-datafusion/pull/2177
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org