You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "lidavidm (via GitHub)" <gi...@apache.org> on 2023/06/02 13:40:02 UTC

[GitHub] [arrow] lidavidm commented on a diff in pull request #35570: GH-34252: [Java] Support ScannerBuilder::Project or ScannerBuilder::Filter as a Substrait proto extended expression

lidavidm commented on code in PR #35570:
URL: https://github.com/apache/arrow/pull/35570#discussion_r1214373161


##########
docs/source/java/dataset.rst:
##########
@@ -158,6 +156,21 @@ Or use shortcut construtor:
 
 Then all columns will be emitted during scanning.
 
+Projection (Produce New Columns) and Filters
+============================================
+
+User can specify projections (new columns) or filters in ScanOptions. For example:
+
+.. code-block:: Java
+
+   ByteBuffer substraitExtendedExpressions = ...; // createExtendedExpresionMessageUsingSubstraitPOJOClasses

Review Comment:
   remove the comment? Or write it out as a sentence on a new line (`// Use Substrait APIs to create an Expression and serialize to a ByteBuffer`)



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.

Review Comment:
   ```suggestion
   Dataset also supports projections and filters with Substrait's extended expressions.
   This requires the substrait-java library.
   ```



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.
+
+Here is an example of a Java program that queries a Parquet file to project new
+columns and also filter then based on Extended Expression definitions. This example
+show us:
+
+- Load TPCH parquet file Nation.parquet.
+- Produce new Projections and apply Filter into dataset using extended expression definition.
+    - Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3.
+    - Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10.
+    - Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18.
+
+.. code-block:: Java
+
+    import java.nio.ByteBuffer;
+    import java.util.ArrayList;
+    import java.util.Arrays;
+    import java.util.Base64;
+    import java.util.HashMap;
+    import java.util.List;
+    import java.util.Optional;
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+
+    import com.google.protobuf.InvalidProtocolBufferException;
+    import com.google.protobuf.util.JsonFormat;
+
+    import io.substrait.proto.Expression;
+    import io.substrait.proto.ExpressionReference;
+    import io.substrait.proto.ExtendedExpression;
+    import io.substrait.proto.FunctionArgument;
+    import io.substrait.proto.SimpleExtensionDeclaration;
+    import io.substrait.proto.SimpleExtensionURI;
+    import io.substrait.type.NamedStruct;
+    import io.substrait.type.Type;
+    import io.substrait.type.TypeCreator;
+    import io.substrait.type.proto.TypeProtoConverter;
+
+    public class ClientSubstraitExtendedExpressions {
+      public static void main(String[] args) throws Exception {
+        // create extended expression for: project two new columns + one filter
+        String binaryExtendedExpressions = createExtendedExpresionMessageUsingPOJOClasses();

Review Comment:
   Um, why use a String? Just pass it around as a ByteBuffer in the first place.



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.
+
+Here is an example of a Java program that queries a Parquet file to project new
+columns and also filter then based on Extended Expression definitions. This example
+show us:
+
+- Load TPCH parquet file Nation.parquet.
+- Produce new Projections and apply Filter into dataset using extended expression definition.
+    - Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3.
+    - Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10.
+    - Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18.
+
+.. code-block:: Java
+
+    import java.nio.ByteBuffer;
+    import java.util.ArrayList;
+    import java.util.Arrays;
+    import java.util.Base64;
+    import java.util.HashMap;
+    import java.util.List;
+    import java.util.Optional;
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+
+    import com.google.protobuf.InvalidProtocolBufferException;
+    import com.google.protobuf.util.JsonFormat;
+
+    import io.substrait.proto.Expression;
+    import io.substrait.proto.ExpressionReference;
+    import io.substrait.proto.ExtendedExpression;
+    import io.substrait.proto.FunctionArgument;
+    import io.substrait.proto.SimpleExtensionDeclaration;
+    import io.substrait.proto.SimpleExtensionURI;
+    import io.substrait.type.NamedStruct;
+    import io.substrait.type.Type;
+    import io.substrait.type.TypeCreator;
+    import io.substrait.type.proto.TypeProtoConverter;
+
+    public class ClientSubstraitExtendedExpressions {
+      public static void main(String[] args) throws Exception {
+        // create extended expression for: project two new columns + one filter
+        String binaryExtendedExpressions = createExtendedExpresionMessageUsingPOJOClasses();
+        // project and filter dataset using extended expression definition - 03 Expressions:
+        // Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3
+        // Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10
+        // Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18
+        projectAndFilterDataset(binaryExtendedExpressions);
+      }
+
+      public static void projectAndFilterDataset(String binaryExtendedExpressions) {
+        String uri = "file:///data/tpch_parquet/nation.parquet";
+        byte[] extendedExpressions = Base64.getDecoder().decode(
+            binaryExtendedExpressions);
+        ByteBuffer substraitExtendedExpressions = ByteBuffer.allocateDirect(
+            extendedExpressions.length);
+        substraitExtendedExpressions.put(extendedExpressions);
+        ScanOptions options = new ScanOptions(/*batchSize*/ 32768,
+            Optional.empty(),
+            Optional.of(substraitExtendedExpressions));
+        try (
+            BufferAllocator allocator = new RootAllocator();
+            DatasetFactory datasetFactory = new FileSystemDatasetFactory(
+                allocator, NativeMemoryPool.getDefault(),
+                FileFormat.PARQUET, uri);
+            Dataset dataset = datasetFactory.finish();
+            Scanner scanner = dataset.newScan(options);
+            ArrowReader reader = scanner.scanBatches()
+        ) {
+          while (reader.loadNextBatch()) {
+            System.out.println(
+                reader.getVectorSchemaRoot().contentToTSVString());
+          }
+        } catch (Exception e) {
+          e.printStackTrace();
+        }
+      }
+
+      private static String createExtendedExpresionMessageUsingPOJOClasses() throws InvalidProtocolBufferException {
+        // Expression: N_REGIONKEY + 10 = col 3 + 10
+        Expression.Builder selectionBuilderProjectOne = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    2)
+                            )
+                    )
+            );
+        Expression.Builder literalBuilderProjectOne = Expression.newBuilder()
+            .setLiteral(
+                Expression.Literal.newBuilder().setI32(10)
+            );
+        io.substrait.proto.Type outputProjectOne = TypeCreator.NULLABLE.I32.accept(
+            new TypeProtoConverter());
+        Expression.Builder expressionBuilderProjectOne = Expression.
+            newBuilder().
+            setScalarFunction(
+                Expression.
+                    ScalarFunction.
+                    newBuilder().
+                    setFunctionReference(0).
+                    setOutputType(outputProjectOne).
+                    addArguments(
+                        0,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectOne)
+                    ).
+                    addArguments(
+                        1,
+                        FunctionArgument.newBuilder().setValue(
+                            literalBuilderProjectOne)
+                    )
+            );
+        ExpressionReference.Builder expressionReferenceBuilderProjectOne = ExpressionReference.newBuilder().
+            setExpression(expressionBuilderProjectOne)
+            .addOutputNames("ADD_TEN_TO_COLUMN_N_REGIONKEY");
+
+        // Expression: name || name = N_NAME || "-" || N_COMMENT = col 1 || col 3
+        Expression.Builder selectionBuilderProjectTwo = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    1)
+                            )
+                    )
+            );
+        Expression.Builder selectionBuilderProjectTwoConcatLiteral = Expression.newBuilder()
+            .setLiteral(
+                Expression.Literal.newBuilder().setString(" - ")
+            );
+        Expression.Builder selectionBuilderProjectOneToConcat = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    3)
+                            )
+                    )
+            );
+        io.substrait.proto.Type outputProjectTwo = TypeCreator.NULLABLE.STRING.accept(
+            new TypeProtoConverter());
+        Expression.Builder expressionBuilderProjectTwo = Expression.
+            newBuilder().
+            setScalarFunction(
+                Expression.
+                    ScalarFunction.
+                    newBuilder().
+                    setFunctionReference(1).
+                    setOutputType(outputProjectTwo).
+                    addArguments(
+                        0,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectTwo)
+                    ).
+                    addArguments(
+                        1,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectTwoConcatLiteral)
+                    ).
+                    addArguments(
+                        2,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectOneToConcat)
+                    )
+            );
+        ExpressionReference.Builder expressionReferenceBuilderProjectTwo = ExpressionReference.newBuilder().
+            setExpression(expressionBuilderProjectTwo)
+            .addOutputNames("CONCAT_COLUMNS_N_NAME_AND_N_COMMENT");
+
+        // Expression: Filter: N_NATIONKEY > 18 = col 1 > 18
+        Expression.Builder selectionBuilderFilterOne = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    0)
+                            )
+                    )
+            );
+        Expression.Builder literalBuilderFilterOne = Expression.newBuilder()
+            .setLiteral(
+                Expression.Literal.newBuilder().setI32(18)
+            );
+        io.substrait.proto.Type outputFilterOne = TypeCreator.NULLABLE.BOOLEAN.accept(
+            new TypeProtoConverter());
+        Expression.Builder expressionBuilderFilterOne = Expression.
+            newBuilder().
+            setScalarFunction(
+                Expression.
+                    ScalarFunction.
+                    newBuilder().
+                    setFunctionReference(2).
+                    setOutputType(outputFilterOne).
+                    addArguments(
+                        0,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderFilterOne)
+                    ).
+                    addArguments(
+                        1,
+                        FunctionArgument.newBuilder().setValue(
+                            literalBuilderFilterOne)
+                    )
+            );
+        ExpressionReference.Builder expressionReferenceBuilderFilterOne = ExpressionReference.newBuilder().
+            setExpression(expressionBuilderFilterOne)
+            .addOutputNames("COLUMN_N_NATIONKEY_GREATER_THAN_18");
+
+        List<String> columnNames = Arrays.asList("N_NATIONKEY", "N_NAME",
+            "N_REGIONKEY", "N_COMMENT");
+        List<Type> dataTypes = Arrays.asList(
+            TypeCreator.NULLABLE.I32,
+            TypeCreator.NULLABLE.STRING,
+            TypeCreator.NULLABLE.I32,
+            TypeCreator.NULLABLE.STRING
+        );
+        //
+        NamedStruct of = NamedStruct.of(
+            columnNames,
+            Type.Struct.builder().fields(dataTypes).nullable(false).build()
+        );
+
+        // Extensions URI
+        HashMap<String, SimpleExtensionURI> extensionUris = new HashMap<>();
+        extensionUris.put(
+            "key-001",
+            SimpleExtensionURI.newBuilder()
+                .setExtensionUriAnchor(1)
+                .setUri("/functions_arithmetic.yaml")
+                .build()
+        );
+        extensionUris.put(
+            "key-002",
+            SimpleExtensionURI.newBuilder()
+                .setExtensionUriAnchor(2)
+                .setUri("/functions_comparison.yaml")
+                .build()
+        );
+
+        // Extensions
+        ArrayList<SimpleExtensionDeclaration> extensions = new ArrayList<>();
+        SimpleExtensionDeclaration extensionFunctionAdd = SimpleExtensionDeclaration.newBuilder()
+            .setExtensionFunction(
+                SimpleExtensionDeclaration.ExtensionFunction.newBuilder()
+                    .setFunctionAnchor(0)
+                    .setName("add:i32_i32")
+                    .setExtensionUriReference(1))
+            .build();
+        SimpleExtensionDeclaration extensionFunctionGreaterThan = SimpleExtensionDeclaration.newBuilder()
+            .setExtensionFunction(
+                SimpleExtensionDeclaration.ExtensionFunction.newBuilder()
+                    .setFunctionAnchor(1)
+                    .setName("concat:vchar")
+                    .setExtensionUriReference(2))
+            .build();
+        SimpleExtensionDeclaration extensionFunctionLowerThan = SimpleExtensionDeclaration.newBuilder()
+            .setExtensionFunction(
+                SimpleExtensionDeclaration.ExtensionFunction.newBuilder()
+                    .setFunctionAnchor(2)
+                    .setName("gt:any_any")
+                    .setExtensionUriReference(2))
+            .build();
+        extensions.add(extensionFunctionAdd);
+        extensions.add(extensionFunctionGreaterThan);
+        extensions.add(extensionFunctionLowerThan);
+
+        // Extended Expression
+        ExtendedExpression.Builder extendedExpressionBuilder =
+            ExtendedExpression.newBuilder().
+                addReferredExpr(0,
+                    expressionReferenceBuilderProjectOne).
+                addReferredExpr(1,
+                    expressionReferenceBuilderProjectTwo).
+                addReferredExpr(2,
+                    expressionReferenceBuilderFilterOne).
+                setBaseSchema(of.toProto());
+        extendedExpressionBuilder.addAllExtensionUris(extensionUris.values());
+        extendedExpressionBuilder.addAllExtensions(extensions);
+
+        ExtendedExpression extendedExpression = extendedExpressionBuilder.build();

Review Comment:
   How stable is this? If the format is likely to change then I wonder if it's worth having so much code that might get stale quickly



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.
+
+Here is an example of a Java program that queries a Parquet file to project new
+columns and also filter then based on Extended Expression definitions. This example
+show us:
+
+- Load TPCH parquet file Nation.parquet.
+- Produce new Projections and apply Filter into dataset using extended expression definition.
+    - Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3.
+    - Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10.
+    - Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18.
+
+.. code-block:: Java
+
+    import java.nio.ByteBuffer;
+    import java.util.ArrayList;
+    import java.util.Arrays;
+    import java.util.Base64;
+    import java.util.HashMap;
+    import java.util.List;
+    import java.util.Optional;
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+
+    import com.google.protobuf.InvalidProtocolBufferException;
+    import com.google.protobuf.util.JsonFormat;
+
+    import io.substrait.proto.Expression;
+    import io.substrait.proto.ExpressionReference;
+    import io.substrait.proto.ExtendedExpression;
+    import io.substrait.proto.FunctionArgument;
+    import io.substrait.proto.SimpleExtensionDeclaration;
+    import io.substrait.proto.SimpleExtensionURI;
+    import io.substrait.type.NamedStruct;
+    import io.substrait.type.Type;
+    import io.substrait.type.TypeCreator;
+    import io.substrait.type.proto.TypeProtoConverter;
+
+    public class ClientSubstraitExtendedExpressions {
+      public static void main(String[] args) throws Exception {
+        // create extended expression for: project two new columns + one filter
+        String binaryExtendedExpressions = createExtendedExpresionMessageUsingPOJOClasses();
+        // project and filter dataset using extended expression definition - 03 Expressions:
+        // Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3
+        // Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10
+        // Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18
+        projectAndFilterDataset(binaryExtendedExpressions);
+      }
+
+      public static void projectAndFilterDataset(String binaryExtendedExpressions) {
+        String uri = "file:///data/tpch_parquet/nation.parquet";
+        byte[] extendedExpressions = Base64.getDecoder().decode(
+            binaryExtendedExpressions);
+        ByteBuffer substraitExtendedExpressions = ByteBuffer.allocateDirect(
+            extendedExpressions.length);
+        substraitExtendedExpressions.put(extendedExpressions);
+        ScanOptions options = new ScanOptions(/*batchSize*/ 32768,
+            Optional.empty(),
+            Optional.of(substraitExtendedExpressions));
+        try (
+            BufferAllocator allocator = new RootAllocator();
+            DatasetFactory datasetFactory = new FileSystemDatasetFactory(
+                allocator, NativeMemoryPool.getDefault(),
+                FileFormat.PARQUET, uri);
+            Dataset dataset = datasetFactory.finish();
+            Scanner scanner = dataset.newScan(options);
+            ArrowReader reader = scanner.scanBatches()
+        ) {
+          while (reader.loadNextBatch()) {
+            System.out.println(
+                reader.getVectorSchemaRoot().contentToTSVString());
+          }
+        } catch (Exception e) {
+          e.printStackTrace();
+        }
+      }
+
+      private static String createExtendedExpresionMessageUsingPOJOClasses() throws InvalidProtocolBufferException {
+        // Expression: N_REGIONKEY + 10 = col 3 + 10
+        Expression.Builder selectionBuilderProjectOne = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    2)
+                            )
+                    )
+            );
+        Expression.Builder literalBuilderProjectOne = Expression.newBuilder()
+            .setLiteral(
+                Expression.Literal.newBuilder().setI32(10)
+            );
+        io.substrait.proto.Type outputProjectOne = TypeCreator.NULLABLE.I32.accept(
+            new TypeProtoConverter());
+        Expression.Builder expressionBuilderProjectOne = Expression.
+            newBuilder().
+            setScalarFunction(
+                Expression.
+                    ScalarFunction.
+                    newBuilder().
+                    setFunctionReference(0).
+                    setOutputType(outputProjectOne).
+                    addArguments(
+                        0,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectOne)
+                    ).
+                    addArguments(
+                        1,
+                        FunctionArgument.newBuilder().setValue(
+                            literalBuilderProjectOne)
+                    )
+            );
+        ExpressionReference.Builder expressionReferenceBuilderProjectOne = ExpressionReference.newBuilder().
+            setExpression(expressionBuilderProjectOne)
+            .addOutputNames("ADD_TEN_TO_COLUMN_N_REGIONKEY");
+
+        // Expression: name || name = N_NAME || "-" || N_COMMENT = col 1 || col 3
+        Expression.Builder selectionBuilderProjectTwo = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    1)
+                            )
+                    )
+            );
+        Expression.Builder selectionBuilderProjectTwoConcatLiteral = Expression.newBuilder()
+            .setLiteral(
+                Expression.Literal.newBuilder().setString(" - ")
+            );
+        Expression.Builder selectionBuilderProjectOneToConcat = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    3)
+                            )
+                    )
+            );
+        io.substrait.proto.Type outputProjectTwo = TypeCreator.NULLABLE.STRING.accept(
+            new TypeProtoConverter());
+        Expression.Builder expressionBuilderProjectTwo = Expression.
+            newBuilder().
+            setScalarFunction(
+                Expression.
+                    ScalarFunction.
+                    newBuilder().
+                    setFunctionReference(1).
+                    setOutputType(outputProjectTwo).
+                    addArguments(
+                        0,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectTwo)
+                    ).
+                    addArguments(
+                        1,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectTwoConcatLiteral)
+                    ).
+                    addArguments(
+                        2,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderProjectOneToConcat)
+                    )
+            );
+        ExpressionReference.Builder expressionReferenceBuilderProjectTwo = ExpressionReference.newBuilder().
+            setExpression(expressionBuilderProjectTwo)
+            .addOutputNames("CONCAT_COLUMNS_N_NAME_AND_N_COMMENT");
+
+        // Expression: Filter: N_NATIONKEY > 18 = col 1 > 18
+        Expression.Builder selectionBuilderFilterOne = Expression.newBuilder().
+            setSelection(
+                Expression.FieldReference.newBuilder().
+                    setDirectReference(
+                        Expression.ReferenceSegment.newBuilder().
+                            setStructField(
+                                Expression.ReferenceSegment.StructField.newBuilder().setField(
+                                    0)
+                            )
+                    )
+            );
+        Expression.Builder literalBuilderFilterOne = Expression.newBuilder()
+            .setLiteral(
+                Expression.Literal.newBuilder().setI32(18)
+            );
+        io.substrait.proto.Type outputFilterOne = TypeCreator.NULLABLE.BOOLEAN.accept(
+            new TypeProtoConverter());
+        Expression.Builder expressionBuilderFilterOne = Expression.
+            newBuilder().
+            setScalarFunction(
+                Expression.
+                    ScalarFunction.
+                    newBuilder().
+                    setFunctionReference(2).
+                    setOutputType(outputFilterOne).
+                    addArguments(
+                        0,
+                        FunctionArgument.newBuilder().setValue(
+                            selectionBuilderFilterOne)
+                    ).
+                    addArguments(
+                        1,
+                        FunctionArgument.newBuilder().setValue(
+                            literalBuilderFilterOne)
+                    )
+            );
+        ExpressionReference.Builder expressionReferenceBuilderFilterOne = ExpressionReference.newBuilder().
+            setExpression(expressionBuilderFilterOne)
+            .addOutputNames("COLUMN_N_NATIONKEY_GREATER_THAN_18");
+
+        List<String> columnNames = Arrays.asList("N_NATIONKEY", "N_NAME",
+            "N_REGIONKEY", "N_COMMENT");
+        List<Type> dataTypes = Arrays.asList(
+            TypeCreator.NULLABLE.I32,
+            TypeCreator.NULLABLE.STRING,
+            TypeCreator.NULLABLE.I32,
+            TypeCreator.NULLABLE.STRING
+        );
+        //
+        NamedStruct of = NamedStruct.of(
+            columnNames,
+            Type.Struct.builder().fields(dataTypes).nullable(false).build()
+        );
+
+        // Extensions URI
+        HashMap<String, SimpleExtensionURI> extensionUris = new HashMap<>();
+        extensionUris.put(
+            "key-001",
+            SimpleExtensionURI.newBuilder()
+                .setExtensionUriAnchor(1)
+                .setUri("/functions_arithmetic.yaml")
+                .build()
+        );
+        extensionUris.put(
+            "key-002",
+            SimpleExtensionURI.newBuilder()
+                .setExtensionUriAnchor(2)
+                .setUri("/functions_comparison.yaml")
+                .build()
+        );
+
+        // Extensions
+        ArrayList<SimpleExtensionDeclaration> extensions = new ArrayList<>();
+        SimpleExtensionDeclaration extensionFunctionAdd = SimpleExtensionDeclaration.newBuilder()
+            .setExtensionFunction(
+                SimpleExtensionDeclaration.ExtensionFunction.newBuilder()
+                    .setFunctionAnchor(0)
+                    .setName("add:i32_i32")
+                    .setExtensionUriReference(1))
+            .build();
+        SimpleExtensionDeclaration extensionFunctionGreaterThan = SimpleExtensionDeclaration.newBuilder()
+            .setExtensionFunction(
+                SimpleExtensionDeclaration.ExtensionFunction.newBuilder()
+                    .setFunctionAnchor(1)
+                    .setName("concat:vchar")
+                    .setExtensionUriReference(2))
+            .build();
+        SimpleExtensionDeclaration extensionFunctionLowerThan = SimpleExtensionDeclaration.newBuilder()
+            .setExtensionFunction(
+                SimpleExtensionDeclaration.ExtensionFunction.newBuilder()
+                    .setFunctionAnchor(2)
+                    .setName("gt:any_any")
+                    .setExtensionUriReference(2))
+            .build();
+        extensions.add(extensionFunctionAdd);
+        extensions.add(extensionFunctionGreaterThan);
+        extensions.add(extensionFunctionLowerThan);
+
+        // Extended Expression
+        ExtendedExpression.Builder extendedExpressionBuilder =
+            ExtendedExpression.newBuilder().
+                addReferredExpr(0,
+                    expressionReferenceBuilderProjectOne).
+                addReferredExpr(1,
+                    expressionReferenceBuilderProjectTwo).
+                addReferredExpr(2,
+                    expressionReferenceBuilderFilterOne).
+                setBaseSchema(of.toProto());
+        extendedExpressionBuilder.addAllExtensionUris(extensionUris.values());
+        extendedExpressionBuilder.addAllExtensions(extensions);
+
+        ExtendedExpression extendedExpression = extendedExpressionBuilder.build();
+
+        // Print JSON
+        System.out.println(
+            JsonFormat.printer().includingDefaultValueFields().print(
+                extendedExpression));
+        // Print binary representation
+        System.out.println(Base64.getEncoder().encodeToString(
+            extendedExpression.toByteArray()));

Review Comment:
   ...there's not really much point printing this



##########
java/dataset/src/main/cpp/jni_wrapper.cc:
##########
@@ -696,3 +722,44 @@ JNIEXPORT void JNICALL
   JniAssertOkOrThrow(arrow::ExportRecordBatchReader(reader_out, arrow_stream_out));
   JNI_METHOD_END()
 }
+
+/*
+ * Class:     org_apache_arrow_dataset_substrait_JniWrapper
+ * Method:    executeDeserializeExpressions
+ * Signature: (Ljava/nio/ByteBuffer;)[Ljava/lang/String;
+ */
+JNIEXPORT jobjectArray JNICALL
+    Java_org_apache_arrow_dataset_substrait_JniWrapper_executeDeserializeExpressions (
+    JNIEnv* env, jobject, jobject expression) {
+  JNI_METHOD_START
+  auto *buff = reinterpret_cast<jbyte*>(env->GetDirectBufferAddress(expression));
+  int length = env->GetDirectBufferCapacity(expression);
+  std::shared_ptr<arrow::Buffer> buffer = JniGetOrThrow(arrow::AllocateBuffer(length));
+  std::memcpy(buffer->mutable_data(), buff, length);
+  // execute expression
+      arrow::engine::BoundExpressions round_tripped =
+    JniGetOrThrow(arrow::engine::DeserializeExpressions(*buffer));
+  // validate is not empty!
+  // create response
+  int totalExpression = round_tripped.named_expressions.size();
+  jobjectArray extendedExpressionOutput = (jobjectArray)env->NewObjectArray(totalExpression*2,env->FindClass("java/lang/String"),0);
+  int i; int j = 0;
+  for (i=0; i<totalExpression; i++) {
+    env->SetObjectArrayElement(
+      extendedExpressionOutput,
+      j++,
+      env->NewStringUTF(
+        round_tripped.named_expressions[i].name.c_str()
+      )
+    );
+    env->SetObjectArrayElement(
+      extendedExpressionOutput,
+      j++,
+      env->NewStringUTF(
+        round_tripped.named_expressions[i].expression.ToString().c_str()
+      )
+    );
+  }

Review Comment:
   ```suggestion
     jobjectArray extendedExpressionOutput = (jobjectArray)env->NewObjectArray(totalExpression*2,env->FindClass("java/lang/String"),0);
     int j = 0;
     for (const auto& expression : round_tripped.named_expressions) {
       env->SetObjectArrayElement(
         extendedExpressionOutput,
         j++,
         env->NewStringUTF(expression.name.c_str())
       );
       env->SetObjectArrayElement(
         extendedExpressionOutput,
         j++,
         env->NewStringUTF(expression.expression.ToString().c_str())
       );
     }
   ```



##########
java/dataset/src/test/java/org/apache/arrow/dataset/substrait/TestAceroSubstraitConsumer.java:
##########
@@ -204,4 +206,132 @@ public void testRunBinaryQueryNamedTableNation() throws Exception {
       }
     }
   }
+
+  @Test
+  public void testDeserializeExtendedExpressions() {
+    // Extended Expression 01 (`add` `2` to column `id`): id + 2
+    // Extended Expression 02 (`concatenate` column `name` || column `name`): name || name
+    // Extended Expression 03 (`filter` 'id' < 20): id < 20
+    // Extended expression result: [add_two_to_column_a, add(FieldPath(0), 2),
+    // concat_column_a_and_b, binary_join_element_wise(FieldPath(1), FieldPath(1), ""),
+    // filter_one, (FieldPath(0) < 20)]
+    String binaryExtendedExpressions = "Ch4IARIaL2Z1bmN0aW9uc19hcml0aG1ldGljLnlhbWwKHggCEhovZnVuY3Rpb25zX2NvbXBhcmlz" +
+        "b24ueWFtbBIRGg8IARoLYWRkOmkzMl9pMzISFBoSCAIQARoMY29uY2F0OnZjaGFyEhIaEAgCEAIaCmx0OmFueV9hbnkaMQoaGhgaBCoCEAE" +
+        "iCBoGEgQKAhIAIgYaBAoCKAIaE2FkZF90d29fdG9fY29sdW1uX2EaOwoiGiAIARoEYgIQASIKGggSBgoEEgIIASIKGggSBgoEEgIIARoVY2" +
+        "9uY2F0X2NvbHVtbl9hX2FuZF9iGjcKHBoaCAIaBAoCEAEiCBoGEgQKAhIAIgYaBAoCKBQaF2ZpbHRlcl9pZF9sb3dlcl90aGFuXzIwIhoKA" +
+        "klECgROQU1FEg4KBCoCEAEKBGICEAEYAg==";
+    // get binary plan
+    byte[] expression = Base64.getDecoder().decode(binaryExtendedExpressions);
+    ByteBuffer substraitExpression = ByteBuffer.allocateDirect(expression.length);
+    substraitExpression.put(expression);
+    // deserialize extended expression
+    List<String> extededExpressionList =
+        new AceroSubstraitConsumer(rootAllocator()).runDeserializeExpressions(substraitExpression);
+    assertEquals(3, extededExpressionList.size() / 2);
+    assertEquals("add_two_to_column_a", extededExpressionList.get(0));
+    assertEquals("add(FieldPath(0), 2)", extededExpressionList.get(1));
+    assertEquals("concat_column_a_and_b", extededExpressionList.get(2));
+    assertEquals("binary_join_element_wise(FieldPath(1), FieldPath(1), \"\")", extededExpressionList.get(3));
+    assertEquals("filter_id_lower_than_20", extededExpressionList.get(4));
+    assertEquals("(FieldPath(0) < 20)", extededExpressionList.get(5));
+  }

Review Comment:
   I don't think this test is useful. We're just testing the C++ code without any purpose.



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.
+
+Here is an example of a Java program that queries a Parquet file to project new
+columns and also filter then based on Extended Expression definitions. This example
+show us:
+
+- Load TPCH parquet file Nation.parquet.
+- Produce new Projections and apply Filter into dataset using extended expression definition.
+    - Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3.
+    - Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10.
+    - Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18.
+
+.. code-block:: Java
+
+    import java.nio.ByteBuffer;
+    import java.util.ArrayList;
+    import java.util.Arrays;
+    import java.util.Base64;
+    import java.util.HashMap;
+    import java.util.List;
+    import java.util.Optional;
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+
+    import com.google.protobuf.InvalidProtocolBufferException;
+    import com.google.protobuf.util.JsonFormat;
+
+    import io.substrait.proto.Expression;
+    import io.substrait.proto.ExpressionReference;
+    import io.substrait.proto.ExtendedExpression;
+    import io.substrait.proto.FunctionArgument;
+    import io.substrait.proto.SimpleExtensionDeclaration;
+    import io.substrait.proto.SimpleExtensionURI;
+    import io.substrait.type.NamedStruct;
+    import io.substrait.type.Type;
+    import io.substrait.type.TypeCreator;
+    import io.substrait.type.proto.TypeProtoConverter;
+
+    public class ClientSubstraitExtendedExpressions {
+      public static void main(String[] args) throws Exception {
+        // create extended expression for: project two new columns + one filter
+        String binaryExtendedExpressions = createExtendedExpresionMessageUsingPOJOClasses();
+        // project and filter dataset using extended expression definition - 03 Expressions:
+        // Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3
+        // Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10
+        // Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18

Review Comment:
   ```suggestion
   ```



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.
+
+Here is an example of a Java program that queries a Parquet file to project new
+columns and also filter then based on Extended Expression definitions. This example
+show us:
+
+- Load TPCH parquet file Nation.parquet.
+- Produce new Projections and apply Filter into dataset using extended expression definition.
+    - Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3.
+    - Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10.
+    - Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18.
+
+.. code-block:: Java
+
+    import java.nio.ByteBuffer;
+    import java.util.ArrayList;
+    import java.util.Arrays;
+    import java.util.Base64;
+    import java.util.HashMap;
+    import java.util.List;
+    import java.util.Optional;
+
+    import org.apache.arrow.dataset.file.FileFormat;
+    import org.apache.arrow.dataset.file.FileSystemDatasetFactory;
+    import org.apache.arrow.dataset.jni.NativeMemoryPool;
+    import org.apache.arrow.dataset.scanner.ScanOptions;
+    import org.apache.arrow.dataset.scanner.Scanner;
+    import org.apache.arrow.dataset.source.Dataset;
+    import org.apache.arrow.dataset.source.DatasetFactory;
+    import org.apache.arrow.memory.BufferAllocator;
+    import org.apache.arrow.memory.RootAllocator;
+    import org.apache.arrow.vector.ipc.ArrowReader;
+
+    import com.google.protobuf.InvalidProtocolBufferException;
+    import com.google.protobuf.util.JsonFormat;
+
+    import io.substrait.proto.Expression;
+    import io.substrait.proto.ExpressionReference;
+    import io.substrait.proto.ExtendedExpression;
+    import io.substrait.proto.FunctionArgument;
+    import io.substrait.proto.SimpleExtensionDeclaration;
+    import io.substrait.proto.SimpleExtensionURI;
+    import io.substrait.type.NamedStruct;
+    import io.substrait.type.Type;
+    import io.substrait.type.TypeCreator;
+    import io.substrait.type.proto.TypeProtoConverter;
+
+    public class ClientSubstraitExtendedExpressions {
+      public static void main(String[] args) throws Exception {
+        // create extended expression for: project two new columns + one filter

Review Comment:
   ```suggestion
   ```



##########
docs/source/java/substrait.rst:
##########
@@ -102,6 +104,335 @@ Here is an example of a Java program that queries a Parquet file using Java Subs
     0	ALGERIA	0	 haggle. carefully final deposits detect slyly agai
     1	ARGENTINA	1	al foxes promise slyly according to the regular accounts. bold requests alon
 
+Executing Projections and Filters Using Extended Expressions
+============================================================
+
+Using `Extended Expression`_ we could leverage our current Dataset operations to
+also support Projections and Filters by. To gain access to Projections and Filters
+is needed to define that operations using current Extended Expression Java POJO
+classes defined into `Substrait Java`_ project.
+
+Here is an example of a Java program that queries a Parquet file to project new
+columns and also filter then based on Extended Expression definitions. This example
+show us:
+
+- Load TPCH parquet file Nation.parquet.
+- Produce new Projections and apply Filter into dataset using extended expression definition.
+    - Expression 01 - CONCAT: N_NAME || ' - ' || N_COMMENT = col 1 || ' - ' || col 3.
+    - Expression 02 - ADD: N_REGIONKEY + 10 = col 1 + 10.
+    - Expression 03 - FILTER: N_NATIONKEY > 18 = col 3 > 18.

Review Comment:
   ```suggestion
   This Java program:
   
   - Loads a Parquet file containing the "nation" table from the TPC-H benchmark.
   - Projects two new columns:
       - ``N_NAME || ' - ' || N_COMMENT``
       - ``N_REGIONKEY + 10``
   - Applies a filter: ``N_NATIONKEY > 18``
   ```



##########
java/dataset/src/main/java/org/apache/arrow/dataset/substrait/AceroSubstraitConsumer.java:
##########
@@ -90,6 +91,15 @@ public ArrowReader runQuery(ByteBuffer plan, Map<String, ArrowReader> namedTable
     return execute(plan, namedTables);
   }
 
+  public List<String> runDeserializeExpressions(ByteBuffer plan) {

Review Comment:
   Docstrings? I don't think we want these at all though? 



##########
java/dataset/src/test/java/org/apache/arrow/dataset/TestDataset.java:
##########
@@ -79,6 +79,7 @@ protected List<ArrowRecordBatch> collectTaskData(Scanner scan) {
       List<ArrowRecordBatch> batches = new ArrayList<>();
       while (reader.loadNextBatch()) {
         VectorSchemaRoot root = reader.getVectorSchemaRoot();
+        System.out.println(root.getSchema());

Review Comment:
   ```suggestion
   ```



##########
java/dataset/src/main/java/org/apache/arrow/dataset/jni/NativeDataset.java:
##########
@@ -36,12 +36,14 @@ public NativeDataset(NativeContext context, long datasetId) {
   }
 
   @Override
+  @SuppressWarnings("ArrayToString")

Review Comment:
   What are we suppressing here?



##########
java/dataset/src/main/java/org/apache/arrow/dataset/scanner/ScanOptions.java:
##########
@@ -56,10 +58,26 @@ public ScanOptions(long batchSize, Optional<String[]> columns) {
     Preconditions.checkNotNull(columns);
     this.batchSize = batchSize;
     this.columns = columns;
+    this.projectExpression = Optional.empty();
+  }
+
+  /**
+   * Constructor.
+   * @param batchSize Maximum row number of each returned {@link org.apache.arrow.vector.ipc.message.ArrowRecordBatch}
+   * @param columns (Optional) Projected columns. {@link Optional#empty()} for scanning all columns. Otherwise,
+   *                Only columns present in the Array will be scanned.
+   * @param projectExpression (Optional) Expressions to evaluate to produce columns
+   */
+  public ScanOptions(long batchSize, Optional<String[]> columns, Optional<ByteBuffer> projectExpression) {

Review Comment:
   We should just use a builder at this point...



##########
java/dataset/src/main/java/org/apache/arrow/dataset/jni/JniWrapper.java:
##########
@@ -67,14 +69,16 @@ private JniWrapper() {
   /**
    * Create Scanner from a Dataset and get the native pointer of the Dataset.
    * @param datasetId the native pointer of the arrow::dataset::Dataset instance.
-   * @param columns desired column names.
-   *                Columns not in this list will not be emitted when performing scan operation. Null equals
-   *                to "all columns".
+   * @param columnsSubset desired column names. Columns not in this list will not be emitted when performing scan
+   *                      operation. Null equals to "all columns".
+   * @param columnsToProduce Set expressions which will be evaluated to produce the materialized columns. Null equals
+   *                         to "no produce".

Review Comment:
   ```suggestion
      * @param columnsToProduce Expressions to materialize new columns (if desired).
   ```



##########
java/dataset/src/main/java/org/apache/arrow/dataset/substrait/JniWrapper.java:
##########
@@ -69,5 +69,8 @@ public native void executeSerializedPlan(String planInput, String[] mapTableToMe
    * @param memoryAddressOutput the memory address where RecordBatchReader is exported.
    */
   public native void executeSerializedPlan(ByteBuffer planInput, String[] mapTableToMemoryAddressInput,
-                                                      long memoryAddressOutput);
+                                           long memoryAddressOutput);
+
+  // add description

Review Comment:
   This needs to be done?



##########
java/dataset/src/main/cpp/jni_wrapper.cc:
##########
@@ -696,3 +722,44 @@ JNIEXPORT void JNICALL
   JniAssertOkOrThrow(arrow::ExportRecordBatchReader(reader_out, arrow_stream_out));
   JNI_METHOD_END()
 }
+
+/*
+ * Class:     org_apache_arrow_dataset_substrait_JniWrapper
+ * Method:    executeDeserializeExpressions
+ * Signature: (Ljava/nio/ByteBuffer;)[Ljava/lang/String;
+ */
+JNIEXPORT jobjectArray JNICALL
+    Java_org_apache_arrow_dataset_substrait_JniWrapper_executeDeserializeExpressions (
+    JNIEnv* env, jobject, jobject expression) {
+  JNI_METHOD_START
+  auto *buff = reinterpret_cast<jbyte*>(env->GetDirectBufferAddress(expression));
+  int length = env->GetDirectBufferCapacity(expression);
+  std::shared_ptr<arrow::Buffer> buffer = JniGetOrThrow(arrow::AllocateBuffer(length));
+  std::memcpy(buffer->mutable_data(), buff, length);
+  // execute expression
+      arrow::engine::BoundExpressions round_tripped =

Review Comment:
   C++ code needs to be formatted



##########
java/dataset/src/test/java/org/apache/arrow/dataset/substrait/TestAceroSubstraitConsumer.java:
##########
@@ -204,4 +206,132 @@ public void testRunBinaryQueryNamedTableNation() throws Exception {
       }
     }
   }
+
+  @Test
+  public void testDeserializeExtendedExpressions() {
+    // Extended Expression 01 (`add` `2` to column `id`): id + 2
+    // Extended Expression 02 (`concatenate` column `name` || column `name`): name || name
+    // Extended Expression 03 (`filter` 'id' < 20): id < 20
+    // Extended expression result: [add_two_to_column_a, add(FieldPath(0), 2),
+    // concat_column_a_and_b, binary_join_element_wise(FieldPath(1), FieldPath(1), ""),
+    // filter_one, (FieldPath(0) < 20)]
+    String binaryExtendedExpressions = "Ch4IARIaL2Z1bmN0aW9uc19hcml0aG1ldGljLnlhbWwKHggCEhovZnVuY3Rpb25zX2NvbXBhcmlz" +
+        "b24ueWFtbBIRGg8IARoLYWRkOmkzMl9pMzISFBoSCAIQARoMY29uY2F0OnZjaGFyEhIaEAgCEAIaCmx0OmFueV9hbnkaMQoaGhgaBCoCEAE" +
+        "iCBoGEgQKAhIAIgYaBAoCKAIaE2FkZF90d29fdG9fY29sdW1uX2EaOwoiGiAIARoEYgIQASIKGggSBgoEEgIIASIKGggSBgoEEgIIARoVY2" +
+        "9uY2F0X2NvbHVtbl9hX2FuZF9iGjcKHBoaCAIaBAoCEAEiCBoGEgQKAhIAIgYaBAoCKBQaF2ZpbHRlcl9pZF9sb3dlcl90aGFuXzIwIhoKA" +
+        "klECgROQU1FEg4KBCoCEAEKBGICEAEYAg==";
+    // get binary plan
+    byte[] expression = Base64.getDecoder().decode(binaryExtendedExpressions);
+    ByteBuffer substraitExpression = ByteBuffer.allocateDirect(expression.length);
+    substraitExpression.put(expression);
+    // deserialize extended expression
+    List<String> extededExpressionList =
+        new AceroSubstraitConsumer(rootAllocator()).runDeserializeExpressions(substraitExpression);
+    assertEquals(3, extededExpressionList.size() / 2);
+    assertEquals("add_two_to_column_a", extededExpressionList.get(0));
+    assertEquals("add(FieldPath(0), 2)", extededExpressionList.get(1));
+    assertEquals("concat_column_a_and_b", extededExpressionList.get(2));
+    assertEquals("binary_join_element_wise(FieldPath(1), FieldPath(1), \"\")", extededExpressionList.get(3));
+    assertEquals("filter_id_lower_than_20", extededExpressionList.get(4));
+    assertEquals("(FieldPath(0) < 20)", extededExpressionList.get(5));
+  }
+
+  @Test
+  public void testBaseParquetReadWithExtendedExpressionsProjectAndFilter() throws Exception {
+    // Extended Expression 01 (`add` `2` to column `id`): id + 2
+    // Extended Expression 02 (`concatenate` column `name` || column `name`): name || name
+    // Extended Expression 03 (`filter` 'id' < 20): id < 20
+    // Extended expression result: [add_two_to_column_a, add(FieldPath(0), 2),
+    // concat_column_a_and_b, binary_join_element_wise(FieldPath(1), FieldPath(1), ""),
+    // filter_one, (FieldPath(0) < 20)]
+    // Base64.getEncoder().encodeToString(plan.toByteArray()): Generated throughout Substrait POJO Extended Expressions
+    String binaryExtendedExpressions = "Ch4IARIaL2Z1bmN0aW9uc19hcml0aG1ldGljLnlhbWwKHggCEhovZnVuY3Rpb25zX2NvbXBhcmlz" +
+        "b24ueWFtbBIRGg8IARoLYWRkOmkzMl9pMzISFBoSCAIQARoMY29uY2F0OnZjaGFyEhIaEAgCEAIaCmx0OmFueV9hbnkaMQoaGhgaBCoCEAE" +
+        "iCBoGEgQKAhIAIgYaBAoCKAIaE2FkZF90d29fdG9fY29sdW1uX2EaOwoiGiAIARoEYgIQASIKGggSBgoEEgIIASIKGggSBgoEEgIIARoVY2" +
+        "9uY2F0X2NvbHVtbl9hX2FuZF9iGjcKHBoaCAIaBAoCEAEiCBoGEgQKAhIAIgYaBAoCKBQaF2ZpbHRlcl9pZF9sb3dlcl90aGFuXzIwIhoKA" +
+        "klECgROQU1FEg4KBCoCEAEKBGICEAEYAg==";
+    Map<String, String> metadataSchema = new HashMap<>();
+    metadataSchema.put("parquet.avro.schema", "{\"type\":\"record\",\"name\":\"Users\"," +
+        "\"namespace\":\"org.apache.arrow.dataset\",\"fields\":[{\"name\":\"id\"," +
+        "\"type\":[\"int\",\"null\"]},{\"name\":\"name\",\"type\":[\"string\",\"null\"]}]}");
+    metadataSchema.put("writer.model.name", "avro");

Review Comment:
   Why do we need this?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org