You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/10/05 20:08:31 UTC

[GitHub] [arrow] vibhatha opened a new pull request, #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

vibhatha opened a new pull request, #14295:
URL: https://github.com/apache/arrow/pull/14295

   The existing test cases didn't cover multi-expressions and the faulty logic wasn't covered because of that. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1265821810

   Thanks!
   
   I now get a different issue (also probably related to the Substrait that I'm generating!)
   
   ``` r
   library(arrow, warn.conflicts = FALSE)
   #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   
   plan_as_json <- '{
     "extensionUris": [
       {
         "extensionUriAnchor": 1,
         "uri": "https://github.com/apache/arrow/blob/master/format/substrait/extension_types.yaml"
       }
     ],
     "relations": [
       {
         "rel": {
           "project": {
             "common": {
               "emit": {"outputMapping": [2]
               }
             },
             "input": {
               "read": {
                 "baseSchema": {
                   "names": ["int"],
                   "struct": {
                     "types": [{"i32": {"nullability": "NULLABILITY_NULLABLE"}}]
                   }
                 },
                 "localFiles": {
                   "items": [
                     {
                       "uriFile": "file:///var/folders/gt/l87wjg8s7312zs9s7c1fgs900000gn/T//Rtmph5wiTQ/file1260f24ace0d2",
                       "parquet": {
   
                       }
                     }
                   ]
                 }
               }
             },
             "expressions": [
               {"selection": {"directReference": {"structField": {"field": 0}}}}
             ]
           }
         }
       }
     ]
   }'
   
   temp_parquet <- tempfile()
   write_parquet(data.frame(int = 1:5), temp_parquet)
   plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json)
   arrow:::do_exec_plan_substrait(plan_as_json)
   #> Error: Invalid: No match for FieldRef.FieldPath(2) in FieldPath(0): int32
   #> FieldPath(0): int32
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/type.h:1796  CheckNonEmpty(matches, root)
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/expression.cc:437  ref->FindOne(in)
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/project_node.cc:68  expr.Bind(*inputs[0]->output_schema(), plan->exec_context())
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:535  MakeExecNode(this->factory_name, plan, std::move(inputs), *this->options, registry)
   #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:530  std::get<Declaration>(input).AddToPlan(plan, registry)
   ```
   
   <sup>Created on 2022-10-03 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)</sup>


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] ursabot commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
ursabot commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1279924300

   Benchmark runs are scheduled for baseline = 2a5e47c16091de4d3151d1c2176685df3dd78c88 and contender = c600dd603e0850eff8c01417dffe90c197d1e6cd. c600dd603e0850eff8c01417dffe90c197d1e6cd is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
   Conbench compare runs links:
   [Finished :arrow_down:0.0% :arrow_up:0.0%] [ec2-t3-xlarge-us-east-2](https://conbench.ursa.dev/compare/runs/3674f5749d44434f83225fcd6f08775c...a08230201a564770a15c8c7302fc319f/)
   [Failed :arrow_down:0.56% :arrow_up:0.0%] [test-mac-arm](https://conbench.ursa.dev/compare/runs/d689ae16b38a4ec9b147194eea221bb0...1d5d811342f04e638775c0a3347b50d8/)
   [Failed :arrow_down:0.0% :arrow_up:0.0%] [ursa-i9-9960x](https://conbench.ursa.dev/compare/runs/2f8fcb9146874a038569acdb1f6115aa...532de7039a4849159a78a723c2fb76c3/)
   [Finished :arrow_down:0.07% :arrow_up:0.0%] [ursa-thinkcentre-m75q](https://conbench.ursa.dev/compare/runs/647851d7413b4ed7a2119fdf58bd18f8...fcf265890dd8485e956bc0a645b8018a/)
   Buildkite builds:
   [Finished] [`c600dd60` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1706)
   [Failed] [`c600dd60` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1725)
   [Failed] [`c600dd60` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1708)
   [Finished] [`c600dd60` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1719)
   [Finished] [`2a5e47c1` ec2-t3-xlarge-us-east-2](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ec2-t3-xlarge-us-east-2/builds/1705)
   [Failed] [`2a5e47c1` test-mac-arm](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-test-mac-arm/builds/1724)
   [Failed] [`2a5e47c1` ursa-i9-9960x](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-i9-9960x/builds/1707)
   [Finished] [`2a5e47c1` ursa-thinkcentre-m75q](https://buildkite.com/apache-arrow/arrow-bci-benchmark-on-ursa-thinkcentre-m75q/builds/1718)
   Supported benchmarks:
   ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
   test-mac-arm: Supported benchmark langs: C++, Python, R
   ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
   ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1265795980

   https://issues.apache.org/jira/browse/ARROW-17915


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] paleolimbot commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
paleolimbot commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1270031191

   Woohoo! All our tests pass with Arrow built off of this branch. I have no idea why the CI needed approval (but I hit the button).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1269542129

   > You are totally right...sorry for the indirection! That is indeed a tempfile path on a Mac (I forgot to restart my session before reprexing that one 😬 ).
   > 
   > After fixing the issues you noted, I'm down to one failing test after building on your branch! Again, it's an index error but I'm _pretty sure_ it's not an off-by-one (or maybe you can help me spot my error!). What I _think_ this plan does is:
   > 
   > * Reads a 1-column Parquet
   > * Computes two new columns: a copy of the first column, and the first column plus 10, then uses emit to only select the second and third columns of the output.
   > * Computes one new column: a copy of the second column, then uses emit to only select the first column
   > 
   > Both of these steps work in isolation, but together, they fail.
   > 
   > ```r
   > library(arrow, warn.conflicts = FALSE)
   > #> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
   > 
   > plan_as_json <- '{
   >   "extensionUris": [
   >     {
   >       "extensionUriAnchor": 1,
   >       "uri": "https://github.com/substrait-io/substrait/blob/main/extensions/functions_arithmetic.yaml"
   >     }
   >   ],
   >   "extensions": [
   >     {
   >       "extensionFunction": {
   >         "extensionUriReference": 1,
   >         "functionAnchor": 2,
   >         "name": "add"
   >       }
   >     }
   >   ],
   >   "relations": [
   >     {
   >       "rel": {
   >         "project": {
   >           "common": {"emit": {"outputMapping": [2]}},
   >           "input": {
   >             "project": {
   >               "common": {"emit": {"outputMapping": [1, 2]}},
   >               "input": {
   >                 "read": {
   >                   "baseSchema": {
   >                     "names": ["int"],
   >                     "struct": {"types": [{"i32": {}}]}
   >                   },
   >                   "localFiles": {
   >                     "items": [
   >                       {"uriFile": "file://THIS_IS_THE_TEMP_FILE", "parquet": {}}
   >                     ]
   >                   }
   >                 }
   >               },
   >               "expressions": [
   >                 {"selection": {"directReference": {"structField": {"field": 0}}}},
   >                 {
   >                   "scalarFunction": {
   >                     "functionReference": 2,
   >                     "outputType": {"i32": {}},
   >                     "arguments": [
   >                       {"enum": {"unspecified": {}}},
   >                       {"value": {"selection": {"directReference": {"structField": {"field": 0}}}}},
   >                       {"value": {"literal": {"fp64": 10}}}
   >                     ]
   >                   }
   >                 }
   >               ]
   >             }
   >           },
   >           "expressions": [
   >             {"selection": {"directReference": {"structField": {"field": 0}}}}
   >           ]
   >         }
   >       }
   >     }
   >   ]
   > }'
   > 
   > temp_parquet <- tempfile()
   > write_parquet(data.frame(int = 1:5), temp_parquet)
   > plan_as_json <- gsub("THIS_IS_THE_TEMP_FILE", temp_parquet, plan_as_json)
   > arrow:::do_exec_plan_substrait(plan_as_json)
   > #> Error: Invalid: No match for FieldRef.FieldPath(2) in FieldPath(1): int32
   > #> FieldPath(2): double
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/type.h:1796  CheckNonEmpty(matches, root)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/expression.cc:437  ref->FindOne(in)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/project_node.cc:68  expr.Bind(*inputs[0]->output_schema(), plan->exec_context())
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:535  MakeExecNode(this->factory_name, plan, std::move(inputs), *this->options, registry)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:530  std::get<Declaration>(input).AddToPlan(plan, registry)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:530  std::get<Declaration>(input).AddToPlan(plan, registry)
   > ```
   > 
   > Created on 2022-10-04 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)
   > 
   > The more readable dplyr synatax is:
   > 
   > ```r
   > # remotes::install_github("paleolimbot/substrait-r@enable-arrow")
   > # (requires libarrow built against #14295)
   > library(substrait, warn.conflicts = FALSE)
   > library(dplyr, warn.conflicts = FALSE)
   > 
   > # Adds a column to the end
   > data.frame(int = 1:5) |> 
   >   arrow_substrait_compiler() |> 
   >   mutate(int10 = int + 10) |> 
   >   collect()
   > #> # A tibble: 5 × 2
   > #>     int int10
   > #>   <int> <dbl>
   > #> 1     1    11
   > #> 2     2    12
   > #> 3     3    13
   > #> 4     4    14
   > #> 5     5    15
   > 
   > # Selects the first column
   > data.frame(int = 1:5) |> 
   >   arrow_substrait_compiler() |> 
   >   select(int) |> 
   >   collect()
   > #> # A tibble: 5 × 1
   > #>     int
   > #>   <int>
   > #> 1     1
   > #> 2     2
   > #> 3     3
   > #> 4     4
   > #> 5     5
   > 
   > # Chaining together fails
   > data.frame(int = 1:5) |> 
   >   arrow_substrait_compiler() |> 
   >   mutate(int10 = int + 10) |> 
   >   select(int) |> 
   >   collect()
   > #> Error: Invalid: No match for FieldRef.FieldPath(2) in FieldPath(1): int32
   > #> FieldPath(2): double
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/type.h:1796  CheckNonEmpty(matches, root)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/expression.cc:437  ref->FindOne(in)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/project_node.cc:68  expr.Bind(*inputs[0]->output_schema(), plan->exec_context())
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:535  MakeExecNode(this->factory_name, plan, std::move(inputs), *this->options, registry)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:530  std::get<Declaration>(input).AddToPlan(plan, registry)
   > #> /Users/deweydunnington/Desktop/rscratch/arrow/cpp/src/arrow/compute/exec/exec_plan.cc:530  std::get<Declaration>(input).AddToPlan(plan, registry)
   > ```
   > 
   > Created on 2022-10-04 by the [reprex package](https://reprex.tidyverse.org) (v2.0.1)
   
   First of all thanks for finding out another issue. I added a fix for it. 
   
   Your understanding about the plan is accurate. 
   
   Flow of Operators: Read ==> Project -> Emit(Project) -> Project -> Emit(Project)
   Schema Field changing  ==>  1 Field -> 3 Fields -> 2 Fields -> 3 Fields -> 1 Field


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] bkietz merged pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
bkietz merged PR #14295:
URL: https://github.com/apache/arrow/pull/14295


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1265771306

   cc @paleolimbot could you please take a look. This was a trivial change (which was actually a typo in the original code which wasn't covered by the tests). Extremely sorry about the inconvenience. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1271131557

   @paleolimbot great!
   And thanks for running the CIs.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] github-actions[bot] commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
github-actions[bot] commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1265796019

   :warning: Ticket **has not been started in JIRA**, please click 'Start Progress'.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] vibhatha commented on pull request #14295: ARROW-17915: [C++] Error when using Substrait ProjectRel

Posted by GitBox <gi...@apache.org>.
vibhatha commented on PR #14295:
URL: https://github.com/apache/arrow/pull/14295#issuecomment-1269905043

   @westonpace CIs says the approval is needed. But I am not sure why. This is not my first contribution though :/ 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org