You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by "nseekhao (via GitHub)" <gi...@apache.org> on 2023/06/08 22:17:52 UTC

[GitHub] [arrow-datafusion] nseekhao commented on issue #6410: Substrait: InList support for strings

nseekhao commented on issue #6410:
URL: https://github.com/apache/arrow-datafusion/issues/6410#issuecomment-1583461966

   Thank you @jayzhan211 for looking into this issue. First of all, I want to apologize for the late reply. I also reread my issue description and there were a bit of a mixup in the example since I was testing multiple cases. Having said that I think there are a few things I should clarify.
   
   It seems like the error I got was not from the list element being type string. But rather that the column is a cast to varchar expression. If you would like to reproduce this you can add the test case:
   ```rust
   // PASS
   #[tokio::test]
   async fn roundtrip_inlist_3() -> Result<()> {
       // Use `assert_expected_plan` here due to alias expression by-passing
       assert_expected_plan(
           "SELECT * FROM data WHERE CAT(b AS int) IN (1, 2, 3)",
           "Filter: data.b = Decimal128(Some(100),5,2) OR data.b = Decimal128(Some(200),5,2) OR data.b = Decimal128(Some(300),5,2)\
           \n  TableScan: data projection=[a, b, c, d, e, f], partial_filters=[data.b = Decimal128(Some(100),5,2) OR data.b = Decimal128(Some(200),5,2) OR data.b = Decimal128(Some(300),5,2)]"
       ).await
   }
   
   // FAIL
   #[tokio::test]
   async fn roundtrip_inlist_4() -> Result<()> {
       roundtrip("SELECT * FROM data WHERE CAST(f AS varchar) IN ('a', 'b', 'c')").await
   }
   // ERROR LOG
   // Error: NotImplemented("Unsupported expression: CAST(data.f AS Utf8) IN ([Utf8(\"a\"), Utf8(\"b\"), Utf8(\"c\")])")
   ```
   This caused my misunderstanding in terms of what the problem actually was so, for this issue, **please ignore these two cases**.
   
   To answer question 2, I think we should support `InList` when the length is over `THRESHOLD_INLINE_INLIST` in Substrait. However, I do think that the threshold of `3` is a little restrictive. **Do you know if there is a way to overwrite that value while the logical plan is being optimized (before we feed it into the Substrait producer)?**


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org