You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "collimarco (via GitHub)" <gi...@apache.org> on 2023/06/05 15:10:15 UTC

[GitHub] [arrow] collimarco opened a new issue, #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

collimarco opened a new issue, #35915:
URL: https://github.com/apache/arrow/issues/35915

   ### Describe the enhancement requested
   
   I have this code:
   
   ```ruby
   table = Arrow::Table.load(s3_uri, format: :parquet)
   puts table.slice { |slicer| (slicer['status'] == 200) & (slicer['message'].match_substring? 'foo') }
   ```
   
   Now I would like to rewrite it more efficiently using condition pushdown:
   
   ```ruby
   table = Arrow::Table.load(s3_uri, format: :parquet, filter: [[:equal, :status, 200], [:match_substring, :message, 'foo']])
   puts table
   ```
   
   However this code doesn't work (`invalid argument Array` for the filter). 
   
   Any idea how to rewrite it correctly? I can't find any documentation about multiple filter conditions.
   
   
   
   ### Component(s)
   
   Ruby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35915:
URL: https://github.com/apache/arrow/issues/35915#issuecomment-1578049287

   Oh, #35927 fixes it.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35915:
URL: https://github.com/apache/arrow/issues/35915#issuecomment-1577518604

   Ah, we need to create an option object for `match_substring`:
   
   ```ruby
   match_substring_options = Arrow::MatchSubstringOptions.new
   match_substring_options.pattern = 'foo'
   table = Arrow::Table.load(s3_uri, format: :parquet, filter: [:and, [:equal, :status, 200], [:match_substring, :message, match_substring_options]])
   ```
   
   But `filter:  [:and, [:equal, :status, 200], [:match_substring, :message, {pattern: 'foo'}]]` shortcut should be implemented.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] collimarco commented on issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

Posted by "collimarco (via GitHub)" <gi...@apache.org>.
collimarco commented on issue #35915:
URL: https://github.com/apache/arrow/issues/35915#issuecomment-1577503068

   @kou It seems to work, but there is another problem with the function in the filter. Now I get this error:
   
   ```
   /Users/collimarco/.rbenv/versions/3.2.2/lib/ruby/gems/3.2.0/gems/gobject-introspection-4.1.6/lib/gobject-introspection/loader.rb:705:in `invoke': 
   [scanner-builder][filter][set]: Invalid: Function 'match_substring' accepts 1 arguments but 2 passed (Arrow::Error::Invalid)
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] collimarco commented on issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

Posted by "collimarco (via GitHub)" <gi...@apache.org>.
collimarco commented on issue #35915:
URL: https://github.com/apache/arrow/issues/35915#issuecomment-1578027805

   @kou Unfortunately the new code gives  a segmentation fault:
   
   ```
   searchparquet.rb: [BUG] Segmentation fault at 0x0000000000000000
   ruby 3.2.2 (2023-03-30 revision e51014f9c0) [x86_64-darwin22]
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #35915:
URL: https://github.com/apache/arrow/issues/35915#issuecomment-1577489174

   Could you try `filter: [:and, [:equal, :status, 200], [:match_substring, :message, 'foo']]`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou closed issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou closed issue #35915: [Ruby] Multiple filter conditions in Arrow::Table.load
URL: https://github.com/apache/arrow/issues/35915


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org