You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "collimarco (via GitHub)" <gi...@apache.org> on 2023/06/25 11:39:19 UTC
[GitHub] [arrow] collimarco opened a new issue, #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)
collimarco opened a new issue, #36291:
URL: https://github.com/apache/arrow/issues/36291
### Describe the enhancement requested
Currently you can filter the rows in a Parquet file using a code like this:
```ruby
Arrow::FileInputStream.open(file) do |input|
reader = Parquet::ArrowFileReader.new(input)
reader.n_row_groups.times do |i|
table = reader.read_row_group(i)
result = table.slice { |slicer| (slicer['status'] == 200) & (slicer['message'].match_substring? 'test') }
puts result if result.n_rows > 0
end
end
```
The problem is that the filter (`slice` condition) is statically defined / it's defined at "compile" time.
If you receive a user input you cannot build a query.
For example, with SQL you would build a query string dynamically based on the inputs from user.
The only solutions that I see, to allow the building of queries at runtime, is to make `slice` accept:
1. a string of conditions (e.g. `status = 200 AND message MATCHES 'test'`)
2. OR, an array of conditions (e.g. `[[:equal, :status, 200], [:matches, :message, 'test']]`)
Building a string or an array of conditions dynamically would be easy and that would allow to build query dynamically at runtime. This is not possible right now.
### Component(s)
Ruby
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou closed issue #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou closed issue #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)
URL: https://github.com/apache/arrow/issues/36291
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org
[GitHub] [arrow] kou commented on issue #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)
Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #36291:
URL: https://github.com/apache/arrow/issues/36291#issuecomment-1606251661
It's dynamic. Here are examples to build conditions by arguments:
```ruby
def read1(status, message)
Arrow::FileInputStream.open(file) do |input|
reader = Parquet::ArrowFileReader.new(input)
reader.n_row_groups.times do |i|
table = reader.read_row_group(i)
result = table.slice { |slicer| (slicer['status'] == status) & (slicer['message'].match_substring? message) }
puts result if result.n_rows > 0
end
end
end
```
```ruby
def read2(status: nil, message: nil)
Arrow::FileInputStream.open(file) do |input|
reader = Parquet::ArrowFileReader.new(input)
reader.n_row_groups.times do |i|
table = reader.read_row_group(i)
result = table.slice do |slicer|
conditions = []
conditions << (slicer['status'] == status) if status
conditions << (slicer['message'].match_substring? message) if message
conditions.inject(:&)
end
puts result if result.n_rows > 0
end
end
end
```
```ruby
def read3(conditions)
Arrow::FileInputStream.open(file) do |input|
reader = Parquet::ArrowFileReader.new(input)
reader.n_row_groups.times do |i|
table = reader.read_row_group(i)
result = table.slice do |slicer|
conditions = conditions.collect do |operator, target, *args|
slicer[target].public_send(operator, *args)
end
conditions.inject(:&)
end
puts result if result.n_rows > 0
end
end
end
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org
For queries about this service, please contact Infrastructure at:
users@infra.apache.org