You are viewing a plain text version of this content. The canonical link for it is here.
Posted to issues@arrow.apache.org by "collimarco (via GitHub)" <gi...@apache.org> on 2023/06/25 11:39:19 UTC

[GitHub] [arrow] collimarco opened a new issue, #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)

collimarco opened a new issue, #36291:
URL: https://github.com/apache/arrow/issues/36291

   ### Describe the enhancement requested
   
   Currently you can filter the rows in a Parquet file using a code like this:
   
   ```ruby
       Arrow::FileInputStream.open(file) do |input|
         reader = Parquet::ArrowFileReader.new(input)
         reader.n_row_groups.times do |i|
           table = reader.read_row_group(i)
           result = table.slice { |slicer| (slicer['status'] == 200) & (slicer['message'].match_substring? 'test') }
           puts result if result.n_rows > 0
         end
       end
   ```
   
   The problem is that the filter (`slice` condition) is statically defined / it's defined at "compile" time.
   
   If you receive a user input you cannot build a query.
   
   For example, with SQL you would build a query string dynamically based on the inputs from user.
   
   The only solutions that I see, to allow the building of queries at runtime, is to make `slice` accept:
   
   1. a string of conditions (e.g. `status = 200 AND message MATCHES 'test'`) 
   2. OR,  an array of conditions (e.g. `[[:equal, :status, 200], [:matches, :message, 'test']]`)
   
   Building a string or an array of conditions dynamically would be easy and that would allow to build query dynamically at runtime. This is not possible right now.
   
   
   
   ### Component(s)
   
   Ruby


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou closed issue #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou closed issue #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)
URL: https://github.com/apache/arrow/issues/36291


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #36291: [Ruby] Allow dynamic queries defined at runtime (e.g. based on user input)

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on issue #36291:
URL: https://github.com/apache/arrow/issues/36291#issuecomment-1606251661

   It's dynamic. Here are examples to build conditions by arguments:
   
   ```ruby
   def read1(status, message)
     Arrow::FileInputStream.open(file) do |input|
       reader = Parquet::ArrowFileReader.new(input)
       reader.n_row_groups.times do |i|
         table = reader.read_row_group(i)
         result = table.slice { |slicer| (slicer['status'] == status) & (slicer['message'].match_substring? message) }
         puts result if result.n_rows > 0
       end
     end
   end
   ```
   
   ```ruby
   def read2(status: nil, message: nil)
     Arrow::FileInputStream.open(file) do |input|
       reader = Parquet::ArrowFileReader.new(input)
       reader.n_row_groups.times do |i|
         table = reader.read_row_group(i)
         result = table.slice do |slicer|
           conditions = []
           conditions << (slicer['status'] == status) if status
           conditions << (slicer['message'].match_substring? message) if message
           conditions.inject(:&)
         end
         puts result if result.n_rows > 0
       end
     end
   end
   ```
   
   ```ruby
   def read3(conditions)
     Arrow::FileInputStream.open(file) do |input|
       reader = Parquet::ArrowFileReader.new(input)
       reader.n_row_groups.times do |i|
         table = reader.read_row_group(i)
         result = table.slice do |slicer|
           conditions = conditions.collect do |operator, target, *args|
             slicer[target].public_send(operator, *args)
           end
           conditions.inject(:&)
         end
         puts result if result.n_rows > 0
       end
     end
   end
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org