You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/08/14 12:32:49 UTC

[GitHub] [arrow-datafusion] praveentiru opened a new issue #879: [Help] - Floating point operations with Integer columns

praveentiru opened a new issue #879:
URL: https://github.com/apache/arrow-datafusion/issues/879


   Posted from stackoverflow: [Original post](https://stackoverflow.com/questions/68750346/rust-datafusion-float-operations-in-dataframe-select)
   I am calculating fraction of orders that were not filled from a large list of order lines. I am using datafusion crate to perform analysis. I want to build a table that looks as shown below:
   
   ```
   +--------+--------------+---------------+--------------+
   | Month  | Total Orders | Missed Orders | Missed Ratio |
   +--------+--------------+---------------+--------------+
   | 201803 | 10           | 3             | 0.3          |
   +--------+--------------+---------------+--------------+
   ```
   
   To achieve this I have return following code:
   ```
       let result = record_count
           .select(vec![col("Month"), 
               col("Total Orders"), 
               col("Missed Orders"),
               (col("Missed Orders").cast_to(&DataType::Float64, &m_order_schema).unwrap() / col("Total Orders").cast_to(&DataType::Float64, &t_order_schema).unwrap()).alias("Service Level")])?;
   ```
   The total orders and missed orders column as integers so, I am casting them to float to get fraction. But, Service Level column comes out as integer with all zeros. Result looks as shown below:
   ```
   +--------+--------------+---------------+--------------+
   | Month  | Total Orders | Missed Orders | Missed Ratio |
   +--------+--------------+---------------+--------------+
   | 201803 | 10           | 3             | 0            |
   +--------+--------------+---------------+--------------+
   ```
   
   Question: How to perform floating point operations with integer columns?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] praveentiru commented on issue #879: [Help] - Floating point operations with Integer columns

Posted by GitBox <gi...@apache.org>.
praveentiru commented on issue #879:
URL: https://github.com/apache/arrow-datafusion/issues/879#issuecomment-932736766


   @alamb My bad. Service level = 1/Missed Ratio. I have edited the question to replace Service Level with Missed Ratio.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #879: [Help] - Floating point operations with Integer columns

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #879:
URL: https://github.com/apache/arrow-datafusion/issues/879#issuecomment-932727559


   Sorry for the late response
   
   > Service Level column comes out as integer with all zeros.
   
   I am not sure I understand the question -- there is no `Service Level` column in the output 
   
   If you could provide a self contained reproducer we might be able to help more


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] houqp commented on issue #879: [Help] - Floating point operations with Integer columns

Posted by GitBox <gi...@apache.org>.
houqp commented on issue #879:
URL: https://github.com/apache/arrow-datafusion/issues/879#issuecomment-933000038


   @praveentiru did you install your python binding from source? if not, it might be really out of date.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] alamb commented on issue #879: [Help] - Floating point operations with Integer columns

Posted by GitBox <gi...@apache.org>.
alamb commented on issue #879:
URL: https://github.com/apache/arrow-datafusion/issues/879#issuecomment-932901002


   @praveentiru I don't normally use the dataframe API (and for your case, the SQL interface might work better).
   
   But in any event, I tried to reproduce the problem you are having, and I was not able to. Here is the program I used:
   
   ```rust
   async fn test_cast() {
       let mut ctx = ExecutionContext::new();
   
       let month: Date32Array = vec![Some(1000)].into_iter().collect();
       let total_orders: Int64Array = vec![Some(10)].into_iter().collect();
       let missed_orders: Int64Array = vec![Some(3)].into_iter().collect();
   
       let batch = RecordBatch::try_from_iter(vec![
           ("Month", Arc::new(month) as ArrayRef),
           ("Total Orders", Arc::new(total_orders) as ArrayRef),
           ("Missed Orders", Arc::new(missed_orders) as ArrayRef),
       ]).unwrap();
   
       let m_order_schema = DFSchema::try_from_qualified_schema(
           "m_orders",
           batch.schema().as_ref()
       ).unwrap();
   
       let t_order_schema = DFSchema::try_from_qualified_schema(
           "t_orders",
           batch.schema().as_ref()
       ).unwrap();
   
       let table = MemTable::try_new(batch.schema(), vec![vec![batch]]).unwrap();
   
       let record_count = ctx.read_table(Arc::new(table)).unwrap();
   
       let result = record_count
           .select(vec![col("Month"),
               col("Total Orders"),
               col("Missed Orders"),
                        (col("Missed Orders").cast_to(&DataType::Float64, &m_order_schema).unwrap() / col("Total Orders").cast_to(&DataType::Float64, &t_order_schema).unwrap()).alias("Service Level")]).unwrap();
   
       result.show().await.unwrap();
   
   }
   ```
   
   And when I ran that code, it appears to produce the output you expected:
   ```
   Starting tests
   +------------+--------------+---------------+---------------+
   | Month      | Total Orders | Missed Orders | Service Level |
   +------------+--------------+---------------+---------------+
   | 1972-09-27 | 10           | 3             | 0.3           |
   +------------+--------------+---------------+---------------+
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org



[GitHub] [arrow-datafusion] praveentiru commented on issue #879: [Help] - Floating point operations with Integer columns

Posted by GitBox <gi...@apache.org>.
praveentiru commented on issue #879:
URL: https://github.com/apache/arrow-datafusion/issues/879#issuecomment-933126461


   @houqp I am working in Rust directly. I will try the code from @alamb and get back. I could not investigate over weekend.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org