You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2022/06/17 16:55:03 UTC

[GitHub] [arrow] arjunsr1 opened a new issue, #13396: Is the Arrow::Table.merge function in a working state?

arjunsr1 opened a new issue, #13396:
URL: https://github.com/apache/arrow/issues/13396

   I'm trying to use the merge function in table.rb (Line 358) and it's not giving me the functionality I am expecting. From what I interpreted, the function should take the table that is passed in as a parameter, and append the rows to the original table. However, I tried to use this function where the table that is calling the function had 53 rows, and the parameter table had 1 row, and the resulting table had only 1 row instead of what I assumed should be 54 rows. I also noticed the TODO comment above the method.
   
   Is this function a work in progress?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
kou commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162676527

   Apache Parquet doesn't support "seconds" as unit: https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp
   So "seconds" timestamp type is converted to "milliseconds" timestamp type when we write Apache Arrow data as Apache Parquet.
   
   We can cast type by the following:
   
   ```ruby
   s3_existing_table.merge(
     "History Completed Time" => Arrow::ChunkedArray.new(s3_existing_table["History Completed Time"].data.chunks.collect {|chunk| chunk.cast(Arrow::TimestampDataType.new(:second))},
     "History Created Time" => Arrow::ChunkedArray.new(s3_existing_table["History Created Time"].data.chunks.collect {|chunk| chunk.cast(Arrow::TimestampDataType.new(:second))}
   )
   ```
   
   But this it too inconvenient... I'll add convenient APIs to write something like the following:
   
   ```ruby
   s3_existing_table.merge(
     "History Completed Time" => s3_existing_table["History Completed Time"].cast(unit: :second)),
     "History Created Time" => s3_existing_table["History Created Time"].cast(unit: :second)),
   )
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
kou commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1163596808

   Could you use the former code in https://github.com/apache/arrow/issues/13396#issuecomment-1162676527 for now?
   We'll release a new version in 2022-07 or 2022-08.
   See also: Preparing for version 9.0.0 release https://lists.apache.org/thread/8b7yyzgmtb6mq7od0jbntvfflm0vv72o


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
kou commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162318465

   Ah, sorry. Could you try `s3_existing_table.concatenate([table])`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162421271

   Hi @kou - through using my debugger and stepping through code, it seems that when I save an arrow table with schema fields
   ```
   History Completed Time: timestamp[s]
   History Created Time: timestamp[s]
   ```
   to a tempfile, then use arrow::Table.load() to get the file data into an Arrow table, the type changes from s to ms. Is there any quick workaround to this issue?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 closed issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 closed issue #13396: Is the Arrow::Table.merge function in a working state?
URL: https://github.com/apache/arrow/issues/13396


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1163318378

   @kou - thanks so much for getting back to me and introducing this functionality. This will help a lot for our application! Is there a new gem version now that I must update? Has this change gone live?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1209701761

   Hello @kou - revisiting the same problem, I now noticed that 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
kou commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162803044

   With https://github.com/apache/arrow/pull/13418, we can write it as:
   
   ```ruby
   s3_existing_table.merge(
     "History Completed Time" => s3_existing_table["History Completed Time"].cast({type: :timestamp, unit: :second}),
     "History Created Time" => s3_existing_table["History Created Time"].cast({type: :timestamp, unit: :second}),
   )
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162355789

   Hi @kou - it seems like `s3_existing_table.concatenate([table])` should work fine. However, I'm getting an error that says `Invalid schema at index 1 was different`. The steps I'm taking are as follows:
   
   - Construct a table based on a CSV file (variable is called tempfile) using `table = Arrow::Table.load(tempfile.path)`
   - Pull a parquet file from S3 into a temporary file location (A parquet file that was generated by previously calling `Arrow::Table.save()` on an arrow table of the same type).
   - Call `s3_existing_table = Arrow::Table.load(_temporary_file_path_)`
   - Finally, try to merge tables by doing `consolidated_table = s3_existing_table.concatenate([table])`
   
   The only schema changes I can verify via diffchecker are as follows
   `
   History Completed Time: timestamp[ms]
   History Created Time: timestamp[ms]
   vs
   History Completed Time: timestamp[s]
   History Created Time: timestamp[s]
   `
   (Rest of the schema ommitted since there are no changes in the rest)
   Apologies if this goes beyond the scope of what you know, but do you think there could be some schema change issue that occurs when saving arrow tables as parquet files?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1163632271

   This works- thanks! I'll close the issue then


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 closed issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 closed issue #13396: Is the Arrow::Table.merge function in a working state?
URL: https://github.com/apache/arrow/issues/13396


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162086123

   I tried to do something like this:
   
   `consolidated_table = Arrow::Table.concatenate(s3_existing_table, table)` and I got an error as follows:
   
   `Caused by NoMethodError: undefined method `concatenate' for Arrow::Table:Class`
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
kou commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1159465522

   It's for merging "columns" not "rows".
   You can use `Arrow::Table#concatenate(table1, ...)` for appending rows.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] arjunsr1 commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
arjunsr1 commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162298341

   I'm getting this error now:  `Caused by TypeError: no implicit conversion of Arrow::Table into Array`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


[GitHub] [arrow] kou commented on issue #13396: Is the Arrow::Table.merge function in a working state?

Posted by GitBox <gi...@apache.org>.
kou commented on issue #13396:
URL: https://github.com/apache/arrow/issues/13396#issuecomment-1162289915

   Could you try `s3_existing_table.concatenate(table)`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org