You are viewing a plain text version of this content. The canonical link for it is here.
Posted to commits@arrow.apache.org by "prasburst (via GitHub)" <gi...@apache.org> on 2024/02/11 01:15:23 UTC

[PR] Update powered_by.md [arrow-site]

prasburst opened a new pull request, #474:
URL: https://github.com/apache/arrow-site/pull/474

   Included details about iceburst.io


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add iceburst to powered by list [arrow-site]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on code in PR #474:
URL: https://github.com/apache/arrow-site/pull/474#discussion_r1485483982


##########
powered_by.md:
##########
@@ -129,6 +129,10 @@ short description of your use case.
   natural language processing, and tabular tasks. Dataset objects are wrappers around 
   Arrow Tables and memory-mapped from disk to support out-of-core parallel processing 
   for machine learning workflows.
+* **[iceburst][53]:** A real-time data lake for monitoring and security built 
+  directly on top of Amazon S3. Our approach is simple: ingest the OpenTelemetry data in an S3 bucket as
+  Parquet files in Iceberg table format and query them using DuckDB with milliseond retrieval and zero egress cost.
+  Parquet is converted to Arrow format in-memory enhancing both speed and efficiency.

Review Comment:
   Is this done by DuckDB or iceburst? If you mean that DuckDB does it, it may be wrong. I think that DuckDB doesn't use Apache Arrow as its internal data format.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add iceburst to powered by list [arrow-site]

Posted by "prasburst (via GitHub)" <gi...@apache.org>.
prasburst commented on PR #474:
URL: https://github.com/apache/arrow-site/pull/474#issuecomment-1937443197

   Hi,
   
   This is done by iceburst which is one of our core value proposition.
   
   Hope this clarifies the question.
   
   Let me know if I can help with any additional information.
   
   Thanks and regards,
   Prasanna.
   
   Sent from Outlook for iOS<https://aka.ms/o0ukef>
   ________________________________
   From: Sutou Kouhei ***@***.***>
   Sent: Saturday, February 10, 2024 10:07:09 PM
   To: apache/arrow-site ***@***.***>
   Cc: prasburst ***@***.***>; Author ***@***.***>
   Subject: Re: [apache/arrow-site] Add iceburst to powered by list (PR #474)
   
   
   @kou commented on this pull request.
   
   ________________________________
   
   In powered_by.md<https://github.com/apache/arrow-site/pull/474#discussion_r1485483982>:
   
   > @@ -129,6 +129,10 @@ short description of your use case.
      natural language processing, and tabular tasks. Dataset objects are wrappers around
      Arrow Tables and memory-mapped from disk to support out-of-core parallel processing
      for machine learning workflows.
   +* **[iceburst][53]:** A real-time data lake for monitoring and security built
   +  directly on top of Amazon S3. Our approach is simple: ingest the OpenTelemetry data in an S3 bucket as
   +  Parquet files in Iceberg table format and query them using DuckDB with milliseond retrieval and zero egress cost.
   +  Parquet is converted to Arrow format in-memory enhancing both speed and efficiency.
   
   
   Is this done by DuckDB or iceburst? If you mean that DuckDB does it, it may be wrong. I think that DuckDB doesn't use Apache Arrow as its internal data format.
   
   —
   Reply to this email directly, view it on GitHub<https://github.com/apache/arrow-site/pull/474#pullrequestreview-1874307769>, or unsubscribe<https://github.com/notifications/unsubscribe-auth/BB5Q3G3G3I52NQQILIWX233YTBNY3AVCNFSM6AAAAABDDD7XTGVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMYTQNZUGMYDONZWHE>.
   You are receiving this because you authored the thread.Message ID: ***@***.***>
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add iceburst to powered by list [arrow-site]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou commented on PR #474:
URL: https://github.com/apache/arrow-site/pull/474#issuecomment-1937446680

   Does iceburst use DuckDB's Arrow integration feature https://duckdb.org/2021/12/03/duck-arrow.html ?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add iceburst to powered by list [arrow-site]

Posted by "prasburst (via GitHub)" <gi...@apache.org>.
prasburst commented on PR #474:
URL: https://github.com/apache/arrow-site/pull/474#issuecomment-1937474600

   Yes, a lot of work is made easy because of the zero copy integration. 
   
   We export the query results to an Arrow table using the `arrow` function. Some cases, especially on aggregation queries made using the relational API of DuckDB, we use the `to_arrow_table` function to export the query results and save everything in Arrow format in-memory.
   
   Here's a reference to Arrow export: https://duckdb.org/docs/guides/python/export_arrow


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org


Re: [PR] Add iceburst to powered by list [arrow-site]

Posted by "kou (via GitHub)" <gi...@apache.org>.
kou merged PR #474:
URL: https://github.com/apache/arrow-site/pull/474


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org