You are viewing a plain text version of this content. The canonical link for it is here.

Posted to github@arrow.apache.org by "joosthooz (via GitHub)" <gi...@apache.org> on 2023/02/13 18:38:56 UTC

[GitHub] [arrow] joosthooz opened a new pull request, #34168: GH-33880: [C++] Improve I/O tracing

joosthooz opened a new pull request, #34168:
URL: https://github.com/apache/arrow/pull/34168

   <!--
   Thanks for opening a pull request!
   If this is your first pull request you can find detailed information on how 
   to contribute here:
     * [New Contributor's Guide](https://arrow.apache.org/docs/dev/developers/guide/step_by_step/pr_lifecycle.html#reviews-and-merge-of-the-pull-request)
     * [Contributing Overview](https://arrow.apache.org/docs/dev/developers/overview.html)
   
   
   If this is not a [minor PR](https://github.com/apache/arrow/blob/master/CONTRIBUTING.md#Minor-Fixes). Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose
   
   Opening GitHub issues ahead of time contributes to the [Openness](http://theapacheway.com/open/#:~:text=Openness%20allows%20new%20users%20the,must%20happen%20in%20the%20open.) of the Apache Arrow project.
   
   Then could you also rename the pull request title in the following format?
   
       GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   or
   
       MINOR: [${COMPONENT}] ${SUMMARY}
   
   In the case of PARQUET issues on JIRA the title also supports:
   
       PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}
   
   -->
   
   ### Rationale for this change
   The structure of the opentelemetry traces have changed significantly recently, this PR aims to improve some things that were left out of scope at that time. Specifically this PR aims to improve the tracing of the I/O side of things. We'll add spans for low-level I/O calls (e.g. `write`) and portions of the Parquet writer and Datasetwriter. The goal is to be able to distinguish work performed on the I/O threadpool versus the normal threadpool and see how much time was actually spent on I/O or on (de/en)coding and (de)compression.
   <!--
    Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed.
    Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes.  
   -->
   
   ### What changes are included in this PR?
   
   <!--
   There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR.
   -->
   
   ### Are these changes tested?
   This is a very early WIP
   <!--
   We typically require tests for all PRs in order to:
   1. Prevent the code from being accidentally broken by subsequent changes
   2. Serve as another way to document the expected behavior of the code
   
   If tests are not included in your PR, please explain why (for example, are they covered by existing tests)?
   -->
   
   ### Are there any user-facing changes?
   Arrow releases do not have telemetry enabled so these changes should not affect normal users. But users that are interested in the performance of Acero will likely want to use the telemetry. For these users, the structure and details of the spans will likely be changed by this PR. The current documentation of the telemetry does not include any details about this yet.
   <!--
   If there are user-facing changes then we may require documentation to be updated before approving the PR.
   -->
   
   <!--
   If there are any breaking changes to public APIs, please uncomment the line below and explain which changes are breaking.
   -->
   <!-- **This PR includes breaking changes to public APIs.** -->
   
   <!--
   Please uncomment the line below (and provide explanation) if the changes fix either (a) a security vulnerability, (b) a bug that caused incorrect or invalid data to be produced, or (c) a bug that causes a crash (even when the API contract is upheld). We use this to highlight fixes to issues that may affect users without their knowledge. For this reason, fixing bugs that cause errors don't count, since those are usually obvious.
   -->
   <!-- **This PR contains a "Critical Fix".** -->


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #34168: GH-33880: [C++] Improve I/O tracing

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #34168:
URL: https://github.com/apache/arrow/pull/34168#issuecomment-1428464712

   * Closes: #33880


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] joosthooz commented on pull request #34168: GH-33880: [C++] Improve I/O tracing

Posted by "joosthooz (via GitHub)" <gi...@apache.org>.

joosthooz commented on PR #34168:
URL: https://github.com/apache/arrow/pull/34168#issuecomment-1576637148

   > What's the status of this patch? Can the patch be review now?
   
   @mapleFU Sorry for the delayed response, and thanks for checking in. I didn't have time to work on this but I'll try to get this out of draft by the end of the week!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] westonpace commented on a diff in pull request #34168: GH-33880: [C++] Improve I/O tracing

Posted by "westonpace (via GitHub)" <gi...@apache.org>.

westonpace commented on code in PR #34168:
URL: https://github.com/apache/arrow/pull/34168#discussion_r1270799408


##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -202,7 +207,18 @@ class DatasetWriterFileQueue {
   Status Push(std::shared_ptr<RecordBatch> batch) {
     uint64_t delta_staged = batch->num_rows();
     rows_currently_staged_ += delta_staged;
-    staged_batches_.push_back(std::move(batch));
+    {
+      util::tracing::Span span;
+      START_SPAN(span, "DatasetWriter::Push",
+                 {{"batch.size_rows", batch->num_rows()},
+                  {"rows_currently_staged", rows_currently_staged_},
+                  // staged_rows_count is updated at the end, after this push and possibly
+                  // multiple pops
+                  //        {"staged_rows_count", writer_state_->staged_rows_count},

Review Comment:
   ```suggestion
   ```



##########
cpp/src/arrow/csv/reader.cc:
##########
@@ -911,9 +964,25 @@ class StreamingReaderImpl : public ReaderMixin,
     auto rb_gen = MakeMappedGenerator(std::move(parsed_block_gen), std::move(decoder_op));
 
     auto self = shared_from_this();
-    return rb_gen().Then([self, rb_gen, max_readahead](const DecodedBlock& first_block) {
-      return self->InitFromBlock(first_block, std::move(rb_gen), max_readahead, 0);
+    auto init_finished = rb_gen().Then([self, rb_gen, max_readahead
+#ifdef ARROW_WITH_OPENTELEMETRY
+                                        ,
+                                        init_span = std::move(init_span),
+                                        read_span = std::move(read_span)
+#endif
+    ](const DecodedBlock& first_block) {
+      auto fut = self->InitFromBlock(first_block, std::move(rb_gen), max_readahead, 0);
+#ifdef ARROW_WITH_OPENTELEMETRY
+      opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> raw_span =
+          ::arrow::internal::tracing::UnwrapSpan(read_span.details.get());
+      raw_span->SetAttribute("batch.size_bytes",
+                             util::TotalBufferSize(*first_block.record_batch));
+#endif

Review Comment:
   Why not `ATTRIBUTE_ON_CURRENT_SPAN`?



##########
cpp/src/arrow/csv/reader.cc:
##########
@@ -456,6 +463,23 @@ class BlockDecodingOperator {
     auto bytes_parsed_or_skipped = block.bytes_parsed_or_skipped;
     auto decoded_arrays_fut = All(std::move(decoded_array_futs));
     auto state = state_;
+#ifdef ARROW_WITH_OPENTELEMETRY
+    opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> raw_span =
+        ::arrow::internal::tracing::UnwrapSpan(span.details.get());
+    return decoded_arrays_fut.Then(
+        [state, bytes_parsed_or_skipped, raw_span](
+            const std::vector<Result<std::shared_ptr<Array>>>& maybe_decoded_arrays)
+            -> Result<DecodedBlock> {
+          ARROW_ASSIGN_OR_RAISE(auto decoded_arrays,
+                                arrow::internal::UnwrapOrRaise(maybe_decoded_arrays));
+
+          ARROW_ASSIGN_OR_RAISE(auto batch,
+                                state->DecodedArraysToBatch(std::move(decoded_arrays)));
+          raw_span->SetAttribute("arrow.csv.output_batch_size_bytes",
+                                 util::TotalBufferSize(*batch));
+          return DecodedBlock{std::move(batch), bytes_parsed_or_skipped};
+        });

Review Comment:
   I think there is too much duplicated code in these alternate paths. Is there any way we can just pass `span` to the callback's capture list and use `ATTRIBUTE_ON_CURRENT_SPAN`?  Also, what happened to `END_SPAN_ON_FUTURE_COMPLETION`?



##########
cpp/src/arrow/util/async_util.cc:
##########
@@ -168,6 +146,25 @@ class AsyncTaskSchedulerImpl : public AsyncTaskScheduler {
     if (IsAborted()) {
       return false;
     }
+#ifdef ARROW_WITH_OPENTELEMETRY
+    // Wrap the task to propagate a parent tracing span to it
+    struct WrapperTask : public Task {
+      WrapperTask(
+          std::unique_ptr<Task> target,
+          opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> parent_span)
+          : target(std::move(target)), parent_span(std::move(parent_span)) {}
+      Result<Future<>> operator()() override {
+        auto scope = arrow::internal::tracing::GetTracer()->WithActiveSpan(parent_span);
+        return (*target)();
+      }
+      int cost() const override { return target->cost(); }
+      std::string_view name() const override { return target->name(); }
+      std::unique_ptr<Task> target;
+      opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> parent_span;
+    };
+    task = std::make_unique<WrapperTask>(
+        std::move(task), arrow::internal::tracing::GetTracer()->GetCurrentSpan());
+#endif

Review Comment:
   I'm not sure this works the same.  `task` here is perhaps a bit of a misnomer.  It returns a future which does the actual work.  So you've wrapped the "submission" part of a task which is normally not the very interesting part (usually it just schedules the task on an executor and returns the future).
   
   Before, I was starting the task here and then marking it finished when the task's future actually completes.  So it was wrapping both the submission and the task itself.



##########
cpp/src/parquet/arrow/reader.cc:
##########
@@ -286,13 +287,21 @@ class FileReaderImpl : public FileReader {
     std::string phys_type =
         TypeToString(reader_->metadata()->schema()->Column(i)->physical_type());
     ::arrow::util::tracing::Span span;
-    START_SPAN(span, "parquet::arrow::read_column",
+    START_SPAN(span, "parquet::arrow::ReadColumn",
                {{"parquet.arrow.columnindex", i},
                 {"parquet.arrow.columnname", column_name},
                 {"parquet.arrow.physicaltype", phys_type},
-                {"parquet.arrow.records_to_read", records_to_read}});
-#endif
+                { "parquet.arrow.records_to_read",
+                  records_to_read }});
+
+    auto status = reader->NextBatch(records_to_read, out);
+
+    uint64_t size_bytes = ::arrow::util::TotalBufferSize(*out->get());
+    ATTRIBUTE_ON_CURRENT_SPAN("parquet.arrow.output_batch_size_bytes", size_bytes);

Review Comment:
   Why do we need to guard `ATTRIBUTE_ON_CURRENT_SPAN` with an ifdef?  Couldn't we have left the ifdef where it was and moved `::arrow::util::TotalBufferSize` inside the macro?



##########
cpp/src/arrow/util/async_util.cc:
##########
@@ -306,7 +313,10 @@ class ThrottledAsyncTaskSchedulerImpl
     std::optional<Future<>> maybe_backoff = throttle_->TryAcquire(latched_cost);
     if (maybe_backoff) {
 #ifdef ARROW_WITH_OPENTELEMETRY
-      TraceTaskQueued(task.get(), span());
+      EVENT(span(), "Task submission throttled",
+            {{"task.name", ::opentelemetry::nostd::string_view(task->name().data(),
+                                                               task->name().size())},
+             {"task.cost", task->cost()}});

Review Comment:
   `span()` is the scheduler's span.  This puts an event on the scheduler's span that submission was throttled.  I don't think that's terribly interesting.
   
   What I was aiming for with `TraceTaskQueued` was to actually start the task's span.  Then, later, in `TraceTaskSubmitted`, it would register a "task submitted" event.
   
   So a throttled task (that is queued for 1 time unit and takes 2 time units to run) would look something like this...
   
   ```
   T+0 Task Span Begins
   T+1 Task Submitted
   T+3 Task Ends
   ```
   
   A non-throttled task would look something like:
   
   ```
   T+0 Task Begins
   T+2 Task Ends
   ```



##########
cpp/src/arrow/util/async_util.cc:
##########
@@ -168,6 +146,25 @@ class AsyncTaskSchedulerImpl : public AsyncTaskScheduler {
     if (IsAborted()) {
       return false;
     }
+#ifdef ARROW_WITH_OPENTELEMETRY
+    // Wrap the task to propagate a parent tracing span to it
+    struct WrapperTask : public Task {
+      WrapperTask(
+          std::unique_ptr<Task> target,
+          opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> parent_span)
+          : target(std::move(target)), parent_span(std::move(parent_span)) {}
+      Result<Future<>> operator()() override {
+        auto scope = arrow::internal::tracing::GetTracer()->WithActiveSpan(parent_span);
+        return (*target)();
+      }
+      int cost() const override { return target->cost(); }
+      std::string_view name() const override { return target->name(); }
+      std::unique_ptr<Task> target;
+      opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> parent_span;
+    };
+    task = std::make_unique<WrapperTask>(
+        std::move(task), arrow::internal::tracing::GetTracer()->GetCurrentSpan());

Review Comment:
   Note that I was intentionally not using `GetCurrentSpan` as the parent.  We are submitting a new task here.  I would like that new task to appear as its own span and not as a child of the thing that submitted it.  For example, in Acero, in most plans, the source node is the one that schedules all the tasks.  If we use `GetCurrentSpan` here then it will look like all the work being done is a part of the source node.



##########
docs/source/cpp/opentelemetry.rst:
##########
@@ -85,3 +85,88 @@ at http://localhost:16686. Note that unlike with other methods of exporting
 traces, no output will be made to stdout/stderr. However, if you tail your
 Docker container logs, you should see output when traces are received by the
 all-in-one container.
+
+Note that the volume of spans produced by Acero can quickly become overwhelming
+for many tracing frameworks. Several spans are produced per input 
+file, input batch, internal chunk of data (called Morsel, consisting of 128k 
+rows by default) and per output file (possibly also divided by columns).
+In practice, this means that for each MB of data processed by Acero, it will
+produce 10 - 20 spans. Choose a suitably sized dataset that strikes a balance
+between being representative for the real-world workload, but not too large to 
+be inspected with (or even ingested by!) a span visualizer such as Jaeger.
+
+Additional background on tracing
+--------------------------------
+Traces produced by Acero are conceptually similar to information produced by
+using profiling tools, but they are not the same.
+For example, the spans by Acero do not necessarily follow the structure of the 
+code, like in case of the call-stacks and flame-graphs produced by profiling.
+The spans aim to highlight:
+- code sections that are performing significant work on the CPU
+- code sections that perform I/O operations (reading/writing to disk)
+- The way blocks of data flow through the execution graph
+- The way data is being reorganized (e.g. a file being split into blocks)
+Each span instance can have various attributes added to it when it is created.
+This allows us to capture the exact size of each block of data and the amount
+of time each node in the execution graph has spent on it.
+
+Span hierarchy
+----------------------
+Traces are organized in a hierarchical fashion, where each span except the root
+span has parents and can have any number of children.
+If a span has a child span active during its lifetime, this usually means that
+this parent span is not actually in control of the CPU. Thus, calculating the 
+total CPU time is not as easy as adding up all of the span durations; only the
+time that a span does not have any active children (this is often referred to 
+as the "self-time") should count.
+However, Acero is a multi-threaded engine, so it is likely that there are
+in fact multiple spans performing work on a CPU at any given time!
+
+To achieve this multi-threaded behavior, many sections of code are
+executed through a task scheduling mechanism. When these tasks are scheduled,
+they can start execution immediately or some time in the future.
+Often, a certain span is active that represents the lifetime of some resource
+(like a scanner, but also a certain batch of data) that functions as the parent
+of a set of spans where actual compute happens.
+Care must be taken when aggregating the durations of these spans.
+
+Structure of Acero traces
+-------------------------
+Acero traces are structured to allow following pieces of data as they flow
+through the graph. Each node's function (a kernel) is represented as a child
+span of the preceding node.
+Acero uses "Morsel-driven parallelism" where batches of data called "morsels" 
+flow through the graph. 
+The morsels are produced by e.g. a DatasetScanner.
+First, the DatasetScanner reads files (called Fragments) into Batches. 
+Depending on the size of the fragments it will produce several Batches per 
+Fragment.
+Then, it may slice the Batches so they do conform to the maximum size of a 
+morsel.
+Each morsel has a toplevel span called ProcessMorsel.
+Currently, the DatasetScanner cannot connect its output to the ProcessMorsel 
+spans due to the asynchronous structure of the code.
+The dataset writer will gather batches of data in its staging area, and will 
+issue a write operation once it has enough rows.
+This is represented by the DatasetWriter::Push and DatasetWriter::Pop spans.
+These also carry the current fill level of the staging area.
+This means that some Morsels will not trigger a write.
+Only if a morsel causes the staging area to overflow its threshold,
+a DatasetWriter::Pop is triggered that will perform a write operation.

Review Comment:
   Acero has a task scheduler.  I've had the most luck when I structured spans in this way:
   
    * There is one span for the plan itself that starts when `StartProducing` is called and ends when the plan finishes.
    * Within that span there is a span for the scheduler.  It should have more or less the same lifetime (these top two spans can be combined if desired)
    * As a child of that each thread task has its own span.
    * Various operations (filtering / projection / etc.) are children of their thread task and may be nested as needed (e.g. a filtering span may have child spans for the expression evaluation)
   
   This is sort of what I had in `async_util.cc` but I think these changes have altered that.



##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -202,7 +207,18 @@ class DatasetWriterFileQueue {
   Status Push(std::shared_ptr<RecordBatch> batch) {
     uint64_t delta_staged = batch->num_rows();
     rows_currently_staged_ += delta_staged;
-    staged_batches_.push_back(std::move(batch));
+    {
+      util::tracing::Span span;
+      START_SPAN(span, "DatasetWriter::Push",
+                 {{"batch.size_rows", batch->num_rows()},
+                  {"rows_currently_staged", rows_currently_staged_},
+                  // staged_rows_count is updated at the end, after this push and possibly
+                  // multiple pops

Review Comment:
   I'm not sure I understand this comment



##########
cpp/src/arrow/util/async_util.cc:
##########
@@ -294,6 +282,25 @@ class ThrottledAsyncTaskSchedulerImpl
   }
 
   bool AddTask(std::unique_ptr<Task> task) override {
+#ifdef ARROW_WITH_OPENTELEMETRY
+    // Wrap the task to propagate a parent tracing span to it
+    struct WrapperTask : public Task {
+      WrapperTask(
+          std::unique_ptr<Task> target,
+          opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> parent_span)
+          : target(std::move(target)), parent_span(std::move(parent_span)) {}
+      Result<Future<>> operator()() override {
+        auto scope = arrow::internal::tracing::GetTracer()->WithActiveSpan(parent_span);
+        return (*target)();
+      }
+      int cost() const override { return target->cost(); }
+      std::string_view name() const override { return target->name(); }
+      std::unique_ptr<Task> target;
+      opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> parent_span;
+    };
+    task = std::make_unique<WrapperTask>(
+        std::move(task), arrow::internal::tracing::GetTracer()->GetCurrentSpan());
+#endif

Review Comment:
   This is exactly the same `WrapperTask` implementation as above.  Can we remove the duplication?



##########
cpp/src/arrow/csv/reader.cc:
##########
@@ -911,9 +964,25 @@ class StreamingReaderImpl : public ReaderMixin,
     auto rb_gen = MakeMappedGenerator(std::move(parsed_block_gen), std::move(decoder_op));
 
     auto self = shared_from_this();
-    return rb_gen().Then([self, rb_gen, max_readahead](const DecodedBlock& first_block) {
-      return self->InitFromBlock(first_block, std::move(rb_gen), max_readahead, 0);
+    auto init_finished = rb_gen().Then([self, rb_gen, max_readahead
+#ifdef ARROW_WITH_OPENTELEMETRY
+                                        ,
+                                        init_span = std::move(init_span),
+                                        read_span = std::move(read_span)
+#endif

Review Comment:
   Why is this `ifdef` block needed?  If tracing is disabled then `init_span` and `read_span` are an empty unique_ptr.  Copying it around should be trivial.



##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -611,12 +639,14 @@ class DatasetWriter::DatasetWriterImpl {
       backpressure =
           writer_state_.rows_in_flight_throttle.Acquire(next_chunk->num_rows());
       if (!backpressure.is_finished()) {
+        EVENT(scheduler_->span(), "DatasetWriter::Backpressure::TooManyRowsQueued");

Review Comment:
   Why both `EVENT` and `EVENT_ON_CURRENT_SPAN`?



##########
cpp/src/arrow/dataset/dataset_writer.cc:
##########
@@ -233,6 +249,18 @@ class DatasetWriterFileQueue {
     return DeferNotOk(options_.filesystem->io_context().executor()->Submit(
         [self = this, batch = std::move(next)]() {
           int64_t rows_to_release = batch->num_rows();
+#ifdef ARROW_WITH_OPENTELEMETRY
+          uint64_t size_bytes = util::TotalBufferSize(*batch);
+          uint64_t num_buffers = 0;
+          for (auto column : batch->columns()) {
+            num_buffers += column->data()->buffers.size();
+          }
+          util::tracing::Span span;
+          START_SPAN(span, "DatasetWriter::WriteNext",
+                     {{"threadpool", "IO"},
+                      {"batch.size_bytes", size_bytes},
+                      {"batch.num_buffers", num_buffers}});

Review Comment:
   Why do I need to know `num_buffers`?



##########
cpp/src/arrow/util/io_util.cc:
##########
@@ -1628,6 +1629,8 @@ Result<int64_t> FileRead(int fd, uint8_t* buffer, int64_t nbytes) {
   HANDLE handle = reinterpret_cast<HANDLE>(_get_osfhandle(fd));
 #endif
   int64_t total_bytes_read = 0;
+  util::tracing::Span span;
+  START_SPAN(span, "FileRead", {{"fd", fd}});

Review Comment:
   This is very deep in the I/O & filesystem hierarchy. Maybe it would be better to put these spans in `LocalFileSystem` instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] joosthooz commented on pull request #34168: GH-33880: [C++] Improve I/O tracing

Posted by "joosthooz (via GitHub)" <gi...@apache.org>.

joosthooz commented on PR #34168:
URL: https://github.com/apache/arrow/pull/34168#issuecomment-1638313332

@mapleFU I think this is ready for some feedback. This PR requires some additional context. I've focused mostly on the Dataset writers and readers, and I think that some of the "normal" readers/writers are still not yet instrumented with spans. I think we should leave that out of scope. As noted in the description above, this PR is a follow-up to a previous PR that changed the structure of traces. Before, there were top-level spans for each graph node, and each item of data would have a child span underneath each of those.
Here, the goal is to use spans' parent-child relation to show how morsels flow through the graph:
![image](https://github.com/apache/arrow/assets/1442581/ae897152-7276-4223-92c7-0d19bae78fea)
These morsels are produced by the readers:
![image](https://github.com/apache/arrow/assets/1442581/384ca71b-e733-40c9-93d1-16aca3afd743)
But because of how the code is written, I have not been able to connect these together. In the ideal world, there would be a connection between ReadNextAsync -> MakeChunkedBatchGenerator -> ProcessMorsel. But that would require more invasive code changes (passing along a span context with an ExecBatch, for example).
![image](https://github.com/apache/arrow/assets/1442581/891ba9cd-c0b2-4536-9015-bfc23c5b1881)
What I'd still like to do is somehow add an attribute to certain spans whose duration can be aggregated to get some sensible total (like CPU time spent on the read side, each kernel, and the write side of things). And then I've been working on a notebook to visualize a trace into a per-thread activity overview, it would be nice if we can add something like that but I'm not sure how.
Looking forward to any feedback

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] mapleFU commented on pull request #34168: GH-33880: [C++] Improve I/O tracing

Posted by "mapleFU (via GitHub)" <gi...@apache.org>.

mapleFU commented on PR #34168:
URL: https://github.com/apache/arrow/pull/34168#issuecomment-1500873638

   What's the status of this patch? Can the patch be review now?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org

[GitHub] [arrow] github-actions[bot] commented on pull request #34168: GH-33880: [C++] Improve I/O tracing

Posted by "github-actions[bot] (via GitHub)" <gi...@apache.org>.

github-actions[bot] commented on PR #34168:
URL: https://github.com/apache/arrow/pull/34168#issuecomment-1428464773

   :warning: GitHub issue #33880 **has been automatically assigned in GitHub** to PR creator.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org