You are viewing a plain text version of this content. The canonical link for it is here.
Posted to github@arrow.apache.org by GitBox <gi...@apache.org> on 2021/12/17 21:10:46 UTC

[GitHub] [arrow] westonpace commented on a change in pull request #11964: ARROW-15067: [C++] Add tracing spans to the scanner

westonpace commented on a change in pull request #11964:
URL: https://github.com/apache/arrow/pull/11964#discussion_r771693311



##########
File path: cpp/src/arrow/util/tracing_internal.h
##########
@@ -97,6 +98,57 @@ AsyncGenerator<T> WrapAsyncGenerator(AsyncGenerator<T> wrapped,
     return fut;
   };
 }
+
+/// \brief Start a new span for each invocation of a generator.
+///
+/// The parent span of the new span will be the currently active span
+/// (if any) as of when WrapAsyncGenerator was itself called.
+template <typename T>
+AsyncGenerator<T> WrapAsyncGenerator(AsyncGenerator<T> wrapped,
+                                     const std::string& span_name) {
+  opentelemetry::trace::StartSpanOptions options;
+  options.parent = GetTracer()->GetCurrentSpan()->GetContext();
+  return WrapAsyncGenerator(std::move(wrapped), std::move(options), span_name);
+}
+
+/// \brief End the given span when the given async generator ends.
+///
+/// The span will be made the active span each time the generator is called.
+template <typename T>
+AsyncGenerator<T> TieSpanToAsyncGenerator(
+    AsyncGenerator<T> wrapped,
+    opentelemetry::nostd::shared_ptr<opentelemetry::trace::Span> span) {
+  return [=]() mutable -> Future<T> {
+    auto scope = GetTracer()->WithActiveSpan(span);
+    auto fut = wrapped();
+    fut.AddCallback([span](const Result<T>& result) {
+      if (!result.ok() || IsIterationEnd(*result)) {
+        MarkSpan(result.status(), span.get());
+        span->End();
+      }
+    });
+    return fut;

Review comment:
       Maybe use `Then`?  The timing can't really be guaranteed otherwise.  Consuming callbacks could run before this callback does.  But I suppose that isn't too critical.  I don't have enough headspace around spans yet to know if we need to worry about the timing (e.g. will the consumer start a new active span and then this one ends that the wrong one?)

##########
File path: cpp/src/arrow/dataset/file_csv.cc
##########
@@ -171,11 +177,20 @@ static inline Future<std::shared_ptr<csv::StreamingReader>> OpenReaderAsync(
       }));
   return reader_fut.Then(
       // Adds the filename to the error
-      [](const std::shared_ptr<csv::StreamingReader>& reader)
-          -> Result<std::shared_ptr<csv::StreamingReader>> { return reader; },
-      [source](const Status& err) -> Result<std::shared_ptr<csv::StreamingReader>> {
-        return err.WithMessage("Could not open CSV input source '", source.path(),
-                               "': ", err);
+      [=](const std::shared_ptr<csv::StreamingReader>& reader)
+          -> Result<std::shared_ptr<csv::StreamingReader>> {
+#ifdef ARROW_WITH_OPENTELEMETRY
+        span->SetStatus(opentelemetry::trace::StatusCode::kOk);

Review comment:
       This seems peculiar.  Wouldn't OT assume a span that ended without error was ok?

##########
File path: cpp/src/arrow/dataset/file_csv.cc
##########
@@ -276,7 +291,12 @@ Result<RecordBatchGenerator> CsvFileFormat::ScanBatchesAsync(
   auto source = file->source();
   auto reader_fut =
       OpenReaderAsync(source, *this, scan_options, ::arrow::internal::GetCpuThreadPool());
-  return GeneratorFromReader(std::move(reader_fut), scan_options->batch_size);
+  auto generator = GeneratorFromReader(std::move(reader_fut), scan_options->batch_size);
+#ifdef ARROW_WITH_OPENTELEMETRY
+  generator = arrow::internal::tracing::WrapAsyncGenerator(
+      std::move(generator), "arrow::dataset::CsvFileFormat::ScanBatchesAsync::Next");

Review comment:
       Having `FileFormat::ScanBatchesAsync::Next` and `CsvFileFormat::ScanBatchesAsync::Next` seems a little redundant. I wouldn't expect there to be much difference in the two.  Am I missing something?

##########
File path: cpp/src/arrow/dataset/file_csv.cc
##########
@@ -148,9 +149,14 @@ static inline Result<csv::ReadOptions> GetReadOptions(
 static inline Future<std::shared_ptr<csv::StreamingReader>> OpenReaderAsync(
     const FileSource& source, const CsvFileFormat& format,
     const std::shared_ptr<ScanOptions>& scan_options, Executor* cpu_executor) {
+#ifdef ARROW_WITH_OPENTELEMETRY
+  auto tracer = arrow::internal::tracing::GetTracer();
+  auto span = tracer->StartSpan("arrow::dataset::CsvFileFormat::OpenReaderAsync");
+#endif

Review comment:
       Could we push this `ifdef` into `StartSpan` by returning a dummy span object with no-op methods?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscribe@arrow.apache.org

For queries about this service, please contact Infrastructure at:
users@infra.apache.org