[Bug]: Span dispatcher does not handle `WorkflowHandler.pending` correctly. #16283

B-Step62 · 2024-09-29T07:51:13Z

Bug Description

Hi team, I am MLflow maintainer and found an issue in the instrumentation mechanism with the new LlamaIndex workflow. The issue is not only for MLflow, but also affect other observability providers like LlamaTrace.

MLflow Tracing implements the new instrumentation mechanism (span + event handlers) and it works perfectly for tracing LlamaIndex workflow until this change was introduced in 0.11.10. After this async context was introduced, the span dispatcher invokes the prepare_to_exit_span handler with the pending WorkflowHandler. This is problematic beacuse:

Users cannot see the actual result but just pending future object as an span/trace output.
Users cannot measure the latency of actual operation. Instead it only records the Future creation time.
The parent span (run) exits earlier than the child task is scheduled, then the self.open_spans attribute of the span handler to be empty when the child span starts, as a result, span handler cannot determine the parent child relationship.

The dispatcher should await for the result and call prepare_to_exist_span. I'm not very familiar with the dispatching logic for workflow, but apparently the issue is that dispatcher.span decorator that invokes the span handler immediately with the Future object.

llama_index/llama-index-core/llama_index/core/workflow/workflow.py

Line 280 in 75c2d10

@dispatcher.span

I also validate the same issue happens in LlamaTrace as well:

(The parent-child relationship is maintained probably because of some difference in the implementation detail, but you can see the output and span duration is off for the root span).

Version

0.11.10 (or later)

Steps to Reproduce

Reproduce with dummy span handler (no dependency)

Define a dummy handler

import inspect
from typing import Any, Dict, Optional
from llama_index.core.bridge.pydantic import Field
from llama_index.core.instrumentation.span.base import BaseSpan
from llama_index.core.instrumentation.span_handlers import BaseSpanHandler
from llama_index.core.workflow.handler import WorkflowHandler


class TestSpanHandler(BaseSpanHandler[BaseSpan]):
    def new_span(
        self,
        id_: str,
        bound_args: inspect.BoundArguments,
        instance: Optional[Any] = None,
        parent_span_id: Optional[str] = None,
        tags: Optional[Dict[str, Any]] = None,
        **kwargs: Any,
    ) -> Optional[BaseSpan]:
        """Create a span."""
        pass

    def prepare_to_exit_span(
        self,
        id_: str,
        bound_args: inspect.BoundArguments,
        instance: Optional[Any] = None,
        result: Optional[Any] = None,
        **kwargs: Any,
    ) -> Any:
        """Logic for preparing to exit a span."""
        if isinstance(result, WorkflowHandler) and not result.is_done():
            print(f"prepare_to_exit_span called with pending WorkflowHandler")


    def prepare_to_drop_span(
        self,
        id_: str,
        bound_args: inspect.BoundArguments,
        instance: Optional[Any] = None,
        err: Optional[BaseException] = None,
        **kwargs: Any,
    ) -> Any:
        """Logic for preparing to drop a span."""
        pass

Define a simple workflow

    from llama_index.core.workflow import StartEvent, StopEvent,Workflow, step
    import time

    class MyWorkflow(Workflow):
        @step
        async def my_step(self, ev: StartEvent) -> StopEvent:
            time.sleep(3)
            return StopEvent(result="Hi, world!")

    w = MyWorkflow(timeout=10, verbose=False)

Run the workflow with the span handler

import llama_index.core.instrumentation as instrument

dispatcher = instrument.get_dispatcher()
dispatcher.add_span_handler(TestSpanHandler())

result = await w.run()

Output:

prepare_to_exit_span called with pending WorkflowHandler

Reproduce with MLflow

Install dev branch pip install git+https://github.com/B-Step62/mlflow.git@llama-workflow-trace (Installing dev branch only for applying some fix for recent async handling. The stable release works with LlamaIndex workflow earlier than 0.11.10)
Enable MLflow tracing for LlamaIndex

import mlflow

mlflow.llama_index.autolog()

Run the same workflow as above.

Relevant Logs/Tracbacks

No response

The text was updated successfully, but these errors were encountered:

dosubot · 2024-09-29T07:53:05Z

To address the issue where the span dispatcher in LlamaIndex version 0.11.10 invokes the prepare_to_exit_span handler with a pending WorkflowHandler, you need to ensure that the prepare_to_exit_span method correctly handles the state of the span before it is exited. This can be done by adding a check to see if the span is in a valid state to be exited.

Here is a possible solution:

Modify the prepare_to_exit_span method to include a check for the span's state.
Ensure that the span is only removed from open_spans if it is in a valid state to be exited.

Here is the updated code:

def span_exit(
        self,
        id_: str,
        bound_args: inspect.BoundArguments,
        instance: Optional[Any] = None,
        result: Optional[Any] = None,
        **kwargs: Any,
    ) -> None:
        """Logic for exiting a span."""
        span = self.prepare_to_exit_span(
            id_=id_, bound_args=bound_args, instance=instance, result=result
        )
        if span:
            with self.lock:
                if id_ in self.open_spans:
                    del self.open_spans[id_]
                else:
                    # Handle the case where the span is not in open_spans
                    # This could involve logging an error or taking other appropriate action
                    pass

In the prepare_to_exit_span method, you should ensure that the span is in a valid state to be exited. This might involve checking certain attributes of the span or ensuring that it has completed its intended operations.

Here is the relevant part of the prepare_to_exit_span method from the AgentOpsSpanHandler class:

def prepare_to_exit_span(
    self,
    id_: str,
    bound_args: inspect.BoundArguments,
    instance: Optional[Any] = None,
    result: Optional[Any] = None,
    **kwargs: Any,
) -> SimpleSpan:
    self._shared_handler_state.remove_span_id(id_)
    return super().prepare_to_exit_span(id_, bound_args, instance, result, **kwargs)

Additionally, you can refer to the test cases provided in the LlamaIndex core tests to ensure that your changes are correctly handling the span lifecycle. For example, the test_dispatcher_span_drop_args and test_dispatcher_async_span_drop_args_with_instance tests show how to mock and assert the behavior of span enter, drop, and exit methods.

By adding these checks and referring to the provided test cases, you can prevent the prepare_to_exit_span handler from being invoked with a pending WorkflowHandler, thereby avoiding incorrect span/trace outputs and latency measurements ^[1]^[2]^[3]^[4].

To continue talking to Dosu, mention @dosu.

B-Step62 added bug Something isn't working triage Issue needs to be triaged/prioritized labels Sep 29, 2024

logan-markewich linked a pull request Sep 29, 2024 that will close this issue

[WIP] Fixing instrumentation for workflows #16290

Open

mikeldking mentioned this issue Sep 30, 2024

[bug] Error logs when instrumenting llama-index workflows Arize-ai/openinference#1051

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug]: Span dispatcher does not handle `WorkflowHandler.pending` correctly. #16283

[Bug]: Span dispatcher does not handle `WorkflowHandler.pending` correctly. #16283

B-Step62 commented Sep 29, 2024

dosubot bot commented Sep 29, 2024

[Bug]: Span dispatcher does not handle WorkflowHandler.pending correctly. #16283

[Bug]: Span dispatcher does not handle WorkflowHandler.pending correctly. #16283

Comments

B-Step62 commented Sep 29, 2024

Bug Description

Version

Steps to Reproduce

Reproduce with dummy span handler (no dependency)

Reproduce with MLflow

Relevant Logs/Tracbacks

dosubot bot commented Sep 29, 2024

[Bug]: Span dispatcher does not handle `WorkflowHandler.pending` correctly. #16283

[Bug]: Span dispatcher does not handle `WorkflowHandler.pending` correctly. #16283