Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAIAgent chat returns empty sourceNodes and metadata #1015

Open
irevived1 opened this issue Jul 3, 2024 · 6 comments
Open

OpenAIAgent chat returns empty sourceNodes and metadata #1015

irevived1 opened this issue Jul 3, 2024 · 6 comments

Comments

@irevived1
Copy link
Contributor

Hello LlamaIndexTS,
I am currently using OpenAIAgent with QueryEngineTool from a custom VectorStoreIndex.
The agent is responding with correct information from the query engine but the sourceNodes and metadata is empty.
Not sure if this is a bug or me misusing the tool.

Thanks in advance,

  const menuContext = await storageContextFromDefaults({
    persistDir: "./menuDB",
  });

  const menuIndex = await VectorStoreIndex.fromDocuments([], {
    storageContext: menuContext,
  });
  const menuRetriever = await menuIndex.asRetriever()

  const menuRetrieverQueryEngine = await menuIndex.asQueryEngine({
    retriever: menuRetriever,
  });

////////////////////////////////

  const openaiLLM = new OpenAI({ model: "gpt-4o", temperature: 0 });

  const agent = new OpenAIAgent({
    systemPrompt: systemMessage,
    verbose: true,
    model: openaiLLM, tools: [
      // ..... more tools here
      new QueryEngineTool({
        queryEngine: menuRetrieverQueryEngine,
        metadata: {
          name: "menu_tool",
          description: `This tool can answer about items on the menu`,
        },
      })
    ]
  });


  const response = await agent.chat({ message: 'do you burgers?', verbose: true })

// response
EngineResponse {
sourceNodes: undefined,
metadata: {},
message: {
content: 'We have two burgers:\n' +
......

@marcusschiesser
Copy link
Collaborator

yes the sourceNodes are not forwarded to the agent, you'll need a callbackmanager:

const callbackManager = new CallbackManager();

callbackManager.on("retrieve-end", (data) => {
    const { nodes, query } = data.detail.payload;
    ... do something with nodes
});

const response = await Settings.withCallbackManager(callbackManager, () => {
      return agent.chat({ message: 'do you burgers?', verbose: true })
 });

An agent could also have multiple QueryEngineTools, so which sourceNodes to forward?

@logan-markewich I think we should deprecate sourceNodes and metadata

@logan-markewich
Copy link
Contributor

logan-markewich commented Jul 4, 2024

@marcusschiesser what would the alternative look like? This is a very well used method for tracking down sources

Take a look at python to see how we bubble up source nodes in agents. We have both sources (for tool calls) and source_nodes (in case any tool calls were a query engine with a response object)

@irevived1
Copy link
Contributor Author

Thanks for the response everyone. We can take advantage of CallbackManager for now.
Is it possible to update this thread if you plan to change the usage in the future?

I've also noticed another issue, its not completely related to the source nodes though.
The verbose flag doesn't really do anything. I tried both true/false and I couldn't spot the difference.

@marcusschiesser
Copy link
Collaborator

@logan-markewich

source_nodes (in case any tool calls were a query engine with a response object)

I see two problems with just having a source_nodes in the response object:

  1. if we stream a response, in which chunk of the stream do we put the source_nodes?
  2. if it's an agent and we have multiple QueryEngineTools, to which tool do the source_nodes belong to?

With the callbacks, we can solve both problems:

  1. we send the retrieve-end event with the source_nodes as payload. The user even gets the time of the retrieval that way
  2. similar, we have an llm-tool-result event which contains the result of the QueryToolEngine

@logan-markewich
Copy link
Contributor

  1. I guess in python, the response object and the stream are two separate objects
  2. Technically this would be tracked unde response.sources, which has each tool call. .source_nodes is an aggregate of the nodes the agent used

Callbacks or instrumentation are ok-ish. As long as you can (as you mentioned) trace back the retrieved nodes to a specific tool/query engine. Hooking into custom callbacks is also slightly less user friendly, at least in python land

@marcusschiesser
Copy link
Collaborator

I agree that callbacks are ok-ish.

source_nodes is an aggregate of the nodes the agent used
if using the aggregate is ok for each use case, we could also add it to LITS (would also solve the original issue of this ticket).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants