跳到主内容

工具调用(又称函数调用)

了解如何实现工具调用、管理代理循环以及使用 Firebase AI Logic SDK 整合人工参与循环。

虽然 LLM 的确是在几乎整个互联网上进行训练的,但它们并非无所不知。它们知道训练当天公开互联网上的内容,但不知道任何更新的内容。它们不知道任何对您或您的组织私有的内容。甚至它们知道的事情也容易与其他已知的事情混淆。

对于这些场景以及许多其他场景,我们通常会向 LLM 提供一个或多个工具。

工具定义

#

工具是一个名称、描述以及 LLM “调用”该工具时输入数据格式的 JSON 模式。例如,如果提示 LLM “减少奶奶的 All America 早餐配方的碳水化合物”,除非我们提供一个接受查询字符串以查找配方的“lookupRecipe”工具,否则它不知道奶奶的配方是什么。

从概念上讲,工具是我们提供给 LLM 在需要该数据或服务时调用的东西。LLM 调用工具的方式是使用专门格式的消息响应应用程序的请求,该消息表示“工具调用”。工具调用消息包括工具的名称和 JSON 参数。应用程序处理工具调用并将结果捆绑到另一个 LLM 请求中,然后 LLM 响应该请求。

这可以持续一段时间。应用程序可以配置具有任意数量工具的模型实例(尽管 LLM 更喜欢功能不重叠的小型目标工具集)。LLM 可以在其响应中捆绑任意数量的工具调用,并且可以在请求中接收任意数量的工具结果。LLM 通过形成请求/响应对历史记录的消息堆栈来整合多个提示和工具调用结果的往返。

当它完成工具调用后,LLM 会返回其最终响应,例如“这里有一个蛋白质含量高、碳水化合物含量低的奶奶的 All American 早餐配方版本……”。

Gemini 函数

#

在 Firebase AI Logic SDK 中,工具称为“函数”,但含义相同。在示例中,解谜模型配置了一个查找单词详细信息的函数。如果 LLM 想要帮助解谜过程的单词详细信息,则调用该函数可以从 Free Dictionary API 获取数据。

json
[
  {
    "word": "tool",
    "phonetic": "/tuːl/",
    "phonetics": [
      {
        "text": "/tuːl/",
        "audio": "https://api.dictionaryapi.dev/media/pronunciations/en/tool-uk.mp3",
        "sourceUrl": "https://commons.wikimedia.org/w/index.php?curid=94709459",
        "license": {
          "name": "BY-SA 4.0",
          "url": "https://creativecommons.org/licenses/by-sa/4.0"
        }
      }
    ],
    "meanings": [
      {
        "partOfSpeech": "noun",
        "definitions": [
          {
            "definition": "A mechanical device intended to make a task easier.",
            "synonyms": [],
            "antonyms": [],
            "example": "Hand me that tool, would you?   I don't have the right tools to start fiddling around with the engine."
          },
...

应用程序有一个执行查找的 Dart 函数

dart
// Look up the metadata for a word in the dictionary API.
Future<Map<String, dynamic>> _getWordMetadataFromApi(String word) async {
  final url = Uri.parse(
    'https://api.dictionaryapi.dev/api/v2/entries/en/${Uri.encodeComponent(word)}',
  );

  final response = await http.get(url);
  return response.statusCode == 200
      ? {'result': jsonDecode(response.body)}
      : {'error': 'Could not find a definition for "$word".'};
}

模型在初始化期间配置了查找函数

dart
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.5-flash',
  systemInstruction: Content.text(clueSolverSystemInstruction),
  tools: [
    Tool.functionDeclarations([
      FunctionDeclaration(
        'getWordMetadata',
        'Gets grammatical metadata for a word, like its part of speech. '
        'Best used to verify a candidate answer against a clue that implies a '
        'grammatical constraint.',
        parameters: {
           'word': Schema(SchemaType.string, description: 'The word to look up.'),
         },
       ),
    ]),
  ],
);

为了可靠性,将工具列在系统指令中也是一个好主意

dart
static String get clueSolverSystemInstruction =>
    '''
You are an expert crossword puzzle solver.

...

### Tool: `getWordMetadata`

You have a tool to get grammatical information about a word.

**When to use:**
- This tool is most helpful as a verification step after you have a likely answer.
- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
- **Good candidates for verification:**
  - Clues that seem to be verbs (e.g., "To run," "Waving").
  - Clues that are adverbs (e.g., "Happily," "Quickly").
  - Clues that specify a plural form.
- **Try to avoid using the tool for:**
  - Simple definitions (e.g., "A small dog").
  - Fill-in-the-blank clues (e.g., "___ and flow").
  - Proper nouns (e.g., "Capital of France").

**Function signature:**
```json
${jsonEncode(_getWordMetadataFunction.toJson())}
```
''';

当应用程序发出请求时,模型现在有一个工具可以使用,当它认为这会很有帮助时。为了支持工具调用,我们需要实现一个代理循环。

代理循环

#

LLM 在功能上是无状态的,这意味着您必须在每次请求时向它提供所有必要的数据。对于仅包含提示和要发送的任何文件的请求,Firebase AI Logic SDK 在您的模型实例上公开了 generateContent 方法。

但是,工具调用需要形成初始提示的消息历史记录,以及构成工具调用和工具结果的请求/响应对。为了支持这一点,Firebase Logic AI 提供了一个“chat”对象来收集历史记录。我们使用它来构建代理循环

  • 启动一个聊天以跨多个请求/响应对保存消息历史记录
  • 收集其提供的任何工具调用的工具结果
  • 将工具结果捆绑到新的请求中
  • 循环直到模型提供没有工具调用的响应
  • 返回在所有响应中累积的文本

以下是该算法作为 GenerativeModel 类上的扩展方法表达,以便我们可以像调用 generateContent 一样调用它

dart
extension on GenerativeModel {
  Future<String> generateContentWithFunctions({
    required String prompt,
    required Future<Map<String, dynamic>> Function(FunctionCall) onFunctionCall,
  }) async {
    // Use a chat session to support multiple request/response pairs, which is
    // needed to support function calls.
    final chat = startChat();
    final buffer = StringBuffer();
    var response = await chat.sendMessage(Content.text(prompt));

    while (true) {
      // Append the response text to the buffer.
      buffer.write(response.text ?? '');

      // If no function calls were collected, we're done
      if (response.functionCalls.isEmpty) break;

      // Append a newline to separate responses.
      buffer.write('\n');

      // Execute all function calls
      final functionResponses = <FunctionResponse>[];
      for (final functionCall in response.functionCalls) {
        try {
          functionResponses.add(
            FunctionResponse(
              functionCall.name,
              await onFunctionCall(functionCall),
            ),
          );
        } catch (ex) {
          functionResponses.add(
            FunctionResponse(functionCall.name, {'error': ex.toString()}),
          );
        }
      }

      // Get the next response stream with function results
      response = await chat.sendMessage(
        Content.functionResponses(functionResponses),
      );
    }

    return buffer.toString();
  }
}

此方法接受一个提示和一个用于处理特定工具调用的回调,该示例调用该回调来处理单词查找函数

dart
await _clueSolverModel.generateContentWithFunctions(
  prompt: getSolverPrompt(clue, length, pattern),
  onFunctionCall: (functionCall) async => switch (functionCall.name) {
    'getWordMetadata' => await _getWordMetadataFromApi(
      functionCall.args['word'] as String,
    ),
    _ => throw Exception('Unknown function call: ${functionCall.name}'),
  },
);

结构化输出使 LLM 能够用于编程,但正是工具将 LLM 变成了“代理”(有关更多信息,请参阅“交互模式”部分)。

结构化输出和工具调用

#

结合结构化输出和工具调用产生强大的组合。在示例中,解谜器有一个查找单词详细信息的工具。它也被要求返回一个将解决方案与置信度分数捆绑在一起的 JSON,两者都显示在应用程序的任务列表中

App task list showing crossword clues followed by bold answers and
confidence scores in parentheses

不幸的是,截至撰写本文时,使用 Firebase AI Logic SDK 将结构化输出和函数结合起来会产生异常

Function calling with a response mime type: 'application/json' is unsupported

作为对此问题的(希望是临时的)解决方法,该示例删除了结构化输出配置,而是使用一个名为 returnResult 的工具来模拟结构化输出

dart
 // The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.5-flash',
  systemInstruction: Content.text(clueSolverSystemInstruction),
  tools: [
    Tool.functionDeclarations([
      ...,
      FunctionDeclaration(
        'returnResult',
        'Returns the final result of the clue solving process.',
        parameters: {
        'answer': Schema(
          SchemaType.string,
          description: 'The answer to the clue.',
        ),
        'confidence': Schema(
          SchemaType.number,
          description: 'The confidence score in the answer from 0.0 to 1.0.',
          ),
        },
      ),
    ]),
  ],
);

returnResult 方法也在系统指令中提到

dart
static String get clueSolverSystemInstruction =>
    '''
You are an expert crossword puzzle solver.

...

### Tool: `returnResult`

You have a tool to return the final result of the clue solving process.

**When to use:**
- Use this tool when you have a final answer and confidence score to return. You
must use this tool exactly once, and only once, to return the final result.

**Function signature:**
```json
${jsonEncode(_returnResultFunction.toJson())}
```
''';

当模型调用 returnResult 时,该示例会缓存结果,然后 solveClue 在调用 generateContentWithFunctions 后查找该结果

dart
// Buffer for the result of the clue solving process.
final _returnResult = <String, dynamic>{};

// Cache the return result of the clue solving process via a function call.
// This is how we get JSON responses from the model with functions, since the
// model cannot return JSON directly when tools are used.
Map<String, dynamic> _cacheReturnResult(Map<String, dynamic> returnResult) {
  assert(_returnResult.isEmpty);
  _returnResult.addAll(returnResult);
  return {'status': 'success'};
}

Future<ClueAnswer?> solveClue(Clue clue, int length, String pattern) async {
  // Clear the return result cache; this is where the result will be stored.
  _returnResult.clear();

  // Generate JSON response with functions and schema.
  await _clueSolverModel.generateContentWithFunctions(
    prompt: getSolverPrompt(clue, length, pattern),
    onFunctionCall: (functionCall) async => switch (functionCall.name) {
      'getWordMetadata' => ...,
      'returnResult' => _cacheReturnResult(functionCall.args),
      _ => throw Exception('Unknown function call: ${functionCall.name}'),
    },
  );

  // Use the structured output that the LLM has called function with
  assert(_returnResult.isNotEmpty);
  return ClueAnswer(
    answer: _returnResult['answer'] as String,
    confidence: (_returnResult['confidence'] as num).toDouble(),
  );
}

我们必须更加努力才能使用 Firebase AI Logic 获得结构化输出和工具调用的组合,但结果是值得的!

人工参与循环

#

到目前为止,我们已经看到工具用于收集数据和格式化输出。我们还可以使用它们让人工参与进来。

例如,有时当示例会传入解决方案应该遵循的模式(例如“_R_Y”)时,模型想要建议一个不符合该模式的答案(例如“RENT”)。像这样的冲突是要求用户提供帮助的好时机
Crossword Companion app displaying a Conflict Detected dialog asking for
user input to resolve a clue pattern
这称为将“人工参与循环”,这是人类和 LLM 协作的另一种方式。Flutter 和 Firebase AI Logic SDK 使这变得很容易。首先,该示例定义了一个函数并配置了模型

dart

// The new function to let the LLM resolve solution conflicts
static final _resolveConflictFunction = FunctionDeclaration(
  'resolveConflict',
  'Asks the user to resolve a conflict between the letter pattern and the '
  'proposed answer. Use this BEFORE calling returnResult if the answer you '
  'want to propose does not match the letter pattern.',
  parameters: {
    'proposedAnswer': Schema(
      SchemaType.string,
      description: 'The answer the LLM wants to suggest.',
    ),
    'pattern': Schema(
      SchemaType.string,
      description: 'The current letter pattern from the grid.',
    ),
    'clue': Schema(SchemaType.string, description: 'The clue text.'),
  },
);

// Pass the new tool to the model for solving clues.
final _clueSolverModel = FirebaseAI.googleAI().generativeModel(
  model: 'gemini-2.5-flash',
  systemInstruction: Content.text(clueSolverSystemInstruction),
  tools: [
    Tool.functionDeclarations([
      ...
      _resolveConflictFunction,
    ]),
  ],
);
// Let the LLM know that it has a new tool.
static String get clueSolverSystemInstruction =>
    '''
You are an expert crossword puzzle solver.

...

### Tool: `resolveConflict`

You have a tool to ask the user to resolve a conflict.

**When to use:**
- Use this tool **BEFORE** `returnResult` if your proposed answer conflicts with the provided letter pattern.
- For example, if the pattern is `_ R _ Y` and you want to suggest `RENT` (which fits the clue), there is a conflict at the second letter (`R` vs `E`). You should call `resolveConflict(proposedAnswer: "RENT", pattern: "_ R _ Y", clue: "...")`.
- The tool will return the user's decision (either your proposed answer or a new one). You should then use that result to call `returnResult`.

**Function signature:**
```json
${jsonEncode(_resolveConflictFunction.toJson())}
```
''';

现在,当模型看到冲突时,它将调用该工具

dart
// handle the LLM's request to resolve the conflict
await _clueSolverModel.generateContentWithFunctions(
  prompt: getSolverPrompt(clue, length, pattern),
  onFunctionCall: (functionCall) async => switch (functionCall.name) {
    ...
    'resolveConflict' => await _handleResolveConflict(
      functionCall.args,
      onConflict,
    ),
  },
);

// Show the dialog to gather the user's input
Future<Map<String, dynamic>> _handleResolveConflict(
  Map<String, dynamic> args,
  Future<String> Function(String clue, String proposedAnswer, String pattern)?
  onConflict,
) async {
  final proposedAnswer = args['proposedAnswer'] as String;
  final pattern = args['pattern'] as String;
  final clue = args['clue'] as String;

  if (onConflict != null) {
    final result = await onConflict(clue, proposedAnswer, pattern);
    return {'result': result};
  }

  return {'result': proposedAnswer};
}

该示例使用 onConflict 方法的实现来处理该工具,该方法调用 showDialog 以从用户处收集数据。所有这些都发生在代理循环的中间,但这没关系——模型没有等待;它已经将其响应发送回应用程序的初始请求。用户可以花时间使用 UI,而该示例等待 showDialog 返回的 Future。完成后,模型会使用消息历史记录和最新的请求继续,在这种情况下,恰好是从用户那里交互式收集的数据。

模态对话框是将人工参与循环的一种简单方法,但不是 Flutter 中唯一的方法。如果您更喜欢,Completer 的一个实例可以让您设置应用程序中的一些状态,从而使其进入“从用户处收集数据”模式。当应用程序拥有数据时,它可以调用 completeCompleter 上并恢复代理循环。

或者,由于您拥有代理循环,您可以检查对指示您需要从用户处收集数据的“特殊”函数的调用。这种特殊函数有时称为“中断”,并且当您拥有用户数据时,“恢复”对话。

请记住,LLM 是无状态的。它没有在等待您,因此您可以以最适合您的应用程序的方式处理代理循环。您可以随时使用更新的消息历史记录和新的提示返回到 LLM,无论是一分钟还是一个月。