工具调用(又称函数调用)
了解如何实现工具调用、管理代理循环以及使用 Firebase AI Logic SDK 整合人工参与循环。
虽然 LLM 的确是在几乎整个互联网上进行训练的,但它们并非无所不知。它们知道训练当天公开互联网上的内容,但不知道任何更新的内容。它们不知道任何对您或您的组织私有的内容。甚至它们知道的事情也容易与其他已知的事情混淆。
对于这些场景以及许多其他场景,我们通常会向 LLM 提供一个或多个工具。
工具定义
#工具是一个名称、描述以及 LLM “调用”该工具时输入数据格式的 JSON 模式。例如,如果提示 LLM “减少奶奶的 All America 早餐配方的碳水化合物”,除非我们提供一个接受查询字符串以查找配方的“lookupRecipe”工具,否则它不知道奶奶的配方是什么。
从概念上讲,工具是我们提供给 LLM 在需要该数据或服务时调用的东西。LLM 调用工具的方式是使用专门格式的消息响应应用程序的请求,该消息表示“工具调用”。工具调用消息包括工具的名称和 JSON 参数。应用程序处理工具调用并将结果捆绑到另一个 LLM 请求中,然后 LLM 响应该请求。
这可以持续一段时间。应用程序可以配置具有任意数量工具的模型实例(尽管 LLM 更喜欢功能不重叠的小型目标工具集)。LLM 可以在其响应中捆绑任意数量的工具调用,并且可以在请求中接收任意数量的工具结果。LLM 通过形成请求/响应对历史记录的消息堆栈来整合多个提示和工具调用结果的往返。
当它完成工具调用后,LLM 会返回其最终响应,例如“这里有一个蛋白质含量高、碳水化合物含量低的奶奶的 All American 早餐配方版本……”。
Gemini 函数
#在 Firebase AI Logic SDK 中,工具称为“函数”,但含义相同。在示例中,解谜模型配置了一个查找单词详细信息的函数。如果 LLM 想要帮助解谜过程的单词详细信息,则调用该函数可以从 Free Dictionary API 获取数据。
[
{
"word": "tool",
"phonetic": "/tuːl/",
"phonetics": [
{
"text": "/tuːl/",
"audio": "https://api.dictionaryapi.dev/media/pronunciations/en/tool-uk.mp3",
"sourceUrl": "https://commons.wikimedia.org/w/index.php?curid=94709459",
"license": {
"name": "BY-SA 4.0",
"url": "https://creativecommons.org/licenses/by-sa/4.0"
}
}
],
"meanings": [
{
"partOfSpeech": "noun",
"definitions": [
{
"definition": "A mechanical device intended to make a task easier.",
"synonyms": [],
"antonyms": [],
"example": "Hand me that tool, would you? I don't have the right tools to start fiddling around with the engine."
},
...
应用程序有一个执行查找的 Dart 函数
// Look up the metadata for a word in the dictionary API.
Future<Map<String, dynamic>> _getWordMetadataFromApi(String word) async {
final url = Uri.parse(
'https://api.dictionaryapi.dev/api/v2/entries/en/${Uri.encodeComponent(word)}',
);
final response = await http.get(url);
return response.statusCode == 200
? {'result': jsonDecode(response.body)}
: {'error': 'Could not find a definition for "$word".'};
}
模型在初始化期间配置了查找函数
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
FunctionDeclaration(
'getWordMetadata',
'Gets grammatical metadata for a word, like its part of speech. '
'Best used to verify a candidate answer against a clue that implies a '
'grammatical constraint.',
parameters: {
'word': Schema(SchemaType.string, description: 'The word to look up.'),
},
),
]),
],
);
为了可靠性,将工具列在系统指令中也是一个好主意
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `getWordMetadata`
You have a tool to get grammatical information about a word.
**When to use:**
- This tool is most helpful as a verification step after you have a likely answer.
- Consider using this tool when a clue contains a grammatical hint that could be ambiguous.
- **Good candidates for verification:**
- Clues that seem to be verbs (e.g., "To run," "Waving").
- Clues that are adverbs (e.g., "Happily," "Quickly").
- Clues that specify a plural form.
- **Try to avoid using the tool for:**
- Simple definitions (e.g., "A small dog").
- Fill-in-the-blank clues (e.g., "___ and flow").
- Proper nouns (e.g., "Capital of France").
**Function signature:**
```json
${jsonEncode(_getWordMetadataFunction.toJson())}
```
''';
当应用程序发出请求时,模型现在有一个工具可以使用,当它认为这会很有帮助时。为了支持工具调用,我们需要实现一个代理循环。
代理循环
#LLM 在功能上是无状态的,这意味着您必须在每次请求时向它提供所有必要的数据。对于仅包含提示和要发送的任何文件的请求,Firebase AI Logic SDK 在您的模型实例上公开了 generateContent 方法。
但是,工具调用需要形成初始提示的消息历史记录,以及构成工具调用和工具结果的请求/响应对。为了支持这一点,Firebase Logic AI 提供了一个“chat”对象来收集历史记录。我们使用它来构建代理循环
- 启动一个聊天以跨多个请求/响应对保存消息历史记录
- 收集其提供的任何工具调用的工具结果
- 将工具结果捆绑到新的请求中
- 循环直到模型提供没有工具调用的响应
- 返回在所有响应中累积的文本
以下是该算法作为 GenerativeModel 类上的扩展方法表达,以便我们可以像调用 generateContent 一样调用它
extension on GenerativeModel {
Future<String> generateContentWithFunctions({
required String prompt,
required Future<Map<String, dynamic>> Function(FunctionCall) onFunctionCall,
}) async {
// Use a chat session to support multiple request/response pairs, which is
// needed to support function calls.
final chat = startChat();
final buffer = StringBuffer();
var response = await chat.sendMessage(Content.text(prompt));
while (true) {
// Append the response text to the buffer.
buffer.write(response.text ?? '');
// If no function calls were collected, we're done
if (response.functionCalls.isEmpty) break;
// Append a newline to separate responses.
buffer.write('\n');
// Execute all function calls
final functionResponses = <FunctionResponse>[];
for (final functionCall in response.functionCalls) {
try {
functionResponses.add(
FunctionResponse(
functionCall.name,
await onFunctionCall(functionCall),
),
);
} catch (ex) {
functionResponses.add(
FunctionResponse(functionCall.name, {'error': ex.toString()}),
);
}
}
// Get the next response stream with function results
response = await chat.sendMessage(
Content.functionResponses(functionResponses),
);
}
return buffer.toString();
}
}
此方法接受一个提示和一个用于处理特定工具调用的回调,该示例调用该回调来处理单词查找函数
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
'getWordMetadata' => await _getWordMetadataFromApi(
functionCall.args['word'] as String,
),
_ => throw Exception('Unknown function call: ${functionCall.name}'),
},
);
结构化输出使 LLM 能够用于编程,但正是工具将 LLM 变成了“代理”(有关更多信息,请参阅“交互模式”部分)。
结构化输出和工具调用
#结合结构化输出和工具调用产生强大的组合。在示例中,解谜器有一个查找单词详细信息的工具。它也被要求返回一个将解决方案与置信度分数捆绑在一起的 JSON,两者都显示在应用程序的任务列表中
不幸的是,截至撰写本文时,使用 Firebase AI Logic SDK 将结构化输出和函数结合起来会产生异常
Function calling with a response mime type: 'application/json' is unsupported
作为对此问题的(希望是临时的)解决方法,该示例删除了结构化输出配置,而是使用一个名为 returnResult 的工具来模拟结构化输出
// The model for solving clues.
_clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
...,
FunctionDeclaration(
'returnResult',
'Returns the final result of the clue solving process.',
parameters: {
'answer': Schema(
SchemaType.string,
description: 'The answer to the clue.',
),
'confidence': Schema(
SchemaType.number,
description: 'The confidence score in the answer from 0.0 to 1.0.',
),
},
),
]),
],
);
returnResult 方法也在系统指令中提到
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `returnResult`
You have a tool to return the final result of the clue solving process.
**When to use:**
- Use this tool when you have a final answer and confidence score to return. You
must use this tool exactly once, and only once, to return the final result.
**Function signature:**
```json
${jsonEncode(_returnResultFunction.toJson())}
```
''';
当模型调用 returnResult 时,该示例会缓存结果,然后 solveClue 在调用 generateContentWithFunctions 后查找该结果
// Buffer for the result of the clue solving process.
final _returnResult = <String, dynamic>{};
// Cache the return result of the clue solving process via a function call.
// This is how we get JSON responses from the model with functions, since the
// model cannot return JSON directly when tools are used.
Map<String, dynamic> _cacheReturnResult(Map<String, dynamic> returnResult) {
assert(_returnResult.isEmpty);
_returnResult.addAll(returnResult);
return {'status': 'success'};
}
Future<ClueAnswer?> solveClue(Clue clue, int length, String pattern) async {
// Clear the return result cache; this is where the result will be stored.
_returnResult.clear();
// Generate JSON response with functions and schema.
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
'getWordMetadata' => ...,
'returnResult' => _cacheReturnResult(functionCall.args),
_ => throw Exception('Unknown function call: ${functionCall.name}'),
},
);
// Use the structured output that the LLM has called function with
assert(_returnResult.isNotEmpty);
return ClueAnswer(
answer: _returnResult['answer'] as String,
confidence: (_returnResult['confidence'] as num).toDouble(),
);
}
我们必须更加努力才能使用 Firebase AI Logic 获得结构化输出和工具调用的组合,但结果是值得的!
人工参与循环
#到目前为止,我们已经看到工具用于收集数据和格式化输出。我们还可以使用它们让人工参与进来。
例如,有时当示例会传入解决方案应该遵循的模式(例如“_R_Y”)时,模型想要建议一个不符合该模式的答案(例如“RENT”)。像这样的冲突是要求用户提供帮助的好时机
这称为将“人工参与循环”,这是人类和 LLM 协作的另一种方式。Flutter 和 Firebase AI Logic SDK 使这变得很容易。首先,该示例定义了一个函数并配置了模型
// The new function to let the LLM resolve solution conflicts
static final _resolveConflictFunction = FunctionDeclaration(
'resolveConflict',
'Asks the user to resolve a conflict between the letter pattern and the '
'proposed answer. Use this BEFORE calling returnResult if the answer you '
'want to propose does not match the letter pattern.',
parameters: {
'proposedAnswer': Schema(
SchemaType.string,
description: 'The answer the LLM wants to suggest.',
),
'pattern': Schema(
SchemaType.string,
description: 'The current letter pattern from the grid.',
),
'clue': Schema(SchemaType.string, description: 'The clue text.'),
},
);
// Pass the new tool to the model for solving clues.
final _clueSolverModel = FirebaseAI.googleAI().generativeModel(
model: 'gemini-2.5-flash',
systemInstruction: Content.text(clueSolverSystemInstruction),
tools: [
Tool.functionDeclarations([
...
_resolveConflictFunction,
]),
],
);
// Let the LLM know that it has a new tool.
static String get clueSolverSystemInstruction =>
'''
You are an expert crossword puzzle solver.
...
### Tool: `resolveConflict`
You have a tool to ask the user to resolve a conflict.
**When to use:**
- Use this tool **BEFORE** `returnResult` if your proposed answer conflicts with the provided letter pattern.
- For example, if the pattern is `_ R _ Y` and you want to suggest `RENT` (which fits the clue), there is a conflict at the second letter (`R` vs `E`). You should call `resolveConflict(proposedAnswer: "RENT", pattern: "_ R _ Y", clue: "...")`.
- The tool will return the user's decision (either your proposed answer or a new one). You should then use that result to call `returnResult`.
**Function signature:**
```json
${jsonEncode(_resolveConflictFunction.toJson())}
```
''';
现在,当模型看到冲突时,它将调用该工具
// handle the LLM's request to resolve the conflict
await _clueSolverModel.generateContentWithFunctions(
prompt: getSolverPrompt(clue, length, pattern),
onFunctionCall: (functionCall) async => switch (functionCall.name) {
...
'resolveConflict' => await _handleResolveConflict(
functionCall.args,
onConflict,
),
},
);
// Show the dialog to gather the user's input
Future<Map<String, dynamic>> _handleResolveConflict(
Map<String, dynamic> args,
Future<String> Function(String clue, String proposedAnswer, String pattern)?
onConflict,
) async {
final proposedAnswer = args['proposedAnswer'] as String;
final pattern = args['pattern'] as String;
final clue = args['clue'] as String;
if (onConflict != null) {
final result = await onConflict(clue, proposedAnswer, pattern);
return {'result': result};
}
return {'result': proposedAnswer};
}
该示例使用 onConflict 方法的实现来处理该工具,该方法调用 showDialog 以从用户处收集数据。所有这些都发生在代理循环的中间,但这没关系——模型没有等待;它已经将其响应发送回应用程序的初始请求。用户可以花时间使用 UI,而该示例等待 showDialog 返回的 Future。完成后,模型会使用消息历史记录和最新的请求继续,在这种情况下,恰好是从用户那里交互式收集的数据。
模态对话框是将人工参与循环的一种简单方法,但不是 Flutter 中唯一的方法。如果您更喜欢,Completer 的一个实例可以让您设置应用程序中的一些状态,从而使其进入“从用户处收集数据”模式。当应用程序拥有数据时,它可以调用 complete 在 Completer 上并恢复代理循环。
或者,由于您拥有代理循环,您可以检查对指示您需要从用户处收集数据的“特殊”函数的调用。这种特殊函数有时称为“中断”,并且当您拥有用户数据时,“恢复”对话。
请记住,LLM 是无状态的。它没有在等待您,因此您可以以最适合您的应用程序的方式处理代理循环。您可以随时使用更新的消息历史记录和新的提示返回到 LLM,无论是一分钟还是一个月。