|
60 | 60 | "```\n",
|
61 | 61 | "check to see if directories exist before making files\n",
|
62 | 62 | "```\n",
|
63 |
| - "For high volume and well defined tasks, we can make it even more robust by outlining the sequence of function to call explicitly, for example:\n", |
| 63 | + "For high volume and well defined tasks, we can make it even more robust by outlining the sequence of functions to call explicitly, for example:\n", |
64 | 64 | "```\n",
|
65 | 65 | "To Process a refund for a delivered order, follow the following steps:\n",
|
66 | 66 | "1. Confirm the order was delivered. Use: `order_status_check`\n",
|
|
154 | 154 | "Do NOT promise to call a function later. If a function call is required, emit it now; otherwise respond normally.\n",
|
155 | 155 | "```\n",
|
156 | 156 | "\n",
|
157 |
| - "2. Catch bad arguments early: \n", |
158 |
| - "Setting `strict` to `true` will ensure function calls reliably adhere to the [function schema](https://platform.openai.com/docs/guides/function-calling?api-mode=responses#strict-mode). We recommend turning it on whenever possible.\n", |
| 157 | + "2. Catch bad arguments early:\n", |
| 158 | + "setting `strict` to `true` will ensure function calls reliably adhere to the [function schema](https://platform.openai.com/docs/guides/function-calling?api-mode=responses#strict-mode). We recommend turning it on whenever possible.\n", |
159 | 159 | "\n",
|
160 | 160 | "If your arguments have additional complex format requirements (e.g valid python code etc), adding the following instruction can remind the model of the expected format. \n",
|
161 | 161 | "\n",
|
162 | 162 | "```\n",
|
163 | 163 | "Validate arguments against the format before sending the call; if you are unsure, ask for clarification instead of guessing.\n",
|
164 | 164 | "```\n",
|
165 | 165 | "\n",
|
166 |
| - "3. Another note on lazy behavior\n", |
167 |
| - "We are aware of rare instances of lazy behavior from o3, such as stating it does not have enough time to complete a task, promising to follow up separately, or giving terse answers even when explicitly prompted to provide more detail. We have found that the following steps help ameliorate this behavior:\n", |
| 166 | + "3. Another note on lazy behavior:\n", |
| 167 | + "we are aware of rare instances of lazy behavior from o3, such as stating it does not have enough time to complete a task, promising to follow up separately, or giving terse answers even when explicitly prompted to provide more detail. We have found that the following steps help ameliorate this behavior:\n", |
168 | 168 | "\n",
|
169 |
| - " a. Start a new conversation for unrelated topics:\n", |
| 169 | + " a. Start a new conversation for unrelated topics: \n", |
170 | 170 | " When switching to a new or unrelated topic, begin a fresh conversation thread rather than continuing in the same context. This helps the model focus on the current subject and prevents it from being influenced by previous, irrelevant context, which can sometimes lead to incomplete or lazy responses. For example, if you were previously discussing code debugging and now want to ask about documentation best practices, which does not require previous conversation context, start a new conversation to ensure clarity and focus.\n",
|
171 | 171 | "\n",
|
172 |
| - " b. Discard irrelevant past tool calls/outputs when the list gets too long, and summarize them as context in the user message:\n", |
| 172 | + " b. Discard irrelevant past tool calls/outputs when the list gets too long, and summarize them as context in the user message: \n", |
173 | 173 | " If the conversation history contains a long list of previous tool calls or outputs that are no longer relevant, remove them from the context. Instead, provide a concise summary of the important information as part of the user message. This keeps the context manageable and ensures the model has access to only the most pertinent information. For instance, if you have a lengthy sequence of tool outputs, you can summarize the key results and include only that summary in your next message.\n",
|
174 | 174 | "\n",
|
175 | 175 | " c. We are constantly improving our models and expect to have this issue addressed in future versions.\n",
|
176 | 176 | "\n",
|
177 | 177 | "\n",
|
178 | 178 | "### Avoid Chain of Thought Prompting\n",
|
179 |
| - "Since these models are reasoning models and produce an internal chain of thought, they do not have to be explicitly prompted to plan and reason between toolcalls. Therefore, a developer should not try to induce additional reasoning before each function call by asking the model to plan more extensively. Asking a reasoning model to reason more may actually hurt the performance. \n", |
| 179 | + "Since these models are reasoning models and produce an internal chain of thought, they do not have to be explicitly prompted to plan and reason between tool calls. Therefore, a developer should not try to induce additional reasoning before each function call by asking the model to plan more extensively. Asking a reasoning model to reason more may actually hurt the performance. \n", |
180 | 180 | "\n",
|
181 | 181 | "A quick side note on reasoning summaries: the models will output reasoning tokens before calling tools. However, these will not always be accompanied by a summary, since our reasoning summaries require a minimum number of material reasoning tokens to produce a summary.\n"
|
182 | 182 | ]
|
|
188 | 188 | "# Responses API\n",
|
189 | 189 | "\n",
|
190 | 190 | "### Reasoning Items for Better Performance\n",
|
191 |
| - "We’ve released a [cookbook](https://cookbook.openai.com/examples/responses_api/reasoning_items) detailing the benefits of using the responses API. It is worth restating a few of the main points in this guide as well. o3/o4-mini are both trained with its internal reasoning persisted between toolcalls within a single turn. Persisting these reasoning items between toolcalls during inference will therefore lead to higher intelligence and performance in the form of better decision in when and how a tool gets called. Responses allow you to persist these reasoning items (maintained either by us or yourself through encrypted content if you do not want us to handle state-management) while Chat Completion doesn’t. Switching to the responses API and allowing the model access to reasoning items between function calls is the easiest way to squeeze out as much performance as possible for function calls. Here is an the example in the cookbook, reproduced for convenience, showing how you can pass back the reasoning item using `encrypted_content` in a way which we do not retain any state on our end:\n" |
| 191 | + "We’ve released a [cookbook](https://cookbook.openai.com/examples/responses_api/reasoning_items) detailing the benefits of using the responses API. It is worth restating a few of the main points in this guide as well. o3/o4-mini are both trained with its internal reasoning persisted between tool calls within a single turn. Persisting these reasoning items between tool calls during inference will therefore lead to higher intelligence and performance in the form of better decision in when and how a tool gets called. Responses allow you to persist these reasoning items (maintained either by us or yourself through encrypted content if you do not want us to handle state-management) while Chat Completion doesn’t. Switching to the responses API and allowing the model access to reasoning items between function calls is the easiest way to squeeze out as much performance as possible for function calls. Here is an the example in the cookbook, reproduced for convenience, showing how you can pass back the reasoning item using `encrypted_content` in a way which we do not retain any state on our end:\n" |
192 | 192 | ]
|
193 | 193 | },
|
194 | 194 | {
|
|
366 | 366 | "**A:** Not guaranteed. The guidance in this document assumes you’re using the standard `tools` model parameter to pass your function schemas, as shown in our [general guide](https://platform.openai.com/docs/guides/function-calling) on function calling. Our o3/o4-mini models are trained to understand and use these schemas natively for tool selection and argument construction.\n",
|
367 | 367 | "\n",
|
368 | 368 | "If you’re instead providing custom tool definitions via natural language in a developer-authored prompt (e.g., defining tools inline in the developer message or user message), this guidance may not fully apply. In those cases:\n",
|
369 |
| - "The model is not relying on its internal tool-schema priors\n", |
370 |
| - "You may need to be more explicit with few-shot examples, output formats, and tool selection criteria\n", |
371 |
| - "Argument construction reliability may degrade without schema-level anchoring\n", |
| 369 | + "The model is not relying on its internal tool-schema priors. \n", |
| 370 | + "You may need to be more explicit with few-shot examples, output formats, and tool selection criteria. \n", |
| 371 | + "Argument construction reliability may degrade without schema-level anchoring.\n", |
372 | 372 | "\n",
|
373 | 373 | "Use the structured tools parameter when possible. If you must define tools in free text, treat it as a custom protocol and test accordingly.\n"
|
374 | 374 | ]
|
|
0 commit comments