feat(genapi): add parallel tool calling support (#5323)

fpagny · web-flow · commit df6cac11108f · 2025-07-24T10:29:31.000+02:00
diff --git a/pages/managed-inference/reference-content/model-catalog.mdx b/pages/managed-inference/reference-content/model-catalog.mdx
@@ -91,6 +91,7 @@ google/gemma-3-27b-it:bf16
 ```
 | Attribute | Value |
 |-----------|-------|
+| Supports parallel tool calling | No |
 | Supported image formats | PNG, JPEG, WEBP, and non-animated GIFs |
 | Maximum image resolution (pixels) | 896x896 |
 | Token dimension (pixels)| 56x56 |
@@ -103,6 +104,7 @@ This model was optimized to have a dense knowledge and faster tokens throughput
 
 | Attribute | Value |
 |-----------|-------|
+| Supports parallel tool calling | Yes |
 | Supported images formats | PNG, JPEG, WEBP, and non-animated GIFs |
 | Maximum image resolution (pixels) | 1540x1540 |
 | Token dimension (pixels)| 28x28 |
@@ -123,6 +125,7 @@ It can analyze images and offer insights from visual content alongside text.
 
 | Attribute | Value |
 |-----------|-------|
+| Supports parallel tool calling | Yes |
 | Supported images formats | PNG, JPEG, WEBP, and non-animated GIFs |
 | Maximum image resolution (pixels) | 1024x1024 |
 | Token dimension (pixels)| 16x16 |
@@ -148,6 +151,10 @@ allenai/molmo-72b-0924:fp8
 Released December 6, 2024, Meta’s Llama 3.3 70b is a fine-tune of the [Llama 3.1 70b](/managed-inference/reference-content/model-catalog/#llama-31-70b-instruct) model.
 This model is still text-only (text in/text out). However, Llama 3.3 was designed to approach the performance of Llama 3.1 405B on some applications.
 
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | Yes |
+
 #### Model name
 ```
 meta/llama-3.3-70b-instruct:fp8
@@ -158,6 +165,10 @@ meta/llama-3.3-70b-instruct:bf16
 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
 Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
 
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | Yes |
+
 #### Model names
 ```
 meta/llama-3.1-70b-instruct:fp8
@@ -168,6 +179,10 @@ meta/llama-3.1-70b-instruct:bf16
 Released July 23, 2024, Meta’s Llama 3.1 is an iteration of the open-access Llama family.
 Llama 3.1 was designed to match the best proprietary models and outperform many of the available open source on common industry benchmarks.
 
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | Yes |
+
 #### Model names
 ```
 meta/llama-3.1-8b-instruct:fp8
@@ -197,6 +212,10 @@ nvidia/llama-3.1-nemotron-70b-instruct:fp8
 Released January 21, 2025, Deepseek’s R1 Distilled Llama 70B is a distilled version of the Llama model family based on Deepseek R1.
 DeepSeek R1 Distill Llama 70B is designed to improve the performance of Llama models on reasoning use cases such as mathematics and coding tasks.
 
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | No |
+
 #### Model name
 ```
 deepseek/deepseek-r1-distill-llama-70b:fp8
@@ -247,6 +266,10 @@ Mistral Nemo is a state-of-the-art transformer model of 12B parameters, built by
 This model is open-weight and distributed under the Apache 2.0 license.
 It was trained on a large proportion of multilingual and code data.
 
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | Yes |
+
 #### Model name
 ```
 mistral/mistral-nemo-instruct-2407:fp8
@@ -302,6 +325,10 @@ kyutai/moshika-0.1-8b:fp8
 Qwen2.5-coder is your intelligent programming assistant familiar with more than 40 programming languages.
 With Qwen2.5-coder deployed at Scaleway, your company can benefit from code generation, AI-assisted code repair, and code reasoning.
 
+| Attribute | Value |
+|-----------|-------|
+| Supports parallel tool calling | No |
+
 #### Model name
 ```
 qwen/qwen2.5-coder-32b-instruct:int8