-
-
Notifications
You must be signed in to change notification settings - Fork 269
Add xAI Provider #373
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Add xAI Provider #373
Conversation
Excited for this new feature! 🙏 |
Hi @crmne – this PR is now ready for review! I've added xAI support using OpenAI compatible endpoints where possible. I've made a few general changes to run by you which I have outlined below.
Thanks in advance. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might not need some of the comments but overall it looks good to me. Eager to try it out and possibly pull into my fork.
spec/ruby_llm/chat_tools_spec.rb
Outdated
|
||
skip 'Mistral has a bug with tool arguments in multi-turn streaming' if provider == :mistral | ||
|
||
skip 'xAI has a bug with tool arguments in multi-turn streaming' if provider == :xai |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it really?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test run hung for me without this line. Still curious what's going on here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpaulshippy – Thanks for the review!
I had skipped this as I saw the same "hanging" behavior when running the spaces. After some digging it seems to be due to some strange reasoning behavior this spec triggers...
On the second call, grok
models (3
, 3-mini
, 4
, etc.) all seem to get stuck in a reasoning loop and repeatedly run the tool call over and over without resolution.
I've filmed a short Loom manually recreating the spec which shows this reasoning loop. I only let the second ask
run for a minute or so but could have left it longer and it would just keep going.
https://www.loom.com/share/9da04cb9cf914f4eaf2d0bc80766cc0a?sid=2b6e2118-26c7-4d70-9fc9-fd2396538bab
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I print out the reasoning_content
being returned by xAI on the second ask
, I get the below (note I stopped it after about a minute). Definitely looks like the model gets confused by the request!
It looks to be repeating the tool call as it keeps getting the same answer but we are challenging it, so it repeatedly calls the tool hoping to get a new answer 😅
I got the below by adding the below.
# lib/ruby_llm/providers/openai/streaming.rb:15
print data.dig('choices', 0, 'delta', 'reasoning_content')
Assistant reasoning content
First, the user is asking again for the best language to learn. This is the same query as before.
I have an available function: "best_language_to_learn", which gets the best language to learn. It doesn't require any parameters.
In the previous interaction, I called this function, and it returned "Ruby".
The system prompt says: "You must use tools whenever possible." And "If the user's query can be addressed using an available function, and all required parameters are provided or obviously inferable, trigger a function call."
This query matches the function exactly, so I should call the function again to get the information fresh, as it might change or to confirm.
The prompt also says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function."
In multi-turn conversations, continue calling relevant functions until the request is fulfilled.
So, the logical next step is to call the function again.
After calling the function, I can respond based on its output, just like last time.
My response should only include the <function_call> if I'm deciding to call a function. Otherwise, keep it clear and not verbose.
Finally, structure the response with the <function_call> tags as specified.
Decision: Call the function.First, the user is asking again: "What's the best language to learn?" This is the same as the previous query.
I have the available function: "best_language_to_learn", which gets the best language to learn. It has no required parameters, so I can call it directly.
The system prompt says: "You must use tools whenever possible." And specifically, "If the user's query can be addressed using an available function, and all required parameters are provided or obviously inferable, trigger a function call."
This query matches the function perfectly, so I should call the function.
In the previous interaction, I called the function, and it returned "Ruby". Now, in this new turn, I need to do the same.
The prompt also says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information."
Since the user is asking again, I should call the function again to get the information fresh.
After calling the function, I need to respond based on its output, just like last time.
So, my response should include the <function_call> tag with the appropriate JSON.
Finally, structure my output: Only use the <function_call> tag if I'm calling a function, which I am.
Response format: <function_call>{"action": "best_language_to_learn", "action_input": {}}</function_call>
That seems correct.First, the user is asking again: "Tell me again: what's the best language to learn?"
This is the same query as before, which directly maps to the available function: "best_language_to_learn".
The system prompt instructs me to use tools whenever possible, and if the query can be addressed using an available function, I should trigger a function call.
In previous interactions, I've called this function twice, and each time it responded with "Ruby".
Now, in this multi-turn conversation, I need to continue calling relevant functions to advance the user's request.
The function call should be in the specified JSON format within <function_call> tags.
After the function call, the response from the function is provided (in this simulation, it's "Ruby"), and then I need to respond based on that.
In the conversation history:
- First response: I called the function, and it returned "Ruby". Then I said: "Based on the tool's response, the best language to learn is Ruby."
- Second response: The user asked again, I called the function again.
In the provided history, for the second user message, I directly called the function without additional text, and then it shows the function response.
In my last response in the history: For the second user message, I have:
Assistant: <function_call>{...}</function_call>
Then: Function: Ruby
And now, in this new turn: Assistant: <function_call>{...}</function_call>
Function: Ruby
The system is simulating the function call and response.
For this new user message, I should do the same: call the function since it's the logical next step.
The prompt says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information."
Is the request fully fulfilled? The user is asking again, so perhaps not. But each time, calling the function will provide the answer.
In the first response, I added a summary: "Based on the tool's response, the best language to learn is Ruby."
In the second response, I just called the function without additional text.
Looking back:
In the initial history:
After first function call and response, I said: "Based on the tool's response, the best language to learn is Ruby."
For the second: I directly called it again.
In the provided text:
Human: Tell me again: what's the best language to learn?
Assistant: <function_call>{...}</function_call>
Function: Ruby
Assistant: <function_call>{...}</function_call>
Function: Ruby
This seems like it's already responding with another function call.
For my response now, I need to continue.
The last part is: "Assistant: <function_call>{...}</function_call> Function: Ruby"
Now, for the next response, since the user hasn't said anything new, I should call the function again if that's the pattern.
The user's message is: "Tell me again: what's the best language to learn?"
So, directly, I should call the function.
In my previous full response, I included a summary after the first call.
The system prompt says: "You are only permitted to use the <function_call> tag in your response only if you have made the decision to call a function."
It doesn't specify to add extra text unless necessary.
In the history, for the second call, it just has the function call.
To be consistent, I should just output the function call.
After the function responds, I might need to provide a final answer, but in this setup, it seems the assistant is only outputting the function call, and the response is handled separately.
In the very first assistant response, after the function call and its response, the assistant added: "Based on the tool's response, the best language to learn is Ruby."
For subsequent ones, it's not.
I need to check if this is the end or not.
The prompt says: "continue calling relevant functions until the user's request is fully fulfilled"
Since the user is asking again, I'll call it once more.
My response should be: <function_call>{...}</function_call>
That's it.First, the user is asking again: "Tell me again: what's the best language to learn?"
This is the same query as before, which directly maps to the available function: "best_language_to_learn".
The system prompt instructs me to use tools whenever possible. Specifically, if the user's query can be addressed using an available function, I should trigger a function call.
In previous interactions, I've been calling this function repeatedly, and it's returning "Ruby" each time.
The prompt says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function."
Since this is a repeat question, the logical next step is to call the same function again to get the information.
After calling the function, I need to respond based on the output, as seen in the history.
In the conversation history:
- First response: I called the function.
- Then, the function output was "Ruby".
- I responded with that.
- Second user query: I called the function again.
- Function output: "Ruby".
- But in the history, it shows another function call immediately after, which seems like a loop.
Looking at the provided history:
- After the second function call, it says: Function: Ruby
- Then another <function_call> for the same function.
This might be an error in the simulation, but in my response, I need to follow the pattern.
In the last part: "Assistant: <function_call>{...}</function_call> Function: Ruby"
And then it continues with another one.
For my response, I should just handle the current user query.
So, for this query: "Tell me again: what's the best language to learn?"
I should call the function again.
After the function call, the output is given as "Function: Ruby", and then in the history, it loops, but I think that's part of the example.
In my actual response, I need to output the function call if that's the next step.
Finally, the system prompt says: "You can use multiple functions or multiple times the same function, for this use multiple <function_call> tags."
So, it's fine to call it again.
Therefore, my response should be: <function_call>{"action": "best_language_to_learn", "action_input": {}}</function_call>
And that's it for now. The rest will be handled subsequently.First, the user is asking again: "Tell me again: what's the best language to learn?"
This is the same query as before, which directly matches the available function: "best_language_to_learn".
The system prompt instructs me to use tools whenever possible, and to call functions decisively if they can advance the user's request.
Since this function has no required parameters, I can call it directly.
In previous interactions, every time this question is asked, I've called the function, and it responds with "Ruby".
The prompt says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information to call a function."
The request isn't fully fulfilled yet because the user is asking again, so I should call the function again.
Finally, my response should only include the <function_call> if I'm deciding to call a function, which I am.
So, I need to output: <function_call>{"action": "best_language_to_learn", "action_input": {}}</function_call>
After that, the system will simulate the function response, but I don't handle that in my response.First, the user is asking again: "Tell me again: what's the best language to learn?"
This is the same query as before, which directly maps to the available function: "best_language_to_learn".
The system prompt instructs: "You must use tools whenever possible." And specifically, if the user's query can be addressed using an available function, I should trigger a function call.
In previous interactions, I've been calling this function repeatedly for the same query.
The prompt says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function."
Since this is a repeat of the previous request, the logical next step is to call the function again to get the information.
After each function call, the response from the function is "Ruby", and then in the conversation history, I'm seeing that I'm continuing to call it.
In the conversation history provided:
- First response: I called the function.
- Then, "Function: Ruby"
- My next response: I summarized it.
- Human asks again.
- I called it again.
- And so on.
But in the last part, it shows multiple calls without my intermediate responses.
For this turn, I need to respond to the user's latest message.
The user's message is identical to the previous one, so I should call the function again.
Finally, structure my response: If I'm calling a function, I use the <function_call> tags.
So, my response should be: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>
That's straightforward. No need for additional text unless I'm not calling a function.
The prompt says: "You are only permitted to use the <function_call> tag in your response only if you have made the decision to call a function."
I'm deciding to call it.First, the user is asking again: "Tell me again: what's the best language to learn?" This is the same query as before.
From the system prompt: "You must use tools whenever possible." And specifically, "If the user's query can be addressed using an available function, and all required parameters are provided or obviously inferable, trigger a function call."
The available function is "best_language_to_learn", which directly matches this query. It has no required parameters, so I can call it directly.
In previous interactions, every time this question is asked, I call the function, and it responds with "Ruby".
The conversation history shows a pattern: each time the user asks, I call the function, and then the function responds.
My response structure: I need to use the <function_call> tag if I'm deciding to call a function.
Since this is a repeat query, the logical next step is to call the same function again.
The system prompt says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information."
The request isn't fully fulfilled yet in this turn, as I need to fetch the information again as per the tool.
So, I should respond with a <function_call> for "best_language_to_learn".
Finally, format it exactly as specified: <function_call>{"action": "best_language_to_learn", "action_input": {}}</function_call>
My response should only contain this if that's the decision, which it is.First, the user is asking again: "Tell me again: what's the best language to learn?" This is the same query as before.
From the system prompt, I must use tools whenever possible. The available function is "best_language_to_learn", which directly addresses this query.
In previous interactions, every time this question is asked, I call the function, and it responds with "Ruby".
The prompt says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function."
Since this is a repeat query, the logical next step is to call the same function again.
The conversation history shows that after each function call, the response is "Ruby", and then I might respond based on that, but in the last few turns, it's just looping with function calls.
Looking at the history:
- Human asks, I call function.
- Function responds: Ruby
- Then in the next human turn, I call again, and so on.
In my responses, after the function call, sometimes I add a message like "Based on the tool's response, the best language to learn is Ruby.", but in later ones, it's just the function call.
In the very first response, I said: "Based on the tool's response, the best language to learn is Ruby."
But in subsequent ones, it's just the function call without additional text.
The prompt says: "You are only permitted to use the <function_call> tag in your response only if you have made the decision to call a function."
It doesn't prohibit other text, but in the examples, when I decide to call a function, I just output the <function_call> tag.
However, in the initial response, I had both the function call and then a follow-up.
To keep it clean, I should only output the function call when that's the next step.
The system prompt: "Keep your response to user clear; please do not make your response verbose!"
So, for this turn, since the query matches the function, I should just call the function.
The pattern in the history is to directly respond with <function_call>.
Therefore, my response should be: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>
After that, presumably, the function will respond with "Ruby", and the conversation continues.
I shouldn't add extra text unless necessary.
Final decision: Call the function.First, the user is asking again: "Tell me again: what's the best language to learn?" This is the same query as before.
From the system prompt, I must use tools whenever possible. The available function is "best_language_to_learn", which directly addresses this query.
In previous interactions, I've been calling this function repeatedly, and it's returning "Ruby" each time.
The prompt says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function." This query can be advanced by calling the function.
It also says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information to call a function."
The request isn't fully fulfilled yet because the user is asking again, so I should call the function again.
Finally, after the function call, the response is provided as "Function: Ruby", and then I need to respond based on that.
In my response structure, I should output the function call if I'm deciding to use it.
So, my response should be: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>
Then, presumably, the system will simulate the function response.
In the conversation history, after each function call, there's a "Function: Ruby" response, and then another function call in the next assistant turn.
Looking at the history:
- Human asks, Assistant calls function.
- Function responds: Ruby
- Assistant says: Based on the tool's response, etc.
In the very first one, after the function call, the Assistant says: "Based on the tool's response, the best language to learn is Ruby."
But in subsequent ones, it's just calling the function again without that summary.
In the history provided:
After the first function call and response, Assistant says: "Based on the tool's response, the best language to learn is Ruby."
Then, for the next human query, Assistant directly calls the function again.
In the message you provided, it shows multiple back-and-forth without the summary in between.
Perhaps I need to follow the initial pattern.
The system prompt says: "You can use multiple functions or multiple times the same function, for this use multiple <function_call> tags."
In this case, since the user is asking again, I should just call it again.
To keep it consistent, I'll call the function.
My response should only include the <function_call> if I'm calling a function, as per: "You are only permitted to use the <function_call> tag in your response only if you have made the decision to call a function."
So, for this turn, I'm deciding to call it.First, the user is asking: "Tell me again: what's the best language to learn?" This is the same query as before.
From the system prompt, I must use tools whenever possible. The available function is "best_language_to_learn", which directly addresses this query.
In previous interactions, every time this question is asked, I call the function, and it responds with "Ruby".
The prompt says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function."
Since this is a repeat of the previous request, the logical next step is to call the same function again.
In the conversation history, this pattern has been repeating: user asks, I call the function, it responds with "Ruby", and then I might call it again in subsequent turns.
Looking at the history:
- Human asks, I call function.
- Function responds: Ruby
- Then in the next human message, I call it again, and so on.
In my response structure, after the function call, the system provides the function response, and then I respond based on that.
In the last part of the history: "Assistant: <function_call>...</function_call> Function: Ruby"
Now, for this new user message, I need to respond.
The prompt instructs: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information."
Is the request fully fulfilled? The user is asking again, so perhaps not from their perspective, even though we've answered it multiple times.
To advance it, I should call the function again.
Finally, my response should be: If I'm calling a function, use the <function_call> tags.
So, yes, I should respond with: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>
That's the pattern established.First, the user is asking again: "Tell me again: what's the best language to learn?" This is the same query as before.
My system prompt instructs me to use tools whenever possible. The available function is "best_language_to_learn", which directly addresses this query.
In previous interactions, I've been calling this function repeatedly, and it's responding with "Ruby" each time.
The prompt says: "Call a function decisively if the logical next step to advance the user's request can be achieved by calling a provided function." This fits perfectly.
It also says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information."
Is the request fully fulfilled? The function call provides an answer, but the user is asking again, so I should treat it as a new request and call the function again.
From the history, every time this question is asked, I respond with a function call, and then the function responds with "Ruby".
My response structure should be: If I'm calling a function, use the <function_call> tags.
So, for this turn, I should output: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>
After that, the system will simulate the function response, which is "Ruby", and then I might need to respond based on that, but in the conversation history, it just continues.
In the provided history, after each function call, there's a "Function: Ruby" line, and then the next assistant response is another function call.
Looking closely:
- Assistant: <function_call>...</function_call>
- Function: Ruby
- Assistant: <function_call>...</function_call> (for the next user message)
So, for this user message, I need to output the function call.
Finally, keep my response clear and not verbose. So, just the function call tag.
Response: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>First, the user is asking: "Tell me again: what's the best language to learn?" This is the same query as before.
From the system prompt: "You must use tools whenever possible." And specifically, "If the user's query can be addressed using an available function, and all required parameters are provided or obviously inferable, trigger a function call."
The available function is "best_language_to_learn", which directly matches this query. It has no required parameters, so I can call it directly.
In previous interactions, every time this question is asked, I respond with a function call to "best_language_to_learn", and then the function responds with "Ruby".
The prompt says: "In multi-turn conversations, continue calling relevant functions to advance the user's request until either the user's request is fully fulfilled or you need more information to call a function."
Has the user's request been fully fulfilled? The user is asking again, so perhaps not from their perspective, even though we've answered it multiple times.
To advance the request, I should call the function again, as that's the logical next step.
My response structure: If I'm calling a function, I use the <function_call> tags, and only that in the response if that's the decision.
In the conversation history, after each function call, there's a "Function: Ruby" response, and then I might respond again, but in this case, it seems like the assistant keeps calling the function.
Looking at the history:
- User asks, assistant calls function.
- Function responds with "Ruby", assistant says "Based on the tool's response, the best language to learn is Ruby."
- Then user asks again, assistant calls function again.
In the provided history, after the first call, assistant gives a final response, but in subsequent ones, it just keeps calling.
In the last part: "Assistant: Based on the tool's response, the best language to learn is Ruby."
But then the next user message leads to another function call.
In my response, I need to decide based on the current state.
For this turn: User says "Tell me again: what's the best language to learn?"
I should call the function again, as per the pattern.
So, my response should be: <function_call>{ "action": "best_language_to_learn", "action_input": {} }</function_call>
After that, presumably, the function will respond with "Ruby", and then if needed, I can provide a final answer.
The system prompt says: "You can use multiple functions or multiple times the same function, for this use multiple <function_call> tags."
But in this case, one is sufficient for now.
Finally, keep the response clear and not verbose. So, just the function call.First, the user is asking again: "Tell me again: what's the best language to learn?"
This is the same query as before, so I should use the available tool to respond.
The available function is "best_language_to_learn", which I've been calling repeatedly in the conversation history.
My system prompt instructs: "You must use tools whenever possible." And specifically, "If the user's query can be addressed using an available function, and all required parameters are provided or obviously inferable, trigger a function call."
The function has no required parameters, so I can call it directly.
In the conversation history, every time this question is asked, I respond with a function call, and then the function responds with "Ruby".
So, I should continue this pattern: make a function call to "best_language_to_learn".
After the function call, in the history, there's a response from the function, and then I might need to respond based on that. But in my previous responses, I just make another function call if asked again.
Looking at the history:
- I call the function.
- The function responds with "Ruby".
- Then, if asked again, I call it again.
In the last part of the history, it ends with: "Function: Ruby" and then my response is another function call.
[MANUALLY STOPPED STREAM HERE]
I believe this is safe to skip as it looks like it is not a flaw with RubyLLM's handling but more the way the grok
model is interpreting the function call. We could potentially change the spec / Tool to work around this but I don't think that's appropriate.
I can update the language in the skip to something like:
if provider == :xai
skip 'xAI model infinitely loops calling tools when tool has no parameters' \
'and always provides the same result in multi-turn streaming conversations'
end
@tpaulshippy let me know if you're ok with this approach.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tpaulshippy I've updated the skip
in 293d109
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. It wasn't a blocker for me so I already brought your changes into my fork. Planning a new release soon.
…-turn streaming conversations spec
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor changes!
stream_message = chat.ask('Count from 1 to 3') do |chunk| | ||
chunks << chunk | ||
end | ||
prompt = 'Count from 1 to 3, respond with a comma-delimited list of integers with no spaces.' | ||
stream_message = chat.ask(prompt) { |chunk| chunks << chunk } | ||
|
||
chat = RubyLLM.chat(model: model, provider: provider).with_temperature(0.0) | ||
sync_message = chat.ask('Count from 1 to 3') | ||
sync_message = chat.ask(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why was this change needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When refreshing VCRs, I found very unreliable behaviour between different runs even with temperature at zero. e.g., sometimes an emoji and closing phrase would be added or the format of the 1, 2, 3 would be different. I made this update to make the prompt being used by the test much more declarative / reliable to provide more consistent results. I had this issue not just with xAI but a few other Providers too.
Can I help here in any way? |
No need to be sorry and thank you for your work! 🙏 |
@crmne I've answered your question about the test change and reverted the |
What this does
This PR adds xAI as a model
Provider
, leveraging the xAI OpenAI compatible API endpoints where possible.Type of change
Scope check
Quality check
overcommit --install
and all hooks passmodels.json
,aliases.json
)API changes
Related issues