-
-
Notifications
You must be signed in to change notification settings - Fork 273
Red Candle Provider #404
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Red Candle Provider #404
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do see 4 rubocop offenses. Can you clean those up?
Love the test helper.
# Red Candle doesn't provide token counts, but we can estimate them | ||
content = result[:content] | ||
# Rough estimation: ~4 characters per token | ||
estimated_output_tokens = (content.length / 4.0).round |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just for funsies?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I noticed a few things like this (infinity tokens per dollar), that I don't see in the ollama provider. While adding these lines of code may have value, I'm not really seeing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're definitely right about the infinity tokens per dollar, that was a little too cute. I removed the whole pricing bit (I don't think it's necessary).
As for the estimated_output_tokens
, the specs require this in these two places:
- https://github.com/crmne/ruby_llm/blob/main/spec/ruby_llm/chat_spec.rb#L18-L19
- https://github.com/crmne/ruby_llm/blob/main/spec/ruby_llm/chat_streaming_spec.rb#L43-L44
We can get real token counts from red-candle
but we need to retokenize which seems wasteful (and I couldn't figure out how to reasonably get access to the underlying Candle::LLM
right here), so we decided to estimate. I'm open to other methods of estimating, we could split on a regex or something, this just seemed simple and efficient.
end | ||
|
||
def render_payload(messages, tools:, temperature:, model:, stream:, schema:) # rubocop:disable Metrics/ParameterLists | ||
# Red Candle doesn't support tools |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sad. At least it has structured generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a planned red-candle feature - just not there yet.
spec/spec_helper.rb
Outdated
require_relative 'support/streaming_error_helpers' | ||
require_relative 'support/provider_capabilities_helper' | ||
|
||
# Handle Red Candle provider based on availability and environment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
May consider putting this in a separate file to follow the pattern set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Looks like this won't work with Ruby 3.1 (which is currently part of the CI for RubyLLM). Probably need to figure this out. I tried it out though - love it! |
@tpaulshippy Thank you for the review and the feedback! We've made some changes and I think this is ready for another look. |
This is unfortunately a blocker. I'm not gonna drop Ruby 3.1 support soon as I know many users of RubyLLM are still running that. I think this patch may need to wait. |
9ab992d resolved this. |
@crmne with the 3.1 blocker removed (turned out to be already working and just needed testing) is there anything else keeping this from moving forward in your opinion? |
What this does
This PR adds support for the Red Candle provider, enabling local LLM execution using quantized GGUF models directly in Ruby without requiring external API calls.
Key Implementation Details
Red Candle is fundamentally different from other providers: While all other RubyLLM providers communicate via HTTP APIs, Red Candle runs models locally using the Candle Rust crate. This brings true local inference to Ruby, with no network
latency or API costs.
Dependency Management
Since Red Candle requires a Rust toolchain at build time, we've made it optional at two levels:
red-candle
is NOT agemspec
dependency. Users must explicitly add gem 'red-candle' to theirGemfile
to use this provider.bundle config set --local with red_candle
.Testing Strategy
We implemented a comprehensive mocking system to keep tests fast:
MockCandleModel
to simulate responses without actual inferenceRED_CANDLE_REAL_INFERENCE=true
to run actual model inference (downloads models on first run, ~4.5 GBs)Changes Made
RubyLLM::Providers::RedCandle
with full chat support including streamingred_candle_test_helper.rb
ruby_llm.rb
andspec_helper.rb
to handle optional dependencymodels_to_test.rb
to conditionally include Red Candle modelsCONTRIBUTING.md
for managing the optional dependencyHow to Test
Once
red-candle
is enabled turn it back off with:bundle config unset with
And turn it BACK on with:
bundle config set --local with red_candle
Try it out
bundle exec irb
Type of change
Scope check
Quality check
overcommit --install
and all hooks passbundle exec rake vcr:record[provider_name]
bundle exec rspec
models.json
,aliases.json
)API changes
Related issues
Fixes #394