feat: add initial version of Online-Offline Co-location (OOC) scheduler. #131

tzh21 · 2025-09-14T07:00:10Z

The initial version of OOC scheduler is mainly based on DisaggPDScheduler, except for handling offline decoding requests on P nodes locally rather than sending to D nodes.
I choose to copy instead of inherit DisaggPDScheduler considering the potential huge gap between OOC scheduler and DisaggPDScheduler.

Copilot

Pull Request Overview

This PR adds an initial version of the Online-Offline Co-location (OOC) scheduler, which enables handling both online and offline decoding requests on prefill nodes (P nodes) locally rather than always sending them to decode nodes (D nodes). The implementation is based on the existing DisaggPDScheduler but copied instead of inherited to allow for significant architectural divergence.

Key changes:

Added new PDOOCScheduler class with offline request handling capabilities
Extended configuration options to support the new OOC mode
Added debug logging for offline request processing

Reviewed Changes

Copilot reviewed 26 out of 27 changed files in this pull request and generated 1 comment.

Show a summary per file

File	Description
xllm/core/scheduler/pd_ooc_scheduler.h	New header defining PDOOCScheduler class with offline request handling
xllm/core/scheduler/pd_ooc_scheduler.cpp	Implementation of PDOOCScheduler with local offline processing logic
xllm/core/scheduler/scheduler_factory.cpp	Factory method updated to create PDOOCScheduler when OOC is enabled
xllm/core/scheduler/continuous_scheduler.h	Added enable_pd_ooc option to scheduler configuration
xllm/core/scheduler/continuous_scheduler.cpp	Added debug logging for offline request tracking
xllm/core/distributed_runtime/*	New PDOOCService RPC service implementation
xllm/proto/disagg_pd.proto	Added PDOOCService protocol definition
xllm/core/common/*	Added enable_pd_ooc flag and option throughout configuration
xllm/pybind/*	Extended Python bindings to expose OOC configuration
xllm/server/*	Added server support for PDOOCService

Comments suppressed due to low confidence (3)

xllm/core/scheduler/pd_ooc_scheduler.cpp:1

The log message incorrectly states that an online request is being put into the offline queue. This should be 'waiting_priority_queue_' based on the context (line 630).

/* Copyright 2025 The xLLM Authors. All Rights Reserved.

xllm/core/scheduler/pd_ooc_scheduler.cpp:1

This debug message suggests an 'unknown' priority strategy, but the else branch handles all non-FCFS strategies normally. The message should be more accurate, such as 'Using non-FCFS priority_strategy' or remove it entirely if this is expected behavior.

/* Copyright 2025 The xLLM Authors. All Rights Reserved.

xllm/core/scheduler/pd_ooc_scheduler.cpp:1

The StepStatus enum is defined but never meaningfully used in the implementation. The step_status member is initialized to IDLE but not updated to reflect actual processing states, making this enum effectively dead code.

/* Copyright 2025 The xLLM Authors. All Rights Reserved.

_{Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.}

Copilot · 2025-09-14T07:00:56Z

xllm/core/scheduler/pd_ooc_scheduler.h

+  static constexpr size_t kOutputTheadNum_ = 128;  // magic num
+  size_t next_thread_idx = 0;
+  ThreadPool output_threadpools_[kOutputTheadNum_];


The variable name has a typo: 'kOutputTheadNum_' should be 'kOutputThreadNum_'.

Suggested change

static constexpr size_t kOutputTheadNum_ = 128; // magic num

size_t next_thread_idx = 0;

ThreadPool output_threadpools_[kOutputTheadNum_];

static constexpr size_t kOutputThreadNum_ = 128; // magic num

size_t next_thread_idx = 0;

ThreadPool output_threadpools_[kOutputThreadNum_];

liutongxuan · 2025-09-15T11:09:32Z

xllm/core/distributed_runtime/pd_ooc_service.h

+  virtual ~PDOOCService() = default;
+
+  // for prefill recv decode response
+  void Generation(::google::protobuf::RpcController* controller,


函数命名要小写，需要和其他代码一致

函数命名要小写，需要和其他代码一致

This function inherits from the proto::PDOOCService and its name cannot be modified.

liutongxuan · 2025-09-15T11:10:47Z

xllm/core/distributed_runtime/pd_ooc_service_impl.cpp

+PDOOCServiceImpl::PDOOCServiceImpl(PDOOCScheduler* scheduler, Engine* engine)
+    : PDOOCServiceImplInterface(), scheduler_(scheduler), engine_(engine) {}
+
+std::shared_ptr<Request> PDOOCServiceImpl::generate_request(


Shouldn't this function be more appropriately placed in request builder?

liutongxuan · 2025-09-15T11:11:56Z

xllm/core/scheduler/continuous_scheduler.cpp

    std::shared_ptr<Request> request = *it;
    request->update_connection_status();
    if (request->finished() || request->cancelled()) {
+      // DVLOG << "Found a finished request in running_requests_";


delete useless comments.

liutongxuan · 2025-09-15T11:14:54Z

xllm/core/scheduler/pd_ooc_scheduler.cpp

+  std::shared_ptr<Request> request = nullptr;
+  int request_thread_idx = -1;
+  {
+    std::lock_guard<std::mutex> lock(remote_requests_map_mutex_);


It is necessary to create a concurrent map/thread_local map to encapsulate the mutex and two maps.

…opensource#115)

…uler and DisaggPDService.

…ent resources.

…g sufficient resources

yq33victor · 2025-09-16T06:53:01Z

xllm/core/distributed_runtime/pd_ooc_service_impl.h

+  PDOOCServiceImplInterface() = default;
+  virtual ~PDOOCServiceImplInterface() = default;
+
+  virtual void decode_recv_new_requests(const proto::DisaggRequests* request,


nit: we'd best to use abstract function in Interface class. And class DisaggPDServiceImplInterface also need be modified .

example: virtual void decode_recv_new_requests(...) = 0;

yq33victor · 2025-09-16T07:03:15Z

xllm/core/distributed_runtime/pd_ooc_service_impl.cpp

+    const proto::DisaggRequest& req) {
+  // create a new request
+  // TODO: Should to support best_of > 1 case, now we only consider
+  // to allocate blocks for the first sequence in the request.


the function is same to DisaggPDServiceImpl::generate_request. maybe we need to make it as a common util function.

yq33victor · 2025-09-16T07:12:15Z

xllm/core/distributed_runtime/pd_ooc_service_impl.cpp

+void PDOOCServiceImpl::decode_recv_new_requests(
+    const proto::DisaggRequests* request,
+    proto::DisaggResponses* response) {
+  // link prefill cluster


the function is also same to DisaggPDServiceImpl::decode_recv_new_requests.

ok, maybe we need to create a base class like: PDServiceImpl, we implement some common function like: generate_request, decode_recv_new_requests ...

class DisaggPDServiceImpl and PDOOCServiceImpl should inherits from PDServiceImpl.

yq33victor · 2025-09-16T07:13:35Z

xllm/core/distributed_runtime/pd_ooc_service_impl.cpp

+
+// TODO: support embedding later, now we only support tokens
+void PDOOCServiceImpl::decode_recv_first_generation(
+    const proto::DisaggGenerations* request,


same as above.

yq33victor · 2025-09-16T07:14:26Z

xllm/core/distributed_runtime/pd_ooc_service_impl.cpp

+}
+
+bool PDOOCServiceImpl::prefill_recv_generation(
+    const proto::DisaggStreamGeneration* request,


same as above.

yq33victor · 2025-09-16T07:14:54Z

xllm/core/distributed_runtime/pd_ooc_service_impl.cpp

+}
+
+void PDOOCServiceImpl::prefill_recv_generations(
+    const proto::DisaggStreamGenerations* requests,


same as above

yq33victor · 2025-09-16T07:25:37Z

xllm/core/framework/request/sequence.cpp


+std::vector<Token> Sequence::get_generated_tokens() const {
+  std::vector<Token> generated_tokens;
+


we can use Slice type to instead vector. like Slice<int32_t> tokens() const { return {tokens_, num_tokens_}; }
or use const std::vector<Token>&, these both will aovoid copy tokens.

yq33victor · 2025-09-16T07:26:38Z

xllm/core/scheduler/continuous_scheduler.cpp


+    // if (request->offline()) {
+    //   DVLOG << "Read an offline request from request_queue_";
+    // }


delete useless comments.

yq33victor · 2025-09-16T07:27:38Z

xllm/core/scheduler/continuous_scheduler.cpp

  GAUGE_SET(num_free_blocks, util::max(block_manager_pool_->num_free_blocks()));
  GAUGE_SET(num_used_blocks, util::min(block_manager_pool_->num_used_blocks()));
+  if (!batches[0].empty()) {
+    DVLOG << "Built a batch";


It seems that it's not needed either.

yq33victor · 2025-09-16T07:31:05Z

xllm/core/scheduler/pd_ooc_scheduler.h

+
+enum class StepStatus { ONLINE_PREFILL, OFFLINE_PREFILL, OFFLINE_DECODE, IDLE };
+
+class PDOOCScheduler : public ContinuousScheduler {


nit: maybe we can add some descriptions comments here to introduce the PPC Scheduler. :)

yq33victor · 2025-09-16T07:37:17Z

xllm/core/scheduler/pd_ooc_scheduler.h

+  virtual ~PDOOCScheduler();
+
+  virtual uint32_t get_waiting_requests_num() const override {
+    return waiting_priority_queue_.size();


note that: currently we have two waiting queue

RequestPriorityQueue waiting_priority_queue_; RequestPriorityQueue waiting_priority_queue_offline_;

yq33victor · 2025-09-16T07:48:33Z

xllm/core/scheduler/pd_ooc_scheduler.h

+                          std::vector<std::shared_ptr<Request>>,
+                          std::function<bool(const std::shared_ptr<Request>&,
+                                             const std::shared_ptr<Request>&)>>;
+  RequestPriorityQueue waiting_priority_queue_;


waiting_priority_queue_ and waiting_priority_queue_offline_ have beed defined in class ContinuousScheduler, the protected member can be used here.

yq33victor · 2025-09-16T07:49:47Z

xllm/core/scheduler/pd_ooc_scheduler.h

+  // thread.
+  std::unordered_map<proto::PDOOCService_Stub*, size_t>
+      remote_prefill_thread_map_;
+  size_t next_prefill_thread_idx = 0;


nit: next_prefill_thread_idx -> next_prefill_thread_idx_

yq33victor · 2025-09-16T08:09:31Z

xllm/core/scheduler/pd_ooc_scheduler.cpp

+// TODO: maybe we should consider update info case even if info already exists
+// in local.
+bool PDOOCScheduler::check_remote_instance_info(
+    const std::string& instance_name) {


like service impl, maybe we need to create a base class for PDOOCScheduler and DisaggPDScheduler.

same function to DisaggPDScheduler::check_remote_instance_info(...)

yq33victor · 2025-09-16T08:11:26Z

xllm/core/scheduler/pd_ooc_scheduler.cpp

+
+proto::PDOOCService_Stub* PDOOCScheduler::create_rpc_channel(
+    const std::string& instance_name) {
+  std::lock_guard<std::mutex> lock(instance_channel_map_mutex_);


same as above.

Except for this line code.
proto::PDOOCService_Stub* stub = new proto::PDOOCService_Stub(channel);

XuZhang99 · 2025-09-22T02:53:05Z

.gitignore

+# local files
+/local
+
+CLAUDE.md


I have no idea how these files created.

tzh21 requested review from JimHsiung, yq33victor and Copilot September 14, 2025 07:00

Copilot AI reviewed Sep 14, 2025

View reviewed changes

tzh21 force-pushed the features/OOC branch from 4655394 to 6a85bea Compare September 15, 2025 05:04

liutongxuan reviewed Sep 15, 2025

View reviewed changes

tzh21 force-pushed the features/OOC branch from 1a1b3eb to 693ff95 Compare September 15, 2025 16:33

wly-115 and others added 5 commits September 16, 2025 10:58

bugfix: disable enable_schedule_overlap when enable VLM backend. (jd-…

f3183f1

…opensource#115)

feat: create PDOOCScheduler and PDOOCService by copying DisaggPDSched…

6f09e6a

…uler and DisaggPDService.

feat: handle offline requests locally at prefill nodes.

ac743d8

[WIP] feat: D node sends a pull signal to P node while having suffici…

a32e5a8

…ent resources.

feat: D nodes pull offline decoding requests from P nodes while havin…

3a29e00

…g sufficient resources

yq33victor reviewed Sep 16, 2025

View reviewed changes

tzh21 force-pushed the features/OOC branch from 693ff95 to 3a29e00 Compare September 18, 2025 11:23

tzh21 added 2 commits September 20, 2025 18:01

[WIP] feat: add step status and interruption (but not working)

7487260

feat: implement interruption on P node

49d9e0c

XuZhang99 reviewed Sep 22, 2025

View reviewed changes

.gitignore

# local files

/local

CLAUDE.md

Copy link

Collaborator

XuZhang99 Sep 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how these files created.


		std::vector<Token> Sequence::get_generated_tokens() const {
		std::vector<Token> generated_tokens;


		enum class StepStatus { ONLINE_PREFILL, OFFLINE_PREFILL, OFFLINE_DECODE, IDLE };

		class PDOOCScheduler : public ContinuousScheduler {

feat: add initial version of Online-Offline Co-location (OOC) scheduler. #131

Are you sure you want to change the base?

feat: add initial version of Online-Offline Co-location (OOC) scheduler. #131

Uh oh!

Conversation

tzh21 commented Sep 14, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Sep 14, 2025

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

liutongxuan Sep 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

liutongxuan Sep 15, 2025 •

edited

Loading