Can not finetune the model using sft?

Thanks for ur wonderful project! 
I use the command 
```python src/train_bash.py --deepspeed /home/xx/llm/conf/deepspeed_config.json --stage sft --model_name_or_path /home/llm/models --do_train --report_to 'tensorboard' --dataset our_dataset --template chatml --finetuning_type full --output_dir /home/xx/llm/models/ --overwrite_cache --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --warmup_ratio 0.20 --save_strategy epoch --evaluation_strategy epoch --num_train_epochs 3 --logging_steps 10 --learning_rate 5e-6 --plot_loss --max_source_length=2048 --max_target_length=2048 --dataloader_num_workers 8 --val_size 0.01 --fp16 --overwrite_output_dir --max_grad_norm 1.0``` 
to **finetune the model with our dataset using sft**, however, the bug says 
**```RuntimeError: One or more of the tensors passed to `gather` were not on the GPU while the `Accelerator` is configured for CUDA. Please move it to the GPU before calling `gather`.```** as shown in screenshot. 
How can I fix it? Looking forward to ur replay, thanks a lot!!!
![4ac28fa4b960a43867338b09a648092](https://github.com/user-attachments/assets/78c55374-6403-4989-9d81-f698babf3123)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Can not finetune the model using sft? #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Can not finetune the model using sft? #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions