You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository was archived by the owner on Dec 3, 2024. It is now read-only.
Thanks for ur wonderful project!
I use the command python src/train_bash.py --deepspeed /home/xx/llm/conf/deepspeed_config.json --stage sft --model_name_or_path /home/llm/models --do_train --report_to 'tensorboard' --dataset our_dataset --template chatml --finetuning_type full --output_dir /home/xx/llm/models/ --overwrite_cache --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --warmup_ratio 0.20 --save_strategy epoch --evaluation_strategy epoch --num_train_epochs 3 --logging_steps 10 --learning_rate 5e-6 --plot_loss --max_source_length=2048 --max_target_length=2048 --dataloader_num_workers 8 --val_size 0.01 --fp16 --overwrite_output_dir --max_grad_norm 1.0
to finetune the model with our dataset using sft, however, the bug says RuntimeError: One or more of the tensors passed to `gather` were not on the GPU while the `Accelerator` is configured for CUDA. Please move it to the GPU before calling `gather`. as shown in screenshot.
How can I fix it? Looking forward to ur replay, thanks a lot!!!