Skip to content
This repository was archived by the owner on Dec 3, 2024. It is now read-only.
This repository was archived by the owner on Dec 3, 2024. It is now read-only.

Can not finetune the model using sft? #5

@liuhe1305

Description

@liuhe1305

Thanks for ur wonderful project!
I use the command
python src/train_bash.py --deepspeed /home/xx/llm/conf/deepspeed_config.json --stage sft --model_name_or_path /home/llm/models --do_train --report_to 'tensorboard' --dataset our_dataset --template chatml --finetuning_type full --output_dir /home/xx/llm/models/ --overwrite_cache --per_device_train_batch_size 8 --per_device_eval_batch_size 8 --gradient_accumulation_steps 4 --lr_scheduler_type cosine --warmup_ratio 0.20 --save_strategy epoch --evaluation_strategy epoch --num_train_epochs 3 --logging_steps 10 --learning_rate 5e-6 --plot_loss --max_source_length=2048 --max_target_length=2048 --dataloader_num_workers 8 --val_size 0.01 --fp16 --overwrite_output_dir --max_grad_norm 1.0
to finetune the model with our dataset using sft, however, the bug says
RuntimeError: One or more of the tensors passed to `gather` were not on the GPU while the `Accelerator` is configured for CUDA. Please move it to the GPU before calling `gather`. as shown in screenshot.
How can I fix it? Looking forward to ur replay, thanks a lot!!!
4ac28fa4b960a43867338b09a648092

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions