执行命令下面命令后:
deepspeed --num_gpus=2 xtuner train qwen1_5_4b_chat_qlora_alpaca_e3.py --deepspeed deepspeed_zero2
报错:
[2025-08-22 20:57:15,242] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-08-22 20:57:17,020] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
[2025-08-22 20:57:17,619] [WARNING] [runner.py:220:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only.
[2025-08-22 20:57:17,619] [INFO] [runner.py:610:main] cmd = /root/autodl-tmp/xtunerenv/bin/python3.10 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMCwgMV19 --master_addr=127.0.0.1 --master_port=29500 --enable_each_rank_log=None xtuner train qwen1_5_4b_chat_qlora_alpaca_e3.py --deepspeed deepspeed_zero2
[2025-08-22 20:57:18,856] [INFO] [real_accelerator.py:260:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[2025-08-22 20:57:20,645] [INFO] [logging.py:107:log_dist] [Rank -1] [TorchCheckpointEngine] Initialized with serialization = False
[2025-08-22 20:57:21,243] [INFO] [launch.py:146:main] WORLD INFO DICT: {'localhost': [0, 1]}
[2025-08-22 20:57:21,243] [INFO] [launch.py:152:main] nnodes=1, num_local_procs=2, node_rank=0
[2025-08-22 20:57:21,243] [INFO] [launch.py:163:main] global_rank_mapping=defaultdict(<class 'list'>, {'localhost': [0, 1]})
[2025-08-22 20:57:21,243] [INFO] [launch.py:164:main] dist_world_size=2
[2025-08-22 20:57:21,243] [INFO] [launch.py:168:main] Setting CUDA_VISIBLE_DEVICES=0,1
[2025-08-22 20:57:21,243] [INFO] [launch.py:256:main] process 12257 spawned with command: ['/root/autodl-tmp/xtunerenv/bin/python3.10', '-u', 'xtuner', '--local_rank=0', 'train', 'qwen1_5_4b_chat_qlora_alpaca_e3.py', '--deepspeed', 'deepspeed_zero2']
[2025-08-22 20:57:21,244] [INFO] [launch.py:256:main] process 12258 spawned with command: ['/root/autodl-tmp/xtunerenv/bin/python3.10', '-u', 'xtuner', '--local_rank=1', 'train', 'qwen1_5_4b_chat_qlora_alpaca_e3.py', '--deepspeed', 'deepspeed_zero2']
/root/autodl-tmp/xtunerenv/bin/python3.10: can't find '__main__' module in '/root/autodl-tmp/xtuner/xtuner'
/root/autodl-tmp/xtunerenv/bin/python3.10: can't find '__main__' module in '/root/autodl-tmp/xtuner/xtuner'
[2025-08-22 20:57:22,245] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 12257
[2025-08-22 20:57:22,247] [INFO] [launch.py:319:sigkill_handler] Killing subprocess 12258
[2025-08-22 20:57:22,247] [ERROR] [launch.py:325:sigkill_handler] ['/root/autodl-tmp/xtunerenv/bin/python3.10', '-u', 'xtuner', '--local_rank=1', 'train', 'qwen1_5_4b_chat_qlora_alpaca_e3.py', '--deepspeed', 'deepspeed_zero2'] exits with return code = 1
麻烦各位老师看看这个问题。