老师,用opencompass评估7B或者8B模型对显卡要求很高吗?我用3090*2都评估不了,用debug模式一起卡在Starting inference process...那里,换成1.8B的模型很快就可以开始,如果说需要A100一类的显卡,那
后面如果遇到70B+一类的模型又应该如何评估?
python run.py --models hf_deepseek_r1_distill_qwen_7b \
--custom-dataset-path /root/autodl-fs/models/FinCorpus/opencompass_eval.jsonl \
--custom-dataset-data-type mcq \
--custom-dataset-infer-method gen \
--max-out-len 16 \
--hf-num-gpus 2 \
--generation-kwargs do_sample=True temperature=0.6 \
--debug
07/18 17:39:09 - OpenCompass - INFO - Loading hf_deepseek_r1_distill_qwen_7b: /root/autodl-tmp/opencompass/opencompass/configs/./models/deepseek/hf_deepseek_r1_distill_qwen_7b.py
07/18 17:39:09 - OpenCompass - INFO - Loading example: /root/autodl-tmp/opencompass/opencompass/configs/./summarizers/example.py
07/18 17:39:09 - OpenCompass - INFO - Current exp folder: outputs/default/20250718_173909
07/18 17:39:09 - OpenCompass - WARNING - SlurmRunner is not used, so the partition argument is ignored.
07/18 17:39:09 - OpenCompass - INFO - Partitioned into 1 tasks.
07/18 17:39:11 - OpenCompass - WARNING - Only use 1 GPUs for total 2 available GPUs in debug mode.
07/18 17:39:11 - OpenCompass - INFO - Task [deepseek-r1-distill-qwen-7b-hf/opencompass_eval]
Sliding Window Attention is enabled but not implemented for `sdpa`; unexpected results may be encountered.
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [03:55<00:00, 117.90s/it]
07/18 17:43:09 - OpenCompass - INFO - using stop words: ['<|end▁of▁sentence|>']
Map: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1743/1743 [00:00<00:00, 14292.70 examples/s]
07/18 17:43:09 - OpenCompass - INFO - Start inferencing [deepseek-r1-distill-qwen-7b-hf/opencompass_eval]
[2025-07-18 17:43:09,617] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting build dataloader
[2025-07-18 17:43:09,618] [opencompass.openicl.icl_inferencer.icl_gen_inferencer] [INFO] Starting inference process...
0%| | 0/218 [00:00<?, ?it/s]07/18 17:43:09 - OpenCompass - INFO - Generation Args of Huggingface:
07/18 17:43:09 - OpenCompass - INFO - {'stopping_criteria': [<opencompass.models.huggingface_above_v4_33._get_stopping_criteria.<locals>.MultiTokenEOSCriteria object at 0x7fd50750b0d0>], 'max_new_tokens': 16384, 'pad_token_id': 151643}
/root/autodl-tmp/conda/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:631: UserWarning: `do_sample` is set to `False`. However, `temperature` is set to `0.6` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `temperature`.
warnings.warn(
/root/autodl-tmp/conda/envs/opencompass/lib/python3.10/site-packages/transformers/generation/configuration_utils.py:636: UserWarning: `do_sample` is set to `False`. However, `top_p` is set to `0.95` -- this flag is only used in sample-based generation modes. You should set `do_sample=True` or unset `top_p`.
warnings.warn(
0%|
后面就用ctrl+c强制结束了,我试验过两个模型DeepSeek-R1-Distill-Qwen-7B和DeepSeek-R1-Distill-Llama-8B, 都没有成功跑起来
如果需要测试数据集可以从这里取https://pan.baidu.com/s/1-rc8N-ZzkyjIzHPFqs6DkA?pwd=578v 里面的opencompass_eval.jsonl,文件都是我按照框架要求处理过的