请教问题:loss在训练过程中不下降可能是什么原因导致的?
1、情况描述:训练到2000epoch,loss一直在3.1-3.2,没有明显下降趋势,是否有训练下去的必要;
2、训练日志片段如下:
08/13 02:39:13 - mmengine - INFO - Iter(train) [ 1910/17010000] lr: 1.1253e-06 eta: 859 days, 19:20:50 time: 19.4571 data_time: 16.2798 memory: 9112 loss: 3.0636 grad_norm: 0.7492
08/13 02:39:45 - mmengine - INFO - Iter(train) [ 1920/17010000] lr: 1.1312e-06 eta: 858 days, 14:46:51 time: 3.2073 data_time: 0.0119 memory: 9121 loss: 3.2175 grad_norm: 0.7430
08/13 02:40:18 - mmengine - INFO - Iter(train) [ 1930/17010000] lr: 1.1370e-06 eta: 857 days, 10:40:29 time: 3.2140 data_time: 0.0119 memory: 9120 loss: 3.2109 grad_norm: 0.7430
08/13 02:40:50 - mmengine - INFO - Iter(train) [ 1940/17010000] lr: 1.1429e-06 eta: 856 days, 7:13:46 time: 3.2293 data_time: 0.0120 memory: 9112 loss: 3.2188 grad_norm: 0.7447
08/13 02:41:22 - mmengine - INFO - Iter(train) [ 1950/17010000] lr: 1.1488e-06 eta: 855 days, 4:15:39 time: 3.2373 data_time: 0.0120 memory: 9113 loss: 3.2292 grad_norm: 0.7447
08/13 02:41:55 - mmengine - INFO - Iter(train) [ 1960/17010000] lr: 1.1547e-06 eta: 854 days, 1:34:26 time: 3.2376 data_time: 0.0119 memory: 9122 loss: 3.1011 grad_norm: 0.7422
08/13 02:42:27 - mmengine - INFO - Iter(train) [ 1970/17010000] lr: 1.1605e-06 eta: 852 days, 23:44:22 time: 3.2619 data_time: 0.0120 memory: 9119 loss: 3.2294 grad_norm: 0.7395
08/13 02:43:00 - mmengine - INFO - Iter(train) [ 1980/17010000] lr: 1.1664e-06 eta: 851 days, 22:35:19 time: 3.2796 data_time: 0.0120 memory: 9118 loss: 3.1764 grad_norm: 0.7395
08/13 02:43:33 - mmengine - INFO - Iter(train) [ 1990/17010000] lr: 1.1723e-06 eta: 850 days, 21:58:56 time: 3.2919 data_time: 0.0124 memory: 9121 loss: 3.0835 grad_norm: 0.7363
08/13 02:44:06 - mmengine - INFO - Exp name: qwen1_5_1_8b_chat_qlora_alpaca_e3_20250813_001901
08/13 02:44:06 - mmengine - INFO - Iter(train) [ 2000/17010000] lr: 1.1782e-06 eta: 849 days, 21:43:03 time: 3.2959 data_time: 0.0120 memory: 9117 loss: 3.1795 grad_norm: 0.7314
3、配置信息
微调模型:Qwen2.5-1.5B-Instruct
训练数据:问题和回复数据都是调用Deepseek生成的情绪对话;
QA数据总量:13700
数据片段如下
