Hugging Face Transformers库由三大核心模块构成:

Tokenizer:将文本转换为模型可识别的数字张量,支持自动填充(Padding)和截断(Truncation)
Model:提供预训练模型的加载接口,支持BERT/GPT/T5等主流架构
Pipeline:封装文本分类/生成/问答等任务的端到端流程
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("gpt2")
model = AutoModelForCausalLM.from_pretrained("gpt2")使用混合精度推理加速:
with torch.cuda.amp.autocast(): outputs = model.generate(**inputs, max_length=100)
将PyTorch模型导出为ONNX格式:
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=13)
部署后推理速度提升2-3倍
2.2 Triton推理服务器
构建Docker镜像部署服务:
FROM nvcr.io/nvidia/tritonserver:22.07-py3 COPY model_repository /models
from datasets import load_dataset
dataset = load_dataset("imdb") \
.filter(lambda x: len(x["text"]) > 50) \
.map(remove_html_tags)def back_translate(example):
en_text = translator(example['text'], src='zh', tgt='en')
return {'text': translator(en_text, src='en', tgt='zh')}
augmented_dataset = dataset.map(back_translate)// ds_config.json
{
"zero_optimization": {
"stage": 3,
"offload_optimizer": {
"device": "cpu"
}
}
}4.2 多机训练启动命令
deepspeed --num_nodes 4 --num_gpus 8 train.py
from peft import LoraConfig lora_config = LoraConfig( r=8, target_modules=["query_key_value"], lora_alpha=32 )
training_args = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=8, learning_rate=2e-5, fp16=True )
from mergekit import merge merged_model = merge([model1, model2], weights=[0.7, 0.3], method="linear")
使用Core ML转换工具:
python -m transformers.onnx --model=distilbert-base-uncased \ --feature=sequence-classification \ coreml/
model = AutoModelForCausalLM.from_pretrained( "gpt2", load_in_8bit=True, device_map="auto" )
from optimum.gptq import GPTQQuantizer quantizer = GPTQQuantizer(bits=4, dataset="c4") quantized_model = quantizer.quantize(model)
student_outputs = student_model(**inputs) loss = distillation_loss( student_outputs.logits, teacher_outputs.logits, temperature=2.0 )
from evaluate import load
bleu = load("bleu")
score = bleu.compute(
predictions=preds,
references=refs
)使用HuggingFace的Ethics评估套件:
from evaluate import EvaluationSuite
suite = EvaluationSuite.load("ethics")
results = suite.run(model)