Hugging Face Transformers库由三大核心模块构成:
Tokenizer:将文本转换为模型可识别的数字张量,支持自动填充(Padding)和截断(Truncation)
Model:提供预训练模型的加载接口,支持BERT/GPT/T5等主流架构
Pipeline:封装文本分类/生成/问答等任务的端到端流程
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("gpt2") model = AutoModelForCausalLM.from_pretrained("gpt2")
使用混合精度推理加速:
with torch.cuda.amp.autocast(): outputs = model.generate(**inputs, max_length=100)
将PyTorch模型导出为ONNX格式:
torch.onnx.export(model, dummy_input, "model.onnx", opset_version=13)
部署后推理速度提升2-3倍
2.2 Triton推理服务器
构建Docker镜像部署服务:
FROM nvcr.io/nvidia/tritonserver:22.07-py3 COPY model_repository /models
from datasets import load_dataset dataset = load_dataset("imdb") \ .filter(lambda x: len(x["text"]) > 50) \ .map(remove_html_tags)
def back_translate(example): en_text = translator(example['text'], src='zh', tgt='en') return {'text': translator(en_text, src='en', tgt='zh')} augmented_dataset = dataset.map(back_translate)
// ds_config.json { "zero_optimization": { "stage": 3, "offload_optimizer": { "device": "cpu" } } }
4.2 多机训练启动命令
deepspeed --num_nodes 4 --num_gpus 8 train.py
from peft import LoraConfig lora_config = LoraConfig( r=8, target_modules=["query_key_value"], lora_alpha=32 )
training_args = TrainingArguments( per_device_train_batch_size=4, gradient_accumulation_steps=8, learning_rate=2e-5, fp16=True )
from mergekit import merge merged_model = merge([model1, model2], weights=[0.7, 0.3], method="linear")
使用Core ML转换工具:
python -m transformers.onnx --model=distilbert-base-uncased \ --feature=sequence-classification \ coreml/
model = AutoModelForCausalLM.from_pretrained( "gpt2", load_in_8bit=True, device_map="auto" )
from optimum.gptq import GPTQQuantizer quantizer = GPTQQuantizer(bits=4, dataset="c4") quantized_model = quantizer.quantize(model)
student_outputs = student_model(**inputs) loss = distillation_loss( student_outputs.logits, teacher_outputs.logits, temperature=2.0 )
from evaluate import load bleu = load("bleu") score = bleu.compute( predictions=preds, references=refs )
使用HuggingFace的Ethics评估套件:
from evaluate import EvaluationSuite suite = EvaluationSuite.load("ethics") results = suite.run(model)