stable diffusion微调总结

“ stable diffusion微调总结 ”

发布时间：2024-02-02

今日阅读：0

来源：CSDN

作者：江小皮不皮

stable diffusion微调总结

stable diffusion
模型类别SDSD2SDXLSDXL LCM潜在一致性模型SDXL DistilledSDXL Turbo 安装accelerate通过pip安装配置
模型类别
SD
SD2
SDXLSDXL LCM潜在一致性模型SDXL DistilledSDXL Turbo
SDXL
SDXL LCM潜在一致性模型
SDXL Distilled
SDXL Turbo
安装accelerate
通过pip安装配置
通过pip安装
配置
accelerate config
查看配置安装diffusers数据处理BLIP模型优化微调方法Dreambooth微调准备数据模型训练脚本模型推理模型转换脚本 Dream+LORA微调模型训练脚本模型推理脚本 Full FineTune数据格式训练脚本推理脚本 LORA微调数据格式训练脚本推理脚本
查看配置
查看配置
安装diffusers
数据处理
BLIP模型优化
BLIP
模型优化
微调方法
Dreambooth微调
准备数据模型训练脚本模型推理模型转换脚本
准备数据
模型训练脚本
模型推理
模型转换脚本
Dream+LORA微调
模型训练脚本模型推理脚本
模型训练脚本
模型推理脚本
Full FineTune
数据格式训练脚本推理脚本
数据格式
训练脚本
推理脚本
LORA微调
数据格式训练脚本推理脚本
数据格式
训练脚本
推理脚本

stable diffusion

模型类别

SD

SD是一个基于latent的扩散模型，它在UNet中引入text condition来实现基于文本生成图像。SD的核心来源于Latent Diffusion这个工作，常规的扩散模型是基于pixel的生成模型，而Latent Diffusion是基于latent的生成模型，它先采用一个autoencoder将图像压缩到latent空间，然后用扩散模型来生成图像的latents，最后送入autoencoder的decoder模块就可以得到生成的图像。

SD2

SD 2.0相比SD 1.x版本的主要变动在于模型结构和训练数据两个部分。首先是模型结构方面，SD 1.x版本的text encoder采用的是OpenAI的CLIP ViT-L/14模型，其模型参数量为123.65M而SD 2.0采用了更大的text encoder基于OpenCLIP在laion-2b数据集上训练的CLIP ViT-H/14模型，其参数量为354.03M，相比原来的text encoder模型大了约3倍。

SDXL

Stable Diffusion XL (SDXL) 是一种强大的文本到图像生成模型，它以三种关键方式迭代以前的 Stable Diffusion 模型
UNet 增大了 3 倍，SDXL 将第二个文本编码器 (OpenCLIP ViT-bigG/14) 与原始文本编码器相结合，显着增加了参数数量
引入大小和裁剪调节，以防止训练数据被丢弃，并更好地控制生成图像的裁剪方式
引入两阶段模型过程基本模型也可以作为独立模型运行生成图像作为细化器模型的输入，该**模型添加了额外的高质量细节

SDXL LCM潜在一致性模型

SDXL 潜在一致性模型 LCM 如“潜在一致性模型使用几步推理合成高分辨率图像”中提出的那样，通过减少所需的步骤数彻底改变了图像生成过程。它将原始 SDXL 模型提炼成一个需要更少步骤4 到 8 个而不是 25 到 50 个步骤来生成图像的版本。该模型对于需要在不影响质量的情况下快速生成图像的应用特别有利。值得一提的是，它比原来的 SDXL 小 50%，快 60%。

SDXL Distilled

SDXL Distilled 是指为特定目的而“蒸馏”的 SDXL 模型版本。例如，Segmind 稳定扩散模型 SSD-1B 是 SDXL 的精炼版本，体积缩小了 50%，速度提高了 60%，同时保持了高质量的文本到图像生成功能。此版本对于速度至关重要但图像质量不能受到影响的场景特别有用。

SDXL Turbo

SDXL Turbo 是 SDXL 1.0 的新版本，专为“实时合成”而开发。这意味着它可以非常快速地生成图像，这一功能由一种称为对抗扩散蒸馏 ADD 的新训练方法提供支持。这种变体是独一无二的，因为它具有有损自动编码组件，尽管在图像的编码和解码过程中会导致一些信息丢失，但可以更快地生成图像。

安装accelerate

通过pip安装
pip install accelerate

配置

accelerate config —————————————————————————————————————————————————————————————————————–In which compute environment are you running?在本机服务器上就选择This machine This machine —————————————————————————————————————————————————————————————————————–Which type of machine are you using?单机多卡选择multi-GPU，单卡选第一个选项 multi-GPU How many different machines will you use (use more than 1 for multi-node training)? [1]: 1 几台机器用来训练 Do you wish to optimize your script with torch dynamo?[yes/NO]: Do you want to use DeepSpeed? [yes/NO]: Do you want to use FullyShardedDataParallel? [yes/NO]: Do you want to use Megatron-LM ? [yes/NO]: How many GPU(s) should be used for distributed training? [1]:2 用几张卡 What GPU(s) (by id) should be used for training on this machine as a comma-seperated list? [all]:all 全部都用来训练 —————————————————————————————————————————————————————————————————————–Do you wish to use FP16 or BF16 (mixed precision)? fp16 选择训练精度类型 accelerate configuration saved at /root/.cache/huggingface/accelerate/default_config.yaml 配置文件保存位置，可修改

查看配置

accelerate env

安装diffusers

注意必须从源码安装最新的版本，不然无法通过版本审核。
```
git clone https://github.com/huggingface/diffusers
cd diffusers
pip install -e .
```
数据处理

我们需要筛除分辨率较低，质量较差比如说768*768分辨率的图片< 100kb，存在破损，以及和任务目标无关的数据，接着去除数据里面可能包含的水印，干扰文字等，最后就可以开始进行数据标注了。数据标注可以分为自动标注和手动标注。自动标注主要依赖像BLIP和Waifu Diffusion 1.4这样的模型，手动标注则依赖标注人员。

BLIP

图像字幕开放式视觉问答多模态/单模态特征提取图文匹配数据注意事项
当我们训练人物主题时，一般需要10-20张高质量数据当我们训练画风主题时，需要100-200张高质量数据当我们训练抽象概念时，则至少需要200张以上的数据。
不管是人物主题，画风主题还是抽象概念，一定要保证数据集中数据的多样性比如说猫女姿态，角度，全身半身的多样性。
每个数据都要符合我们的审美和评判标准！模型注意事项 > 1. 底模型的选择至关重要，SDXL LoRA的很多底层能力与基础概念的学习都来自于底模型的能力。并且底模型的优秀能力需要与我们训练的主题，比如说人物，画风或者某个抽象概念相适配。如果我们要训练二次元LoRA，则需要选择二次元底模型，如果我们要训练三次元LoRA，则需要选择三次元底模型，以此类推。模型以savetensor为后缀的是加密的，ckpt是开源的。

模型优化

1.剪枝剪枝后的模型pruned，泛化性好，存储空间小。 2.ema: ema是一种常用的优化神经网络的方法，他可以平滑模型的参数更新，降低模型训练过程中的波动和震荡，增强模型的鲁棒性和泛化能力

微调方法

目前主流训练 Stable Diffusion 模型的方法有
Full FineTune > 全量训练，数据以图片+标注的形式。
Dreambooth > DreamBooth是一种训练技术，通过对某个主题或风格的几张图像进行训练来更新整个扩散模型。它的工作原理是将提示中的特殊单词与示例图像相关联。
Text Inversion > 文本反转是一种训练技术，用于通过一些您希望其学习内容的示例图像来个性化图像生成模型。该技术的工作原理是学习和更新文本嵌入新嵌入与您必须在提示中使用的特殊单词相关联以匹配您提供的示例图像。

LoRA > LoRA大型语言模型的低秩适应是一种流行的轻量级训练技术，可显着减少可训练参数的数量。它的工作原理是向模型中插入较少数量的新权重，并且仅对这些权重进行训练。这使得 LoRA 的训练速度更快、内存效率更高，并产生更小的模型权重几百 MB，更容易存储和共享。LoRA 还可以与 DreamBooth 等其他训练技术相结合，以加速训练。

Dreambooth微调

准备数据

https: //huggingface.co/datasets/diffusers/dog-example

模型训练脚本

export MODEL_NAME="stable-diffusion-2"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="path-to-save-model"
accelerate launch train_dreambooth.py \
--pretrained_model_name_or_path=$MODEL_NAME  \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=768 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--learning_rate=5e-6 \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=400 \

train_dreambooth.py: 脚本位置为https://github.com/huggingface/diffusers/blob/main/examples/dreambooth/train_dreambooth.py pretrained_model_name_or_path: 模型的路径 instance_data_dir 训练的图片位置 output_dir 微调模型保存位置 instance_prompt罕见字符，使用 Stable Diffusion 模型去生成一个已有相关主题class 的先验知识，并在训练中充分考虑原 class 和新 instance 的 prior preservation loss，从而避免新 instance 图片特征渗透到其他生成里。 resolution 图片尺寸，和训练的模型相对应 train_batch_size 训练批次 gradient_accumulation_steps gradient_accumulation_steps通过累计梯度来解决本地显存不足问题。假设原来的batch_size=6，样本总量为24。那么参数更新次数=²⁴⁄₆=4。如果我的显存不够6batch，想调成3batch，那么我的参数更新次数就是=²⁴⁄₃=8次但是我设置了gradient_accumulation_steps=2，batch还是6，但是内部是按照batch=3来算的，计算两次batch=3后进行累计梯度，即batch_size=⁶⁄₂=3，参数更新次数不变=24/3/2=4，在梯度反传时，每gradient_accumulation_steps次进行一次梯度更新，之前照常利用loss.backward()计算梯度。 learning_rate学习率 lr_scheduler 策略 lr_warmup_steps预热的步数 max_train_steps训练步数

模型推理

from diffusers import StableDiffusionPipeline
import torch
model_id = "stable_finetine/path-to-save-model"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16).to("cuda")
prompt = "A photo of dog in a bucket"
image = pipe(prompt, num_inference_steps=50, guidance_scale=7.5).images[0]
image.save("dog-bucket2.png")

模型转换脚本

python  convert_diffusers_to_original_stable_diffusion.py --model_path path-to-save-model --checkpoint_path dreambooth_dog.safetensors --use_safetensors

convert_diffusers_to_original_stable_diffusion.py 脚本位置在https://github.com/huggingface/diffusers/blob/main/scripts/convert_diffusers_to_original_stable_diffusion.py model_path经过dreambooth训练出来的模型 checkpoint_path 自定义命名

Dream+LORA微调

模型训练脚本

export MODEL_NAME="stable-diffusion-2"
export INSTANCE_DIR="dog"
export OUTPUT_DIR="path-to-save-lora-model"
accelerate launch train_dreambooth_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME  \
--instance_data_dir=$INSTANCE_DIR \
--output_dir=$OUTPUT_DIR \
--instance_prompt="a photo of sks dog" \
--resolution=768 \
--train_batch_size=1 \
--gradient_accumulation_steps=1 \
--checkpointing_steps=100 \
--learning_rate=1e-4 \
--report_to="wandb" \
--lr_scheduler="constant" \
--lr_warmup_steps=0 \
--max_train_steps=500 \
--validation_prompt="A photo of sks dog in a bucket" \
--validation_epochs=50 \
--seed="0" \
--mixed_precision "no"

mixed_precision: 默认是fp16，会报错ValueError: Attempting to unscale FP16gradients 需要改成no，则是fp36。 train_dreambooth_lora.py该脚本自带转换模型

模型推理脚本

from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
import torch
pipe = DiffusionPipeline.from_pretrained("stable-diffusion-2", torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.to("cuda")
# pipe.unet.load_attn_procs("path-to-save-lora-model")
pipe.load_lora_weights("path-to-save-lora-model")
image = pipe("A picture of a sks dog in a bucket", num_inference_steps=25).images[0]
image.save("dog-bucket-lora.png")

注意点原文档中是:

pipe.unet.load_attn_procs("path-to-save-lora-model")

但是加载后的推理并没有明显的效果，怀疑根本没有加载到。作者后续更新了新方法，测试新方法有效果

pipe.load_lora_weights("path-to-save-lora-model")

Full FineTune

数据格式

folder/train/metadata.jsonl
folder/train/0001.png
folder/train/0002.png
folder/train/0003.png

metadata.jsonl

{"file_name": "0001.png", "additional_feature": "This is a first value of a text feature you added to your images"}
{"file_name": "0002.png", "additional_feature": "This is a second value of a text feature you added to your images"}
{"file_name": "0003.png", "additional_feature": "This is a third value of a text feature you added to your images"}

或者用huggingface上现成的数据

pokemon-blip-captions
data
train-00000-of-00001-566cc9b19d7203f8.parquet
dataset_infos.json

训练脚本

export MODEL_NAME="stable-diffusion-2"
export DATASET_NAME="pokemon-blip-captions"
accelerate launch --mixed_precision="fp16"  train_full_finetune.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME \
--use_ema \
--resolution=768 --center_crop --random_flip \
--train_batch_size=1 \
--gradient_accumulation_steps=4 \
--gradient_checkpointing \
--max_train_steps=15000 \
--learning_rate=1e-05 \
--max_grad_norm=1 \
--lr_scheduler="constant" --lr_warmup_steps=0 \
--output_dir="sd-pokemon-model"

推理脚本

import torch
from diffusers import StableDiffusionPipeline
model_path = "sd-pokemon-model"
pipe = StableDiffusionPipeline.from_pretrained(model_path, torch_dtype=torch.float16)
pipe.to("cuda")
image = pipe(prompt="a drawing of a pokemon stuffed animal",num_inference_steps=50).images[0]
image.save("yoda-pokemon.png")

LORA微调

数据格式

同full Fine Tune

训练脚本

export MODEL_NAME="stable-diffusion-2"
export DATASET_NAME="pokemon-blip-captions"
accelerate launch --mixed_precision="no" train_lora.py \
--pretrained_model_name_or_path=$MODEL_NAME \
--dataset_name=$DATASET_NAME --caption_column="text" \
--resolution=768 --random_flip \
--train_batch_size=2 \
--num_train_epochs=100 --checkpointing_steps=5000 \
--learning_rate=1e-04 --lr_scheduler="constant" --lr_warmup_steps=0 \
--seed=42 \
--output_dir="sd-pokemon-model-lora" \
--validation_prompt="cute dragon creature"

推理脚本

from diffusers import StableDiffusionPipeline
import torch
model_path = "sd-pokemon-model-lora/checkpoint-10000"
pipe = StableDiffusionPipeline.from_pretrained("stable-diffusion-2", torch_dtype=torch.float16)
pipe.load_lora_weights(model_path)
pipe.to("cuda")
prompt = "A pokemon with green eyes and red legs."
image = pipe(prompt, num_inference_steps=30, guidance_scale=7.5).images[0]
image.save("pokemon.png")

站点统计

本周更新文章: 0 篇

文章总数: 59110 篇

今日访问量: 47768 次

访问总量: 202300 次