Untitled
unknown
plain_text
2 years ago
30 kB
27
Indexable
(sdiff) root@193df87d3047:/workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8# python src/main.py --base src/configs/diffusion/matfuse-ldm-vq_f8.yaml --train --gpus 0,
python: can't open file '/workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8/src/main.py': [Errno 2] No such file or directory
(sdiff) root@193df87d3047:/workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8# cd ..
(sdiff) root@193df87d3047:/workspace/matfuse-sd/logs# cd ..
(sdiff) root@193df87d3047:/workspace/matfuse-sd# python src/main.py --base src/configs/diffusion/matfuse-ldm-vq_f8.yaml --train --gpus 0,
Global seed set to 23
Running on GPUs 0,
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 395.03 M params.
Keeping EMAs of 628.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 256, 32, 32) = 262144 dimensions.
making attention of type 'vanilla' with 512 in_channels
Restored from /workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8/checkpoints/epoch=000003.ckpt with 576 missing and 7 unexpected keys out of a total of 7
Missing Keys: ['decoder.conv_in.weight', 'decoder.conv_in.bias', 'decoder.mid.block_1.norm1.weight', 'decoder.mid.block_1.norm1.bias', 'decoder.mid.block_1.conv1.weight', 'decoder.mid.block_1.conv1.bias', 'decoder.mid.block_1.norm2.weight', 'decoder.mid.block_1.norm2.bias', 'decoder.mid.block_1.conv2.weight', 'decoder.mid.block_1.conv2.bias', 'decoder.mid.attn_1.norm.weight', 'decoder.mid.attn_1.norm.bias', 'decoder.mid.attn_1.q.weight', 'decoder.mid.attn_1.q.bias', 'decoder.mid.attn_1.k.weight', 'decoder.mid.attn_1.k.bias', 'decoder.mid.attn_1.v.weight', 'decoder.mid.attn_1.v.bias', 'decoder.mid.attn_1.proj_out.weight', 'decoder.mid.attn_1.proj_out.bias', 'decoder.mid.block_2.norm1.weight', 'decoder.mid.block_2.norm1.bias', 'decoder.mid.block_2.conv1.weight', 'decoder.mid.block_2.conv1.bias', 'decoder.mid.block_2.norm2.weight', 'decoder.mid.block_2.norm2.bias', 'decoder.mid.block_2.conv2.weight', 'decoder.mid.block_2.conv2.bias', 'decoder.up.0.block.0.norm1.weight', 'decoder.up.0.block.0.norm1.bias', 'decoder.up.0.block.0.conv1.weight', 'decoder.up.0.block.0.conv1.bias', 'decoder.up.0.block.0.norm2.weight', 'decoder.up.0.block.0.norm2.bias', 'decoder.up.0.block.0.conv2.weight', 'decoder.up.0.block.0.conv2.bias', 'decoder.up.0.block.1.norm1.weight', 'decoder.up.0.block.1.norm1.bias', 'decoder.up.0.block.1.conv1.weight', 'decoder.up.0.block.1.conv1.bias', 'decoder.up.0.block.1.norm2.weight', 'decoder.up.0.block.1.norm2.bias', 'decoder.up.0.block.1.conv2.weight', 'decoder.up.0.block.1.conv2.bias', 'decoder.up.0.block.2.norm1.weight', 'decoder.up.0.block.2.norm1.bias', 'decoder.up.0.block.2.conv1.weight', 'decoder.up.0.block.2.conv1.bias', 'decoder.up.0.block.2.norm2.weight', 'decoder.up.0.block.2.norm2.bias', 'decoder.up.0.block.2.conv2.weight', 'decoder.up.0.block.2.conv2.bias', 'decoder.up.1.block.0.norm1.weight', 'decoder.up.1.block.0.norm1.bias', 'decoder.up.1.block.0.conv1.weight', 'decoder.up.1.block.0.conv1.bias', 'decoder.up.1.block.0.norm2.weight', 'decoder.up.1.block.0.norm2.bias', 'decoder.up.1.block.0.conv2.weight', 'decoder.up.1.block.0.conv2.bias', 'decoder.up.1.block.0.nin_shortcut.weight', 'decoder.up.1.block.0.nin_shortcut.bias', 'decoder.up.1.block.1.norm1.weight', 'decoder.up.1.block.1.norm1.bias', 'decoder.up.1.block.1.conv1.weight', 'decoder.up.1.block.1.conv1.bias', 'decoder.up.1.block.1.norm2.weight', 'decoder.up.1.block.1.norm2.bias', 'decoder.up.1.block.1.conv2.weight', 'decoder.up.1.block.1.conv2.bias', 'decoder.up.1.block.2.norm1.weight', 'decoder.up.1.block.2.norm1.bias', 'decoder.up.1.block.2.conv1.weight', 'decoder.up.1.block.2.conv1.bias', 'decoder.up.1.block.2.norm2.weight', 'decoder.up.1.block.2.norm2.bias', 'decoder.up.1.block.2.conv2.weight', 'decoder.up.1.block.2.conv2.bias', 'decoder.up.1.upsample.conv.weight', 'decoder.up.1.upsample.conv.bias', 'decoder.up.2.block.0.norm1.weight', 'decoder.up.2.block.0.norm1.bias', 'decoder.up.2.block.0.conv1.weight', 'decoder.up.2.block.0.conv1.bias', 'decoder.up.2.block.0.norm2.weight', 'decoder.up.2.block.0.norm2.bias', 'decoder.up.2.block.0.conv2.weight', 'decoder.up.2.block.0.conv2.bias', 'decoder.up.2.block.0.nin_shortcut.weight', 'decoder.up.2.block.0.nin_shortcut.bias', 'decoder.up.2.block.1.norm1.weight', 'decoder.up.2.block.1.norm1.bias', 'decoder.up.2.block.1.conv1.weight', 'decoder.up.2.block.1.conv1.bias', 'decoder.up.2.block.1.norm2.weight', 'decoder.up.2.block.1.norm2.bias', 'decoder.up.2.block.1.conv2.weight', 'decoder.up.2.block.1.conv2.bias', 'decoder.up.2.block.2.norm1.weight', 'decoder.up.2.block.2.norm1.bias', 'decoder.up.2.block.2.conv1.weight', 'decoder.up.2.block.2.conv1.bias', 'decoder.up.2.block.2.norm2.weight', 'decoder.up.2.block.2.norm2.bias', 'decoder.up.2.block.2.conv2.weight', 'decoder.up.2.block.2.conv2.bias', 'decoder.up.2.upsample.conv.weight', 'decoder.up.2.upsample.conv.bias', 'decoder.up.3.block.0.norm1.weight', 'decoder.up.3.block.0.norm1.bias', 'decoder.up.3.block.0.conv1.weight', 'decoder.up.3.block.0.conv1.bias', 'decoder.up.3.block.0.norm2.weight', 'decoder.up.3.block.0.norm2.bias', 'decoder.up.3.block.0.conv2.weight', 'decoder.up.3.block.0.conv2.bias', 'decoder.up.3.block.1.norm1.weight', 'decoder.up.3.block.1.norm1.bias', 'decoder.up.3.block.1.conv1.weight', 'decoder.up.3.block.1.conv1.bias', 'decoder.up.3.block.1.norm2.weight', 'decoder.up.3.block.1.norm2.bias', 'decoder.up.3.block.1.conv2.weight', 'decoder.up.3.block.1.conv2.bias', 'decoder.up.3.block.2.norm1.weight', 'decoder.up.3.block.2.norm1.bias', 'decoder.up.3.block.2.conv1.weight', 'decoder.up.3.block.2.conv1.bias', 'decoder.up.3.block.2.norm2.weight', 'decoder.up.3.block.2.norm2.bias', 'decoder.up.3.block.2.conv2.weight', 'decoder.up.3.block.2.conv2.bias', 'decoder.up.3.upsample.conv.weight', 'decoder.up.3.upsample.conv.bias', 'decoder.norm_out.weight', 'decoder.norm_out.bias', 'decoder.conv_out.weight', 'decoder.conv_out.bias', 'post_quant_conv.weight', 'post_quant_conv.bias', 'encoder_0.conv_in.weight', 'encoder_0.conv_in.bias', 'encoder_0.down.0.block.0.norm1.weight', 'encoder_0.down.0.block.0.norm1.bias', 'encoder_0.down.0.block.0.conv1.weight', 'encoder_0.down.0.block.0.conv1.bias', 'encoder_0.down.0.block.0.norm2.weight', 'encoder_0.down.0.block.0.norm2.bias', 'encoder_0.down.0.block.0.conv2.weight', 'encoder_0.down.0.block.0.conv2.bias', 'encoder_0.down.0.block.1.norm1.weight', 'encoder_0.down.0.block.1.norm1.bias', 'encoder_0.down.0.block.1.conv1.weight', 'encoder_0.down.0.block.1.conv1.bias', 'encoder_0.down.0.block.1.norm2.weight', 'encoder_0.down.0.block.1.norm2.bias', 'encoder_0.down.0.block.1.conv2.weight', 'encoder_0.down.0.block.1.conv2.bias', 'encoder_0.down.0.downsample.conv.weight', 'encoder_0.down.0.downsample.conv.bias', 'encoder_0.down.1.block.0.norm1.weight', 'encoder_0.down.1.block.0.norm1.bias', 'encoder_0.down.1.block.0.conv1.weight', 'encoder_0.down.1.block.0.conv1.bias', 'encoder_0.down.1.block.0.norm2.weight', 'encoder_0.down.1.block.0.norm2.bias', 'encoder_0.down.1.block.0.conv2.weight', 'encoder_0.down.1.block.0.conv2.bias', 'encoder_0.down.1.block.1.norm1.weight', 'encoder_0.down.1.block.1.norm1.bias', 'encoder_0.down.1.block.1.conv1.weight', 'encoder_0.down.1.block.1.conv1.bias', 'encoder_0.down.1.block.1.norm2.weight', 'encoder_0.down.1.block.1.norm2.bias', 'encoder_0.down.1.block.1.conv2.weight', 'encoder_0.down.1.block.1.conv2.bias', 'encoder_0.down.1.downsample.conv.weight', 'encoder_0.down.1.downsample.conv.bias', 'encoder_0.down.2.block.0.norm1.weight', 'encoder_0.down.2.block.0.norm1.bias', 'encoder_0.down.2.block.0.conv1.weight', 'encoder_0.down.2.block.0.conv1.bias', 'encoder_0.down.2.block.0.norm2.weight', 'encoder_0.down.2.block.0.norm2.bias', 'encoder_0.down.2.block.0.conv2.weight', 'encoder_0.down.2.block.0.conv2.bias', 'encoder_0.down.2.block.0.nin_shortcut.weight', 'encoder_0.down.2.block.0.nin_shortcut.bias', 'encoder_0.down.2.block.1.norm1.weight', 'encoder_0.down.2.block.1.norm1.bias', 'encoder_0.down.2.block.1.conv1.weight', 'encoder_0.down.2.block.1.conv1.bias', 'encoder_0.down.2.block.1.norm2.weight', 'encoder_0.down.2.block.1.norm2.bias', 'encoder_0.down.2.block.1.conv2.weight', 'encoder_0.down.2.block.1.conv2.bias', 'encoder_0.down.2.downsample.conv.weight', 'encoder_0.down.2.downsample.conv.bias', 'encoder_0.down.3.block.0.norm1.weight', 'encoder_0.down.3.block.0.norm1.bias', 'encoder_0.down.3.block.0.conv1.weight', 'encoder_0.down.3.block.0.conv1.bias', 'encoder_0.down.3.block.0.norm2.weight', 'encoder_0.down.3.block.0.norm2.bias', 'encoder_0.down.3.block.0.conv2.weight', 'encoder_0.down.3.block.0.conv2.bias', 'encoder_0.down.3.block.0.nin_shortcut.weight', 'encoder_0.down.3.block.0.nin_shortcut.bias', 'encoder_0.down.3.block.1.norm1.weight', 'encoder_0.down.3.block.1.norm1.bias', 'encoder_0.down.3.block.1.conv1.weight', 'encoder_0.down.3.block.1.conv1.bias', 'encoder_0.down.3.block.1.norm2.weight', 'encoder_0.down.3.block.1.norm2.bias', 'encoder_0.down.3.block.1.conv2.weight', 'encoder_0.down.3.block.1.conv2.bias', 'encoder_0.mid.block_1.norm1.weight', 'encoder_0.mid.block_1.norm1.bias', 'encoder_0.mid.block_1.conv1.weight', 'encoder_0.mid.block_1.conv1.bias', 'encoder_0.mid.block_1.norm2.weight', 'encoder_0.mid.block_1.norm2.bias', 'encoder_0.mid.block_1.conv2.weight', 'encoder_0.mid.block_1.conv2.bias', 'encoder_0.mid.attn_1.norm.weight', 'encoder_0.mid.attn_1.norm.bias', 'encoder_0.mid.attn_1.q.weight', 'encoder_0.mid.attn_1.q.bias', 'encoder_0.mid.attn_1.k.weight', 'encoder_0.mid.attn_1.k.bias', 'encoder_0.mid.attn_1.v.weight', 'encoder_0.mid.attn_1.v.bias', 'encoder_0.mid.attn_1.proj_out.weight', 'encoder_0.mid.attn_1.proj_out.bias', 'encoder_0.mid.block_2.norm1.weight', 'encoder_0.mid.block_2.norm1.bias', 'encoder_0.mid.block_2.conv1.weight', 'encoder_0.mid.block_2.conv1.bias', 'encoder_0.mid.block_2.norm2.weight', 'encoder_0.mid.block_2.norm2.bias', 'encoder_0.mid.block_2.conv2.weight', 'encoder_0.mid.block_2.conv2.bias', 'encoder_0.norm_out.weight', 'encoder_0.norm_out.bias', 'encoder_0.conv_out.weight', 'encoder_0.conv_out.bias', 'encoder_1.conv_in.weight', 'encoder_1.conv_in.bias', 'encoder_1.down.0.block.0.norm1.weight', 'encoder_1.down.0.block.0.norm1.bias', 'encoder_1.down.0.block.0.conv1.weight', 'encoder_1.down.0.block.0.conv1.bias', 'encoder_1.down.0.block.0.norm2.weight', 'encoder_1.down.0.block.0.norm2.bias', 'encoder_1.down.0.block.0.conv2.weight', 'encoder_1.down.0.block.0.conv2.bias', 'encoder_1.down.0.block.1.norm1.weight', 'encoder_1.down.0.block.1.norm1.bias', 'encoder_1.down.0.block.1.conv1.weight', 'encoder_1.down.0.block.1.conv1.bias', 'encoder_1.down.0.block.1.norm2.weight', 'encoder_1.down.0.block.1.norm2.bias', 'encoder_1.down.0.block.1.conv2.weight', 'encoder_1.down.0.block.1.conv2.bias', 'encoder_1.down.0.downsample.conv.weight', 'encoder_1.down.0.downsample.conv.bias', 'encoder_1.down.1.block.0.norm1.weight', 'encoder_1.down.1.block.0.norm1.bias', 'encoder_1.down.1.block.0.conv1.weight', 'encoder_1.down.1.block.0.conv1.bias', 'encoder_1.down.1.block.0.norm2.weight', 'encoder_1.down.1.block.0.norm2.bias', 'encoder_1.down.1.block.0.conv2.weight', 'encoder_1.down.1.block.0.conv2.bias', 'encoder_1.down.1.block.1.norm1.weight', 'encoder_1.down.1.block.1.norm1.bias', 'encoder_1.down.1.block.1.conv1.weight', 'encoder_1.down.1.block.1.conv1.bias', 'encoder_1.down.1.block.1.norm2.weight', 'encoder_1.down.1.block.1.norm2.bias', 'encoder_1.down.1.block.1.conv2.weight', 'encoder_1.down.1.block.1.conv2.bias', 'encoder_1.down.1.downsample.conv.weight', 'encoder_1.down.1.downsample.conv.bias', 'encoder_1.down.2.block.0.norm1.weight', 'encoder_1.down.2.block.0.norm1.bias', 'encoder_1.down.2.block.0.conv1.weight', 'encoder_1.down.2.block.0.conv1.bias', 'encoder_1.down.2.block.0.norm2.weight', 'encoder_1.down.2.block.0.norm2.bias', 'encoder_1.down.2.block.0.conv2.weight', 'encoder_1.down.2.block.0.conv2.bias', 'encoder_1.down.2.block.0.nin_shortcut.weight', 'encoder_1.down.2.block.0.nin_shortcut.bias', 'encoder_1.down.2.block.1.norm1.weight', 'encoder_1.down.2.block.1.norm1.bias', 'encoder_1.down.2.block.1.conv1.weight', 'encoder_1.down.2.block.1.conv1.bias', 'encoder_1.down.2.block.1.norm2.weight', 'encoder_1.down.2.block.1.norm2.bias', 'encoder_1.down.2.block.1.conv2.weight', 'encoder_1.down.2.block.1.conv2.bias', 'encoder_1.down.2.downsample.conv.weight', 'encoder_1.down.2.downsample.conv.bias', 'encoder_1.down.3.block.0.norm1.weight', 'encoder_1.down.3.block.0.norm1.bias', 'encoder_1.down.3.block.0.conv1.weight', 'encoder_1.down.3.block.0.conv1.bias', 'encoder_1.down.3.block.0.norm2.weight', 'encoder_1.down.3.block.0.norm2.bias', 'encoder_1.down.3.block.0.conv2.weight', 'encoder_1.down.3.block.0.conv2.bias', 'encoder_1.down.3.block.0.nin_shortcut.weight', 'encoder_1.down.3.block.0.nin_shortcut.bias', 'encoder_1.down.3.block.1.norm1.weight', 'encoder_1.down.3.block.1.norm1.bias', 'encoder_1.down.3.block.1.conv1.weight', 'encoder_1.down.3.block.1.conv1.bias', 'encoder_1.down.3.block.1.norm2.weight', 'encoder_1.down.3.block.1.norm2.bias', 'encoder_1.down.3.block.1.conv2.weight', 'encoder_1.down.3.block.1.conv2.bias', 'encoder_1.mid.block_1.norm1.weight', 'encoder_1.mid.block_1.norm1.bias', 'encoder_1.mid.block_1.conv1.weight', 'encoder_1.mid.block_1.conv1.bias', 'encoder_1.mid.block_1.norm2.weight', 'encoder_1.mid.block_1.norm2.bias', 'encoder_1.mid.block_1.conv2.weight', 'encoder_1.mid.block_1.conv2.bias', 'encoder_1.mid.attn_1.norm.weight', 'encoder_1.mid.attn_1.norm.bias', 'encoder_1.mid.attn_1.q.weight', 'encoder_1.mid.attn_1.q.bias', 'encoder_1.mid.attn_1.k.weight', 'encoder_1.mid.attn_1.k.bias', 'encoder_1.mid.attn_1.v.weight', 'encoder_1.mid.attn_1.v.bias', 'encoder_1.mid.attn_1.proj_out.weight', 'encoder_1.mid.attn_1.proj_out.bias', 'encoder_1.mid.block_2.norm1.weight', 'encoder_1.mid.block_2.norm1.bias', 'encoder_1.mid.block_2.conv1.weight', 'encoder_1.mid.block_2.conv1.bias', 'encoder_1.mid.block_2.norm2.weight', 'encoder_1.mid.block_2.norm2.bias', 'encoder_1.mid.block_2.conv2.weight', 'encoder_1.mid.block_2.conv2.bias', 'encoder_1.norm_out.weight', 'encoder_1.norm_out.bias', 'encoder_1.conv_out.weight', 'encoder_1.conv_out.bias', 'encoder_2.conv_in.weight', 'encoder_2.conv_in.bias', 'encoder_2.down.0.block.0.norm1.weight', 'encoder_2.down.0.block.0.norm1.bias', 'encoder_2.down.0.block.0.conv1.weight', 'encoder_2.down.0.block.0.conv1.bias', 'encoder_2.down.0.block.0.norm2.weight', 'encoder_2.down.0.block.0.norm2.bias', 'encoder_2.down.0.block.0.conv2.weight', 'encoder_2.down.0.block.0.conv2.bias', 'encoder_2.down.0.block.1.norm1.weight', 'encoder_2.down.0.block.1.norm1.bias', 'encoder_2.down.0.block.1.conv1.weight', 'encoder_2.down.0.block.1.conv1.bias', 'encoder_2.down.0.block.1.norm2.weight', 'encoder_2.down.0.block.1.norm2.bias', 'encoder_2.down.0.block.1.conv2.weight', 'encoder_2.down.0.block.1.conv2.bias', 'encoder_2.down.0.downsample.conv.weight', 'encoder_2.down.0.downsample.conv.bias', 'encoder_2.down.1.block.0.norm1.weight', 'encoder_2.down.1.block.0.norm1.bias', 'encoder_2.down.1.block.0.conv1.weight', 'encoder_2.down.1.block.0.conv1.bias', 'encoder_2.down.1.block.0.norm2.weight', 'encoder_2.down.1.block.0.norm2.bias', 'encoder_2.down.1.block.0.conv2.weight', 'encoder_2.down.1.block.0.conv2.bias', 'encoder_2.down.1.block.1.norm1.weight', 'encoder_2.down.1.block.1.norm1.bias', 'encoder_2.down.1.block.1.conv1.weight', 'encoder_2.down.1.block.1.conv1.bias', 'encoder_2.down.1.block.1.norm2.weight', 'encoder_2.down.1.block.1.norm2.bias', 'encoder_2.down.1.block.1.conv2.weight', 'encoder_2.down.1.block.1.conv2.bias', 'encoder_2.down.1.downsample.conv.weight', 'encoder_2.down.1.downsample.conv.bias', 'encoder_2.down.2.block.0.norm1.weight', 'encoder_2.down.2.block.0.norm1.bias', 'encoder_2.down.2.block.0.conv1.weight', 'encoder_2.down.2.block.0.conv1.bias', 'encoder_2.down.2.block.0.norm2.weight', 'encoder_2.down.2.block.0.norm2.bias', 'encoder_2.down.2.block.0.conv2.weight', 'encoder_2.down.2.block.0.conv2.bias', 'encoder_2.down.2.block.0.nin_shortcut.weight', 'encoder_2.down.2.block.0.nin_shortcut.bias', 'encoder_2.down.2.block.1.norm1.weight', 'encoder_2.down.2.block.1.norm1.bias', 'encoder_2.down.2.block.1.conv1.weight', 'encoder_2.down.2.block.1.conv1.bias', 'encoder_2.down.2.block.1.norm2.weight', 'encoder_2.down.2.block.1.norm2.bias', 'encoder_2.down.2.block.1.conv2.weight', 'encoder_2.down.2.block.1.conv2.bias', 'encoder_2.down.2.downsample.conv.weight', 'encoder_2.down.2.downsample.conv.bias', 'encoder_2.down.3.block.0.norm1.weight', 'encoder_2.down.3.block.0.norm1.bias', 'encoder_2.down.3.block.0.conv1.weight', 'encoder_2.down.3.block.0.conv1.bias', 'encoder_2.down.3.block.0.norm2.weight', 'encoder_2.down.3.block.0.norm2.bias', 'encoder_2.down.3.block.0.conv2.weight', 'encoder_2.down.3.block.0.conv2.bias', 'encoder_2.down.3.block.0.nin_shortcut.weight', 'encoder_2.down.3.block.0.nin_shortcut.bias', 'encoder_2.down.3.block.1.norm1.weight', 'encoder_2.down.3.block.1.norm1.bias', 'encoder_2.down.3.block.1.conv1.weight', 'encoder_2.down.3.block.1.conv1.bias', 'encoder_2.down.3.block.1.norm2.weight', 'encoder_2.down.3.block.1.norm2.bias', 'encoder_2.down.3.block.1.conv2.weight', 'encoder_2.down.3.block.1.conv2.bias', 'encoder_2.mid.block_1.norm1.weight', 'encoder_2.mid.block_1.norm1.bias', 'encoder_2.mid.block_1.conv1.weight', 'encoder_2.mid.block_1.conv1.bias', 'encoder_2.mid.block_1.norm2.weight', 'encoder_2.mid.block_1.norm2.bias', 'encoder_2.mid.block_1.conv2.weight', 'encoder_2.mid.block_1.conv2.bias', 'encoder_2.mid.attn_1.norm.weight', 'encoder_2.mid.attn_1.norm.bias', 'encoder_2.mid.attn_1.q.weight', 'encoder_2.mid.attn_1.q.bias', 'encoder_2.mid.attn_1.k.weight', 'encoder_2.mid.attn_1.k.bias', 'encoder_2.mid.attn_1.v.weight', 'encoder_2.mid.attn_1.v.bias', 'encoder_2.mid.attn_1.proj_out.weight', 'encoder_2.mid.attn_1.proj_out.bias', 'encoder_2.mid.block_2.norm1.weight', 'encoder_2.mid.block_2.norm1.bias', 'encoder_2.mid.block_2.conv1.weight', 'encoder_2.mid.block_2.conv1.bias', 'encoder_2.mid.block_2.norm2.weight', 'encoder_2.mid.block_2.norm2.bias', 'encoder_2.mid.block_2.conv2.weight', 'encoder_2.mid.block_2.conv2.bias', 'encoder_2.norm_out.weight', 'encoder_2.norm_out.bias', 'encoder_2.conv_out.weight', 'encoder_2.conv_out.bias', 'encoder_3.conv_in.weight', 'encoder_3.conv_in.bias', 'encoder_3.down.0.block.0.norm1.weight', 'encoder_3.down.0.block.0.norm1.bias', 'encoder_3.down.0.block.0.conv1.weight', 'encoder_3.down.0.block.0.conv1.bias', 'encoder_3.down.0.block.0.norm2.weight', 'encoder_3.down.0.block.0.norm2.bias', 'encoder_3.down.0.block.0.conv2.weight', 'encoder_3.down.0.block.0.conv2.bias', 'encoder_3.down.0.block.1.norm1.weight', 'encoder_3.down.0.block.1.norm1.bias', 'encoder_3.down.0.block.1.conv1.weight', 'encoder_3.down.0.block.1.conv1.bias', 'encoder_3.down.0.block.1.norm2.weight', 'encoder_3.down.0.block.1.norm2.bias', 'encoder_3.down.0.block.1.conv2.weight', 'encoder_3.down.0.block.1.conv2.bias', 'encoder_3.down.0.downsample.conv.weight', 'encoder_3.down.0.downsample.conv.bias', 'encoder_3.down.1.block.0.norm1.weight', 'encoder_3.down.1.block.0.norm1.bias', 'encoder_3.down.1.block.0.conv1.weight', 'encoder_3.down.1.block.0.conv1.bias', 'encoder_3.down.1.block.0.norm2.weight', 'encoder_3.down.1.block.0.norm2.bias', 'encoder_3.down.1.block.0.conv2.weight', 'encoder_3.down.1.block.0.conv2.bias', 'encoder_3.down.1.block.1.norm1.weight', 'encoder_3.down.1.block.1.norm1.bias', 'encoder_3.down.1.block.1.conv1.weight', 'encoder_3.down.1.block.1.conv1.bias', 'encoder_3.down.1.block.1.norm2.weight', 'encoder_3.down.1.block.1.norm2.bias', 'encoder_3.down.1.block.1.conv2.weight', 'encoder_3.down.1.block.1.conv2.bias', 'encoder_3.down.1.downsample.conv.weight', 'encoder_3.down.1.downsample.conv.bias', 'encoder_3.down.2.block.0.norm1.weight', 'encoder_3.down.2.block.0.norm1.bias', 'encoder_3.down.2.block.0.conv1.weight', 'encoder_3.down.2.block.0.conv1.bias', 'encoder_3.down.2.block.0.norm2.weight', 'encoder_3.down.2.block.0.norm2.bias', 'encoder_3.down.2.block.0.conv2.weight', 'encoder_3.down.2.block.0.conv2.bias', 'encoder_3.down.2.block.0.nin_shortcut.weight', 'encoder_3.down.2.block.0.nin_shortcut.bias', 'encoder_3.down.2.block.1.norm1.weight', 'encoder_3.down.2.block.1.norm1.bias', 'encoder_3.down.2.block.1.conv1.weight', 'encoder_3.down.2.block.1.conv1.bias', 'encoder_3.down.2.block.1.norm2.weight', 'encoder_3.down.2.block.1.norm2.bias', 'encoder_3.down.2.block.1.conv2.weight', 'encoder_3.down.2.block.1.conv2.bias', 'encoder_3.down.2.downsample.conv.weight', 'encoder_3.down.2.downsample.conv.bias', 'encoder_3.down.3.block.0.norm1.weight', 'encoder_3.down.3.block.0.norm1.bias', 'encoder_3.down.3.block.0.conv1.weight', 'encoder_3.down.3.block.0.conv1.bias', 'encoder_3.down.3.block.0.norm2.weight', 'encoder_3.down.3.block.0.norm2.bias', 'encoder_3.down.3.block.0.conv2.weight', 'encoder_3.down.3.block.0.conv2.bias', 'encoder_3.down.3.block.0.nin_shortcut.weight', 'encoder_3.down.3.block.0.nin_shortcut.bias', 'encoder_3.down.3.block.1.norm1.weight', 'encoder_3.down.3.block.1.norm1.bias', 'encoder_3.down.3.block.1.conv1.weight', 'encoder_3.down.3.block.1.conv1.bias', 'encoder_3.down.3.block.1.norm2.weight', 'encoder_3.down.3.block.1.norm2.bias', 'encoder_3.down.3.block.1.conv2.weight', 'encoder_3.down.3.block.1.conv2.bias', 'encoder_3.mid.block_1.norm1.weight', 'encoder_3.mid.block_1.norm1.bias', 'encoder_3.mid.block_1.conv1.weight', 'encoder_3.mid.block_1.conv1.bias', 'encoder_3.mid.block_1.norm2.weight', 'encoder_3.mid.block_1.norm2.bias', 'encoder_3.mid.block_1.conv2.weight', 'encoder_3.mid.block_1.conv2.bias', 'encoder_3.mid.attn_1.norm.weight', 'encoder_3.mid.attn_1.norm.bias', 'encoder_3.mid.attn_1.q.weight', 'encoder_3.mid.attn_1.q.bias', 'encoder_3.mid.attn_1.k.weight', 'encoder_3.mid.attn_1.k.bias', 'encoder_3.mid.attn_1.v.weight', 'encoder_3.mid.attn_1.v.bias', 'encoder_3.mid.attn_1.proj_out.weight', 'encoder_3.mid.attn_1.proj_out.bias', 'encoder_3.mid.block_2.norm1.weight', 'encoder_3.mid.block_2.norm1.bias', 'encoder_3.mid.block_2.conv1.weight', 'encoder_3.mid.block_2.conv1.bias', 'encoder_3.mid.block_2.norm2.weight', 'encoder_3.mid.block_2.norm2.bias', 'encoder_3.mid.block_2.conv2.weight', 'encoder_3.mid.block_2.conv2.bias', 'encoder_3.norm_out.weight', 'encoder_3.norm_out.bias', 'encoder_3.conv_out.weight', 'encoder_3.conv_out.bias', 'quantize_0.embedding.weight', 'quantize_1.embedding.weight', 'quantize_2.embedding.weight', 'quantize_3.embedding.weight', 'quant_conv_0.weight', 'quant_conv_0.bias', 'quant_conv_1.weight', 'quant_conv_1.bias', 'quant_conv_2.weight', 'quant_conv_2.bias', 'quant_conv_3.weight', 'quant_conv_3.bias']
Unexpected Keys: ['epoch', 'global_step', 'pytorch-lightning_version', 'state_dict', 'callbacks', 'optimizer_states', 'lr_schedulers']
Conditional model: Multiconditional
Monitoring val/loss_simple_ema as checkpoint metric.
Merged modelckpt-cfg:
{'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs/2024-06-10T17-26-39_matfuse-ldm-vq_f8/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}}
GPU available: True, used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
#### Data #####
train, MatFuseDataset, 173319
validation, MatFuseDataset, 63
accumulate_grad_batches = 1
Setting learning rate to 8.00e-06 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 8 (batchsize) * 1.00e-06 (base_lr)
/root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/core/datamodule.py:423: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup.
rank_zero_deprecation(
LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0]
LatentDiffusion: Also optimizing conditioner params!
Project config
model:
base_learning_rate: 1.0e-06
target: ldm.models.diffusion.ddpm.LatentDiffusion
params:
linear_start: 0.0015
linear_end: 0.0195
num_timesteps_cond: 1
log_every_t: 200
timesteps: 1000
first_stage_key: packed
cond_stage_key:
- image_embed
- sketch
- palette
- text
image_size: 32
channels: 12
cond_stage_trainable: true
conditioning_key: hybrid
monitor: val/loss_simple_ema
ucg_training:
image_embed:
p: 0.5
val: 0.0
palette:
p: 0.5
val: 0.0
sketch:
p: 0.5
val: 0.0
text:
p: 0.5
val: ''
unet_config:
target: ldm.modules.diffusionmodules.openaimodel.UNetModel
params:
image_size: 32
in_channels: 16
out_channels: 12
model_channels: 256
attention_resolutions:
- 4
- 2
- 1
num_res_blocks: 2
channel_mult:
- 1
- 2
- 4
num_head_channels: 32
use_spatial_transformer: true
transformer_depth: 1
context_dim: 512
use_checkpoint: true
legacy: false
first_stage_config:
target: ldm.models.autoencoder.VQModelMulti
params:
embed_dim: 3
n_embed: 4096
ckpt_path: /workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8/checkpoints/epoch=000003.ckpt
ddconfig:
double_z: false
z_channels: 256
resolution: 256
in_channels: 3
out_ch: 12
ch: 128
ch_mult:
- 1
- 1
- 2
- 4
num_res_blocks: 2
attn_resolutions:
- null
dropout: 0.0
lossconfig:
target: torch.nn.Identity
cond_stage_config:
target: ldm.modules.encoders.multicondition.MultiConditionEncoder
params:
image_embed_config:
target: ldm.modules.encoders.modules.FrozenCLIPImageEmbedder
params:
model: ViT-B/16
text_embed_config:
target: ldm.modules.encoders.modules.FrozenCLIPSentenceEmbedder
params:
version: sentence-transformers/clip-ViT-B-16
binary_encoder_config:
target: ldm.modules.encoders.modules.SimpleEncoder
params:
in_channels: 1
out_channels: 4
palette_proj_config:
target: ldm.modules.encoders.multicondition.PaletteEncoder
params:
in_ch: 3
hid_ch: 64
out_ch: 512
data:
target: main.DataModuleFromConfig
params:
batch_size: 8
num_workers: 0
wrap: false
train:
target: ldm.data.matfuse.MatFuseDataset
params:
data_root: data/train
size: 256
output_names:
- diffuse
- normal
- roughness
- specular
validation:
target: ldm.data.matfuse.MatFuseDataset
params:
data_root: data/test
size: 256
output_names:
- diffuse
- normal
- roughness
- specular
Lightning config
callbacks:
image_logger:
target: main.ImageLogger
params:
batch_frequency: 6
max_images: 4
increase_log_steps: false
log_images_kwargs:
ddim_steps: 50
trainer:
strategy: ddp
replace_sampler_ddp: false
gpus: 0,
| Name | Type | Params
------------------------------------------------------------
0 | model | DiffusionWrapper | 395 M
1 | model_ema | LitEma | 0
2 | first_stage_model | VQModelMulti | 132 M
3 | cond_stage_model | MultiConditionEncoder | 299 M
------------------------------------------------------------
545 M Trainable params
281 M Non-trainable params
827 M Total params
3,308.097 Total estimated model params size (MB)
Validation sanity check: 0it [00:00, ?it/s]/root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
Global seed set to 23
/root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance.
rank_zero_warn(
/root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/callbacks/lr_monitor.py:112: RuntimeWarning: You are using `LearningRateMonitor` callback with models that have no learning rate schedulers. Please see documentation for `configure_optimizers` method.
rank_zero_warn(
Epoch 0: 0%|▏ | 14/6258 [00:13<1:30:16, 1.15it/s, loss=0.992, v_num=fdaz, train/loss_simple_step=0.985, train/loss_vlb_step=0.00564, train/loss_step=0.985, global_step=13.00]Editor is loading...
Leave a Comment