Untitled
unknown
plain_text
a year ago
30 kB
20
Indexable
(sdiff) root@193df87d3047:/workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8# python src/main.py --base src/configs/diffusion/matfuse-ldm-vq_f8.yaml --train --gpus 0, python: can't open file '/workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8/src/main.py': [Errno 2] No such file or directory (sdiff) root@193df87d3047:/workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8# cd .. (sdiff) root@193df87d3047:/workspace/matfuse-sd/logs# cd .. (sdiff) root@193df87d3047:/workspace/matfuse-sd# python src/main.py --base src/configs/diffusion/matfuse-ldm-vq_f8.yaml --train --gpus 0, Global seed set to 23 Running on GPUs 0, LatentDiffusion: Running in eps-prediction mode DiffusionWrapper has 395.03 M params. Keeping EMAs of 628. making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 256, 32, 32) = 262144 dimensions. making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels making attention of type 'vanilla' with 512 in_channels Working with z of shape (1, 256, 32, 32) = 262144 dimensions. making attention of type 'vanilla' with 512 in_channels Restored from /workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8/checkpoints/epoch=000003.ckpt with 576 missing and 7 unexpected keys out of a total of 7 Missing Keys: ['decoder.conv_in.weight', 'decoder.conv_in.bias', 'decoder.mid.block_1.norm1.weight', 'decoder.mid.block_1.norm1.bias', 'decoder.mid.block_1.conv1.weight', 'decoder.mid.block_1.conv1.bias', 'decoder.mid.block_1.norm2.weight', 'decoder.mid.block_1.norm2.bias', 'decoder.mid.block_1.conv2.weight', 'decoder.mid.block_1.conv2.bias', 'decoder.mid.attn_1.norm.weight', 'decoder.mid.attn_1.norm.bias', 'decoder.mid.attn_1.q.weight', 'decoder.mid.attn_1.q.bias', 'decoder.mid.attn_1.k.weight', 'decoder.mid.attn_1.k.bias', 'decoder.mid.attn_1.v.weight', 'decoder.mid.attn_1.v.bias', 'decoder.mid.attn_1.proj_out.weight', 'decoder.mid.attn_1.proj_out.bias', 'decoder.mid.block_2.norm1.weight', 'decoder.mid.block_2.norm1.bias', 'decoder.mid.block_2.conv1.weight', 'decoder.mid.block_2.conv1.bias', 'decoder.mid.block_2.norm2.weight', 'decoder.mid.block_2.norm2.bias', 'decoder.mid.block_2.conv2.weight', 'decoder.mid.block_2.conv2.bias', 'decoder.up.0.block.0.norm1.weight', 'decoder.up.0.block.0.norm1.bias', 'decoder.up.0.block.0.conv1.weight', 'decoder.up.0.block.0.conv1.bias', 'decoder.up.0.block.0.norm2.weight', 'decoder.up.0.block.0.norm2.bias', 'decoder.up.0.block.0.conv2.weight', 'decoder.up.0.block.0.conv2.bias', 'decoder.up.0.block.1.norm1.weight', 'decoder.up.0.block.1.norm1.bias', 'decoder.up.0.block.1.conv1.weight', 'decoder.up.0.block.1.conv1.bias', 'decoder.up.0.block.1.norm2.weight', 'decoder.up.0.block.1.norm2.bias', 'decoder.up.0.block.1.conv2.weight', 'decoder.up.0.block.1.conv2.bias', 'decoder.up.0.block.2.norm1.weight', 'decoder.up.0.block.2.norm1.bias', 'decoder.up.0.block.2.conv1.weight', 'decoder.up.0.block.2.conv1.bias', 'decoder.up.0.block.2.norm2.weight', 'decoder.up.0.block.2.norm2.bias', 'decoder.up.0.block.2.conv2.weight', 'decoder.up.0.block.2.conv2.bias', 'decoder.up.1.block.0.norm1.weight', 'decoder.up.1.block.0.norm1.bias', 'decoder.up.1.block.0.conv1.weight', 'decoder.up.1.block.0.conv1.bias', 'decoder.up.1.block.0.norm2.weight', 'decoder.up.1.block.0.norm2.bias', 'decoder.up.1.block.0.conv2.weight', 'decoder.up.1.block.0.conv2.bias', 'decoder.up.1.block.0.nin_shortcut.weight', 'decoder.up.1.block.0.nin_shortcut.bias', 'decoder.up.1.block.1.norm1.weight', 'decoder.up.1.block.1.norm1.bias', 'decoder.up.1.block.1.conv1.weight', 'decoder.up.1.block.1.conv1.bias', 'decoder.up.1.block.1.norm2.weight', 'decoder.up.1.block.1.norm2.bias', 'decoder.up.1.block.1.conv2.weight', 'decoder.up.1.block.1.conv2.bias', 'decoder.up.1.block.2.norm1.weight', 'decoder.up.1.block.2.norm1.bias', 'decoder.up.1.block.2.conv1.weight', 'decoder.up.1.block.2.conv1.bias', 'decoder.up.1.block.2.norm2.weight', 'decoder.up.1.block.2.norm2.bias', 'decoder.up.1.block.2.conv2.weight', 'decoder.up.1.block.2.conv2.bias', 'decoder.up.1.upsample.conv.weight', 'decoder.up.1.upsample.conv.bias', 'decoder.up.2.block.0.norm1.weight', 'decoder.up.2.block.0.norm1.bias', 'decoder.up.2.block.0.conv1.weight', 'decoder.up.2.block.0.conv1.bias', 'decoder.up.2.block.0.norm2.weight', 'decoder.up.2.block.0.norm2.bias', 'decoder.up.2.block.0.conv2.weight', 'decoder.up.2.block.0.conv2.bias', 'decoder.up.2.block.0.nin_shortcut.weight', 'decoder.up.2.block.0.nin_shortcut.bias', 'decoder.up.2.block.1.norm1.weight', 'decoder.up.2.block.1.norm1.bias', 'decoder.up.2.block.1.conv1.weight', 'decoder.up.2.block.1.conv1.bias', 'decoder.up.2.block.1.norm2.weight', 'decoder.up.2.block.1.norm2.bias', 'decoder.up.2.block.1.conv2.weight', 'decoder.up.2.block.1.conv2.bias', 'decoder.up.2.block.2.norm1.weight', 'decoder.up.2.block.2.norm1.bias', 'decoder.up.2.block.2.conv1.weight', 'decoder.up.2.block.2.conv1.bias', 'decoder.up.2.block.2.norm2.weight', 'decoder.up.2.block.2.norm2.bias', 'decoder.up.2.block.2.conv2.weight', 'decoder.up.2.block.2.conv2.bias', 'decoder.up.2.upsample.conv.weight', 'decoder.up.2.upsample.conv.bias', 'decoder.up.3.block.0.norm1.weight', 'decoder.up.3.block.0.norm1.bias', 'decoder.up.3.block.0.conv1.weight', 'decoder.up.3.block.0.conv1.bias', 'decoder.up.3.block.0.norm2.weight', 'decoder.up.3.block.0.norm2.bias', 'decoder.up.3.block.0.conv2.weight', 'decoder.up.3.block.0.conv2.bias', 'decoder.up.3.block.1.norm1.weight', 'decoder.up.3.block.1.norm1.bias', 'decoder.up.3.block.1.conv1.weight', 'decoder.up.3.block.1.conv1.bias', 'decoder.up.3.block.1.norm2.weight', 'decoder.up.3.block.1.norm2.bias', 'decoder.up.3.block.1.conv2.weight', 'decoder.up.3.block.1.conv2.bias', 'decoder.up.3.block.2.norm1.weight', 'decoder.up.3.block.2.norm1.bias', 'decoder.up.3.block.2.conv1.weight', 'decoder.up.3.block.2.conv1.bias', 'decoder.up.3.block.2.norm2.weight', 'decoder.up.3.block.2.norm2.bias', 'decoder.up.3.block.2.conv2.weight', 'decoder.up.3.block.2.conv2.bias', 'decoder.up.3.upsample.conv.weight', 'decoder.up.3.upsample.conv.bias', 'decoder.norm_out.weight', 'decoder.norm_out.bias', 'decoder.conv_out.weight', 'decoder.conv_out.bias', 'post_quant_conv.weight', 'post_quant_conv.bias', 'encoder_0.conv_in.weight', 'encoder_0.conv_in.bias', 'encoder_0.down.0.block.0.norm1.weight', 'encoder_0.down.0.block.0.norm1.bias', 'encoder_0.down.0.block.0.conv1.weight', 'encoder_0.down.0.block.0.conv1.bias', 'encoder_0.down.0.block.0.norm2.weight', 'encoder_0.down.0.block.0.norm2.bias', 'encoder_0.down.0.block.0.conv2.weight', 'encoder_0.down.0.block.0.conv2.bias', 'encoder_0.down.0.block.1.norm1.weight', 'encoder_0.down.0.block.1.norm1.bias', 'encoder_0.down.0.block.1.conv1.weight', 'encoder_0.down.0.block.1.conv1.bias', 'encoder_0.down.0.block.1.norm2.weight', 'encoder_0.down.0.block.1.norm2.bias', 'encoder_0.down.0.block.1.conv2.weight', 'encoder_0.down.0.block.1.conv2.bias', 'encoder_0.down.0.downsample.conv.weight', 'encoder_0.down.0.downsample.conv.bias', 'encoder_0.down.1.block.0.norm1.weight', 'encoder_0.down.1.block.0.norm1.bias', 'encoder_0.down.1.block.0.conv1.weight', 'encoder_0.down.1.block.0.conv1.bias', 'encoder_0.down.1.block.0.norm2.weight', 'encoder_0.down.1.block.0.norm2.bias', 'encoder_0.down.1.block.0.conv2.weight', 'encoder_0.down.1.block.0.conv2.bias', 'encoder_0.down.1.block.1.norm1.weight', 'encoder_0.down.1.block.1.norm1.bias', 'encoder_0.down.1.block.1.conv1.weight', 'encoder_0.down.1.block.1.conv1.bias', 'encoder_0.down.1.block.1.norm2.weight', 'encoder_0.down.1.block.1.norm2.bias', 'encoder_0.down.1.block.1.conv2.weight', 'encoder_0.down.1.block.1.conv2.bias', 'encoder_0.down.1.downsample.conv.weight', 'encoder_0.down.1.downsample.conv.bias', 'encoder_0.down.2.block.0.norm1.weight', 'encoder_0.down.2.block.0.norm1.bias', 'encoder_0.down.2.block.0.conv1.weight', 'encoder_0.down.2.block.0.conv1.bias', 'encoder_0.down.2.block.0.norm2.weight', 'encoder_0.down.2.block.0.norm2.bias', 'encoder_0.down.2.block.0.conv2.weight', 'encoder_0.down.2.block.0.conv2.bias', 'encoder_0.down.2.block.0.nin_shortcut.weight', 'encoder_0.down.2.block.0.nin_shortcut.bias', 'encoder_0.down.2.block.1.norm1.weight', 'encoder_0.down.2.block.1.norm1.bias', 'encoder_0.down.2.block.1.conv1.weight', 'encoder_0.down.2.block.1.conv1.bias', 'encoder_0.down.2.block.1.norm2.weight', 'encoder_0.down.2.block.1.norm2.bias', 'encoder_0.down.2.block.1.conv2.weight', 'encoder_0.down.2.block.1.conv2.bias', 'encoder_0.down.2.downsample.conv.weight', 'encoder_0.down.2.downsample.conv.bias', 'encoder_0.down.3.block.0.norm1.weight', 'encoder_0.down.3.block.0.norm1.bias', 'encoder_0.down.3.block.0.conv1.weight', 'encoder_0.down.3.block.0.conv1.bias', 'encoder_0.down.3.block.0.norm2.weight', 'encoder_0.down.3.block.0.norm2.bias', 'encoder_0.down.3.block.0.conv2.weight', 'encoder_0.down.3.block.0.conv2.bias', 'encoder_0.down.3.block.0.nin_shortcut.weight', 'encoder_0.down.3.block.0.nin_shortcut.bias', 'encoder_0.down.3.block.1.norm1.weight', 'encoder_0.down.3.block.1.norm1.bias', 'encoder_0.down.3.block.1.conv1.weight', 'encoder_0.down.3.block.1.conv1.bias', 'encoder_0.down.3.block.1.norm2.weight', 'encoder_0.down.3.block.1.norm2.bias', 'encoder_0.down.3.block.1.conv2.weight', 'encoder_0.down.3.block.1.conv2.bias', 'encoder_0.mid.block_1.norm1.weight', 'encoder_0.mid.block_1.norm1.bias', 'encoder_0.mid.block_1.conv1.weight', 'encoder_0.mid.block_1.conv1.bias', 'encoder_0.mid.block_1.norm2.weight', 'encoder_0.mid.block_1.norm2.bias', 'encoder_0.mid.block_1.conv2.weight', 'encoder_0.mid.block_1.conv2.bias', 'encoder_0.mid.attn_1.norm.weight', 'encoder_0.mid.attn_1.norm.bias', 'encoder_0.mid.attn_1.q.weight', 'encoder_0.mid.attn_1.q.bias', 'encoder_0.mid.attn_1.k.weight', 'encoder_0.mid.attn_1.k.bias', 'encoder_0.mid.attn_1.v.weight', 'encoder_0.mid.attn_1.v.bias', 'encoder_0.mid.attn_1.proj_out.weight', 'encoder_0.mid.attn_1.proj_out.bias', 'encoder_0.mid.block_2.norm1.weight', 'encoder_0.mid.block_2.norm1.bias', 'encoder_0.mid.block_2.conv1.weight', 'encoder_0.mid.block_2.conv1.bias', 'encoder_0.mid.block_2.norm2.weight', 'encoder_0.mid.block_2.norm2.bias', 'encoder_0.mid.block_2.conv2.weight', 'encoder_0.mid.block_2.conv2.bias', 'encoder_0.norm_out.weight', 'encoder_0.norm_out.bias', 'encoder_0.conv_out.weight', 'encoder_0.conv_out.bias', 'encoder_1.conv_in.weight', 'encoder_1.conv_in.bias', 'encoder_1.down.0.block.0.norm1.weight', 'encoder_1.down.0.block.0.norm1.bias', 'encoder_1.down.0.block.0.conv1.weight', 'encoder_1.down.0.block.0.conv1.bias', 'encoder_1.down.0.block.0.norm2.weight', 'encoder_1.down.0.block.0.norm2.bias', 'encoder_1.down.0.block.0.conv2.weight', 'encoder_1.down.0.block.0.conv2.bias', 'encoder_1.down.0.block.1.norm1.weight', 'encoder_1.down.0.block.1.norm1.bias', 'encoder_1.down.0.block.1.conv1.weight', 'encoder_1.down.0.block.1.conv1.bias', 'encoder_1.down.0.block.1.norm2.weight', 'encoder_1.down.0.block.1.norm2.bias', 'encoder_1.down.0.block.1.conv2.weight', 'encoder_1.down.0.block.1.conv2.bias', 'encoder_1.down.0.downsample.conv.weight', 'encoder_1.down.0.downsample.conv.bias', 'encoder_1.down.1.block.0.norm1.weight', 'encoder_1.down.1.block.0.norm1.bias', 'encoder_1.down.1.block.0.conv1.weight', 'encoder_1.down.1.block.0.conv1.bias', 'encoder_1.down.1.block.0.norm2.weight', 'encoder_1.down.1.block.0.norm2.bias', 'encoder_1.down.1.block.0.conv2.weight', 'encoder_1.down.1.block.0.conv2.bias', 'encoder_1.down.1.block.1.norm1.weight', 'encoder_1.down.1.block.1.norm1.bias', 'encoder_1.down.1.block.1.conv1.weight', 'encoder_1.down.1.block.1.conv1.bias', 'encoder_1.down.1.block.1.norm2.weight', 'encoder_1.down.1.block.1.norm2.bias', 'encoder_1.down.1.block.1.conv2.weight', 'encoder_1.down.1.block.1.conv2.bias', 'encoder_1.down.1.downsample.conv.weight', 'encoder_1.down.1.downsample.conv.bias', 'encoder_1.down.2.block.0.norm1.weight', 'encoder_1.down.2.block.0.norm1.bias', 'encoder_1.down.2.block.0.conv1.weight', 'encoder_1.down.2.block.0.conv1.bias', 'encoder_1.down.2.block.0.norm2.weight', 'encoder_1.down.2.block.0.norm2.bias', 'encoder_1.down.2.block.0.conv2.weight', 'encoder_1.down.2.block.0.conv2.bias', 'encoder_1.down.2.block.0.nin_shortcut.weight', 'encoder_1.down.2.block.0.nin_shortcut.bias', 'encoder_1.down.2.block.1.norm1.weight', 'encoder_1.down.2.block.1.norm1.bias', 'encoder_1.down.2.block.1.conv1.weight', 'encoder_1.down.2.block.1.conv1.bias', 'encoder_1.down.2.block.1.norm2.weight', 'encoder_1.down.2.block.1.norm2.bias', 'encoder_1.down.2.block.1.conv2.weight', 'encoder_1.down.2.block.1.conv2.bias', 'encoder_1.down.2.downsample.conv.weight', 'encoder_1.down.2.downsample.conv.bias', 'encoder_1.down.3.block.0.norm1.weight', 'encoder_1.down.3.block.0.norm1.bias', 'encoder_1.down.3.block.0.conv1.weight', 'encoder_1.down.3.block.0.conv1.bias', 'encoder_1.down.3.block.0.norm2.weight', 'encoder_1.down.3.block.0.norm2.bias', 'encoder_1.down.3.block.0.conv2.weight', 'encoder_1.down.3.block.0.conv2.bias', 'encoder_1.down.3.block.0.nin_shortcut.weight', 'encoder_1.down.3.block.0.nin_shortcut.bias', 'encoder_1.down.3.block.1.norm1.weight', 'encoder_1.down.3.block.1.norm1.bias', 'encoder_1.down.3.block.1.conv1.weight', 'encoder_1.down.3.block.1.conv1.bias', 'encoder_1.down.3.block.1.norm2.weight', 'encoder_1.down.3.block.1.norm2.bias', 'encoder_1.down.3.block.1.conv2.weight', 'encoder_1.down.3.block.1.conv2.bias', 'encoder_1.mid.block_1.norm1.weight', 'encoder_1.mid.block_1.norm1.bias', 'encoder_1.mid.block_1.conv1.weight', 'encoder_1.mid.block_1.conv1.bias', 'encoder_1.mid.block_1.norm2.weight', 'encoder_1.mid.block_1.norm2.bias', 'encoder_1.mid.block_1.conv2.weight', 'encoder_1.mid.block_1.conv2.bias', 'encoder_1.mid.attn_1.norm.weight', 'encoder_1.mid.attn_1.norm.bias', 'encoder_1.mid.attn_1.q.weight', 'encoder_1.mid.attn_1.q.bias', 'encoder_1.mid.attn_1.k.weight', 'encoder_1.mid.attn_1.k.bias', 'encoder_1.mid.attn_1.v.weight', 'encoder_1.mid.attn_1.v.bias', 'encoder_1.mid.attn_1.proj_out.weight', 'encoder_1.mid.attn_1.proj_out.bias', 'encoder_1.mid.block_2.norm1.weight', 'encoder_1.mid.block_2.norm1.bias', 'encoder_1.mid.block_2.conv1.weight', 'encoder_1.mid.block_2.conv1.bias', 'encoder_1.mid.block_2.norm2.weight', 'encoder_1.mid.block_2.norm2.bias', 'encoder_1.mid.block_2.conv2.weight', 'encoder_1.mid.block_2.conv2.bias', 'encoder_1.norm_out.weight', 'encoder_1.norm_out.bias', 'encoder_1.conv_out.weight', 'encoder_1.conv_out.bias', 'encoder_2.conv_in.weight', 'encoder_2.conv_in.bias', 'encoder_2.down.0.block.0.norm1.weight', 'encoder_2.down.0.block.0.norm1.bias', 'encoder_2.down.0.block.0.conv1.weight', 'encoder_2.down.0.block.0.conv1.bias', 'encoder_2.down.0.block.0.norm2.weight', 'encoder_2.down.0.block.0.norm2.bias', 'encoder_2.down.0.block.0.conv2.weight', 'encoder_2.down.0.block.0.conv2.bias', 'encoder_2.down.0.block.1.norm1.weight', 'encoder_2.down.0.block.1.norm1.bias', 'encoder_2.down.0.block.1.conv1.weight', 'encoder_2.down.0.block.1.conv1.bias', 'encoder_2.down.0.block.1.norm2.weight', 'encoder_2.down.0.block.1.norm2.bias', 'encoder_2.down.0.block.1.conv2.weight', 'encoder_2.down.0.block.1.conv2.bias', 'encoder_2.down.0.downsample.conv.weight', 'encoder_2.down.0.downsample.conv.bias', 'encoder_2.down.1.block.0.norm1.weight', 'encoder_2.down.1.block.0.norm1.bias', 'encoder_2.down.1.block.0.conv1.weight', 'encoder_2.down.1.block.0.conv1.bias', 'encoder_2.down.1.block.0.norm2.weight', 'encoder_2.down.1.block.0.norm2.bias', 'encoder_2.down.1.block.0.conv2.weight', 'encoder_2.down.1.block.0.conv2.bias', 'encoder_2.down.1.block.1.norm1.weight', 'encoder_2.down.1.block.1.norm1.bias', 'encoder_2.down.1.block.1.conv1.weight', 'encoder_2.down.1.block.1.conv1.bias', 'encoder_2.down.1.block.1.norm2.weight', 'encoder_2.down.1.block.1.norm2.bias', 'encoder_2.down.1.block.1.conv2.weight', 'encoder_2.down.1.block.1.conv2.bias', 'encoder_2.down.1.downsample.conv.weight', 'encoder_2.down.1.downsample.conv.bias', 'encoder_2.down.2.block.0.norm1.weight', 'encoder_2.down.2.block.0.norm1.bias', 'encoder_2.down.2.block.0.conv1.weight', 'encoder_2.down.2.block.0.conv1.bias', 'encoder_2.down.2.block.0.norm2.weight', 'encoder_2.down.2.block.0.norm2.bias', 'encoder_2.down.2.block.0.conv2.weight', 'encoder_2.down.2.block.0.conv2.bias', 'encoder_2.down.2.block.0.nin_shortcut.weight', 'encoder_2.down.2.block.0.nin_shortcut.bias', 'encoder_2.down.2.block.1.norm1.weight', 'encoder_2.down.2.block.1.norm1.bias', 'encoder_2.down.2.block.1.conv1.weight', 'encoder_2.down.2.block.1.conv1.bias', 'encoder_2.down.2.block.1.norm2.weight', 'encoder_2.down.2.block.1.norm2.bias', 'encoder_2.down.2.block.1.conv2.weight', 'encoder_2.down.2.block.1.conv2.bias', 'encoder_2.down.2.downsample.conv.weight', 'encoder_2.down.2.downsample.conv.bias', 'encoder_2.down.3.block.0.norm1.weight', 'encoder_2.down.3.block.0.norm1.bias', 'encoder_2.down.3.block.0.conv1.weight', 'encoder_2.down.3.block.0.conv1.bias', 'encoder_2.down.3.block.0.norm2.weight', 'encoder_2.down.3.block.0.norm2.bias', 'encoder_2.down.3.block.0.conv2.weight', 'encoder_2.down.3.block.0.conv2.bias', 'encoder_2.down.3.block.0.nin_shortcut.weight', 'encoder_2.down.3.block.0.nin_shortcut.bias', 'encoder_2.down.3.block.1.norm1.weight', 'encoder_2.down.3.block.1.norm1.bias', 'encoder_2.down.3.block.1.conv1.weight', 'encoder_2.down.3.block.1.conv1.bias', 'encoder_2.down.3.block.1.norm2.weight', 'encoder_2.down.3.block.1.norm2.bias', 'encoder_2.down.3.block.1.conv2.weight', 'encoder_2.down.3.block.1.conv2.bias', 'encoder_2.mid.block_1.norm1.weight', 'encoder_2.mid.block_1.norm1.bias', 'encoder_2.mid.block_1.conv1.weight', 'encoder_2.mid.block_1.conv1.bias', 'encoder_2.mid.block_1.norm2.weight', 'encoder_2.mid.block_1.norm2.bias', 'encoder_2.mid.block_1.conv2.weight', 'encoder_2.mid.block_1.conv2.bias', 'encoder_2.mid.attn_1.norm.weight', 'encoder_2.mid.attn_1.norm.bias', 'encoder_2.mid.attn_1.q.weight', 'encoder_2.mid.attn_1.q.bias', 'encoder_2.mid.attn_1.k.weight', 'encoder_2.mid.attn_1.k.bias', 'encoder_2.mid.attn_1.v.weight', 'encoder_2.mid.attn_1.v.bias', 'encoder_2.mid.attn_1.proj_out.weight', 'encoder_2.mid.attn_1.proj_out.bias', 'encoder_2.mid.block_2.norm1.weight', 'encoder_2.mid.block_2.norm1.bias', 'encoder_2.mid.block_2.conv1.weight', 'encoder_2.mid.block_2.conv1.bias', 'encoder_2.mid.block_2.norm2.weight', 'encoder_2.mid.block_2.norm2.bias', 'encoder_2.mid.block_2.conv2.weight', 'encoder_2.mid.block_2.conv2.bias', 'encoder_2.norm_out.weight', 'encoder_2.norm_out.bias', 'encoder_2.conv_out.weight', 'encoder_2.conv_out.bias', 'encoder_3.conv_in.weight', 'encoder_3.conv_in.bias', 'encoder_3.down.0.block.0.norm1.weight', 'encoder_3.down.0.block.0.norm1.bias', 'encoder_3.down.0.block.0.conv1.weight', 'encoder_3.down.0.block.0.conv1.bias', 'encoder_3.down.0.block.0.norm2.weight', 'encoder_3.down.0.block.0.norm2.bias', 'encoder_3.down.0.block.0.conv2.weight', 'encoder_3.down.0.block.0.conv2.bias', 'encoder_3.down.0.block.1.norm1.weight', 'encoder_3.down.0.block.1.norm1.bias', 'encoder_3.down.0.block.1.conv1.weight', 'encoder_3.down.0.block.1.conv1.bias', 'encoder_3.down.0.block.1.norm2.weight', 'encoder_3.down.0.block.1.norm2.bias', 'encoder_3.down.0.block.1.conv2.weight', 'encoder_3.down.0.block.1.conv2.bias', 'encoder_3.down.0.downsample.conv.weight', 'encoder_3.down.0.downsample.conv.bias', 'encoder_3.down.1.block.0.norm1.weight', 'encoder_3.down.1.block.0.norm1.bias', 'encoder_3.down.1.block.0.conv1.weight', 'encoder_3.down.1.block.0.conv1.bias', 'encoder_3.down.1.block.0.norm2.weight', 'encoder_3.down.1.block.0.norm2.bias', 'encoder_3.down.1.block.0.conv2.weight', 'encoder_3.down.1.block.0.conv2.bias', 'encoder_3.down.1.block.1.norm1.weight', 'encoder_3.down.1.block.1.norm1.bias', 'encoder_3.down.1.block.1.conv1.weight', 'encoder_3.down.1.block.1.conv1.bias', 'encoder_3.down.1.block.1.norm2.weight', 'encoder_3.down.1.block.1.norm2.bias', 'encoder_3.down.1.block.1.conv2.weight', 'encoder_3.down.1.block.1.conv2.bias', 'encoder_3.down.1.downsample.conv.weight', 'encoder_3.down.1.downsample.conv.bias', 'encoder_3.down.2.block.0.norm1.weight', 'encoder_3.down.2.block.0.norm1.bias', 'encoder_3.down.2.block.0.conv1.weight', 'encoder_3.down.2.block.0.conv1.bias', 'encoder_3.down.2.block.0.norm2.weight', 'encoder_3.down.2.block.0.norm2.bias', 'encoder_3.down.2.block.0.conv2.weight', 'encoder_3.down.2.block.0.conv2.bias', 'encoder_3.down.2.block.0.nin_shortcut.weight', 'encoder_3.down.2.block.0.nin_shortcut.bias', 'encoder_3.down.2.block.1.norm1.weight', 'encoder_3.down.2.block.1.norm1.bias', 'encoder_3.down.2.block.1.conv1.weight', 'encoder_3.down.2.block.1.conv1.bias', 'encoder_3.down.2.block.1.norm2.weight', 'encoder_3.down.2.block.1.norm2.bias', 'encoder_3.down.2.block.1.conv2.weight', 'encoder_3.down.2.block.1.conv2.bias', 'encoder_3.down.2.downsample.conv.weight', 'encoder_3.down.2.downsample.conv.bias', 'encoder_3.down.3.block.0.norm1.weight', 'encoder_3.down.3.block.0.norm1.bias', 'encoder_3.down.3.block.0.conv1.weight', 'encoder_3.down.3.block.0.conv1.bias', 'encoder_3.down.3.block.0.norm2.weight', 'encoder_3.down.3.block.0.norm2.bias', 'encoder_3.down.3.block.0.conv2.weight', 'encoder_3.down.3.block.0.conv2.bias', 'encoder_3.down.3.block.0.nin_shortcut.weight', 'encoder_3.down.3.block.0.nin_shortcut.bias', 'encoder_3.down.3.block.1.norm1.weight', 'encoder_3.down.3.block.1.norm1.bias', 'encoder_3.down.3.block.1.conv1.weight', 'encoder_3.down.3.block.1.conv1.bias', 'encoder_3.down.3.block.1.norm2.weight', 'encoder_3.down.3.block.1.norm2.bias', 'encoder_3.down.3.block.1.conv2.weight', 'encoder_3.down.3.block.1.conv2.bias', 'encoder_3.mid.block_1.norm1.weight', 'encoder_3.mid.block_1.norm1.bias', 'encoder_3.mid.block_1.conv1.weight', 'encoder_3.mid.block_1.conv1.bias', 'encoder_3.mid.block_1.norm2.weight', 'encoder_3.mid.block_1.norm2.bias', 'encoder_3.mid.block_1.conv2.weight', 'encoder_3.mid.block_1.conv2.bias', 'encoder_3.mid.attn_1.norm.weight', 'encoder_3.mid.attn_1.norm.bias', 'encoder_3.mid.attn_1.q.weight', 'encoder_3.mid.attn_1.q.bias', 'encoder_3.mid.attn_1.k.weight', 'encoder_3.mid.attn_1.k.bias', 'encoder_3.mid.attn_1.v.weight', 'encoder_3.mid.attn_1.v.bias', 'encoder_3.mid.attn_1.proj_out.weight', 'encoder_3.mid.attn_1.proj_out.bias', 'encoder_3.mid.block_2.norm1.weight', 'encoder_3.mid.block_2.norm1.bias', 'encoder_3.mid.block_2.conv1.weight', 'encoder_3.mid.block_2.conv1.bias', 'encoder_3.mid.block_2.norm2.weight', 'encoder_3.mid.block_2.norm2.bias', 'encoder_3.mid.block_2.conv2.weight', 'encoder_3.mid.block_2.conv2.bias', 'encoder_3.norm_out.weight', 'encoder_3.norm_out.bias', 'encoder_3.conv_out.weight', 'encoder_3.conv_out.bias', 'quantize_0.embedding.weight', 'quantize_1.embedding.weight', 'quantize_2.embedding.weight', 'quantize_3.embedding.weight', 'quant_conv_0.weight', 'quant_conv_0.bias', 'quant_conv_1.weight', 'quant_conv_1.bias', 'quant_conv_2.weight', 'quant_conv_2.bias', 'quant_conv_3.weight', 'quant_conv_3.bias'] Unexpected Keys: ['epoch', 'global_step', 'pytorch-lightning_version', 'state_dict', 'callbacks', 'optimizer_states', 'lr_schedulers'] Conditional model: Multiconditional Monitoring val/loss_simple_ema as checkpoint metric. Merged modelckpt-cfg: {'target': 'pytorch_lightning.callbacks.ModelCheckpoint', 'params': {'dirpath': 'logs/2024-06-10T17-26-39_matfuse-ldm-vq_f8/checkpoints', 'filename': '{epoch:06}', 'verbose': True, 'save_last': True, 'monitor': 'val/loss_simple_ema', 'save_top_k': 3}} GPU available: True, used: True TPU available: False, using: 0 TPU cores IPU available: False, using: 0 IPUs #### Data ##### train, MatFuseDataset, 173319 validation, MatFuseDataset, 63 accumulate_grad_batches = 1 Setting learning rate to 8.00e-06 = 1 (accumulate_grad_batches) * 1 (num_gpus) * 8 (batchsize) * 1.00e-06 (base_lr) /root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/core/datamodule.py:423: LightningDeprecationWarning: DataModule.setup has already been called, so it will not be called again. In v1.6 this behavior will change to always call DataModule.setup. rank_zero_deprecation( LOCAL_RANK: 0 - CUDA_VISIBLE_DEVICES: [0] LatentDiffusion: Also optimizing conditioner params! Project config model: base_learning_rate: 1.0e-06 target: ldm.models.diffusion.ddpm.LatentDiffusion params: linear_start: 0.0015 linear_end: 0.0195 num_timesteps_cond: 1 log_every_t: 200 timesteps: 1000 first_stage_key: packed cond_stage_key: - image_embed - sketch - palette - text image_size: 32 channels: 12 cond_stage_trainable: true conditioning_key: hybrid monitor: val/loss_simple_ema ucg_training: image_embed: p: 0.5 val: 0.0 palette: p: 0.5 val: 0.0 sketch: p: 0.5 val: 0.0 text: p: 0.5 val: '' unet_config: target: ldm.modules.diffusionmodules.openaimodel.UNetModel params: image_size: 32 in_channels: 16 out_channels: 12 model_channels: 256 attention_resolutions: - 4 - 2 - 1 num_res_blocks: 2 channel_mult: - 1 - 2 - 4 num_head_channels: 32 use_spatial_transformer: true transformer_depth: 1 context_dim: 512 use_checkpoint: true legacy: false first_stage_config: target: ldm.models.autoencoder.VQModelMulti params: embed_dim: 3 n_embed: 4096 ckpt_path: /workspace/matfuse-sd/logs/2024-06-07T19-26-44_multi-vq_f8/checkpoints/epoch=000003.ckpt ddconfig: double_z: false z_channels: 256 resolution: 256 in_channels: 3 out_ch: 12 ch: 128 ch_mult: - 1 - 1 - 2 - 4 num_res_blocks: 2 attn_resolutions: - null dropout: 0.0 lossconfig: target: torch.nn.Identity cond_stage_config: target: ldm.modules.encoders.multicondition.MultiConditionEncoder params: image_embed_config: target: ldm.modules.encoders.modules.FrozenCLIPImageEmbedder params: model: ViT-B/16 text_embed_config: target: ldm.modules.encoders.modules.FrozenCLIPSentenceEmbedder params: version: sentence-transformers/clip-ViT-B-16 binary_encoder_config: target: ldm.modules.encoders.modules.SimpleEncoder params: in_channels: 1 out_channels: 4 palette_proj_config: target: ldm.modules.encoders.multicondition.PaletteEncoder params: in_ch: 3 hid_ch: 64 out_ch: 512 data: target: main.DataModuleFromConfig params: batch_size: 8 num_workers: 0 wrap: false train: target: ldm.data.matfuse.MatFuseDataset params: data_root: data/train size: 256 output_names: - diffuse - normal - roughness - specular validation: target: ldm.data.matfuse.MatFuseDataset params: data_root: data/test size: 256 output_names: - diffuse - normal - roughness - specular Lightning config callbacks: image_logger: target: main.ImageLogger params: batch_frequency: 6 max_images: 4 increase_log_steps: false log_images_kwargs: ddim_steps: 50 trainer: strategy: ddp replace_sampler_ddp: false gpus: 0, | Name | Type | Params ------------------------------------------------------------ 0 | model | DiffusionWrapper | 395 M 1 | model_ema | LitEma | 0 2 | first_stage_model | VQModelMulti | 132 M 3 | cond_stage_model | MultiConditionEncoder | 299 M ------------------------------------------------------------ 545 M Trainable params 281 M Non-trainable params 827 M Total params 3,308.097 Total estimated model params size (MB) Validation sanity check: 0it [00:00, ?it/s]/root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, val dataloader 0, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance. rank_zero_warn( Global seed set to 23 /root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/trainer/data_loading.py:105: UserWarning: The dataloader, train dataloader, does not have many workers which may be a bottleneck. Consider increasing the value of the `num_workers` argument` (try 64 which is the number of cpus on this machine) in the `DataLoader` init to improve performance. rank_zero_warn( /root/anaconda3/envs/sdiff/lib/python3.10/site-packages/pytorch_lightning/callbacks/lr_monitor.py:112: RuntimeWarning: You are using `LearningRateMonitor` callback with models that have no learning rate schedulers. Please see documentation for `configure_optimizers` method. rank_zero_warn( Epoch 0: 0%|▏ | 14/6258 [00:13<1:30:16, 1.15it/s, loss=0.992, v_num=fdaz, train/loss_simple_step=0.985, train/loss_vlb_step=0.00564, train/loss_step=0.985, global_step=13.00]
Editor is loading...
Leave a Comment