Untitled

mail@pastecode.io avatar
unknown
c_cpp
11 days ago
550 kB
3
Indexable
Never
gadi-gpu-v100-0092.gadi.nci.org.au
gadi-gpu-v100-0092.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
gadi-gpu-v100-0095.gadi.nci.org.au
mpirun -output-filename llama.nodes2.GBS128.MBS32.125448206.gadi-pbs -report-bindings -x NCCL_DEBUG=INFO -x NCCL_NET_GDR_LEVEL=6 litgpt finetune_full /scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf --access_token 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 --out_dir /scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full --data JSON --data.json_path /scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024 --config /scratch/pi13/cl2868/litgptpy312_venv/full.yaml --eval.final_validation=false --train.epochs=1 --devices=4 --num_nodes=2 --train.global_batch_size=128 --train.micro_batch_size=32
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 20 bound to socket 0[core 20[hwt 0]]: [././././././././././././././././././././B/././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 21 bound to socket 0[core 21[hwt 0]]: [./././././././././././././././././././././B/./.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 22 bound to socket 0[core 22[hwt 0]]: [././././././././././././././././././././././B/.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 23 bound to socket 0[core 23[hwt 0]]: [./././././././././././././././././././././././B][./././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 24 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 25 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 26 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 27 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 28 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 29 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 30 bound to socket 1[core 30[hwt 0]]: [./././././././././././././././././././././././.][././././././B/././././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 31 bound to socket 1[core 31[hwt 0]]: [./././././././././././././././././././././././.][./././././././B/./././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 32 bound to socket 1[core 32[hwt 0]]: [./././././././././././././././././././././././.][././././././././B/././././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 33 bound to socket 1[core 33[hwt 0]]: [./././././././././././././././././././././././.][./././././././././B/./././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 34 bound to socket 1[core 34[hwt 0]]: [./././././././././././././././././././././././.][././././././././././B/././././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 35 bound to socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././B/./././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 36 bound to socket 1[core 36[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/././././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 37 bound to socket 1[core 37[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././B/./././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 38 bound to socket 1[core 38[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././B/././././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 39 bound to socket 1[core 39[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././B/./././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 40 bound to socket 1[core 40[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././B/././././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 41 bound to socket 1[core 41[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././B/./././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 42 bound to socket 1[core 42[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././B/././././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 43 bound to socket 1[core 43[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././B/./././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 44 bound to socket 1[core 44[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././B/././.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 45 bound to socket 1[core 45[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././B/./.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 46 bound to socket 1[core 46[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././././B/.]
[gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 47 bound to socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././././B]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 48 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 49 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 50 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 51 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 52 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 53 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 54 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 55 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 56 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 57 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 58 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 59 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 60 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 61 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 62 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 63 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 64 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 65 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 66 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 67 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 68 bound to socket 0[core 20[hwt 0]]: [././././././././././././././././././././B/././.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 69 bound to socket 0[core 21[hwt 0]]: [./././././././././././././././././././././B/./.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 70 bound to socket 0[core 22[hwt 0]]: [././././././././././././././././././././././B/.][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 71 bound to socket 0[core 23[hwt 0]]: [./././././././././././././././././././././././B][./././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 72 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 73 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 74 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 75 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 76 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 77 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 78 bound to socket 1[core 30[hwt 0]]: [./././././././././././././././././././././././.][././././././B/././././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 79 bound to socket 1[core 31[hwt 0]]: [./././././././././././././././././././././././.][./././././././B/./././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 80 bound to socket 1[core 32[hwt 0]]: [./././././././././././././././././././././././.][././././././././B/././././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 81 bound to socket 1[core 33[hwt 0]]: [./././././././././././././././././././././././.][./././././././././B/./././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 82 bound to socket 1[core 34[hwt 0]]: [./././././././././././././././././././././././.][././././././././././B/././././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 83 bound to socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././B/./././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 84 bound to socket 1[core 36[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/././././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 85 bound to socket 1[core 37[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././B/./././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 86 bound to socket 1[core 38[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././B/././././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 87 bound to socket 1[core 39[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././B/./././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 88 bound to socket 1[core 40[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././B/././././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 89 bound to socket 1[core 41[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././B/./././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 90 bound to socket 1[core 42[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././B/././././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 91 bound to socket 1[core 43[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././B/./././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 92 bound to socket 1[core 44[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././B/././.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 93 bound to socket 1[core 45[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././B/./.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 94 bound to socket 1[core 46[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././././B/.]
[gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 95 bound to socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././././B]
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1506763a3d10>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15224ec748f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x145a56763170>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14d0f1dc12e0>,
              ignore_index=-100,
              {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15044b882450>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x146da84e54f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x151a1b8611f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15011f976f00>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1500ffa18cb0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14f8c1e5bda0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x149d13751b50>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b827e0d130>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1535224a91f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14bdd1ec5b50>,
              {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1484d0f49850>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14ffa0702960>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x150ce9af5c40>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1552ce5f5310>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14e8b38f7440>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15290f603d70>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14f1c4629a90>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14e2176c3500>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14e7847bd970>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1507e866cef0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b1f14faa50>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14fe1b67a150>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x150c194a0110>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x152cff5a4da0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1529bb6251f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b0e7b1d1f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x148acc21b050>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14e297d8e7e0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14837a3b5d30>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x151a4e3a9ac0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b0887109b0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14f3d194e990>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x145c9d1e6990>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x153ed86f91f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x154c2ec7da90>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1476ab4c6150>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1514743c2db0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14fabe66b830>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1470d75d38c0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x146dd18f3050>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15218883c230>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15529e994bf0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 max_norm=None,
                    min_lr=6e-05)}
'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14824b729f10>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1532d922bcb0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1484bf1ea180>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14ef1ce1b9e0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15528c491a90>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x152aadb3a120>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x150d2d622f90>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14f5877edaf0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x145ee199b290>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14fafaaad6d0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x148c2fb714c0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x153e67f94320>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x153bbd407ad0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14a74aa18260>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x148484edfc80>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14638eb2e1b0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 {'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14a3398e83e0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14a4a64b0c80>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1475deff1df0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1522615b2090>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1515e69ebf80>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14d9eff85670>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14be83836e10>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x151982230260>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x150e9d74d070>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x153a7e4609b0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x146ca7ef3a70>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x154e44bca4b0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14675b23bce0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x147919a94e60>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1519d1c43e60>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x147592862750>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b553a1f830>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b7b0d61910>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x150cc0d71f40>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15009b2d6180>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x1490c8496180>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15294058a7e0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14fac64818e0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15059117e8a0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b41c8b4b90>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x145b46a05b80>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14c140a654f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14c41fbb3ce0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x150d142039e0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14b859194890>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x15540dbd0170>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x146593274080>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x154cc170da90>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
{'access_token': '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',
 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'),
 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'),
              mask_prompt=False,
              val_split_fraction=None,
              prompt_style=<litgpt.prompts.Alpaca object at 0x14843bc5a0f0>,
              ignore_index=-100,
              seed=42,
              num_workers=4),
 'devices': 4,
 'eval': EvalArgs(interval=25000,
                  max_new_tokens=100,
                  max_iters=100,
                  initial_validation=False,
                  final_validation=False),
 'logger_name': 'csv',
 'num_nodes': 2,
 'optimizer': {'class_path': 'torch.optim.Adadelta',
               'init_args': {'lr': 0.001}},
 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'),
 'precision': 'bf16-true',
 'resume': False,
 'seed': 1337,
 'train': TrainArgs(save_interval=20000,
                    log_interval=1,
                    global_batch_size=128,
                    micro_batch_size=32,
                    lr_warmup_steps=25,
                    lr_warmup_fraction=None,
                    epochs=1,
                    max_tokens=None,
                    max_steps=None,
                    max_seq_length=512,
                    tie_embeddings=None,
                    max_norm=None,
                    min_lr=6e-05)}
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p
[LOG_CAT_ML] ml_discover_hierarchy exited with error
[LOG_CAT_ML] ml_discover_hierarchy exited with error
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO cudaDriverVersion 12040
NCCL version 2.20.5+cuda12.4
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0>
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO cudaDriverVersion 12040
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0>
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0>
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO comm 0xc0ddaa0 rank 7 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO comm 0xbbe1100 rank 10 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO comm 0xb7d90d0 rank 15 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO comm 0xcae2f30 rank 13 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO comm 0xc05b400 rank 14 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO comm 0xc06a910 rank 16 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO comm 0xbf7fbf0 rank 9 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO comm 0xd318380 rank 6 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO comm 0xd296050 rank 11 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO comm 0xbf239e0 rank 12 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO comm 0xd3cbbc0 rank 8 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO comm 0xc77a9e0 rank 21 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO comm 0xb8d4ba0 rank 28 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO comm 0xcef28e0 rank 46 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO comm 0xce13180 rank 41 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO comm 0xbd21030 rank 43 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO comm 0xcee30f0 rank 40 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO comm 0xc2cb8a0 rank 45 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO comm 0xcf8f530 rank 22 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO comm 0xcad80d0 rank 29 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO comm 0xd16efc0 rank 26 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO comm 0xbebae40 rank 20 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO comm 0xcba79c0 rank 42 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO comm 0xcbf5ba0 rank 27 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO comm 0xbea08d0 rank 19 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO comm 0xced6210 rank 17 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO comm 0xcb59410 rank 18 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO comm 0xd128500 rank 44 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO comm 0xd3f6a50 rank 47 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO comm 0xbc42c30 rank 23 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO comm 0xbf0aef0 rank 25 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO comm 0xbd07a30 rank 24 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO comm 0xbdbdd40 rank 35 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO comm 0xc15bab0 rank 38 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO comm 0xcf0b210 rank 34 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO comm 0xc113f30 rank 32 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO comm 0xc0f0b60 rank 37 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO comm 0xd69f9c0 rank 31 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO comm 0xd5d6890 rank 30 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO comm 0xc49fd90 rank 36 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO comm 0xc1671a0 rank 33 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO comm 0xcf97ce0 rank 39 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO comm 0xc287540 rank 5 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO comm 0xbd1eae0 rank 4 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO comm 0xcf5d6b0 rank 3 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO comm 0xc24cb20 rank 2 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO comm 0xc71a300 rank 1 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO comm 0xbcb9150 rank 0 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO comm 0xc814c90 rank 91 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO comm 0xc2e3860 rank 92 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO comm 0xc59a600 rank 89 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO comm 0xc274ff0 rank 90 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO comm 0xd179c40 rank 82 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO comm 0xd6b8950 rank 93 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO comm 0xd1f8590 rank 83 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO comm 0xc71edb0 rank 77 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO comm 0xc8f2f90 rank 75 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO comm 0xbfbd9b0 rank 84 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO comm 0xc7d1f00 rank 81 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO comm 0xbc2a810 rank 59 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO comm 0xd5ab930 rank 78 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO comm 0xd4e09f0 rank 63 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO comm 0xce6c7a0 rank 88 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO comm 0xc9aee40 rank 80 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO comm 0xbd84510 rank 61 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO comm 0xd2e01a0 rank 69 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO comm 0xbfcdc80 rank 57 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO comm 0xc8b9a40 rank 67 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO comm 0xd3d8100 rank 79 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO comm 0xbfae7c0 rank 94 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO comm 0xbccfb60 rank 56 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO comm 0xce61430 rank 74 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO comm 0xbcb8b50 rank 86 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO comm 0xd230240 rank 58 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO comm 0xc533d00 rank 73 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO comm 0xc975300 rank 71 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO comm 0xd0bda60 rank 87 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO comm 0xc088a10 rank 68 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO comm 0xc73aaf0 rank 70 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO comm 0xb94c3b0 rank 72 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO comm 0xbbbd700 rank 62 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO comm 0xc7e7a50 rank 85 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO comm 0xcd02020 rank 95 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO comm 0xbd87d20 rank 64 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO comm 0xc39ac90 rank 60 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO comm 0xca15b20 rank 76 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO comm 0xbe54f00 rank 55 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO comm 0xb7dbe20 rank 65 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO comm 0xcb46b20 rank 66 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO comm 0xd201d60 rank 48 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO comm 0xd5f7e80 rank 53 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO comm 0xc396db0 rank 54 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO comm 0xbe83a00 rank 51 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO comm 0xcfe0a00 rank 52 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO comm 0xcdb7730 rank 50 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO comm 0xc8856d0 rank 49 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START

gadi-gpu-v100-0092:524970:524970 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 47 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524969:524969 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 46 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524968:524968 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 45 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524967:524967 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 44 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524966:524966 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 43 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524964:524964 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 41 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524965:524965 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 42 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524963:524963 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 40 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524962:524962 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 39 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524961:524961 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 38 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524960:524960 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 37 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524959:524959 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 36 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524958:524958 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 35 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524957:524957 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 34 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524956:524956 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 33 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524955:524955 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 32 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524954:524954 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 31 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524952:524952 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 29 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524951:524951 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 28 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524953:524953 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 30 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524950:524950 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 27 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524949:524949 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 26 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524948:524948 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 25 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524947:524947 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 24 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524946:524946 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 23 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524945:524945 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 22 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524943:524943 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 20 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524944:524944 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 21 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524942:524942 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 19 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524941:524941 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 18 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524940:524940 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 17 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524939:524939 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 16 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524938:524938 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 15 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524937:524937 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 14 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524936:524936 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 13 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524935:524935 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 12 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524934:524934 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 11 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524933:524933 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 10 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524932:524932 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 9 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0092:524931:524931 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 8 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524924:524924 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524928:524928 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 5 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524930:524930 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 7 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524927:524927 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 4 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524926:524926 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524929:524929 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 6 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524925:524925 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524923:524923 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 3d000
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114806:2114806 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 48 and rank 49 both on CUDA device 3d000
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114807:2114807 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 49 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114812:2114812 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 54 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114824:2114824 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 66 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114825:2114825 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 67 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114837:2114837 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 79 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114852:2114852 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 94 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114814:2114814 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 56 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114832:2114832 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 74 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114809:2114809 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 51 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114844:2114844 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 86 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114849:2114849 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 91 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114816:2114816 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 58 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114831:2114831 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 73 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114829:2114829 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 71 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114810:2114810 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 52 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114845:2114845 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 87 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114826:2114826 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 68 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114828:2114828 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 70 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114830:2114830 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 72 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114820:2114820 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 62 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114843:2114843 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 85 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114853:2114853 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 95 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114822:2114822 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 64 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114808:2114808 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 50 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114850:2114850 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 92 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114818:2114818 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 60 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114847:2114847 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 89 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114848:2114848 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 90 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114834:2114834 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 76 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114813:2114813 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 55 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114840:2114840 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 82 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114851:2114851 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 93 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114841:2114841 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 83 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114835:2114835 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 77 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114823:2114823 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 65 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114833:2114833 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 75 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114842:2114842 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 84 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114839:2114839 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 81 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114817:2114817 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 59 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114836:2114836 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 78 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114821:2114821 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 63 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114811:2114811 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 53 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114846:2114846 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 88 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114838:2114838 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 80 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114819:2114819 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 61 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114827:2114827 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 69 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114815:2114815 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 57 and rank 48 both on CUDA device 3d000
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using network IB
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using network IB
NCCL version 2.20.5+cuda12.4
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using network IB
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using non-device net plugin version 0
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO comm 0x10a92750 rank 26 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO comm 0x104cb1f0 rank 42 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO comm 0xf62b1c0 rank 24 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO comm 0xfa92df0 rank 45 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO comm 0x105193e0 rank 27 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO comm 0x108b2d70 rank 22 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO comm 0x10816150 rank 46 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO comm 0xf4e8500 rank 43 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO comm 0x10736890 rank 41 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO comm 0x10a4bd30 rank 44 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO comm 0xf82e600 rank 25 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO comm 0xfb70350 rank 2 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO comm 0x10bc39c0 rank 47 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO comm 0x10efa0c0 rank 30 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO comm 0xf566410 rank 23 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO comm 0x1009e210 rank 21 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO comm 0xf1f82c0 rank 28 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO comm 0x10806800 rank 40 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO comm 0xf7de670 rank 20 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO comm 0xfa7f1d0 rank 38 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO comm 0x103fb7f0 rank 29 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO comm 0xf5049d0 rank 10 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO comm 0xf92df70 rank 33 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO comm 0x10cef3f0 rank 8 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO comm 0xfdc35d0 rank 36 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO comm 0x1047cc10 rank 18 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO comm 0xf0fc910 rank 15 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO comm 0xf8a4ef0 rank 7 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO comm 0xf642200 rank 4 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO comm 0xf847210 rank 12 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO comm 0xf5dc800 rank 0 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO comm 0xf98e140 rank 16 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO comm 0x104066c0 rank 13 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO comm 0x107f9a50 rank 17 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO comm 0xfa14390 rank 37 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO comm 0x1003da20 rank 1 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO comm 0xfa5b6b0 rank 32 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO comm 0xf7c3fe0 rank 19 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO comm 0x108bb520 rank 39 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO comm 0x10bb97e0 rank 11 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO comm 0x1082e9c0 rank 34 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO comm 0xfa4eab0 rank 5 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO comm 0xf8a3420 rank 9 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO comm 0xf6e1580 rank 35 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO comm 0xf97ec40 rank 14 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO comm 0x10c3bbb0 rank 6 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO comm 0x10880ee0 rank 3 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO comm 0x10fc31f0 rank 31 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START

gadi-gpu-v100-0092:524928:524928 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 5 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524929:524929 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 6 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524926:524926 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524931:524931 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 8 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524930:524930 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 7 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524927:524927 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 4 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524924:524924 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524925:524925 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524957:524957 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 34 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524932:524932 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 9 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524958:524958 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 35 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524949:524949 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 26 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524943:524943 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 20 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524961:524961 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 38 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524952:524952 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 29 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524933:524933 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 10 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524965:524965 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 42 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524947:524947 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 24 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524956:524956 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 33 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524968:524968 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 45 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524959:524959 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 36 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524941:524941 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 18 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524938:524938 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 15 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524950:524950 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 27 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524945:524945 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 22 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524935:524935 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 12 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524923:524923 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 3d000
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524939:524939 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 16 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524969:524969 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 46 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524966:524966 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 43 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524936:524936 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 13 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524940:524940 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 17 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524960:524960 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 37 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524955:524955 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 32 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524942:524942 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 19 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524962:524962 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 39 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524964:524964 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 41 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524967:524967 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 44 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524934:524934 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 11 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524948:524948 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 25 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524970:524970 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 47 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524953:524953 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 30 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524954:524954 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 31 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1501 -> 5

gadi-gpu-v100-0092:524937:524937 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 14 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524946:524946 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 23 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524944:524944 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 21 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524951:524951 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 28 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0092:524963:524963 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 40 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using network IB
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return self.parallel_devices[self.local_rank]
    return component(**cfg)
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    sys.exit(main())
    return _run_component(component, init.get(subcommand))
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    CLI(parser_data)
    return component(**cfg)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
All GPUs are fully connected via NVLink.
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    CLI(parser_data)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return _run_component(component, init.get(subcommand))
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return self._wrap_and_launch(function, self, *args, **kwargs)
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    self._strategy.setup_environment()
    sys.exit(main())
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
    CLI(parser_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
    CLI(parser_data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return _run_component(component, init.get(subcommand))
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/96
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return _run_component(component, init.get(subcommand))
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return component(**cfg)
    self._strategy.setup_environment()
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return _run_component(component, init.get(subcommand))
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/96
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/96
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/96
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO comm 0x10b25590 rank 0 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO comm 0xfebde40 rank 41 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO comm 0x101a8df0 rank 1 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO comm 0x10cfb8a0 rank 31 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO comm 0xf8d1e30 rank 46 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO comm 0xf7a7230 rank 3 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO comm 0x10784c60 rank 26 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO comm 0xf6a7c30 rank 13 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO comm 0xf8f13a0 rank 9 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO comm 0x10339350 rank 28 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO comm 0xfe57530 rank 25 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO comm 0x10904230 rank 4 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO comm 0x10138470 rank 43 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO comm 0x10c039d0 rank 21 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO comm 0xf9ac1f0 rank 20 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO comm 0xf5f3390 rank 8 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO comm 0x102989c0 rank 23 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO comm 0x109e1240 rank 39 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO comm 0x1010b280 rank 37 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO comm 0x10e04230 rank 15 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO comm 0x106daec0 rank 2 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO comm 0x10626fb0 rank 47 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO comm 0xf5dc370 rank 38 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO comm 0x10b53a70 rank 10 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO comm 0x10216610 rank 27 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO comm 0x1046a350 rank 18 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO comm 0xf778580 rank 7 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO comm 0xfb98670 rank 42 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO comm 0x10a9d350 rank 34 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO comm 0x10fdc180 rank 45 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO comm 0xfc06f70 rank 44 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO comm 0x10b1bdc0 rank 35 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO comm 0xf0ff650 rank 17 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO comm 0x102d2560 rank 32 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO comm 0x100425e0 rank 29 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO comm 0xf905240 rank 36 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO comm 0xfcbe420 rank 12 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO comm 0xf26fb00 rank 24 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO comm 0xf4e0f00 rank 14 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO comm 0x10f1b610 rank 5 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO comm 0x100f5730 rank 33 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO comm 0x10ecf160 rank 30 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO comm 0xf54dfa0 rank 11 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO comm 0x101dd270 rank 19 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO comm 0x1078feb0 rank 40 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO comm 0xfcba5e0 rank 6 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO comm 0x1005e330 rank 22 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO comm 0xf6cf4a0 rank 16 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START

gadi-gpu-v100-0095:2114844:2114844 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 38 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114848:2114848 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 42 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114851:2114851 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 45 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114850:2114850 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 44 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114841:2114841 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 35 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114842:2114842 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 36 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114846:2114846 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 40 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114847:2114847 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 41 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114852:2114852 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 46 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114832:2114832 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 26 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114819:2114819 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 13 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114815:2114815 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 9 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114834:2114834 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 28 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114831:2114831 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 25 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114810:2114810 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 4 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114849:2114849 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 43 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114827:2114827 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 21 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114826:2114826 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 20 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114814:2114814 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 8 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114829:2114829 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 23 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114845:2114845 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 39 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114843:2114843 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 37 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114821:2114821 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 15 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114822:2114822 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 16 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114853:2114853 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 47 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114816:2114816 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 10 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114833:2114833 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 27 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114824:2114824 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 18 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114813:2114813 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 7 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114840:2114840 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 34 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114823:2114823 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 17 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114838:2114838 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 32 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114835:2114835 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 29 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114818:2114818 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 12 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114830:2114830 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 24 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114820:2114820 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 14 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114811:2114811 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 5 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114839:2114839 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 33 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114836:2114836 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 30 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114817:2114817 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 11 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114825:2114825 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 19 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114812:2114812 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 6 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114828:2114828 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 22 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114837:2114837 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 31 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1784 -> 5

gadi-gpu-v100-0095:2114806:2114806 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 3d000

gadi-gpu-v100-0095:2114807:2114807 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114809:2114809 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1746 -> 5

gadi-gpu-v100-0095:2114808:2114808 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 3d000
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1784 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1501 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1746 -> 5
gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1784 -> 5
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return _run_component(component, init.get(subcommand))
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return _run_component(component, init.get(subcommand))
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return component(**cfg)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    self._strategy.setup_environment()
    return to_run(*args, **kwargs)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return to_run(*args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    self.accelerator.setup_device(self.root_device)
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
IndexError: list index out of range
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    sys.exit(main())
    return component(**cfg)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    CLI(parser_data)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return to_run(*args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return to_run(*args, **kwargs)
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return _run_component(component, init.get(subcommand))
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
IndexError: list index out of range
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
    CLI(parser_data)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
IndexError: list index out of range
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    CLI(parser_data)
    return _run_component(component, init.get(subcommand))
    return _run_component(component, init.get(subcommand))
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return _run_component(component, init.get(subcommand))
    return component(**cfg)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return to_run(*args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    CLI(parser_data)
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return to_run(*args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return self.parallel_devices[self.local_rank]
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
IndexError: list index out of range
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self.accelerator.setup_device(self.root_device)
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    CLI(parser_data)
    return _run_component(component, init.get(subcommand))
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
    return _run_component(component, init.get(subcommand))
    return component(**cfg)
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return _run_component(component, init.get(subcommand))
    return component(**cfg)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
    self._strategy.setup_environment()
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    CLI(parser_data)
    self.accelerator.setup_device(self.root_device)
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
IndexError: list index out of range
    return _run_component(component, init.get(subcommand))
    self.accelerator.setup_device(self.root_device)
    sys.exit(main())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
IndexError: list index out of range
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
IndexError: list index out of range
    CLI(parser_data)
    return _run_component(component, init.get(subcommand))
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return _run_component(component, init.get(subcommand))
    return _run_component(component, init.get(subcommand))
    sys.exit(main())
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    CLI(parser_data)
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return self._wrap_and_launch(function, self, *args, **kwargs)
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self._strategy.setup_environment()
    return to_run(*args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return _run_component(component, init.get(subcommand))
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
Traceback (most recent call last):
  File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    sys.exit(main())
             ^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
IndexError: list index out of range
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    CLI(parser_data)
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    CLI(parser_data)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    return _run_component(component, init.get(subcommand))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    return _run_component(component, init.get(subcommand))
    return _run_component(component, init.get(subcommand))
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
    super().setup_environment()
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    return component(**cfg)
           ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self.parallel_devices[self.local_rank]
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
    fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
IndexError: list index out of range
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return to_run(*args, **kwargs)
    return to_run(*args, **kwargs)
    return self._wrap_and_launch(function, self, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    return self._wrap_and_launch(function, self, *args, **kwargs)
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
    return self.parallel_devices[self.local_rank]
    return self.parallel_devices[self.local_rank]
    super().setup_environment()
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
    return to_run(*args, **kwargs)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    return to_run(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup
    super().setup_environment()
    self.accelerator.setup_device(self.root_device)
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    super().setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
    self._strategy.setup_environment()
    self._strategy.setup_environment()
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment
    self.accelerator.setup_device(self.root_device)
    super().setup_environment()
    self.accelerator.setup_device(self.root_device)
    self.accelerator.setup_device(self.root_device)
    self.accelerator.setup_device(self.root_device)
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
    super().setup_environment()
    return self.parallel_devices[self.local_rank]
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    self.accelerator.setup_device(self.root_device)
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    self.accelerator.setup_device(self.root_device)
    self.accelerator.setup_device(self.root_device)
    return self.parallel_devices[self.local_rank]
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
                                  ^^^^^^^^^^^^^^^^
  File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device
    return self.parallel_devices[self.local_rank]
    return self.parallel_devices[self.local_rank]
    return self.parallel_devices[self.local_rank]
    return self.parallel_devices[self.local_rank]
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
IndexError: list index out of range
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
    return self.parallel_devices[self.local_rank]
           ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
IndexError: list index out of range
All GPUs are fully connected via NVLink.
Initializing distributed: GLOBAL_RANK: 48, MEMBER: 49/96
Initializing distributed: GLOBAL_RANK: 49, MEMBER: 50/96
Initializing distributed: GLOBAL_RANK: 50, MEMBER: 51/96
Initializing distributed: GLOBAL_RANK: 51, MEMBER: 52/96
=>> PBS: job killed: walltime 602 exceeded limit 600
[rank51]: Traceback (most recent call last):
[rank51]:   File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
[rank51]:     sys.exit(main())
[rank51]:              ^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
[rank51]:     CLI(parser_data)
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
[rank51]:     return _run_component(component, init.get(subcommand))
[rank51]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
[rank51]:     return component(**cfg)
[rank51]:            ^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
[rank51]:     fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
[rank51]:     return self._wrap_and_launch(function, self, *args, **kwargs)
[rank51]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
[rank51]:     return to_run(*args, **kwargs)
[rank51]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup
[rank51]:     return to_run(*args, **kwargs)
[rank51]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main
[rank51]:     train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train)
[rank51]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders
[rank51]:     with fabric.rank_zero_first():
[rank51]:   File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__
[rank51]:     return next(self.gen)
[rank51]:            ^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first
[rank51]:     with _InfiniteBarrier() as barrier:
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__
[rank51]:     self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000))
[rank51]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper
[rank51]:     func_return = func(*args, **kwargs)
[rank51]:                   ^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group
[rank51]:     return _new_group_with_tag(
[rank51]:            ^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag
[rank51]:     pg, pg_store = _new_process_group_helper(
[rank51]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank51]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper
[rank51]:     backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout)
[rank51]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank51]: torch.distributed.DistNetworkError: Connection reset by peer
[rank49]: Traceback (most recent call last):
[rank49]:   File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
[rank49]:     sys.exit(main())
[rank49]:              ^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
[rank49]:     CLI(parser_data)
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
[rank49]:     return _run_component(component, init.get(subcommand))
[rank49]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
[rank49]:     return component(**cfg)
[rank49]:            ^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
[rank49]:     fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
[rank49]:     return self._wrap_and_launch(function, self, *args, **kwargs)
[rank49]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
[rank49]:     return to_run(*args, **kwargs)
[rank49]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup
[rank49]:     return to_run(*args, **kwargs)
[rank49]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main
[rank49]:     train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train)
[rank49]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders
[rank49]:     with fabric.rank_zero_first():
[rank49]:   File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__
[rank49]:     return next(self.gen)
[rank49]:            ^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first
[rank49]:     with _InfiniteBarrier() as barrier:
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__
[rank49]:     self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000))
[rank49]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper
[rank49]:     func_return = func(*args, **kwargs)
[rank49]:                   ^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group
[rank49]:     return _new_group_with_tag(
[rank49]:            ^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag
[rank49]:     pg, pg_store = _new_process_group_helper(
[rank49]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank49]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper
[rank49]:     backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout)
[rank49]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank49]: torch.distributed.DistNetworkError: Connection reset by peer
[rank50]: Traceback (most recent call last):
[rank50]:   File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
[rank50]:     sys.exit(main())
[rank50]:              ^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
[rank50]:     CLI(parser_data)
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
[rank50]:     return _run_component(component, init.get(subcommand))
[rank50]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
[rank50]:     return component(**cfg)
[rank50]:            ^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
[rank50]:     fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
[rank50]:     return self._wrap_and_launch(function, self, *args, **kwargs)
[rank50]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
[rank50]:     return to_run(*args, **kwargs)
[rank50]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup
[rank50]:     return to_run(*args, **kwargs)
[rank50]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main
[rank50]:     train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train)
[rank50]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders
[rank50]:     with fabric.rank_zero_first():
[rank50]:   File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__
[rank50]:     return next(self.gen)
[rank50]:            ^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first
[rank50]:     with _InfiniteBarrier() as barrier:
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__
[rank50]:     self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000))
[rank50]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper
[rank50]:     func_return = func(*args, **kwargs)
[rank50]:                   ^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group
[rank50]:     return _new_group_with_tag(
[rank50]:            ^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag
[rank50]:     pg, pg_store = _new_process_group_helper(
[rank50]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank50]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper
[rank50]:     backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout)
[rank50]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank50]: torch.distributed.DistNetworkError: Connection reset by peer
[rank48]: Traceback (most recent call last):
[rank48]:   File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module>
[rank48]:     sys.exit(main())
[rank48]:              ^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main
[rank48]:     CLI(parser_data)
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI
[rank48]:     return _run_component(component, init.get(subcommand))
[rank48]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component
[rank48]:     return component(**cfg)
[rank48]:            ^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup
[rank48]:     fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer)
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch
[rank48]:     return self._wrap_and_launch(function, self, *args, **kwargs)
[rank48]:            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch
[rank48]:     return to_run(*args, **kwargs)
[rank48]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup
[rank48]:     return to_run(*args, **kwargs)
[rank48]:            ^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main
[rank48]:     train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train)
[rank48]:                                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders
[rank48]:     with fabric.rank_zero_first():
[rank48]:   File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__
[rank48]:     return next(self.gen)
[rank48]:            ^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first
[rank48]:     with _InfiniteBarrier() as barrier:
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__
[rank48]:     self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000))
[rank48]:                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper
[rank48]:     func_return = func(*args, **kwargs)
[rank48]:                   ^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group
[rank48]:     return _new_group_with_tag(
[rank48]:            ^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag
[rank48]:     pg, pg_store = _new_process_group_helper(
[rank48]:                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank48]:   File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper
[rank48]:     backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout)
[rank48]:                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
[rank48]: torch.distributed.DistNetworkError: Connection reset by peer

======================================================================================
                  Resource Usage on 2024-09-25 22:32:29:
   Job Id:             125448206.gadi-pbs
   Project:            pi13
   Exit Status:        -29 (Job failed due to exceeding walltime)
   Service Units:      49.04
   NCPUs Requested:    96                     NCPUs Used: 96
                                           CPU Time Used: 01:45:13
   Memory Requested:   256.0GB               Memory Used: 82.02GB
   NGPUs Requested:    8                 GPU Utilisation: 6%
                                         GPU Memory Used: 2.49GB
   Walltime requested: 00:10:00            Walltime Used: 00:10:13
   JobFS requested:    200.0MB                JobFS used: 0B
======================================================================================
Leave a Comment