Untitled
unknown
c_cpp
11 days ago
550 kB
3
Indexable
Never
gadi-gpu-v100-0092.gadi.nci.org.au gadi-gpu-v100-0092.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au gadi-gpu-v100-0095.gadi.nci.org.au mpirun -output-filename llama.nodes2.GBS128.MBS32.125448206.gadi-pbs -report-bindings -x NCCL_DEBUG=INFO -x NCCL_NET_GDR_LEVEL=6 litgpt finetune_full /scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf --access_token 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 --out_dir /scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full --data JSON --data.json_path /scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024 --config /scratch/pi13/cl2868/litgptpy312_venv/full.yaml --eval.final_validation=false --train.epochs=1 --devices=4 --num_nodes=2 --train.global_batch_size=128 --train.micro_batch_size=32 [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 1 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 2 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 3 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 4 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 5 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 6 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 7 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 8 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 9 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 10 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 11 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 12 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 13 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 14 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 15 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 16 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 17 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 18 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 19 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 20 bound to socket 0[core 20[hwt 0]]: [././././././././././././././././././././B/././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 21 bound to socket 0[core 21[hwt 0]]: [./././././././././././././././././././././B/./.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 22 bound to socket 0[core 22[hwt 0]]: [././././././././././././././././././././././B/.][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 23 bound to socket 0[core 23[hwt 0]]: [./././././././././././././././././././././././B][./././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 24 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 25 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 26 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 27 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 28 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 29 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 30 bound to socket 1[core 30[hwt 0]]: [./././././././././././././././././././././././.][././././././B/././././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 31 bound to socket 1[core 31[hwt 0]]: [./././././././././././././././././././././././.][./././././././B/./././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 32 bound to socket 1[core 32[hwt 0]]: [./././././././././././././././././././././././.][././././././././B/././././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 33 bound to socket 1[core 33[hwt 0]]: [./././././././././././././././././././././././.][./././././././././B/./././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 34 bound to socket 1[core 34[hwt 0]]: [./././././././././././././././././././././././.][././././././././././B/././././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 35 bound to socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././B/./././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 36 bound to socket 1[core 36[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/././././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 37 bound to socket 1[core 37[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././B/./././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 38 bound to socket 1[core 38[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././B/././././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 39 bound to socket 1[core 39[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././B/./././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 40 bound to socket 1[core 40[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././B/././././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 41 bound to socket 1[core 41[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././B/./././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 42 bound to socket 1[core 42[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././B/././././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 43 bound to socket 1[core 43[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././B/./././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 44 bound to socket 1[core 44[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././B/././.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 45 bound to socket 1[core 45[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././B/./.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 46 bound to socket 1[core 46[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././././B/.] [gadi-gpu-v100-0092.gadi.nci.org.au:524879] MCW rank 47 bound to socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././././B] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 48 bound to socket 0[core 0[hwt 0]]: [B/././././././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 49 bound to socket 0[core 1[hwt 0]]: [./B/./././././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 50 bound to socket 0[core 2[hwt 0]]: [././B/././././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 51 bound to socket 0[core 3[hwt 0]]: [./././B/./././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 52 bound to socket 0[core 4[hwt 0]]: [././././B/././././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 53 bound to socket 0[core 5[hwt 0]]: [./././././B/./././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 54 bound to socket 0[core 6[hwt 0]]: [././././././B/././././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 55 bound to socket 0[core 7[hwt 0]]: [./././././././B/./././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 56 bound to socket 0[core 8[hwt 0]]: [././././././././B/././././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 57 bound to socket 0[core 9[hwt 0]]: [./././././././././B/./././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 58 bound to socket 0[core 10[hwt 0]]: [././././././././././B/././././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 59 bound to socket 0[core 11[hwt 0]]: [./././././././././././B/./././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 60 bound to socket 0[core 12[hwt 0]]: [././././././././././././B/././././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 61 bound to socket 0[core 13[hwt 0]]: [./././././././././././././B/./././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 62 bound to socket 0[core 14[hwt 0]]: [././././././././././././././B/././././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 63 bound to socket 0[core 15[hwt 0]]: [./././././././././././././././B/./././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 64 bound to socket 0[core 16[hwt 0]]: [././././././././././././././././B/././././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 65 bound to socket 0[core 17[hwt 0]]: [./././././././././././././././././B/./././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 66 bound to socket 0[core 18[hwt 0]]: [././././././././././././././././././B/././././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 67 bound to socket 0[core 19[hwt 0]]: [./././././././././././././././././././B/./././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 68 bound to socket 0[core 20[hwt 0]]: [././././././././././././././././././././B/././.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 69 bound to socket 0[core 21[hwt 0]]: [./././././././././././././././././././././B/./.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 70 bound to socket 0[core 22[hwt 0]]: [././././././././././././././././././././././B/.][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 71 bound to socket 0[core 23[hwt 0]]: [./././././././././././././././././././././././B][./././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 72 bound to socket 1[core 24[hwt 0]]: [./././././././././././././././././././././././.][B/././././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 73 bound to socket 1[core 25[hwt 0]]: [./././././././././././././././././././././././.][./B/./././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 74 bound to socket 1[core 26[hwt 0]]: [./././././././././././././././././././././././.][././B/././././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 75 bound to socket 1[core 27[hwt 0]]: [./././././././././././././././././././././././.][./././B/./././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 76 bound to socket 1[core 28[hwt 0]]: [./././././././././././././././././././././././.][././././B/././././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 77 bound to socket 1[core 29[hwt 0]]: [./././././././././././././././././././././././.][./././././B/./././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 78 bound to socket 1[core 30[hwt 0]]: [./././././././././././././././././././././././.][././././././B/././././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 79 bound to socket 1[core 31[hwt 0]]: [./././././././././././././././././././././././.][./././././././B/./././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 80 bound to socket 1[core 32[hwt 0]]: [./././././././././././././././././././././././.][././././././././B/././././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 81 bound to socket 1[core 33[hwt 0]]: [./././././././././././././././././././././././.][./././././././././B/./././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 82 bound to socket 1[core 34[hwt 0]]: [./././././././././././././././././././././././.][././././././././././B/././././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 83 bound to socket 1[core 35[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././B/./././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 84 bound to socket 1[core 36[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././B/././././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 85 bound to socket 1[core 37[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././B/./././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 86 bound to socket 1[core 38[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././B/././././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 87 bound to socket 1[core 39[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././B/./././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 88 bound to socket 1[core 40[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././B/././././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 89 bound to socket 1[core 41[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././B/./././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 90 bound to socket 1[core 42[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././B/././././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 91 bound to socket 1[core 43[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././B/./././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 92 bound to socket 1[core 44[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././B/././.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 93 bound to socket 1[core 45[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././B/./.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 94 bound to socket 1[core 46[hwt 0]]: [./././././././././././././././././././././././.][././././././././././././././././././././././B/.] [gadi-gpu-v100-0095.gadi.nci.org.au:2114790] MCW rank 95 bound to socket 1[core 47[hwt 0]]: [./././././././././././././././././././././././.][./././././././././././././././././././././././B] {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1506763a3d10>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15224ec748f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x145a56763170>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14d0f1dc12e0>, ignore_index=-100, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15044b882450>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x146da84e54f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x151a1b8611f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15011f976f00>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1500ffa18cb0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14f8c1e5bda0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x149d13751b50>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b827e0d130>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1535224a91f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14bdd1ec5b50>, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1484d0f49850>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14ffa0702960>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x150ce9af5c40>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1552ce5f5310>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14e8b38f7440>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15290f603d70>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14f1c4629a90>, ignore_index=-100, seed=42, num_workers=4), {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14e2176c3500>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14e7847bd970>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1507e866cef0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b1f14faa50>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14fe1b67a150>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x150c194a0110>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x152cff5a4da0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1529bb6251f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b0e7b1d1f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x148acc21b050>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14e297d8e7e0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14837a3b5d30>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x151a4e3a9ac0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b0887109b0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14f3d194e990>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x145c9d1e6990>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x153ed86f91f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x154c2ec7da90>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1476ab4c6150>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1514743c2db0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14fabe66b830>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1470d75d38c0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x146dd18f3050>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15218883c230>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15529e994bf0>, ignore_index=-100, seed=42, num_workers=4), max_norm=None, min_lr=6e-05)} 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14824b729f10>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1532d922bcb0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1484bf1ea180>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14ef1ce1b9e0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15528c491a90>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x152aadb3a120>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x150d2d622f90>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14f5877edaf0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x145ee199b290>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14fafaaad6d0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x148c2fb714c0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x153e67f94320>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x153bbd407ad0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14a74aa18260>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x148484edfc80>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14638eb2e1b0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14a3398e83e0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14a4a64b0c80>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1475deff1df0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1522615b2090>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1515e69ebf80>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14d9eff85670>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14be83836e10>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x151982230260>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x150e9d74d070>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x153a7e4609b0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x146ca7ef3a70>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x154e44bca4b0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14675b23bce0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x147919a94e60>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1519d1c43e60>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x147592862750>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b553a1f830>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b7b0d61910>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x150cc0d71f40>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15009b2d6180>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x1490c8496180>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15294058a7e0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14fac64818e0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15059117e8a0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b41c8b4b90>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x145b46a05b80>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14c140a654f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14c41fbb3ce0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x150d142039e0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14b859194890>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x15540dbd0170>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x146593274080>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x154cc170da90>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} {'access_token': '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', 'checkpoint_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/meta-llama/Llama-2-7b-hf'), 'data': JSON(json_path=PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/model/alpaca1024'), mask_prompt=False, val_split_fraction=None, prompt_style=<litgpt.prompts.Alpaca object at 0x14843bc5a0f0>, ignore_index=-100, seed=42, num_workers=4), 'devices': 4, 'eval': EvalArgs(interval=25000, max_new_tokens=100, max_iters=100, initial_validation=False, final_validation=False), 'logger_name': 'csv', 'num_nodes': 2, 'optimizer': {'class_path': 'torch.optim.Adadelta', 'init_args': {'lr': 0.001}}, 'out_dir': PosixPath('/scratch/pi13/cl2868/litgptpy312_venv/out/finetune/full'), 'precision': 'bf16-true', 'resume': False, 'seed': 1337, 'train': TrainArgs(save_interval=20000, log_interval=1, global_batch_size=128, micro_batch_size=32, lr_warmup_steps=25, lr_warmup_fraction=None, epochs=1, max_tokens=None, max_steps=None, max_seq_length=512, tie_embeddings=None, max_norm=None, min_lr=6e-05)} [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] component basesmuma is not available but requested in hierarchy: basesmuma,basesmuma,ucx_p2p:basesmsocket,basesmuma,p2p [LOG_CAT_ML] ml_discover_hierarchy exited with error [LOG_CAT_ML] ml_discover_hierarchy exited with error gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO cudaDriverVersion 12040 NCCL version 2.20.5+cuda12.4 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.7<0> gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO cudaDriverVersion 12040 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Bootstrap : Using ib0:10.6.28.4<0> gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO NET/Plugin : dlerror=libnccl-net.so: cannot open shared object file: No such file or directory No plugin found (libnccl-net.so), using internal implementation gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.4<0> gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO NET/IB : Using [0]mlx5_0:1/IB [RO]; OOB ib0:10.6.28.7<0> gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO comm 0xc0ddaa0 rank 7 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO comm 0xbbe1100 rank 10 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO comm 0xb7d90d0 rank 15 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO comm 0xcae2f30 rank 13 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO comm 0xc05b400 rank 14 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO comm 0xc06a910 rank 16 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO comm 0xbf7fbf0 rank 9 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO comm 0xd318380 rank 6 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO comm 0xd296050 rank 11 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO comm 0xbf239e0 rank 12 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO comm 0xd3cbbc0 rank 8 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO comm 0xc77a9e0 rank 21 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO comm 0xb8d4ba0 rank 28 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO comm 0xcef28e0 rank 46 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO comm 0xce13180 rank 41 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO comm 0xbd21030 rank 43 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO comm 0xcee30f0 rank 40 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO comm 0xc2cb8a0 rank 45 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO comm 0xcf8f530 rank 22 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO comm 0xcad80d0 rank 29 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO comm 0xd16efc0 rank 26 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO comm 0xbebae40 rank 20 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO comm 0xcba79c0 rank 42 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO comm 0xcbf5ba0 rank 27 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO comm 0xbea08d0 rank 19 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO comm 0xced6210 rank 17 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO comm 0xcb59410 rank 18 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO comm 0xd128500 rank 44 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO comm 0xd3f6a50 rank 47 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO comm 0xbc42c30 rank 23 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO comm 0xbf0aef0 rank 25 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO comm 0xbd07a30 rank 24 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO comm 0xbdbdd40 rank 35 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO comm 0xc15bab0 rank 38 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO comm 0xcf0b210 rank 34 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO comm 0xc113f30 rank 32 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO comm 0xc0f0b60 rank 37 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO comm 0xd69f9c0 rank 31 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO comm 0xd5d6890 rank 30 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO comm 0xc49fd90 rank 36 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO comm 0xc1671a0 rank 33 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO comm 0xcf97ce0 rank 39 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO comm 0xc287540 rank 5 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO comm 0xbd1eae0 rank 4 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO comm 0xcf5d6b0 rank 3 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO comm 0xc24cb20 rank 2 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO comm 0xc71a300 rank 1 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO comm 0xbcb9150 rank 0 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO comm 0xc814c90 rank 91 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO comm 0xc2e3860 rank 92 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO comm 0xc59a600 rank 89 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO comm 0xc274ff0 rank 90 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO comm 0xd179c40 rank 82 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO comm 0xd6b8950 rank 93 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO comm 0xd1f8590 rank 83 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO comm 0xc71edb0 rank 77 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO comm 0xc8f2f90 rank 75 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO comm 0xbfbd9b0 rank 84 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO comm 0xc7d1f00 rank 81 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO comm 0xbc2a810 rank 59 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO comm 0xd5ab930 rank 78 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO comm 0xd4e09f0 rank 63 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO comm 0xce6c7a0 rank 88 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO comm 0xc9aee40 rank 80 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO comm 0xbd84510 rank 61 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO comm 0xd2e01a0 rank 69 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO comm 0xbfcdc80 rank 57 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO comm 0xc8b9a40 rank 67 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO comm 0xd3d8100 rank 79 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO comm 0xbfae7c0 rank 94 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO comm 0xbccfb60 rank 56 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO comm 0xce61430 rank 74 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO comm 0xbcb8b50 rank 86 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO comm 0xd230240 rank 58 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO comm 0xc533d00 rank 73 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO comm 0xc975300 rank 71 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO comm 0xd0bda60 rank 87 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO comm 0xc088a10 rank 68 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO comm 0xc73aaf0 rank 70 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO comm 0xb94c3b0 rank 72 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO comm 0xbbbd700 rank 62 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO comm 0xc7e7a50 rank 85 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO comm 0xcd02020 rank 95 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO comm 0xbd87d20 rank 64 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO comm 0xc39ac90 rank 60 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO comm 0xca15b20 rank 76 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO comm 0xbe54f00 rank 55 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO comm 0xb7dbe20 rank 65 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO comm 0xcb46b20 rank 66 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO comm 0xd201d60 rank 48 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO comm 0xd5f7e80 rank 53 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO comm 0xc396db0 rank 54 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO comm 0xbe83a00 rank 51 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO comm 0xcfe0a00 rank 52 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO comm 0xcdb7730 rank 50 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO comm 0xc8856d0 rank 49 nranks 96 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xcf455ffd94a07d98 - Init START gadi-gpu-v100-0092:524970:524970 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 47 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524969:524969 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 46 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524968:524968 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 45 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524967:524967 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 44 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524966:524966 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 43 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524964:524964 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 41 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524965:524965 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 42 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524963:524963 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 40 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524962:524962 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 39 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524961:524961 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 38 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524960:524960 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 37 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524959:524959 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 36 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524958:524958 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 35 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524957:524957 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 34 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524956:524956 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 33 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524955:524955 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 32 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524954:524954 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 31 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524952:524952 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 29 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524951:524951 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 28 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524953:524953 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 30 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524950:524950 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 27 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524949:524949 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 26 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524948:524948 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 25 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524947:524947 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 24 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524946:524946 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 23 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524945:524945 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 22 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524943:524943 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 20 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524944:524944 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 21 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524942:524942 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 19 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524941:524941 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 18 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524940:524940 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 17 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524939:524939 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 16 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524938:524938 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 15 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524937:524937 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 14 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524936:524936 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 13 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524935:524935 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 12 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524934:524934 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 11 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524933:524933 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 10 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524932:524932 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 9 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524931:524931 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 8 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524924:524924 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524928:524928 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 5 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524930:524930 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 7 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524927:524927 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 4 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524926:524926 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524929:524929 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 6 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524925:524925 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524923:524923 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 3d000 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 48 and rank 49 both on CUDA device 3d000 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114807:2114807 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 49 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114812:2114812 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 54 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114824:2114824 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 66 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114825:2114825 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 67 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114837:2114837 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 79 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114852:2114852 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 94 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114814:2114814 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 56 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114832:2114832 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 74 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114809:2114809 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 51 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114844:2114844 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 86 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114849:2114849 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 91 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114816:2114816 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 58 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114831:2114831 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 73 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114829:2114829 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 71 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114810:2114810 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 52 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114845:2114845 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 87 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114826:2114826 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 68 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114828:2114828 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 70 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114830:2114830 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 72 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114820:2114820 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 62 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114843:2114843 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 85 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114853:2114853 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 95 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114822:2114822 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 64 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114808:2114808 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 50 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114850:2114850 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 92 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114818:2114818 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 60 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114847:2114847 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 89 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114848:2114848 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 90 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114834:2114834 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 76 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114813:2114813 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 55 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114840:2114840 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 82 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114851:2114851 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 93 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114841:2114841 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 83 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114835:2114835 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 77 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114823:2114823 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 65 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114833:2114833 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 75 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114842:2114842 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 84 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114839:2114839 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 81 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114817:2114817 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 59 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114836:2114836 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 78 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114821:2114821 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 63 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114811:2114811 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 53 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114846:2114846 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 88 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114838:2114838 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 80 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114819:2114819 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 61 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114827:2114827 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 69 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114815:2114815 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 57 and rank 48 both on CUDA device 3d000 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO Using network IB gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO Using network IB NCCL version 2.20.5+cuda12.4 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO Using network IB gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using non-device net plugin version 0 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO comm 0x10a92750 rank 26 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO comm 0x104cb1f0 rank 42 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO comm 0xf62b1c0 rank 24 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO comm 0xfa92df0 rank 45 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO comm 0x105193e0 rank 27 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO comm 0x108b2d70 rank 22 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO comm 0x10816150 rank 46 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO comm 0xf4e8500 rank 43 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO comm 0x10736890 rank 41 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO comm 0x10a4bd30 rank 44 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO comm 0xf82e600 rank 25 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO comm 0xfb70350 rank 2 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO comm 0x10bc39c0 rank 47 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO comm 0x10efa0c0 rank 30 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO comm 0xf566410 rank 23 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO comm 0x1009e210 rank 21 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO comm 0xf1f82c0 rank 28 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO comm 0x10806800 rank 40 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO comm 0xf7de670 rank 20 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO comm 0xfa7f1d0 rank 38 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO comm 0x103fb7f0 rank 29 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO comm 0xf5049d0 rank 10 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO comm 0xf92df70 rank 33 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO comm 0x10cef3f0 rank 8 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO comm 0xfdc35d0 rank 36 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO comm 0x1047cc10 rank 18 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO comm 0xf0fc910 rank 15 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO comm 0xf8a4ef0 rank 7 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO comm 0xf642200 rank 4 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO comm 0xf847210 rank 12 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO comm 0xf5dc800 rank 0 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO comm 0xf98e140 rank 16 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO comm 0x104066c0 rank 13 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO comm 0x107f9a50 rank 17 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO comm 0xfa14390 rank 37 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO comm 0x1003da20 rank 1 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO comm 0xfa5b6b0 rank 32 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO comm 0xf7c3fe0 rank 19 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO comm 0x108bb520 rank 39 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO comm 0x10bb97e0 rank 11 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO comm 0x1082e9c0 rank 34 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO comm 0xfa4eab0 rank 5 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO comm 0xf8a3420 rank 9 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO comm 0xf6e1580 rank 35 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO comm 0xf97ec40 rank 14 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO comm 0x10c3bbb0 rank 6 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO comm 0x10880ee0 rank 3 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO comm 0x10fc31f0 rank 31 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xef48ffd73c9f8756 - Init START gadi-gpu-v100-0092:524928:524928 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 5 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524929:524929 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 6 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524926:524926 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524931:524931 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 8 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524930:524930 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 7 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524927:524927 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 4 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524924:524924 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524925:524925 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524957:524957 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 34 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524932:524932 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 9 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524958:524958 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 35 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524949:524949 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 26 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524943:524943 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 20 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524961:524961 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 38 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524952:524952 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 29 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524933:524933 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 10 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524965:524965 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 42 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524947:524947 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 24 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524956:524956 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 33 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524968:524968 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 45 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524959:524959 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 36 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524941:524941 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 18 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524938:524938 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 15 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524950:524950 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 27 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524945:524945 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 22 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524935:524935 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 12 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524923:524923 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 3d000 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524939:524939 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 16 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524969:524969 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 46 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524966:524966 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 43 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524936:524936 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 13 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524940:524940 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 17 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524960:524960 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 37 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524955:524955 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 32 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524942:524942 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 19 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524962:524962 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 39 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524964:524964 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 41 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524967:524967 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 44 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524934:524934 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 11 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524948:524948 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 25 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524970:524970 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 47 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524953:524953 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 30 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524954:524954 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 31 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524937:524937 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 14 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524946:524946 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 23 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524944:524944 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 21 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524951:524951 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 28 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524963:524963 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 40 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524928:524928 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0092:524927:524927 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524929:524929 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524926:524926 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524931:524931 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524932:524932 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524925:524925 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524930:524930 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524957:524957 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524958:524958 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524933:524933 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524924:524924 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524934:524934 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524969:524969 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524939:524939 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524970:524970 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524937:524937 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524940:524940 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524967:524967 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524936:524936 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524960:524960 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524941:524941 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524935:524935 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524953:524953 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524965:524965 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524945:524945 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524962:524962 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524951:524951 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524950:524950 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524938:524938 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524946:524946 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524956:524956 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524944:524944 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524968:524968 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524955:524955 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524943:524943 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524948:524948 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524961:524961 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524963:524963 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524942:524942 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524964:524964 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524949:524949 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524959:524959 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524947:524947 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524923:524923 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524954:524954 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524952:524952 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0092:524966:524966 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO Using network IB Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return self.parallel_devices[self.local_rank] return component(**cfg) ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI sys.exit(main()) return _run_component(component, init.get(subcommand)) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component CLI(parser_data) return component(**cfg) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment All GPUs are fully connected via NVLink. self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component CLI(parser_data) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return _run_component(component, init.get(subcommand)) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return self._wrap_and_launch(function, self, *args, **kwargs) super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range self._strategy.setup_environment() sys.exit(main()) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) CLI(parser_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) CLI(parser_data) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return _run_component(component, init.get(subcommand)) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Initializing distributed: GLOBAL_RANK: 0, MEMBER: 1/96 Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return _run_component(component, init.get(subcommand)) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return component(**cfg) self._strategy.setup_environment() ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return _run_component(component, init.get(subcommand)) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Initializing distributed: GLOBAL_RANK: 2, MEMBER: 3/96 Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment Initializing distributed: GLOBAL_RANK: 3, MEMBER: 4/96 super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Initializing distributed: GLOBAL_RANK: 1, MEMBER: 2/96 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO comm 0x10b25590 rank 0 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO comm 0xfebde40 rank 41 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO comm 0x101a8df0 rank 1 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO comm 0x10cfb8a0 rank 31 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO comm 0xf8d1e30 rank 46 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO comm 0xf7a7230 rank 3 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO comm 0x10784c60 rank 26 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO comm 0xf6a7c30 rank 13 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO comm 0xf8f13a0 rank 9 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO comm 0x10339350 rank 28 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO comm 0xfe57530 rank 25 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO comm 0x10904230 rank 4 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO comm 0x10138470 rank 43 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO comm 0x10c039d0 rank 21 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO comm 0xf9ac1f0 rank 20 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO comm 0xf5f3390 rank 8 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO comm 0x102989c0 rank 23 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO comm 0x109e1240 rank 39 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO comm 0x1010b280 rank 37 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO comm 0x10e04230 rank 15 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO comm 0x106daec0 rank 2 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO comm 0x10626fb0 rank 47 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO comm 0xf5dc370 rank 38 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO comm 0x10b53a70 rank 10 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO comm 0x10216610 rank 27 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO comm 0x1046a350 rank 18 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO comm 0xf778580 rank 7 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO comm 0xfb98670 rank 42 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO comm 0x10a9d350 rank 34 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO comm 0x10fdc180 rank 45 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO comm 0xfc06f70 rank 44 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO comm 0x10b1bdc0 rank 35 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO comm 0xf0ff650 rank 17 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO comm 0x102d2560 rank 32 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO comm 0x100425e0 rank 29 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO comm 0xf905240 rank 36 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO comm 0xfcbe420 rank 12 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO comm 0xf26fb00 rank 24 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO comm 0xf4e0f00 rank 14 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO comm 0x10f1b610 rank 5 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO comm 0x100f5730 rank 33 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO comm 0x10ecf160 rank 30 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO comm 0xf54dfa0 rank 11 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO comm 0x101dd270 rank 19 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO comm 0x1078feb0 rank 40 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO comm 0xfcba5e0 rank 6 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO comm 0x1005e330 rank 22 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO comm 0xf6cf4a0 rank 16 nranks 48 cudaDev 0 nvmlDev 0 busId 3d000 commId 0xf3d32ee4a40c9c2c - Init START gadi-gpu-v100-0095:2114844:2114844 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 38 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114844:2114844 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114848:2114848 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 42 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114851:2114851 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 45 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114850:2114850 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 44 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114841:2114841 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 35 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114842:2114842 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 36 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114846:2114846 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 40 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114846:2114846 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114847:2114847 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 41 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114847:2114847 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114852:2114852 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 46 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114832:2114832 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 26 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114819:2114819 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 13 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114815:2114815 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 9 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114834:2114834 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 28 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114831:2114831 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 25 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114810:2114810 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 4 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114849:2114849 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 43 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114849:2114849 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114827:2114827 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 21 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114826:2114826 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 20 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114814:2114814 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 8 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114829:2114829 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 23 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114845:2114845 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 39 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114845:2114845 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114843:2114843 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 37 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114843:2114843 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114821:2114821 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 15 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114822:2114822 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 16 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114853:2114853 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 47 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114816:2114816 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 10 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114816:2114816 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114833:2114833 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 27 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114824:2114824 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 18 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114813:2114813 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 7 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114840:2114840 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 34 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114848:2114848 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114850:2114850 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114823:2114823 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 17 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114823:2114823 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114838:2114838 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 32 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114835:2114835 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 29 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114818:2114818 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 12 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114818:2114818 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114830:2114830 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 24 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114820:2114820 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 14 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114820:2114820 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114811:2114811 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 5 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114811:2114811 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114839:2114839 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 33 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114836:2114836 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 30 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114817:2114817 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 11 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114825:2114825 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 19 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114825:2114825 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114842:2114842 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114812:2114812 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 6 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114828:2114828 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 22 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114828:2114828 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114837:2114837 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 31 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114852:2114852 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114819:2114819 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114815:2114815 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114810:2114810 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114826:2114826 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114814:2114814 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114822:2114822 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114821:2114821 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114824:2114824 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114813:2114813 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114851:2114851 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114841:2114841 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114835:2114835 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114830:2114830 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114839:2114839 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114836:2114836 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114817:2114817 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114812:2114812 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114837:2114837 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114832:2114832 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114834:2114834 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114831:2114831 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114827:2114827 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114829:2114829 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114833:2114833 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114840:2114840 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114838:2114838 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114853:2114853 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 0 and rank 1 both on CUDA device 3d000 gadi-gpu-v100-0095:2114807:2114807 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 1 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114809:2114809 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 3 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114808:2114808 [0] init.cc:871 NCCL WARN Duplicate GPU detected : rank 2 and rank 0 both on CUDA device 3d000 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114807:2114807 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114809:2114809 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114808:2114808 [0] NCCL INFO init.cc:1784 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1501 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1746 -> 5 gadi-gpu-v100-0095:2114806:2114806 [0] NCCL INFO init.cc:1784 -> 5 Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return _run_component(component, init.get(subcommand)) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return _run_component(component, init.get(subcommand)) return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return self._wrap_and_launch(function, self, *args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return component(**cfg) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup self._strategy.setup_environment() return to_run(*args, **kwargs) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return to_run(*args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> self.accelerator.setup_device(self.root_device) self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment IndexError: list index out of range return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range sys.exit(main()) return component(**cfg) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup CLI(parser_data) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return to_run(*args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return to_run(*args, **kwargs) self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return _run_component(component, init.get(subcommand)) self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range IndexError: list index out of range self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) CLI(parser_data) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI IndexError: list index out of range return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) CLI(parser_data) return _run_component(component, init.get(subcommand)) return _run_component(component, init.get(subcommand)) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return _run_component(component, init.get(subcommand)) return component(**cfg) return component(**cfg) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return to_run(*args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup CLI(parser_data) CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return to_run(*args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return self.parallel_devices[self.local_rank] super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range IndexError: list index out of range return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self.accelerator.setup_device(self.root_device) self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> CLI(parser_data) return _run_component(component, init.get(subcommand)) sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) return _run_component(component, init.get(subcommand)) return component(**cfg) CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return _run_component(component, init.get(subcommand)) return component(**cfg) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) self._strategy.setup_environment() return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment CLI(parser_data) self.accelerator.setup_device(self.root_device) Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device IndexError: list index out of range return _run_component(component, init.get(subcommand)) self.accelerator.setup_device(self.root_device) sys.exit(main()) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device IndexError: list index out of range Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI IndexError: list index out of range CLI(parser_data) return _run_component(component, init.get(subcommand)) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return _run_component(component, init.get(subcommand)) return _run_component(component, init.get(subcommand)) sys.exit(main()) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> CLI(parser_data) sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return self._wrap_and_launch(function, self, *args, **kwargs) Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self._strategy.setup_environment() return to_run(*args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return _run_component(component, init.get(subcommand)) CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self._wrap_and_launch(function, self, *args, **kwargs) self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment Traceback (most recent call last): File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI sys.exit(main()) ^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup IndexError: list index out of range File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup CLI(parser_data) return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component CLI(parser_data) File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component return _run_component(component, init.get(subcommand)) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment return _run_component(component, init.get(subcommand)) return _run_component(component, init.get(subcommand)) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component super().setup_environment() return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) return component(**cfg) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self.parallel_devices[self.local_rank] self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch IndexError: list index out of range return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return to_run(*args, **kwargs) return to_run(*args, **kwargs) return self._wrap_and_launch(function, self, *args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ return self._wrap_and_launch(function, self, *args, **kwargs) return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch return self.parallel_devices[self.local_rank] return self.parallel_devices[self.local_rank] super().setup_environment() ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() return to_run(*args, **kwargs) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup return to_run(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 932, in _wrap_with_setup super().setup_environment() self.accelerator.setup_device(self.root_device) File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment super().setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment self._strategy.setup_environment() self._strategy.setup_environment() File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 259, in setup_environment self.accelerator.setup_device(self.root_device) super().setup_environment() self.accelerator.setup_device(self.root_device) self.accelerator.setup_device(self.root_device) self.accelerator.setup_device(self.root_device) self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ super().setup_environment() return self.parallel_devices[self.local_rank] self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device self.accelerator.setup_device(self.root_device) ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/strategy.py", line 109, in setup_environment ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range self.accelerator.setup_device(self.root_device) self.accelerator.setup_device(self.root_device) return self.parallel_devices[self.local_rank] ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ ^^^^^^^^^^^^^^^^ File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/strategies/fsdp.py", line 203, in root_device return self.parallel_devices[self.local_rank] return self.parallel_devices[self.local_rank] return self.parallel_devices[self.local_rank] return self.parallel_devices[self.local_rank] IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range IndexError: list index out of range IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range return self.parallel_devices[self.local_rank] ~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^ IndexError: list index out of range All GPUs are fully connected via NVLink. Initializing distributed: GLOBAL_RANK: 48, MEMBER: 49/96 Initializing distributed: GLOBAL_RANK: 49, MEMBER: 50/96 Initializing distributed: GLOBAL_RANK: 50, MEMBER: 51/96 Initializing distributed: GLOBAL_RANK: 51, MEMBER: 52/96 =>> PBS: job killed: walltime 602 exceeded limit 600 [rank51]: Traceback (most recent call last): [rank51]: File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> [rank51]: sys.exit(main()) [rank51]: ^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main [rank51]: CLI(parser_data) [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI [rank51]: return _run_component(component, init.get(subcommand)) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component [rank51]: return component(**cfg) [rank51]: ^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup [rank51]: fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch [rank51]: return self._wrap_and_launch(function, self, *args, **kwargs) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch [rank51]: return to_run(*args, **kwargs) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup [rank51]: return to_run(*args, **kwargs) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main [rank51]: train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders [rank51]: with fabric.rank_zero_first(): [rank51]: File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__ [rank51]: return next(self.gen) [rank51]: ^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first [rank51]: with _InfiniteBarrier() as barrier: [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__ [rank51]: self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000)) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper [rank51]: func_return = func(*args, **kwargs) [rank51]: ^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group [rank51]: return _new_group_with_tag( [rank51]: ^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag [rank51]: pg, pg_store = _new_process_group_helper( [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper [rank51]: backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout) [rank51]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank51]: torch.distributed.DistNetworkError: Connection reset by peer [rank49]: Traceback (most recent call last): [rank49]: File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> [rank49]: sys.exit(main()) [rank49]: ^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main [rank49]: CLI(parser_data) [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI [rank49]: return _run_component(component, init.get(subcommand)) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component [rank49]: return component(**cfg) [rank49]: ^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup [rank49]: fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch [rank49]: return self._wrap_and_launch(function, self, *args, **kwargs) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch [rank49]: return to_run(*args, **kwargs) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup [rank49]: return to_run(*args, **kwargs) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main [rank49]: train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders [rank49]: with fabric.rank_zero_first(): [rank49]: File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__ [rank49]: return next(self.gen) [rank49]: ^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first [rank49]: with _InfiniteBarrier() as barrier: [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__ [rank49]: self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000)) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper [rank49]: func_return = func(*args, **kwargs) [rank49]: ^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group [rank49]: return _new_group_with_tag( [rank49]: ^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag [rank49]: pg, pg_store = _new_process_group_helper( [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper [rank49]: backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout) [rank49]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank49]: torch.distributed.DistNetworkError: Connection reset by peer [rank50]: Traceback (most recent call last): [rank50]: File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> [rank50]: sys.exit(main()) [rank50]: ^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main [rank50]: CLI(parser_data) [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI [rank50]: return _run_component(component, init.get(subcommand)) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component [rank50]: return component(**cfg) [rank50]: ^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup [rank50]: fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch [rank50]: return self._wrap_and_launch(function, self, *args, **kwargs) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch [rank50]: return to_run(*args, **kwargs) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup [rank50]: return to_run(*args, **kwargs) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main [rank50]: train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders [rank50]: with fabric.rank_zero_first(): [rank50]: File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__ [rank50]: return next(self.gen) [rank50]: ^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first [rank50]: with _InfiniteBarrier() as barrier: [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__ [rank50]: self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000)) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper [rank50]: func_return = func(*args, **kwargs) [rank50]: ^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group [rank50]: return _new_group_with_tag( [rank50]: ^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag [rank50]: pg, pg_store = _new_process_group_helper( [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper [rank50]: backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout) [rank50]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank50]: torch.distributed.DistNetworkError: Connection reset by peer [rank48]: Traceback (most recent call last): [rank48]: File "/home/552/cl2868/.local/bin/litgpt", line 8, in <module> [rank48]: sys.exit(main()) [rank48]: ^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/__main__.py", line 71, in main [rank48]: CLI(parser_data) [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 119, in CLI [rank48]: return _run_component(component, init.get(subcommand)) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/jsonargparse/_cli.py", line 204, in _run_component [rank48]: return component(**cfg) [rank48]: ^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 120, in setup [rank48]: fabric.launch(main, devices, resume, seed, config, data, checkpoint_dir, out_dir, train, eval, optimizer) [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 843, in launch [rank48]: return self._wrap_and_launch(function, self, *args, **kwargs) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 929, in _wrap_and_launch [rank48]: return to_run(*args, **kwargs) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 934, in _wrap_with_setup [rank48]: return to_run(*args, **kwargs) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 139, in main [rank48]: train_dataloader, val_dataloader = get_dataloaders(fabric, data, tokenizer, train) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/litgpt/finetune/full.py", line 376, in get_dataloaders [rank48]: with fabric.rank_zero_first(): [rank48]: File "/apps/python3/3.12.1/lib/python3.12/contextlib.py", line 137, in __enter__ [rank48]: return next(self.gen) [rank48]: ^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/fabric.py", line 635, in rank_zero_first [rank48]: with _InfiniteBarrier() as barrier: [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/lightning/fabric/utilities/distributed.py", line 422, in __enter__ [rank48]: self.group = torch.distributed.new_group(backend="gloo", timeout=timedelta(days=10000)) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/c10d_logger.py", line 93, in wrapper [rank48]: func_return = func(*args, **kwargs) [rank48]: ^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4125, in new_group [rank48]: return _new_group_with_tag( [rank48]: ^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 4205, in _new_group_with_tag [rank48]: pg, pg_store = _new_process_group_helper( [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: File "/home/552/cl2868/.local/lib/python3.12/site-packages/torch/distributed/distributed_c10d.py", line 1569, in _new_process_group_helper [rank48]: backend_class = ProcessGroupGloo(backend_prefix_store, group_rank, group_size, timeout=timeout) [rank48]: ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ [rank48]: torch.distributed.DistNetworkError: Connection reset by peer ====================================================================================== Resource Usage on 2024-09-25 22:32:29: Job Id: 125448206.gadi-pbs Project: pi13 Exit Status: -29 (Job failed due to exceeding walltime) Service Units: 49.04 NCPUs Requested: 96 NCPUs Used: 96 CPU Time Used: 01:45:13 Memory Requested: 256.0GB Memory Used: 82.02GB NGPUs Requested: 8 GPU Utilisation: 6% GPU Memory Used: 2.49GB Walltime requested: 00:10:00 Walltime Used: 00:10:13 JobFS requested: 200.0MB JobFS used: 0B ======================================================================================
Leave a Comment