Untitled
unknown
plain_text
2 years ago
18 kB
6
Indexable
It has something to do with cuda version i think ishfaqha@login-1:~/code/LangevinMCRL/output/atari1_fgts$ cat 1.txt Current working directory: /home/mila/i/ishfaqha/code/LangevinMCRL Starting run at: Wed Jul 26 15:40:35 EDT 2023 Job Array ID / Job ID: 3431041 / 3431041 This is job 1 out of 1 jobs SLURM_TMPDIR: /Tmp/slurm.3431041.0 SLURM_JOB_NODELIST: cn-g017 [=== Module miniconda/3 loaded ===] /home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/cuda/__init__.py:146: UserWarning: NVIDIA A100-SXM4-80GB MIG 3g.40gb with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-SXM4-80GB MIG 3g.40gb GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) /home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/cuda/__init__.py:146: UserWarning: NVIDIA A100-SXM4-80GB MIG 3g.40gb with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-SXM4-80GB MIG 3g.40gb GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) /home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/cuda/__init__.py:146: UserWarning: NVIDIA A100-SXM4-80GB MIG 3g.40gb with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-SXM4-80GB MIG 3g.40gb GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Traceback (most recent call last): File "main.py", line 58, in <module> main(sys.argv) File "main.py", line 55, in main exp.run() File "/home/mila/i/ishfaqha/code/LangevinMCRL/experiment.py", line 43, in run self.agent.run_steps() File "/home/mila/i/ishfaqha/code/LangevinMCRL/agents/FGTSDQN.py", line 81, in run_steps self.collectors['Train'].collect(n_step=self.batch_size * self.cfg['env']['train_num']) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/tianshou/data/collector.py", line 297, in collect result = self.policy(self.data, last_state) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/policy.py", line 97, in forward logits, hidden = model(obs_next, state=state, info=batch.info, head=head) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 82, in forward logits, state = self.nets[head](obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 104, in forward q_values = self._network(obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Traceback (most recent call last): File "main.py", line 58, in <module> main(sys.argv) File "main.py", line 55, in main exp.run() File "/home/mila/i/ishfaqha/code/LangevinMCRL/experiment.py", line 43, in run self.agent.run_steps() File "/home/mila/i/ishfaqha/code/LangevinMCRL/agents/FGTSDQN.py", line 81, in run_steps self.collectors['Train'].collect(n_step=self.batch_size * self.cfg['env']['train_num']) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/tianshou/data/collector.py", line 297, in collect result = self.policy(self.data, last_state) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/policy.py", line 97, in forward logits, hidden = model(obs_next, state=state, info=batch.info, head=head) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 82, in forward logits, state = self.nets[head](obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 104, in forward q_values = self._network(obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Traceback (most recent call last): File "main.py", line 58, in <module> main(sys.argv) File "main.py", line 55, in main exp.run() File "/home/mila/i/ishfaqha/code/LangevinMCRL/experiment.py", line 43, in run self.agent.run_steps() File "/home/mila/i/ishfaqha/code/LangevinMCRL/agents/FGTSDQN.py", line 81, in run_steps self.collectors['Train'].collect(n_step=self.batch_size * self.cfg['env']['train_num']) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/tianshou/data/collector.py", line 297, in collect result = self.policy(self.data, last_state) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/policy.py", line 97, in forward logits, hidden = model(obs_next, state=state, info=batch.info, head=head) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 82, in forward logits, state = self.nets[head](obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 104, in forward q_values = self._network(obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. /home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/cuda/__init__.py:146: UserWarning: NVIDIA A100-SXM4-80GB MIG 3g.40gb with CUDA capability sm_80 is not compatible with the current PyTorch installation. The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70. If you want to use the NVIDIA A100-SXM4-80GB MIG 3g.40gb GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/ warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name)) Traceback (most recent call last): File "main.py", line 58, in <module> main(sys.argv) File "main.py", line 55, in main exp.run() File "/home/mila/i/ishfaqha/code/LangevinMCRL/experiment.py", line 43, in run self.agent.run_steps() File "/home/mila/i/ishfaqha/code/LangevinMCRL/agents/FGTSDQN.py", line 81, in run_steps self.collectors['Train'].collect(n_step=self.batch_size * self.cfg['env']['train_num']) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/tianshou/data/collector.py", line 297, in collect result = self.policy(self.data, last_state) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/policy.py", line 97, in forward logits, hidden = model(obs_next, state=state, info=batch.info, head=head) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 82, in forward logits, state = self.nets[head](obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/code/LangevinMCRL/components/ensemble_langevin_dqn/network.py", line 104, in forward q_values = self._network(obs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/home/mila/i/ishfaqha/envs/tianshou/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 454, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA error: no kernel image is available for execution on the device CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Job finished with exit code 4 at: Wed Jul 26 15:40:48 EDT 2023 Copy log files from temporary directory Source directory: /Tmp/slurm.3431041.0/atari1_fgts/. Destination directory: ./logs/atari1_fgts/ ======== GPU REPORT ======== ==============NVSMI LOG============== Timestamp : Wed Jul 26 15:40:49 2023 Driver Version : 515.65.01 CUDA Version : 11.7 Attached GPUs : 1 GPU 00000000:81:00.0 Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Accounted Processes : None Wed Jul 26 15:40:49 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 515.65.01 Driver Version: 515.65.01 CUDA Version: 11.7 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA A100-SXM... On | 00000000:81:00.0 Off | On | | N/A 28C P0 80W / 500W | 24MiB / 81920MiB | N/A Default | | | | Enabled | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | MIG devices: | +------------------+----------------------+-----------+-----------------------+ | GPU GI CI MIG | Memory-Usage | Vol| Shared | | ID ID Dev | BAR1-Usage | SM Unc| CE ENC DEC OFA JPG| | | | ECC| | |==================+======================+===========+=======================| | 0 2 0 0 | 10MiB / 40448MiB | 42 0 | 3 0 2 0 0 | | | 0MiB / 65535MiB | | | +------------------+----------------------+-----------+-----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+ (tianshou) ishfaqha@login-1:~/code/LangevinMCRL/output/atari1_fgts$
Editor is loading...