Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docker image doesn't work #7

Open
Skyy93 opened this issue Jul 14, 2023 · 2 comments
Open

Docker image doesn't work #7

Skyy93 opened this issue Jul 14, 2023 · 2 comments

Comments

@Skyy93
Copy link

Skyy93 commented Jul 14, 2023

Hello, thank you for your amazing work! I want to try it and used the docker instructions you provided here:
https://github.com/nianticlabs/nerf-object-removal/blob/main/docker/README.md

The image builds correctly and runs but when I try your example command i get the following message in the logs:

[2023-07-14 13:53:18,408][saicinpainting.training.trainers.base][INFO] - BaseInpaintingTrainingModule init done
[2023-07-14 13:53:18,627][__main__][CRITICAL] - Prediction failed due to Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination:
Traceback (most recent call last):
  File "bin/predict.py", line 59, in main
    model.to(device)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/pytorch_lightning/core/decorators.py", line 89, in inner_fn
    module = fn(self, *args, **kwargs)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/pytorch_lightning/utilities/device_dtype_mixin.py", line 120, in to
    return super().to(*args, **kwargs)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1145, in to
    return self._apply(convert)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 2 more times]
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 820, in _apply
    param_applied = fn(param)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
  File "/opt/conda/envs/object-removal/lib/python3.8/site-packages/torch/cuda/__init__.py", line 247, in _lazy_init
    torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 803: system has unsupported display driver / cuda driver combination

Because of this a following error occurcs

FileNotFoundError: [Errno 2] No such file or directory: '/app/object-removal/experiments/real/001/data/../lama_depth_output_real/000_mask001.png'

and also fails JAX to find a GPU

W0714 13:53:32.249409 140354252236608 xla_bridge.py:363] No GPU/TPU found, falling back to CPU. (Set TF_CPP_MIN_LOG_LEVEL=0 and rerun for more info.)

I have a RTX 4090 with this driver and cuda version in the docker container: Driver Version: 535.54.03 CUDA Version: 11.8

Could you please look into it? I tried to use another Cuda12.0 Container as base image then the pytorch error resolves but not the JAX error that implies it does not find the GPU.

Thank you

@520xyxyzq
Copy link

Hi Skyy93, did you solve the problem? I encountered the exact same problem.

@sbhavani
Copy link

sbhavani commented Nov 9, 2023

NVIDIA's nightly JAX containers are available here: https://github.com/NVIDIA/JAX-Toolbox with open Dockerfiles. I'd recommend starting from a base image here and adding PyTorch and other libs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants