分类
devops

cuda技术总结

driver

./NVIDIA-Linux-x86_64-550.54.14.run --check
check sums and md5 sums are ok
./NVIDIA-Linux-x86_64-550.54.14.run --info

  Identification    : NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.54.14
  Target directory  : NVIDIA-Linux-x86_64-550.54.14
  Uncompressed size : 1048037 KB
  Compression       : zstd
  Date of packaging : Thu Feb 22 02:54:13 UTC 2024
  Application run after extraction : ./nvidia-installer 

  The directory NVIDIA-Linux-x86_64-550.54.14 will be removed after extraction.
./NVIDIA-Linux-x86_64-550.54.14.run --version
Verifying archive integrity... OK
Uncompressing NVIDIA Accelerated Graphics Driver for Linux-x86_64 550.54.14........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................

nvidia-installer:  version 550.54.14
  The NVIDIA Software Installer for Unix/Linux.

  This program is used to install, upgrade and uninstall The NVIDIA Accelerated Graphics Driver Set for Linux-x86_64.
The target kernel has CONFIG_MODULE_SIG set, which means that it supports cryptographic signatures on kernel modules. On some systems, the kernel may refuse to load modules without a valid signature from a
  trusted key. This system also has UEFI Secure Boot enabled; many distributions enforce module signature verification on UEFI systems when Secure Boot is enabled. Would you like to sign the NVIDIA kernel
  module?  
Are you sure you want to continue?    --> continue installation
building kernel modules......
would you like to sign the NVIDIA kernel module?   --> sigh the kernel module
would you like to generate a new one?   --> generate a new key pair
would you like to delete the private signing key?  --> no
certification  --> ok
key  --> ok
this will likely require rebooting your computer.  --> install signed kernel module
...  --> ok
would you like to run the nvidia-xconfig utility... update?  --> no
see for details  --> ok
ERROR: The kernel module failed to load. Secure boot is enabled on this system, so this is likely because it was not signed by a key that is trusted by the kernel. Please try installing the driver again, and sign the kernel module when prompted to do so.


ERROR: Unable to load the kernel module 'nvidia.ko'.  This happens most frequently when this kernel module was built against the wrong or improperly configured kernel sources, with a version of gcc that differs from the one used to build the target kernel, or if another
       driver, such as nouveau, is present and prevents the NVIDIA kernel module from obtaining ownership of the NVIDIA device(s), or no NVIDIA device installed in this system is supported by this NVIDIA Linux graphics driver release.

       Please see the log entries 'Kernel module load error' and 'Kernel messages' at the end of the file '/var/log/nvidia-installer.log' for more information.


ERROR: Installation has failed.  Please see the file '/var/log/nvidia-installer.log' for details.  You may find suggestions on fixing installation problems in the README available on the Linux driver download page at www.nvidia.com.

modprobe -vvv nvidia
modprobe: INFO: ../libkmod/libkmod.c:367 kmod_set_log_fn() custom logging function 0x55878361dd70 registered
modprobe: DEBUG: ../libkmod/libkmod-index.c:757 index_mm_open() file=/lib/modules/6.1.0-18-amd64/modules.dep.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:757 index_mm_open() file=/lib/modules/6.1.0-18-amd64/modules.alias.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:757 index_mm_open() file=/lib/modules/6.1.0-18-amd64/modules.symbols.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:757 index_mm_open() file=/lib/modules/6.1.0-18-amd64/modules.builtin.alias.bin
modprobe: DEBUG: ../libkmod/libkmod-index.c:757 index_mm_open() file=/lib/modules/6.1.0-18-amd64/modules.builtin.bin
modprobe: DEBUG: ../libkmod/libkmod-module.c:579 kmod_module_new_from_lookup() input alias=nvidia, normalized=nvidia
modprobe: DEBUG: ../libkmod/libkmod.c:597 kmod_search_moddep() use mmaped index 'modules.dep' modname=nvidia
modprobe: DEBUG: ../libkmod/libkmod.c:405 kmod_pool_get_module() get module name='nvidia' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:413 kmod_pool_add_module() add 0x5587839d9fd0 key='nvidia'
modprobe: DEBUG: ../libkmod/libkmod.c:405 kmod_pool_get_module() get module name='drm' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:405 kmod_pool_get_module() get module name='drm' found=(nil)
modprobe: DEBUG: ../libkmod/libkmod.c:413 kmod_pool_add_module() add 0x5587839da0e0 key='drm'
modprobe: DEBUG: ../libkmod/libkmod-module.c:196 kmod_module_parse_depline() add dep: /lib/modules/6.1.0-18-amd64/kernel/drivers/gpu/drm/drm.ko
modprobe: DEBUG: ../libkmod/libkmod-module.c:202 kmod_module_parse_depline() 1 dependencies for nvidia
modprobe: DEBUG: ../libkmod/libkmod-module.c:584 kmod_module_new_from_lookup() lookup=nvidia found=1
modprobe: DEBUG: ../libkmod/libkmod.c:502 lookup_builtin_file() use mmaped index 'modules.builtin' modname=nvidia
modprobe: DEBUG: ../libkmod/libkmod-module.c:1817 kmod_module_get_initstate() could not open '/sys/module/nvidia/initstate': No such file or directory
modprobe: DEBUG: ../libkmod/libkmod-module.c:1827 kmod_module_get_initstate() could not open '/sys/module/nvidia': No such file or directory
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_pcsp mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=cx88_alsa mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_atiixp_modem mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_intel8x0m mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_via82xx_modem mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=nouveau mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=bonding mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=dummy mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=ifb mod->name=drm mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1373 kmod_module_probe_insert_module() Ignoring module 'drm': already loaded
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_pcsp mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=cx88_alsa mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_atiixp_modem mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_intel8x0m mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=snd_via82xx_modem mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=nouveau mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=bonding mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=dummy mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1461 kmod_module_get_options() modname=ifb mod->name=nvidia mod->alias=(null)
modprobe: DEBUG: ../libkmod/libkmod-module.c:1817 kmod_module_get_initstate() could not open '/sys/module/nvidia/initstate': No such file or directory
modprobe: DEBUG: ../libkmod/libkmod-module.c:1827 kmod_module_get_initstate() could not open '/sys/module/nvidia': No such file or directory
modprobe: DEBUG: ../libkmod/libkmod-module.c:802 kmod_module_get_path() name='nvidia' path='/lib/modules/6.1.0-18-amd64/updates/dkms/nvidia.ko'
modprobe: DEBUG: ../libkmod/libkmod-module.c:802 kmod_module_get_path() name='nvidia' path='/lib/modules/6.1.0-18-amd64/updates/dkms/nvidia.ko'
insmod /lib/modules/6.1.0-18-amd64/updates/dkms/nvidia.ko 
modprobe: DEBUG: ../libkmod/libkmod-module.c:802 kmod_module_get_path() name='nvidia' path='/lib/modules/6.1.0-18-amd64/updates/dkms/nvidia.ko'
modprobe: INFO: ../libkmod/libkmod-module.c:949 kmod_module_insert_module() Failed to insert module '/lib/modules/6.1.0-18-amd64/updates/dkms/nvidia.ko': Key was rejected by service
modprobe: ERROR: could not insert 'nvidia': Key was rejected by service
modprobe: DEBUG: ../libkmod/libkmod-module.c:469 kmod_module_unref() kmod_module 0x5587839d9fd0 released
modprobe: DEBUG: ../libkmod/libkmod.c:421 kmod_pool_del_module() del 0x5587839d9fd0 key='nvidia'
modprobe: DEBUG: ../libkmod/libkmod-module.c:469 kmod_module_unref() kmod_module 0x5587839da0e0 released
modprobe: DEBUG: ../libkmod/libkmod.c:421 kmod_pool_del_module() del 0x5587839da0e0 key='drm'
modprobe: INFO: ../libkmod/libkmod.c:334 kmod_unref() context 0x5587839d9480 released

Mar 12 10:25:40 debian kernel: nvidia_uvm: module uses symbols nvUvmInterfaceDisableAccessCntr from proprietary module nvidia, inheriting taint.
Mar 12 10:25:40 debian kernel: nvidia-uvm: Loaded the UVM driver, major device number 236.
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
Mar 12 10:25:44 debian kernel: ACPI BIOS Error (bug): Failure creating named object [\_SB.PC00.PEG1.PEGP._DSM.USRG], AE_ALREADY_EXISTS (20220331/dsfield-184)
Mar 12 10:25:44 debian kernel: ACPI Error: AE_ALREADY_EXISTS, CreateBufferField failure (20220331/dswload2-477)
Mar 12 10:25:44 debian kernel: ACPI Error: Aborting method \_SB.PC00.PEG1.PEGP._DSM due to previous error (AE_ALREADY_EXISTS) (20220331/psparse-529)
cat /proc/driver/nvidia/version 
NVRM version: NVIDIA UNIX x86_64 Kernel Module  550.54.14  Thu Feb 22 01:44:30 UTC 2024
GCC version:  gcc version 12.2.0 (Debian 12.2.0-14) 

cuda

wget https://developer.download.nvidia.com/compute/cuda/repos/debian12/x86_64/cuda-keyring_1.1-1_all.deb
dpkg -i cuda-keyring_1.1-1_all.deb
apt update
apt-get -y install cuda


Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 libcuda1 : PreDepends: nvidia-legacy-check (>= 396) but it is not going to be installed
            Depends: nvidia-support but it is not installable
            Recommends: nvidia-kernel-dkms (= 550.54.14-1) but it is not installable or
                        nvidia-kernel-550.54.14 or
                        nvidia-open-kernel-550.54.14 but it is not installable or
                        nvidia-open-kernel-550.54.14 but it is not installable
            Recommends: libnvidia-cfg1 (= 550.54.14-1) but it is not going to be installed
            Recommends: nvidia-persistenced but it is not going to be installed
            Recommends: libcuda1:i386 (= 550.54.14-1)
 nvidia-alternative : PreDepends: nvidia-legacy-check (>= 396) but it is not going to be installed
                      Depends: glx-alternative-nvidia (>= 1.2) but it is not installable
 nvidia-driver : PreDepends: nvidia-installer-cleanup but it is not installable
                 PreDepends: nvidia-legacy-check (>= 396) but it is not going to be installed
                 Depends: xserver-xorg-video-nvidia (= 550.54.14-1) but it is not going to be installed
                 Depends: nvidia-vdpau-driver (= 550.54.14-1) but it is not going to be installed
                 Depends: nvidia-kernel-dkms (= 550.54.14-1) but it is not installable or
                          nvidia-kernel-550.54.14 or
                          nvidia-open-kernel-550.54.14 but it is not installable or
                          nvidia-open-kernel-550.54.14 but it is not installable
                 Depends: nvidia-support but it is not installable
                 Recommends: libnvidia-cfg1 (= 550.54.14-1) but it is not going to be installed
                 Recommends: nvidia-persistenced but it is not going to be installed
 nvidia-settings : PreDepends: nvidia-installer-cleanup but it is not installable
                   Recommends: nvidia-vdpau-driver but it is not going to be installed
 nvidia-xconfig : PreDepends: nvidia-installer-cleanup but it is not installable
E: Unable to correct problems, you have held broken packages.


wget https://developer.download.nvidia.com/compute/cuda/12.4.0/local_installers/cuda_12.4.0_550.54.14_linux.run
sudo sh cuda_12.4.0_550.54.14_linux.run


cat /var/log/cuda-installer.log
[INFO]: Driver not installed.
[INFO]: Checking compiler version...
[INFO]: gcc location: /usr/bin/gcc

[INFO]: gcc version: gcc version 12.2.0 (Debian 12.2.0-14) 

[INFO]: Initializing menu
[INFO]: nvidia-fs.setKOVersion(2.19.6)
[INFO]: Setup complete
[INFO]: Installing: Driver
[INFO]: Installing: 550.54.14
[INFO]: Executing NVIDIA-Linux-x86_64-550.54.14.run --ui=none --no-questions --accept-license --disable-nouveau --no-cc-version-check --install-libglvnd  2>&1
[INFO]: Finished with code: 256
[ERROR]: Install of driver component failed. Consult the driver log at /var/log/nvidia-installer.log for more details.
[ERROR]: Install of 550.54.14 failed, quitting


cat > /etc/ld.so.conf.d/cuda.conf<<EOF
/usr/local/cuda-12.4/lib64
EOF
ldconfig

cat ~/.bashrc<<'EOF'
# add nvcc compiler to path
export PATH=$PATH:/usr/local/cuda-12.4/bin
EOF


cat > /etc/profile.d/myenv.sh<<'EOF'
PATH=$PATH:/usr/local/cuda-12.4/bin
EOF

nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2024 NVIDIA Corporation
Built on Tue_Feb_27_16:19:38_PST_2024
Cuda compilation tools, release 12.4, V12.4.99
Build cuda_12.4.r12.4/compiler.33961263_0


cuDNN

wget https://developer.download.nvidia.com/compute/cudnn/redist/cudnn/linux-x86_64/cudnn-linux-x86_64-9.0.0.312_cuda12-archive.tar.xz

nvidia-docker

curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey |  gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
tee /etc/apt/sources.list.d/nvidia-container-toolkit.list

apt udpate
apt-get install -y nvidia-container-toolkit

nvidia-ctk runtime configure --runtime=docker && systemctl restart docker



docker pull nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04

docker run --runtime=nvidia --rm nvidia/cuda:12.3.2-cudnn9-devel-ubuntu22.04 nvidia-smi


docker run --runtime=nvidia -it --privileged --rm cuda:dev bash

anaconda

Anaconda是一个开源的Python和R语言的发行版本,用于计算科学(数据科学、机器学习、大数据处理和预测分析),Anaconda致力于简化软件包管理系统和部署。 Anaconda透过Conda进行软件包管理,并拥有许多适用于Windows、Linux和MacOS的数据科学软件包。

wget https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh
Do you accept the license terms? [yes|no]
>>> yes

Anaconda3 will now be installed into this location:
/root/anaconda3

  - Press ENTER to confirm the location
  - Press CTRL-C to abort the installation
  - Or specify a different location below

[/root/anaconda3] >>> /opt/anaconda3
PREFIX=/opt/anaconda3
Unpacking payload ...

Installing base environment...


Downloading and Extracting Packages:


Downloading and Extracting Packages:

Preparing transaction: done
Executing transaction: - 

    Installed package of scikit-learn can be accelerated using scikit-learn-intelex.
    More details are available here: https://intel.github.io/scikit-learn-intelex

    For example:

        $ conda install scikit-learn-intelex
        $ python -m sklearnex my_application.py



done
installation finished.
Do you wish to update your shell profile to automatically initialize conda?
This will activate conda on startup and change the command prompt when activated.
If you'd prefer that conda's base environment not be activated on startup,
   run the following command when conda is activated:

conda config --set auto_activate_base false

You can undo this by running `conda init --reverse $SHELL`? [yes|no]


[no] >>> no

You have chosen to not have conda modify your shell scripts at all.
To activate conda's base environment in your current shell session:

eval "$(/opt/anaconda3/bin/conda shell.YOUR_SHELL_NAME hook)" 

To install conda's shell functions for easier access, first activate, then:

conda init

Thank you for installing Anaconda3!



eval "$(/opt/anaconda3/bin/conda shell.bash hook)"

conda install pytorch torchvision torchaudio pytorch-cuda=12.1 -c pytorch -c nvidia


ollama

curl -fsSL https://ollama.com/install.sh | sh


./ollama-linux-amd64 --help
Large language model runner

Usage:
  ollama [flags]
  ollama [command]

Available Commands:
  serve       Start ollama
  create      Create a model from a Modelfile
  show        Show information for a model
  run         Run a model
  pull        Pull a model from a registry
  push        Push a model to a registry
  list        List models
  cp          Copy a model
  rm          Remove a model
  help        Help about any command

Flags:
  -h, --help      help for ollama
  -v, --version   Show version information

Use "ollama [command] --help" for more information about a command.



ollama pull codellama:13b
ollama pull dolphin-mixtral:v2.7
ollama pull gemma:7b
ollama pull gemma:7b-instruct
ollama pull llama2:13b
ollama pull mistral:v0.2
ollama pull qwen:14b
ollama pull openchat:7b

ref