Changes

Jump to navigation Jump to search
4,618 bytes removed ,  19:40, 13 November 2020
no edit summary
This page details the build of our [[DIGITS DevBox]]. There's also a page giving information on [[Using the DevBox]]. nVIDIA, famous for their incredibly poor supply-chain and inventory management, have been saying [https://developer.nvidia.com/devbo "Please note that we are sold out of our inventory of the DIGITS DevBox, and no new systems are being built"] since shortly after the [https://en.wikipedia.org/wiki/GeForce_10_series Titax X] was the latest and greatest thing (i.e., somewhere around 2016). But it's pretty straight forward to update [https://www.azken.com/download/DIGITS_DEVBOX_DESIGN_GUIDE.pdf their spec].
==Introduction==
===Specification===
<onlyinclude>[[File:Top1000.jpg|right|300px]] Our [[DIGITS DevBox]], affectionately named "Bastard"after Lois McMaster Bujold's fifth God, has a XEON e5-2620v3 processor, 256GB of DDR4 RAM, two GPUs - one Titan RTX and one Titan Xp - with room for two more, a 500GB SSD hard drive (mounting /), and an 8TB RAID5 array bcached with a 512GB m.2 drive (mounting the /bulk share, which is available over samba). It runs Ubuntu 18.04, CUDA 10.0, cuDNN 7.6.1, Anaconda3-2019.03, python 3.7, tensorflow 1.13, digits 6, and other useful machine learning tools/libraries.</onlyinclude>
===Documentation===
Give the box a reboot!
 
===X Windows===
 
If you install the video driver before installing Xwindows, you will need to manually edit the Xwindows config files. So, now install the X window system. The easiest way is:
tasksel
And choose your favorite. We used Ubuntu Desktop.
 
And reboot again to make sure that everything is working nicely.
===Video Drivers===
The first build of this box was done with an installation of CUDA 10.1, which automatically installed version 418.67 of the NVIDIA driver. We then installed CUDA 10.0 under conda to support Tensorflow 1.13. All went mostly well, and the history of this page contains the instructions. However, at some point, likely because of an OS update, the video driver(s) stopped working. This page now describes the second build (as if it were a build from scratch). [[Addressing Ubuntu NVIDIA Issues]] provides additional information.
====Hardware check=and Drivers===
Check that the hardware is being seenand what driver is being used with: lspci -vk 05:00.0 VGA compatible controller: NVIDIA Corporation GP102 [TITAN Xp] (rev a1) (prog-if 00 [VGA controller]) Subsystem: NVIDIA Corporation GP102 [TITAN Xp] Flags: bus master, fast devsel, latency 0, IRQ 78, NUMA node 0 Memory at fa000000 (32-bit, non-prefetchable) [size=16M] Memory at c0000000 (64-bit, prefetchable) [size=256M] Memory at d0000000 (64-bit, prefetchable) [size=32M] I/O ports at d000 [size=128] Expansion ROM at 000c0000 [disabled] [size=128K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable+ Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 Capabilities: [900] #19 Kernel driver in use: nouveau Kernel modules: nvidiafb, nouveau 06:00.0 VGA compatible controller: NVIDIA Corporation Device 1e02 (rev a1) (prog -if 00 [VGA controller]) Subsystem: NVIDIA Corporation Device 12a3 Flags: fast devsel, IRQ 24, NUMA node 0 Memory at f8000000 (32-bit, non-prefetchable) [size=16M] Memory at a0000000 (64-bit, prefetchable) [size=256M] Memory at b0000000 (64-bit, prefetchable) [size=32M] I/O ports at c000 [size=128] Expansion ROM at f9000000 [disabled] [size=512K] Capabilities: [60] Power Management version 3 Capabilities: [68] MSI: Enable- Count=1/1 Maskable- 64bit+ Capabilities: [78] Express Legacy Endpoint, MSI 00 Capabilities: [100] Virtual Channel Capabilities: [250] Latency Tolerance Reporting Capabilities: [258] L1 PM Substates Capabilities: [128] Power Budgeting <?> Capabilities: [420] Advanced Error Reporting Capabilities: [600] Vendor Specific Information: ID=0001 Rev=1 Len=024 Capabilities: [900] #19 Capabilities: [bb0] #15 Kernel modules: nvidiafb, nouveau This looks good. The second card is the Titan RTX (see https://devicehunt.com/view/type/pci/vendor/10DE/device/1E02).
Currently we are using the nouveau driver for the Xp, and have no driver loaded for the RTX.
driver : xserver-xorg-video-nouveau - distro free builtin
You could install the driver directly now using, say, apt install nvidia-430. But don't! ====CUDA==== Get CUDA 10.1 and have it install its preferred driver (418.67): *The installation instructions are here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html*You can down load CUDA from here: https://developer.nvidia.com/cuda-downloads?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal Essentially, first install build-essential, which gets you gcc. Then blacklist the nouveau driver (see https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html#runfile-nouveau) and reboot (to a text terminal, if you have deviated from these instructions and already installed X Windows) so that it isn't loaded.
apt-get install build-essential
gcc --version
wget https://developer.nvidia.com/compute/cuda/10.1/Prod/local_installers/cuda_10.1.168_418.67_linux.run
vi /etc/modprobe.d/blacklist-nouveau.conf
blacklist nouveau
update-initramfs -u
shutdown -r now
Reboot to a text terminal
lspci -vk
Shows no kernel driver in use!
Install the driver!  apt install nvidia-driver-430 ====CUDA==== Get CUDA 10.0, rather than 10.1. Although 10.1 is the latest version at the time of writing, it won't work with Tensorflow 1.13, so you'll just end up installing 10.0 under conda anyway. *The installation instructions are here: https://docs.nvidia.com/cuda/cuda-installation-guide-linux/index.html*You can down load CUDA 10.0 from here: https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocalEssentially, first install build-essential, which gets you gcc.  Then run the installer script.and DO NOT install the driver (don't worry about the warning, it will work fine!): sh cuda_10.10.168_418130_410.67_linux48_linux.run  Do you accept the previously read EULA? accept/decline/quit: accept Install NVIDIA Accelerated Graphics Driver for Linux-x86_64 410.48? (y)es/(n)o/(q)uit: n Install the CUDA 10.0 Toolkit? (y)es/(n)o/(q)uit: y Enter Toolkit Location [ default is /usr/local/cuda-10.0 ]: Do you want to install a symbolic link at /usr/local/cuda? (y)es/(n)o/(q)uit: y Install the CUDA 10.0 Samples? (y)es/(n)o/(q)uit: y
=========== Enter CUDA Samples Location = Summary = =========== [ default is /home/ed ]:
Driver: Installed Installing the CUDA Toolkit: Installed in /usr/local/cuda-10.1/0 ... Missing recommended library: libGLU.so Missing recommended library: libX11.so Missing recommended library: libXi.so Samples Missing recommended library: Installed in /home/ed/, but missing libXmu.so Missing recommended librarieslibrary: libGL.so
Please make sure that - PATH includes Installing the CUDA Samples in /usrhome/local/cuda-10ed ...1/bin - LD_LIBRARY_PATH includes Copying samples to /usrhome/localed/cudaNVIDIA_CUDA-10.1/lib64, or, add /usr/local/cuda-100_Samples now..1/lib64 to /etc/ld.so Finished copying samples.conf and run ldconfig as root
To uninstall the CUDA Toolkit, run cuda-uninstaller in /usr/local/cuda-10.1/bin =========== To uninstall the NVIDIA Driver, run nvidia-uninstall = Summary = ===========
Driver: Not Selected Toolkit: Installed in /usr/local/cuda-10.0 Samples: Installed in /home/ed, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-10.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run the uninstall script in /usr/local/cuda-10.0/bin Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.10/doc/pdf for detailed information on setting up CUDA. ***WARNING: Incomplete installation! This installation did not install the CUDA Driver. A driver of version at least 384.00 is required for CUDA 10.0 functionality to work. To install the driver using this installer, run the following command, replacing <CudaInstaller> with the name of this run file: sudo <CudaInstaller>.run -silent -driver Logfile is /vartmp/cuda_install_2807.log Now fix the paths. To do this for a single user do: export PATH=/usr/local/cuda-installer10.0/bin:/usr/local/cuda-10.0${PATH:+:${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-10.log0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
Fix the pathsBut it is better to fix it for everyone by editing your environment file: export vi /etc/environment PATH="/usr/local/cuda-10.10/bin:/usr/local/cuda-10.1sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/NsightCompute-2019.1${PATHbin:+/usr/games:${PATH}}/usr/local/games" export LD_LIBRARY_PATH="/usr/local/cuda-10.10/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}"
Start With version cuda 10.0, you don't need to edit rc.local to start the persistence daemon:
/usr/bin/nvidia-persistenced --verbose
This should be run at bootInstead, so: vi /etc/rc.local #!/bin/sh -e /usr/bin/nvidia-persistenced --verbose exit 0 chmod +x /etc/rcruns as a service.local Verify the driver: cat /proc/driver/nvidia/version
====Test the installation====
Make the samples in:... cd /usr/local/cuda-10.10/samples
make
And change into the sample directory and run the tests:
Change into the sample directory and run the tests: cd /usr/local/cuda-10.10/samples/bin/x86_64/linux/release
./deviceQuery
./bandwidthTest
And yes, it's a thing of beautyEverything should be good at this point===X Windows=== Now install the X window system. The easiest way is: tasksel And choose your favorite. We used Ubuntu Desktop. And reboot again to make sure that everything is working nicely.
===Bcache===
This section follows https://developer.nvidia.com/rdp/digits-download. Install Docker CE first, following https://docs.docker.com/install/linux/docker-ce/ubuntu/
Then follow https://github.com/NVIDIA/nvidia-docker#quick-start to install docker2, but change the last command to use cuda 10.10
...
sudo apt-get install -y nvidia-docker2
sudo pkill -SIGHUP dockerd
# Test nvidia-smi with the latest official CUDA image
docker run --runtime=nvidia --rm nvidia/cuda:10.10-base nvidia-smi
Then pull DIGITS using docker (https://hub.docker.com/r/nvidia/digits/):
*https://developer.nvidia.com/digits
Note: you can kill docker containers with
docker system prune
====cuDNN====
First, make an installs directory in bulk and copy the installation files over from the RDP (E:\installs\DIGITS DevBox). Then:
cd /bulk/install/
dpkg -i libcudnn7_7.56.1.1034-1+cuda10.1_amd640_amd64.deb dpkg -i libcudnn7-dev_7.56.1.1034-1+cuda10.1_amd640_amd64.deb dpkg -i libcudnn7-doc_7.56.1.1034-1+cuda10.1_amd640_amd64.deb
And test it:
pip install --upgrade tensorflow-gpu
python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
 
And this doesn't work. It turns out that tensorflow 1.13.1 doesn't work with CUDA 10.1! But there is a work around, which is to install cuda10 in conda only (see https://github.com/tensorflow/tensorflow/issues/26182). We are also going to leave the installation of CUDA 10.1 because tensorflow will catch up at some point.
 
Still as researcher (and in the venv):
conda install cudatoolkit
conda install cudnn
conda install tensorflow-gpu
export LD_LIBRARY_PATH=/home/researcher/anaconda3/lib/${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
AND IT WORKS!
Note: to deactivate the virtual environment:
deactivate
 
Note that adding the anaconda path to /etc/environment makes the virtual environment redundant.
=====PyTorch and SciKit=====
*http://deeplearning.net/software/theano/install_ubuntu.html
==Video Driver Issue=VNC===
After logging into In order to use the box sometime latergraphical interface for Matlab and other applications, it seems that the video drivers are no longer loading, presumably as we need a consequence of some update or somethingVNC server.
===Testing===First, install the VNC client remotely. We use the standalone exe from TigerVNC.
nvidiaNow install TightVNC, following the instructions: https://www.digitalocean.com/community/tutorials/how-settings to-install-query FlatpanelNativeResolution ERROR: NVIDIA driver is not loadedand-configure-vnc-on-ubuntu-18-04
cd /usr/local/cuda-10.1/samples/bin/x86_64/linux/releaseroot ./deviceQuery CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 100 apt-> no CUDAget install xfce4 xfce4-capable device is detected Result = FAILgoodies
As user ./mnistCUDNN cudnnGetVersion() : 7501 , CUDNN_VERSION from cudnn.h : 7501 (7.5.1) Cuda failurer version : GCC 7.4.0 Error: no CUDAsudo apt-capable device is detected error_util.h:93 Aborting... And as researcher:get install tightvncserver cd /home/researcher/vncserver source ./venv/bin/activate set password for user python vncserver -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" ... failed call to cuInitkill : CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected1 mv ~/...kernel driver does not appear to be running on this host (bastard): vnc/procxstartup ~/driver/nvidia.vnc/version does not exist lspci -vk shows Kernel modules: nvidiafb, nouveau and no Kernel driver in usexstartup.bak It looks like nouveau is still blacklisted in /etc vi ~/modprobe.dvnc/blacklist-nouveau.conf and /usrxstartup #!/bin/nvidia-persistenced --verbose is still being called in /etc/rc.local. ubuntu-drivers devicesreturns exactly what it did before we installed CUDA 10.1 too...bash There is no xrdb $HOME/proc/driver/nvidia folder, and therefore no /proc/driver/nvidia/version file found. We get the following:Xresources /usr/bin/nvidia-persistenced --verbose startxfce4 & nvidia-persistenced failed to initialize. Check syslog for more details.vncserver tail sudo vi /varetc/systemd/logsystem/syslogvncserver@.service ...Jul 9 13:35:56 bastard kernel: [ 5314.526960Unit] pcieport 0000:00:02.0: [12] Replay Timer Timeout Description=Start TightVNC server at startup After=syslog.target network..Jul target 9 13:35:56 bastard nvidia-persistenced: Failed to query NVIDIA devices. Please ensure that the NVIDIA device files (/dev/nvidia*) exist, and that user 0 has read and write permissions for those files. ls /dev/ ...reveals no nvidia devices [Service] Type=forking nvidia-smi User=uname ...NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running. Group=uname  grep nvidia WorkingDirectory=/etchome/modprobe.duname PIDFile=/* home/libed/modprobe.dvnc/*%H:%i.pid ... ExecStartPre=-/etcusr/modprobe.dbin/blacklistvncserver -framebuffer.confkill :blacklist nvidiafb%i > /dev/null 2>&1 ... ExecStart=/etcusr/modprobe.dbin/nvidiavncserver -installerdepth 24 -disable-nouveau.confgeometry 1280x800 :# generated by nvidia-installer%i ExecStop===Uninstall/Reinstall=== Am going to try uninstalling CUDA 10.1 and the current Nvidia driver, and then reinstalling CUDA 10.0 /usr/local/cuda-10.1/bin/cuda-uninstaller nvidiavncserver -uninstall WARNINGkill : Your driver installation has been altered since it was initially installed; this may happen, for example, if%i you have since installed the NVIDIA driver through a mechanism other than nvidia-installer (such as your distribution's native package management system). nvidia-installer will attempt to uninstall as best it [Install] can. Please see the file '/var/log/nvidia WantedBy=multi-uninstall.log' for details. WARNING: Failed to delete some directories. See /var/log/nvidia-uninstall.log for details. Uninstallation of existing driver: NVIDIA Accelerated Graphics Driver for Linux-x86_64 (418.67) is completeuser.target
Then download cuda_10.0.130_410.48_linux.run from https://developer.nvidia.com/cuda-10.0-download-archive?target_os=Linux&target_arch=x86_64&target_distro=Ubuntu&target_version=1804&target_type=runfilelocal, as well as cuda_10.0.130.1_linux.run.Note that changing the color depth breaks it!
To make changes (or after the edit) sudo susystemctl daemon-reload cd /bulk/installsudo systemctl enable vncserver@2.service ./cuda_10.0.130_410.48_linux.runvncserver -kill :2 sudo systemctl start vncserver@2 accept all defaults and install everything (including 410.something NVIDIA driver) sudo systemctl status vncserver@2
=========== Driver: Installed Toolkit: Installed in /usr/local/cuda-10.0 Samples: Installed in /home/ed, but missing recommended libraries Please make sure that - PATH includes /usr/local/cuda-10.0/bin - LD_LIBRARY_PATH includes /usr/local/cuda-10.0/lib64, or, add /usr/local/cuda-10.0/lib64 to /etc/ld.so.conf and run ldconfig as root To uninstall the CUDA Toolkit, run Stop the uninstall script in /usr/local/cuda-10.0/bin To uninstall the NVIDIA Driver, run nvidia-uninstall Please see CUDA_Installation_Guide_Linux.pdf in /usr/local/cuda-10.0/doc/pdf for detailed information on setting up CUDA. server with Logfile is /tmp/cuda_install_8524.logsudo systemctl stop vncserver@2
Fix the pathsNote that we are using : export PATH=/usr/local/cuda-10.0/bin${PATH:+2 because :${PATH}} export LD_LIBRARY_PATH=/usr/local/cuda-101 is running our regular Xwindows GUI.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
AlsoInstrucions on how to set up an IP tunnel using PuTTY: vi https:/etc/ldhelpdeskgeek.so.conf.dcom/cuda.conf how-to/usr/local/cudatunnel-vnc-over-10.0ssh/lib64 ldconfig
Finally: ./cuda_10.0.130.1.run accept all defaults====Connection Issues====
Unfortunately Coming back to this didn't work, I had issues connecting. I set up the tunnel using the saved profile in puTTY.exe and checked to see which local port was listening (it was 5901) and not firewalled using the listening ports tab under network on resmon. After a reboot: nvidia-settings --query FlatpanelNativeResolution Unable exe (it said allowed, not restricted under firewall status). VNC seemed to be running fine on Bastard, and I tried connecting to init serverlocalhost: Could not connect: Connection refused 1 (message was same as before that is 5901 on the boxlocalhost, this through the tunnel to 5902 on Bastard) using VNC Connect by RealVNC. The connection was over ssh)refused.
I checked it was listening and there was no firewall: netstat -tlpn tcp 0 0 0./deviceQuery Starting0.0.0:5902 0.0.0.0:* LISTEN 2025/Xtightvnc CUDA Device Query (Runtime API) version (CUDART static linking) cudaGetDeviceCount returned 35 -> CUDA driver version is insufficient for CUDA runtime versionufw status Result = FAILStatus: inactive
python -c "import tensorflow as tf; tf.enabl e_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))" 2019-07-09 15:20:40.085877: E tensorflow/stream_executor/cuda/cuda_driver.cc:300 ] failed call The localhost port seems to cuInitbe open and listening just fine: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detecte d 2019Test-07-09 15:20:40NetConnection 127.0.085978: I tensorflow/stream_executor/cuda/cuda_diagnostics0.c c:148] kernel driver does not appear to be running on this host (bastard): /proc /driver/nvidia/version does not exist1 -p 5901
/usr/bin/nvidia-persistenced --verbose nvidia-persistenced failed to initialize. Check syslog for more detailsSo, presumably, there must be something wrong with the tunnel itself.
lspci -vk also returned '''Ignoring the same as beforeSSH tunnel worked fine: Connect to 192. This is really frustrating!168.2.202::5902 using the TightVNC (or RealVNC, etc.) client.'''
Did the following: apt-get install nvidia-prime prime-select nvidia Info: the nvidia profile is already set===RDP===
I also installed xrdp: updateapt install xrdp adduser xrdp ssl-initramfs cert #Check the status and that it is listening on 3389 systemctl status xrd netstat -utln #It is listening... vi /etc/xrdp/xrdp.ini #See https://linux.die.net/man/5/xrdp.ini systemctl restart xrdp
For next time This gave a dead session (as roota flat light blue screen with nothing on it):, which finally yielded a connection log which said "login successful for display 10, start connecting, connection problems, giving up, some problem." lshw cat /var/log/xrdp-c video ..sesman. shows configuration without driverlog
modprobe --resolve-alias nvidiafbmodinfo $(modprobe --resolve-alias nvidiafb)There could be some conflict between VNC and RDP. systemctl status xrdp shows "xrdp_wm_log_msg: connection problem, giving up".
I tried without success: lsof +D gsettings set org.gnome.Vino require-encryption false https:/usr/lib/xorg/modulesaskubuntu.com/driversquestions/ COMMAND PID USER FD TYPE DEVICE SIZE797973/OFF NODE NAMEerror-problem-connecting-windows-10-rdp-into-xrdp Xorg 2488 root mem REG 8,49 23624 26346422 /usr/lib/xorgvi /modulesetc/driversX11/fbdev_drvXwrapper.soconfig allowed_users = anybody Xorg 2488 root mem REG 8,49 90360 26347089 This was promising as it was previously set to consol. https:/usr/libwww.linuxquestions.org/xorgquestions/moduleslinux-software-2/driversxrdp-under-debian-9-connection-problem-4175623357/modesetting_drv#post5817508 apt-get install xorgxrdp-hwe-18.so04 Xorg 2488 root mem REG 8Couldn't find the package... This lead was promising as it applies to 18.04.02 HWE,49 which is what I'm running 217104 26346424 https:/usr/libwww.nakivo.com/xorgblog/moduleshow-to-use-remote-desktop-connection-ubuntu-linux-walkthrough/drivers/nouveau_drv.so dpkg -l |grep xserver-xorg-core ii xserver-xorg-core 2:1.19.6-1ubuntu4.3 amd64 Xorg 2488 root mem REG X server - core server 8Which seems ok,49 7813904 26346043 despite having a problem with XRDP and Ubuntu 18.04 HWE documented very clearly here: http:/usr/libc-nergy.be/xorgblog/modules/drivers/nvidia_drv.so?p=13972
cat There is clearly an issue with Ubuntu 18.04 and XRDP. The solution seems to be to downgrade xserver-xorg-core and some related packages, which can be done with an install script (https:/var/logc-nergy.be/Xorgblog/?p=13933) or manually.0.logBut I don't want to do that, so I removed xrdp and went back to VNC! apt remove xrdp
] (II) LoadModule: "nvidia"[ 29.047] (II) Loading /usr/lib/xorg/modules/drivers/nvidia_drv.so[ 29.047] (II) Module nvidia: vendor="NVIDIA Corporation"[ 29.047] compiled for 4.0.2, module version = 1.0.0[ 29.047] Module class: X.Org Video Driver[ 29.047] (II) NVIDIA dlloader X Driver 410.48 Thu Sep 6 06:27:34 CDT 2018[ 29.047] (II) NVIDIA Unified Driver for all Supported NVIDIA GPUs[ 29.047] (II) Loading sub module "fb"[ 29.047] (II) LoadModule: "fb"[ 29.047] (II) Loading /usr/lib/xorg/modules/libfb.so[ 29.047] (II) Module fb: vendor="X.Org Foundation"[ 29.047] compiled for 1.19.6, module version Other Software= 1.0.0[ 29.047] ABI class: X.Org ANSI C Emulation, version 0.4[ 29.047] (II) Loading sub module "wfb"[ 29.047] (II) LoadModule: "wfb"[ 29.047] (II) Loading /usr/lib/xorg/modules/libwfb.so[ 29.048] (II) Module wfb: vendor="X.Org Foundation"[ 29.048] compiled for 1.19.6, module version = 1.0.0[ 29.048] ABI class: X.Org ANSI C Emulation, version 0.4[ 29.048] (II) Loading sub module "ramdac"[ 29.048] (II) LoadModule: "ramdac"[ 29.048] (II) Module "ramdac" already built-in[ 29.095] (EE) NVIDIA: Failed to initialize the NVIDIA kernel module. Please see the[ 29.095] (EE) NVIDIA: system's kernel log for additional error messages and[ 29.095] (EE) NVIDIA: consult the NVIDIA README for details.[ 29.095] (EE) No devices detected.
vi I installed the community edition of PyCharm: snap install pycharm-community --classic #Restart the local terminal so that it has updated paths (after a snap install, etc.) /snap/pycharm-community/var214/logbin/kern.log ... it looks like we are back to an unsigned module tainting the kernelpycharm. sh
vi /etc/default/grubOn launch, you get some config options. I chose to install and enable:GRUB_DEFAULT=0*IdeaVim (a VI editor emulator)GRUB_TIMEOUT_STYLE=hiddenGRUB_TIMEOUT=2GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian`GRUB_CMDLINE_LINUX_DEFAULT="nvidia-drm.modeset=1"*RGRUB_CMDLINE_LINUX=""*AWS Toolkit
update-grubSourcing file `Make a launcher: In /etcusr/defaultshare/grub'applications: Generating grub configuration file .. vi pycharm.desktop [Desktop Entry]Found linux image: /boot/vmlinuz-4 Version=2020.182.0-25-generic3Found initrd image: /boot/initrd.img-4.18.0-25-generic Type=ApplicationFound linux image: /boot/vmlinuz-4.18.0-20-generic Name=PyCharmFound initrd image: Icon=/bootsnap/initrd.img-4.18.0-20pycharm-genericFound linux image: community/boot214/vmlinuz-4.18.0-18-genericFound initrd image: bin/boot/initrdpycharm.img-4.18.0-18-genericpngFound linux image: Exec="/bootsnap/vmlinuz-4.15.0-54pycharm-genericFound initrd image: community/boot214/initrd.img-4.15.0-54-genericFound memtest86+ image: bin/boot/memtest86+pycharm.elfsh" %fFound memtest86+ image: /boot/memtest86+.bin Comment=The Drive to Developdevice-mapper: reload ioctl on osprober-linux-nvme0n1p1 failed: Device or resource busy Categories=Development;IDE;Command failed Terminal=falsedone StartupWMClass=jetbrains-pycharm
https://askubuntuAlso, create a launcher on the desktop with the same info.com/questions/1048274/ubuntu-18-04-stopped-working-with-nvidia-drivers

Navigation menu