Changes

Jump to navigation Jump to search
9,447 bytes added ,  17:07, 9 August 2024
This page details the build of our [[DIGITS DevBox]]. There's also a page giving information on [[Using the DevBox]]. nVIDIA, famous for their incredibly poor supply-chain and inventory management, have been saying [https://developer.nvidia.com/devbo "Please note that we are sold out of our inventory of the DIGITS DevBox, and no new systems are being built"] since shortly after the [https://en.wikipedia.org/wiki/GeForce_10_series Titax X] was the latest and greatest thing (i.e., somewhere around 2016). But it's pretty straight forward to update [https://www.azken.com/download/DIGITS_DEVBOX_DESIGN_GUIDE.pdf their spec].
==Introduction==
===Specification===
<onlyinclude>[[File:Top1000.jpg|right|300px]] Our [[DIGITS DevBox]], affectionately named "Bastard"after Lois McMaster Bujold's fifth God, has a XEON e5-2620v3 processor, 256GB of DDR4 RAM, two GPUs - one Titan RTX and one Titan Xp - with room for two more, a 500GB SSD hard drive (mounting /), and an 8TB RAID5 array bcached with a 512GB m.2 drive (mounting the /bulk share, which is available over samba). It runs Ubuntu 18.04, CUDA 10.0, cuDNN 7.6.1, Anaconda3-2019.03, python 3.7, tensorflow 1.13, digits 6, and other useful machine learning tools/libraries.</onlyinclude>
===Documentation===
The DevBox is currently unavailable from Amazon [https://www.amazon.com/Lambda-Deep-Learning-DevBox-Preinstalled/dp/B01BCDK1KC], and at around $15k buying one is prohibitive for most people. Some firms, including Lamdba Labs [https://lambdalabs.com/deep-learning/workstations/4-gpu], Bizon-tech [https://bizon-tech.com/us/bizon-g3000], are selling variants on them, but their prices are high too and the details on their specs are limited (the MoBo and config details are missing entirely).
But the parts ' cost is perhaps $4-5k now for a massive update to the original spec! So this page goes through everything required to put one together and get it up and running.
==Hardware==
And change into the sample directory and run the tests:
cd /usr/local/cuda-10.10/samples/bin/x86_64/linux/release ./deviceQuery ./bandwidthTest
Everything should be good at this point!
sudo apt-get install tightvncserver
vncserver
set password for user (ailia)
vncserver -kill :1
mv ~/.vnc/xstartup ~/.vnc/xstartup.bak
Instrucions on how to set up an IP tunnel using PuTTY:
https://helpdeskgeek.com/how-to/tunnel-vnc-over-ssh/
 
====Connection Issues====
 
Coming back to this, I had issues connecting. I set up the tunnel using the saved profile in puTTY.exe and checked to see which local port was listening (it was 5901) and not firewalled using the listening ports tab under network on resmon.exe (it said allowed, not restricted under firewall status). VNC seemed to be running fine on Bastard, and I tried connecting to localhost::1 (that is 5901 on the localhost, through the tunnel to 5902 on Bastard) using VNC Connect by RealVNC. The connection was refused.
 
I checked it was listening and there was no firewall:
netstat -tlpn
tcp 0 0 0.0.0.0:5902 0.0.0.0:* LISTEN 2025/Xtightvnc
ufw status
Status: inactive
 
The localhost port seems to be open and listening just fine:
Test-NetConnection 127.0.0.1 -p 5901
 
So, presumably, there must be something wrong with the tunnel itself.
 
'''Ignoring the SSH tunnel worked fine: Connect to 192.168.2.202::5902 using the TightVNC (or RealVNC, etc.) client.'''
 
====Later Notes====
 
=====Change the resolution=====
 
I came back and changed the resolution to make it work on one of my portrait desktop monitors.
See https://www.tightvnc.com/vncserver.1.php
 
As root:
vi /etc/systemd/system/vncserver@.service
Change line:
ExecStart=/usr/bin/vncserver -depth 24 -geometry 1440x2560 :%i
(Note that the size is 2160x3840 divide by 150%). Leave the color depth as it says elsewhere that changes are bad.
systemctl daemon-reload
systemctl enable vncserver@2.service
 
As Ed:
vncserver -kill :2
sudo systemctl start vncserver@2
sudo systemctl status vncserver@2
 
Exit full screen with ctrl-alt-shift-f.
 
=====Cut And Paste=====
 
Also, try to fix the cut-and-paste issue. See, for example, https://unix.stackexchange.com/questions/35030/how-can-i-copy-paste-data-to-and-from-the-windows-clipboard-to-an-opensuse-clipb
 
As root:
apt-get install autocutsel
vi ~/.vnc/xstartup
#!/bin/bash
xrdb $HOME/.Xresources
autocutsel -fork
startxfce4 &
 
Though this might have been working fine anyway. Just change the terminal and all will be well.
 
=====Use XFCE terminal=====
 
Change Settings: Preferred Applications -> Utilities -> Terminal to XFCE
 
Note that this seems to fix everything but the instructions for customizing the menu are here: https://wiki.xfce.org/howto/customize-menu
cat /etc/xdg/menus/xfce-applications.menu
 
===RDP===
 
I also installed xrdp:
apt install xrdp
adduser xrdp ssl-cert
#Check the status and that it is listening on 3389
systemctl status xrd
netstat -tln
#It is listening...
vi /etc/xrdp/xrdp.ini
#See https://linux.die.net/man/5/xrdp.ini
systemctl restart xrdp
 
This gave a dead session (a flat light blue screen with nothing on it), which finally yielded a connection log which said "login successful for display 10, start connecting, connection problems, giving up, some problem."
cat /var/log/xrdp-sesman.log
 
There could be some conflict between VNC and RDP. systemctl status xrdp shows "xrdp_wm_log_msg: connection problem, giving up".
 
I tried without success:
gsettings set org.gnome.Vino require-encryption false
https://askubuntu.com/questions/797973/error-problem-connecting-windows-10-rdp-into-xrdp
vi /etc/X11/Xwrapper.config
allowed_users = anybody
This was promising as it was previously set to consol.
https://www.linuxquestions.org/questions/linux-software-2/xrdp-under-debian-9-connection-problem-4175623357/#post5817508
apt-get install xorgxrdp-hwe-18.04
Couldn't find the package... This lead was promising as it applies to 18.04.02 HWE, which is what I'm running
https://www.nakivo.com/blog/how-to-use-remote-desktop-connection-ubuntu-linux-walkthrough/
dpkg -l |grep xserver-xorg-core
ii xserver-xorg-core 2:1.19.6-1ubuntu4.3 amd64 Xorg X server - core server
Which seems ok, despite having a problem with XRDP and Ubuntu 18.04 HWE documented very clearly here: http://c-nergy.be/blog/?p=13972
 
There is clearly an issue with Ubuntu 18.04 and XRDP. The solution seems to be to downgrade xserver-xorg-core and some related packages, which can be done with an install script (https://c-nergy.be/blog/?p=13933) or manually. But I don't want to do that, so I removed xrdp and went back to VNC!
apt remove xrdp
 
===Other Software===
 
I installed the community edition of PyCharm:
snap install pycharm-community --classic
#Restart the local terminal so that it has updated paths (after a snap install, etc.)
/snap/pycharm-community/214/bin/pycharm.sh
 
On launch, you get some config options. I chose to install and enable:
*IdeaVim (a VI editor emulator)
*R
*AWS Toolkit
 
Make a launcher: In /usr/share/applications:
vi pycharm.desktop
[Desktop Entry]
Version=2020.2.3
Type=Application
Name=PyCharm
Icon=/snap/pycharm-community/214/bin/pycharm.png
Exec="/snap/pycharm-community/214/bin/pycharm.sh" %f
Comment=The Drive to Develop
Categories=Development;IDE;
Terminal=false
StartupWMClass=jetbrains-pycharm
 
Also, create a launcher on the desktop with the same info.
 
Note that when I came back to the box the launcher didn't work...
 
==== MATLAB ====
 
I installed MATLAB R2024a by downloading the zip, running
sudo ./install
 
and using the defaults of /usr/local/MATLAB/R2024 etc. The license number is 41201644.
 
===Upgrading the nVIDIA Drivers===
 
In MATLAB, I ran:
gpuDevice
Error using gpuDevice (line 26)
Graphics driver is out of date. Download and install the latest graphics driver for your GPU from NVIDIA.
 
Some quick checks showed that I was using driver version 430.26 on ubuntu 18.04.02.
nvidia-smi
lsb_release -a
 
I couldn't quite get MATLAB to tell me what I needed:
* https://www.mathworks.com/help/parallel-computing/gpu-computing-requirements.html
* https://www.mathworks.com/help/parallel-computing/run-mex-functions-containing-cuda-code.html#mw_20acaa78-994d-4695-ab4b-bca1cfc3dbac
 
For MEX, I have 10.2 and need 12.2 of the CUDA toolkit:
MATLAB Release CUDA Toolkit Version
R2024a 12.2
...
R2020b 10.2
 
However:
* nVidia said the latest version was https://www.nvidia.com/Download/driverResults.aspx/230357/en-us/
* The repo said the highest version for 18.04 is 545: https://launchpad.net/~graphics-drivers/+archive/ubuntu/ppa
 
As root:
runlevel
#5
systemctl get-default
#graphical.target
systemctl set-default multi-user.target
systemctl reboot
 
As ed:
vncserver -kill :2
Killing Xtightvnc process ID 1844
 
As root:
#sh ./NVIDIA-Linux-x86_64-550.107.02.run
# The distribution-provided pre-install script failed!
#cat /var/log/nvidia-installer.log
 
apt-get update
apt install nvidia-driver-545
systemctl set-default graphical.target
systemctl reboot
 
Run MATLAB
gpuDevice
Name: 'NVIDIA TITAN RTX'
Index: 1
ComputeCapability: '7.5'
GraphicsDriverVersion: '545.29.06'
ToolkitVersion: 12.2000
 
gpuDevice(2)
Name: 'NVIDIA TITAN Xp'
Index: 2
ComputeCapability: '6.1'
SupportsDouble: 1
GraphicsDriverVersion: '545.29.06'
ToolkitVersion: 12.2000
 
The messages were:
apt install nvidia-driver-545
The following additional packages will be installed:
libnvidia-cfg1-545 libnvidia-common-545 libnvidia-compute-545 libnvidia-compute-545:i386 libnvidia-decode-545
libnvidia-decode-545:i386 libnvidia-encode-545 libnvidia-encode-545:i386 libnvidia-extra-545 libnvidia-fbc1-545
libnvidia-fbc1-545:i386 libnvidia-gl-545 libnvidia-gl-545:i386 nvidia-compute-utils-545 nvidia-dkms-545
nvidia-firmware-545-545.29.06 nvidia-kernel-common-545 nvidia-kernel-source-545 nvidia-utils-545
xserver-xorg-video-nvidia-545
The following packages will be REMOVED:
libnvidia-cfg1-430 libnvidia-common-430 libnvidia-compute-430 libnvidia-compute-430:i386 libnvidia-decode-430
libnvidia-decode-430:i386 libnvidia-encode-430 libnvidia-encode-430:i386 libnvidia-fbc1-430 libnvidia-fbc1-430:i386
libnvidia-gl-430 libnvidia-gl-430:i386 libnvidia-ifr1-430 libnvidia-ifr1-430:i386 nvidia-compute-utils-430 nvidia-dkms-430
nvidia-driver-430 nvidia-kernel-common-430 nvidia-kernel-source-430 nvidia-utils-430 xserver-xorg-video-nvidia-430
The following NEW packages will be installed:
libnvidia-cfg1-545 libnvidia-common-545 libnvidia-compute-545 libnvidia-compute-545:i386 libnvidia-decode-545
libnvidia-decode-545:i386 libnvidia-encode-545 libnvidia-encode-545:i386 libnvidia-extra-545 libnvidia-fbc1-545
libnvidia-fbc1-545:i386 libnvidia-gl-545 libnvidia-gl-545:i386 nvidia-compute-utils-545 nvidia-dkms-545 nvidia-driver-545
nvidia-firmware-545-545.29.06 nvidia-kernel-common-545 nvidia-kernel-source-545 nvidia-utils-545
xserver-xorg-video-nvidia-545
0 upgraded, 21 newly installed, 21 to remove and 2 not upgraded.

Navigation menu