Research Computing Infrastructure

From edegan.com
Jump to navigation Jump to search

Note: Those with access can view more details and help guides on my infrastructure from the Administration page. The Research Computing Configuration for both Father and Mother are publicly-available. Many visitors to this page are likely looking for the page on Addressing Ubuntu NVIDIA Issues.

Bastard

Bastard is our dedicated multi-GPU compute environment. It is set up to support the latest GPU-based articifical intelligence, data science, and statistical analysis techniques. See also Using the DevBox, which provides examples on how to connect to and this blisteringly-fast parrellel compute machine.

Summary

Top1000.jpg

Our DIGITS DevBox, affectionately named after Lois McMaster Bujold's fifth God, has a XEON e5-2620v3 processor, 256GB of DDR4 RAM, two GPUs - one Titan RTX and one Titan Xp - with room for two more, a 500GB SSD hard drive (mounting /), and an 8TB RAID5 array bcached with a 512GB m.2 drive (mounting the /bulk share, which is available over samba). It runs Ubuntu 18.04, CUDA 10.0, cuDNN 7.6.1, Anaconda3-2019.03, python 3.7, tensorflow 1.13, digits 6, and other useful machine learning tools/libraries.

Father and Mother

Father is our Windows 2019 Server, which provides bulk storage on a RAID array and Remote Desktop Protocol (RDP) based computing and applications. The RDP Software Configuration page describes the software installed on Father.

Mother is our Linux server, running Ubuntu 20.04, and provides both the main structured data research computing environment (through Postgres 12), as well as the Apache2 web server and so the public facing research computing platform.

Summary

The component lists for our current Research Computing Hardware are provided below. These parts work nicely together, which can be a challenge. Both machines use lots of common components - the same Supermicro boards, the same RAM, the same drives (more or less), etc. The boards were chosen because they support dual chip Intel Scalable CPUs on socket 3647, DDR4 at 2666MHz, have NVMe connections for the solid state drives (provided you remember to buy the oculink cables!), and have room for multiple GPUs using 16 channels of PCI-E 3.0 (though the BIOS of this board seems to prevent them from working). The chips all have fast enough clock speeds to match the RAM, and sufficient channels for the drives and GPUs. Each machine has a RAID 10 array made up initially of 4 6TB NAS drives, which are in a hot-swappable bay.

See Research Computing Configuration for how we set them up.

RDP Hardware Components

The RDP has dual 12-core CPUs. We compromised on clock speed to save on price, but this is a good "all-purpose" configuration. The OS lives on the 400Gb NVMe SSD. Currently this box has 512Gb of DDR4 2.666Ghz, but it is expandable to 1Tb. The board supports 2Tb but you need 64Gb sticks, which are currently prohibitively expensive.

Quantity Part
1 Supermicro Motherboard MBD-X11DAI-N-O Xeon Dual Socket S3647 C621 Max.2TB PCI Express EATX (MBD-X11DAI-N-O)
2 Intel CD8067303405900 Xeon Gold 6126, 12 Cores, 2.6 GHz, 19.25 MB Cache, DDR4 up to 2666 MHz, 125W TDP - OEM
1 512GB (8x64GB) DDR4-2666MHz PC4-21300 4Rx4 288-Pin 1.2V ECC Load Reduced LRDIMM Memory by NEMIX RAM
1 Intel 750 Series 2.5" 400GB PCI-Express 3.0 x4 MLC Internal Solid State Drive (SSD) SSDPE2MW400G4X1
2 Noctua NH-D9 DX-3647 4U Premium CPU Cooler for Intel Xeon LGA3647
1 EVGA SuperNOVA 1600 T2 220-T2-1600-X1 80+ TITANIUM 1600W Fully Modular EVGA ECO Mode Includes FREE Power On Self Tester Power Supply
4 WD Red 6TB NAS Hard Disk Drive - 5400 RPM Class SATA 6Gb/s 64MB Cache 3.5 Inch - WD60EFRX
1 Rosewill RSV-L4000 - 4U Rackmount Server Case / Chassis - 8 Internal Bays, 7 Cooling Fans Included
1 Rosewill RSV-SATA-Cage-34 - Hard Disk Drives - Black, 3 x 5.25" to 4 x 3.5" Hot-Swap - SATA III / SAS - Cage
1 ASUS 24X DVD Burner - Bulk 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM Black SATA Model DRW-24B1ST/BLK/B/AS - OEM
1 Rosewill RDRD-11003 2.5" SSD / HDD Mounting Kit for 3.5" Drive Bay with 60mm Fan
1 Arctic Silver 5 High-Density Polysynthetic Silver Thermal Compound AS5-3.5G
1 AmazonBasics Wired Keyboard and Wired Mouse Bundle Pack

Dbase Server Components

The database server has a single 4-core 3.6Ghz Skylake chip, as clock speed matters much more than cores in this set-up. The OS lives on a 400Gb NVMe SSD and the postgresql installation lives on the 1.2Tb NVMe SSD. The 12 TB RAID 10 array is for deep bulk storage. Because we only have a single CPU on the board, we are maxed out at 512Gb with the 8Gb sticks of DDR4 2.66Ghz.

Quantity Part
1 Supermicro Motherboard MBD-X11DAI-N-O Xeon Dual Socket S3647 C621 Max.2TB PCI Express EATX (MBD-X11DAI-N-O)
1 Intel Xeon Scalable Gold 5122 SkyLake 4-Core 3.6 GHz (3.7 GHz Turbo) LGA 3647 105W BX806735122 Server Processor
1 512GB (8x64GB) DDR4-2666MHz PC4-21300 4Rx4 288-Pin 1.2V ECC Load Reduced LRDIMM Memory by NEMIX RAM
1 Intel 750 Series 2.5" 400GB PCI-Express 3.0 x4 MLC Internal Solid State Drive (SSD) SSDPE2MW400G4X1
1 Intel 750 Series 2.5" 1.2TB PCI-Express 3.0 x4 MLC Internal Solid State Drive (SSD) SSDPE2MW012T4X1
1 Noctua NH-D9 DX-3647 4U Premium CPU Cooler for Intel Xeon LGA3647
1 EVGA SuperNOVA 1600 T2 220-T2-1600-X1 80+ TITANIUM 1600W Fully Modular EVGA ECO Mode Includes FREE Power On Self Tester Power Supply
4 WD Red 6TB NAS Hard Disk Drive - 5400 RPM Class SATA 6Gb/s 64MB Cache 3.5 Inch - WD60EFRX
1 Rosewill RSV-L4000 - 4U Rackmount Server Case / Chassis - 8 Internal Bays, 7 Cooling Fans Included
1 Rosewill RSV-SATA-Cage-34 - Hard Disk Drives - Black, 3 x 5.25" to 4 x 3.5" Hot-Swap - SATA III / SAS - Cage
1 ASUS 24X DVD Burner - Bulk 24X DVD+R 8X DVD+RW 8X DVD+R DL 24X DVD-R 6X DVD-RW 16X DVD-ROM 48X CD-R 24X CD-RW 48X CD-ROM Black SATA Model DRW-24B1ST/BLK/B/AS - OEM
1 Rosewill RDRD-11003 2.5" SSD / HDD Mounting Kit for 3.5" Drive Bay with 60mm Fan
1 Arctic Silver 5 High-Density Polysynthetic Silver Thermal Compound AS5-3.5G
1 AmazonBasics Wired Keyboard and Wired Mouse Bundle Pack