Difference between revisions of "GPU Build"

Project
GPU Build
Project Information
Has title	GPU Build
Has owner	Oliver Chang, Kyran Adams
Has start date
Has deadline date
Has project status	Active
Has sponsor	McNair Center
Has project output	Content
	Copyright © 2019 edegan.com. All Rights Reserved.

Latest revision as of 13:39, 21 September 2020

Final Decision

We decided to clone the NVIDIA DIGITS DevBox: https://developer.nvidia.com/devbox

To start with we are trying to use our existing ASUS Z10 server board, rather than switching to the Asus X99-E WS workstation class motherboard, and rather than Four TITAN X GPUs, we've got a TITAN XP and a TITAN RTX.

Note that the Asus X99-E WS is available from NewEgg for $500 now.

Single vs. Multi GPU

GTX 1080 Ti Specs
Since we are using Tensorflow, it doesn't scale well to multiple GPUs for a single model
Which GPU for deep learning (04/09/2017)

"I quickly found that it is not only very difficult to parallelize neural networks on multiple GPUs efficiently, but also that the speedup was only mediocre for dense neural networks. Small neural networks could be parallelized rather efficiently using data parallelism, but larger neural networks... received almost no speedup."
Possible other use of multiple GPUs: training multiple different models simultaneously, "very useful for researchers, who want try multiple versions of a new algorithm at the same time."
This source recommends GTX 1080 Tis and does cost analysis of it
If the network doesn't fit in the memory of one GPU (11 GB),

Advice on single vs multi-GPU system

Want to get two graphics cards, one for development, one (crappy or onboard card) for operating system [1]

Different uses of multiple GPUs

Intra-model parallelism: If a model has long, independent computation paths, then you can split the model across multiple GPUs and have each compute a part of it. This requires careful understanding of the model and the computational dependencies.
Replicated training: Start up multiple copies of the model, train them, and then synchronize their learning (the gradients applied to their weights & biases).

TL;DR

Pros of multiple GPUs:

Able to train multiple networks at once (either copies of the same network or modified networks). Allows for running long experiments while running new ones
Possible speed ups if the network can be split up (and is big enough), but tensorflow is not great for this
More memory for huge batches (not sure if necessary)

Cons of multiple GPUs:

Adds a lot of complexity.

K80, NVLink

NVLink can link between CPU and GPU for increase in speed, but only with the CPU IBM POWER8+.
NVLink can link between GPU and GPU as a replacement for SLI with other CPUs, but this is not super relevant to tensorflow, even if trying to parallelize across one model.
This source says to get the 1080 because the K80 is basically two K40s, which have less memory bandwidth than the 1080. This source agrees.

Misc. Parts

Cases: Rosewill 1.0 mm Thickness 4U Rackmount Server Chassis, Black Metal/Steel RSV-L4000[2]
Consider this case: Corsair Carbide Series Air 540 High Airflow ATX Cube Case [3]
DVDRW (Needed?): Asus 24x DVD-RW Serial-ATA Internal OEM Optical Drive DRW-24B1ST [4]
Keyboard and Mouse: AmazonBasics Wired Keyboard and Wired Mouse Bundle Pack [5]
Optical drive: HP - DVD1265I DVD/CD Writer [6]

Other Builds/Guides

Double GPU Server Build

PC Partpicker build

This article says that it may be necessary to get both CPUs to get all of the PCI lanes

Double GPU Build

PC Partpicker build

Motherboard

Needs enough PCIe slots to support both GPUs and other units
Motherboards: MSI - Z170A GAMING M7 ATX LGA1151 Motherboard [7], LGA 1151, 3x PCIe 3.0 x 16, 4 x PCIe 3.0 x 1, 6 x SATA 6GB/s, also used in this build

CPU/Fan

At least one core (two threads) per GPU
Chips: Intel - Core i7-6700 3.4GHz Quad-Core Processor [8]
CPU Fans: Cooler Master - Hyper 212 EVO 82.9 CFM Sleeve Bearing CPU Cooler [9]
Buying this fan because it's very cheap for the reviews it got, and the stock cooler for the CPU has had mixed reviews

GPU

2x GTX 1080 Ti [10]
Integrated graphics on CPU: Intel HD Graphics 530

RAM

At least as much RAM as GPUs (2 * 11 GB [GTX 1080 Ti size] = 22 GB, so 32GB)
Does not have to be fast for deep learning: "CPU-RAM-to-GPU-RAM is the true bottleneck – this step makes use of direct memory access (DMA). As quoted above, the memory bandwidth for my RAM modules are 51.2GB/s, but the DMA bandwidth is only 12GB/s!"[11]
Crucial - 32GB (2 x 16GB) DDR4-2133 Memory [12], SATA 6 GB/s interface
If not enough, should be able to extend this by buying two more cards

PSU

Some say PSU should be 1.5x-2x wattage of system, some say wattage+100W
PSU: EVGA - SuperNOVA G2 1000W 80+ Gold Certified Fully-Modular ATX Power Supply [13]

Storage

SSD should be fast enough, no need for M.2 [14]
SSD: Samsung - 850 EVO-Series 500GB 2.5" Solid State Drive [15]
HDD: Seagate - Barracuda 3TB 3.5" 7200RPM Internal Hard Drive [16]

Other things to consider

Water cooling? this has a good section on cooling
Case is not rack mounted

Software tips

Setting up Ubuntu and Docker [17]

@@ Line 1: / Line 1: @@
-{{McNair Projects
+{{Project
+|Has project output=Content
+|Has sponsor=McNair Center
 |Has title=GPU Build
 |Has owner=Oliver Chang,Kyran Adams
@@ Line 5: / Line 7: @@
 }}
+==Final Decision==
+We decided to clone the NVIDIA [[DIGITS DevBox]]: https://developer.nvidia.com/devbox
+To start with we are trying to use our existing ASUS Z10 server board, rather than switching to the Asus X99-E WS workstation class motherboard, and rather than Four TITAN X GPUs, we've got a TITAN XP and a TITAN RTX.
+Note that the Asus X99-E WS is available from NewEgg for $500 now.
 ==Single vs. Multi GPU==
 *[https://www.nvidia.com/en-us/geforce/products/10series/geforce-gtx-1080-ti/ GTX 1080 Ti Specs]
 * Since we are using Tensorflow, it doesn't scale well to multiple GPUs for a single model
@@ Line 15: / Line 25: @@
 # If the network doesn't fit in the memory of one GPU (11 GB),
 * [https://devtalk.nvidia.com/default/topic/743814/cuda-setup-and-installation/advice-on-single-vs-multi-gpu-system/ Advice on single vs multi-GPU system]
-# Want to get two graphics cards, one for development, one (crappy card) for operating system [https://stackoverflow.com/questions/21911560/how-can-i-set-one-nvidia-graphics-card-for-display-and-other-for-computingin-li]
+# Want to get two graphics cards, one for development, one (crappy or onboard card) for operating system [https://stackoverflow.com/questions/21911560/how-can-i-set-one-nvidia-graphics-card-for-display-and-other-for-computingin-li]
 *[https://stackoverflow.com/questions/37732196/tensorflow-difference-between-multi-gpus-and-distributed-tensorflow Different uses of multiple GPUs]
 # Intra-model parallelism: If a model has long, independent computation paths, then you can split the model across multiple GPUs and have each compute a part of it. This requires careful understanding of the model and the computational dependencies.
@@ Line 29: / Line 39: @@
 *Adds a lot of complexity.
+=== K80, NVLink ===
+*NVLink can link between CPU and GPU for increase in speed, but only with the CPU IBM POWER8+.
+*NVLink can link between GPU and GPU as a replacement for SLI with other CPUs, but this is not super relevant to tensorflow, even if trying to parallelize across one model.
+*[https://www.quora.com/Which-GPU-is-better-for-Deep-Learning-GTX-1080-or-Tesla-K80 This source] says to get the 1080 because the K80 is basically two K40s, which have less memory bandwidth than the 1080. [https://www.reddit.com/r/deeplearning/comments/5mc7s6/performance_difference_between_nvidia_k80_and_gtx/ This source] agrees.
 ==Misc. Parts==
 *Cases: Rosewill 1.0 mm Thickness 4U Rackmount Server Chassis, Black Metal/Steel RSV-L4000[https://www.amazon.com/gp/product/B0056OUTBK/ref=oh_aui_detailpage_o04_s00?ie=UTF8&psc=1]
+*Consider this case: Corsair Carbide Series Air 540 High Airflow ATX Cube Case [https://www.amazon.com/dp/B00D6GINF4/ref=twister_B00JRYFVAO?_encoding=UTF8&psc=1]
 *DVDRW (Needed?): Asus 24x DVD-RW Serial-ATA Internal OEM Optical Drive DRW-24B1ST [http://www.amazon.com/Asus-Serial-ATA-Internal-Optical-DRW-24B1ST/dp/B0033Z2BAQ/ref=sr_1_2?s=pc&ie=UTF8&qid=1452399113&sr=1-2&keywords=dvdrw]
 *Keyboard and Mouse: AmazonBasics Wired Keyboard and Wired Mouse Bundle Pack [http://www.amazon.com/AmazonBasics-Wired-Keyboard-Mouse-Bundle/dp/B00B7GV802/ref=sr_1_2?s=pc&rps=1&ie=UTF8&qid=1452402108&sr=1-2&keywords=keyboard+and+mouse&refinements=p_72%3A1248879011%2Cp_85%3A2470955011]
+* Optical drive: HP - DVD1265I DVD/CD Writer [https://www.newegg.com/Product/Product.aspx?Item=N82E16827140098&ignorebbr=1&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=]
@@ Line 44: / Line 59: @@
 * [https://medium.com/@SocraticDatum/getting-started-with-gpu-driven-deep-learning-part-1-building-a-machine-d24a3ed1ab1e How to build a GPU deep learning machine]
 * [https://www.slideshare.net/PetteriTeikariPhD/deep-learning-workstation Deep Learning Computer Build] useful tips, long
+* [https://www.tooploox.com/blog/deep-learning-with-gpu Another box]
+* [http://graphific.github.io/posts/building-a-deep-learning-dream-machine/ Expensive deep learning box]
+==Double GPU Server Build==
+[https://pcpartpicker.com/user/kyranadams/saved/gDzFdC PC Partpicker build]
-Questions to ask:
+*[https://www.quora.com/Can-I-double-the-PCIe-lanes-in-a-dual-CPU-motherboard This article] says that it may be necessary to get both CPUs to get all of the PCI lanes
-* Approx. dataset/batch size
-* Network card?
-* DVD drive?
-* How much RAM/storage needed?
-==Single GPU Build==
 ==Double GPU Build==
-[https://pcpartpicker.com/list/ZQjKf8 PC Partpicker build]
+[https://pcpartpicker.com/user/kyranadams/saved/ykK7hM PC Partpicker build]
 ===Motherboard===
-*Should have enough PCIe slots
+*Needs enough PCIe slots to support both GPUs and other units
-*Motherboards: ASUS Z10PE-D16 [http://www.newegg.com/Product/Product.aspx?Item=N82E16813132257&Tpk=N82E16813132257], Dual LGA 2011 R3, DDR4 - Up to 32GB RDIMM, 16 slots
+*Motherboards: MSI - Z170A GAMING M7 ATX LGA1151 Motherboard [https://www.newegg.com/Product/Product.aspx?Item=9SIA85V4SC7911&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=], LGA 1151, 3x PCIe 3.0 x 16, 4 x PCIe 3.0 x 1, 6 x SATA 6GB/s, also used in [https://medium.com/@SocraticDatum/getting-started-with-gpu-driven-deep-learning-part-1-building-a-machine-d24a3ed1ab1e this build]
 ===CPU/Fan===
-*Not a huge deal, but used for data preparation
+*At least one core (two threads) per GPU
-*If using multiple GPUs, at least one core (two threads) per GPU
+*Chips: Intel - Core i7-6700 3.4GHz Quad-Core Processor [https://www.amazon.com/dp/B0136JONG8/?tag=pcpapi-20]
-*Chips: Intel Haswell Xeon e5-2620v3, 6 core @ 2.4ghz, 6x256k level 1 cache, 15mb level 2 cache, socket LGA 2011-v3 [https://www.amazon.com/gp/product/B00M1BUUMO/ref=oh_aui_detailpage_o04_s00?ie=UTF8&psc=1]
+*CPU Fans: Cooler Master - Hyper 212 EVO 82.9 CFM Sleeve Bearing CPU Cooler [https://www.newegg.com/Product/Product.aspx?Item=N82E16835103099&ignorebbr=1&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=]
-*CPU Fans: Intel Thermal Solution Cooling Fan for E5-2600 Processors BXSTS200C [https://www.amazon.com/gp/product/B007HJAM50/ref=oh_aui_detailpage_o03_s00?ie=UTF8&psc=1]
+*Buying this fan because it's very cheap for the reviews it got, and the stock cooler for the CPU has had mixed reviews
 ===GPU===
-* 2x GTX 1080 Ti
+* 2x GTX 1080 Ti [https://www.newegg.com/Product/Product.aspx?Item=N82E16814487338&ignorebbr=1&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=]
-* Aspeed AST2400 with 32MB VRAM (comes with motherboard)
+* Integrated graphics on CPU: Intel HD Graphics 530
 ===RAM===
-*At least twice as much RAM as GPUs (2 * 2 * 11 GB [GTX 1080 Ti size] = 32 GB)
+*At least as much RAM as GPUs (2 * 11 GB [GTX 1080 Ti size] = 22 GB, so 32GB)
-*RAM: Crucial DDR4 RDIMM [http://www.newegg.com/Product/Product.aspx?Item=9SIA0ZX39C3002], 2133Mhz , Registered (buffered) and ECC, comes in packs of 4 x 32GB
+*Does not have to be fast for deep learning: "CPU-RAM-to-GPU-RAM is the true bottleneck – this step makes use of direct memory access (DMA). As quoted above, the memory bandwidth for my RAM modules are 51.2GB/s, but the DMA bandwidth is only 12GB/s!"[http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/]
+* Crucial - 32GB (2 x 16GB) DDR4-2133 Memory [https://www.newegg.com/Product/Product.aspx?Item=9SIA8PV5HF1514&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=], SATA 6 GB/s interface
+* If not enough, should be able to extend this by buying two more cards
 ===PSU===
-*Some say 1.5x-2x wattage of GPU+CPU, some say GPU+CPU+100W
+*Some say PSU should be 1.5x-2x wattage of system, some say wattage+100W
-*PSUs: Corsair RM Series 850 Watt ATX/EPS 80PLUS Gold-Certified Power Supply - CP-9020056-NA RM850 [https://www.amazon.com/gp/product/B00EB7UIXM/ref=oh_aui_detailpage_o03_s00?ie=UTF8&psc=1]
+*PSU: EVGA - SuperNOVA G2 1000W 80+ Gold Certified Fully-Modular ATX Power Supply [https://www.newegg.com/Product/Product.aspx?Item=N82E16817438010&ignorebbr=1&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=]
 ===Storage===
-*M.2 Drives: Samsung 950 PRO -Series 512GB PCIe NVMe - M.2 Internal SSD 2-Inch MZ-V5P512BW [https://www.amazon.com/gp/product/B01639694M/ref=oh_aui_detailpage_o03_s01?ie=UTF8&psc=1]
+*SSD should be fast enough, no need for M.2 [http://timdettmers.com/2015/03/09/deep-learning-hardware-guide]
-*Solid State Drives: Intel Solid-State Drive 750 Series SSDPEDMW400G4R5 PCI-Express 3.0 MLC - 400GB [https://www.amazon.com/gp/product/B00UHJJQAY/ref=oh_aui_detailpage_o07_s00?ie=UTF8&psc=1] or 800GB [https://www.amazon.com/gp/product/B013QP8XUE/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1]
+*SSD: Samsung - 850 EVO-Series 500GB 2.5" Solid State Drive [https://www.newegg.com/Product/Product.aspx?Item=N82E16820147373&ignorebbr=1&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=]
-*Regular Hard drives: WD Red 3TB NAS Hard Disk Drive [https://www.amazon.com/gp/product/B008JJLW4M/ref=oh_aui_detailpage_o00_s00?ie=UTF8&psc=1] - 5400 RPM Class SATA 6 Gb/s 64MB Cache 3.5 Inch
+*HDD: Seagate - Barracuda 3TB 3.5" 7200RPM Internal Hard Drive [https://www.newegg.com/Product/Product.aspx?Item=9SIADG25GT7889&nm_mc=AFC-C8Junction&cm_mmc=AFC-C8Junction-PCPartPicker,%20LLC-_-na-_-na-_-na&cm_sp=&AID=10446076&PID=3938566&SID=]
+===Other things to consider===
+* Water cooling? [http://timdettmers.com/2015/03/09/deep-learning-hardware-guide/ this] has a good section on cooling
+* Case is not rack mounted
+==Software tips==
+* Setting up Ubuntu and Docker [https://medium.com/@SocraticDatum/getting-started-with-gpu-driven-deep-learning-part-2-environment-setup-fd1947aab29]

Difference between revisions of "GPU Build"

Latest revision as of 13:39, 21 September 2020

Contents

Final Decision

Single vs. Multi GPU

TL;DR

K80, NVLink

Misc. Parts

Other Builds/Guides

Double GPU Server Build

Double GPU Build

Motherboard

CPU/Fan

GPU

RAM

PSU

Storage

Other things to consider

Software tips

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools