Difference between revisions of "Wei Wu(Work log)"
(60 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
==Notes from Ed== | ==Notes from Ed== | ||
− | + | Detail about the install/config/basic use of software on the db server is on the [[Database Server Documentation]] documentation page. | |
− | + | Please build and link to project pages that describe what you have done to date! | |
− | |||
− | |||
− | |||
− | |||
==Summer 2018== | ==Summer 2018== | ||
<onlyinclude> | <onlyinclude> | ||
− | [[Wei Wu]] [[Work Logs]] [[Wei Wu(Work log)|(log page)]] | + | [[Wei Wu]] [[Work Logs]] [[Wei Wu(Work log)|(log page)]] <br> |
− | 2018-06-11 Set up wiki page and RDP for work. Installed CUDA on dbserver. Waiting for Matlab and Gurobi to be | + | 2018-06-11 <br> |
− | installed on the dbserver (or I will do it myself later this week). Started looking at the paper | + | *Set up wiki page and RDP for work. Installed CUDA on dbserver. Waiting for Matlab and Gurobi to be |
+ | installed on the dbserver (or I will do it myself later this week). | ||
+ | *Started looking at the paper in progress [http://mcnair.bakerinstitute.org/wiki/Estimating_Unobserved_Complementarities_between_Entrepreneurs_and_Venture_Capitalists ''Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists''] and its related Matlab code. | ||
+ | *Began looking for ways to incorporate CUDA parallel computation with Matlab and Julia. | ||
− | 2018-06-12 Installed tightVNC on dbserver by following steps 1-3 from this [https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu- | + | 2018-06-12 <br> |
+ | *Installed tightVNC on dbserver by following steps 1-3 from this [https://www.digitalocean.com/community/tutorials/how-to-install-and-configure-vnc-on-ubuntu-14-04 tightVNC tutorial] | ||
− | Tested connection via localhost port | + | *Tested connection via localhost port. |
− | Matlab matching code with Ed, James, and Chenyu via Skype. | + | *Matlab matching code with Ed, James, and Chenyu via Skype. |
− | 2018-06-13 Continued searching for a method to set up vnc for dbserver without ssh. Started moving the selenium box, monitors, keyboards, etc, from Room 310. Matlab matching code with Ed, James, and Chenyu via Skype. | + | 2018-06-13 <br> |
+ | *Continued searching for a method to set up vnc for dbserver without ssh. | ||
+ | *Started moving the selenium box, monitors, keyboards, etc, from Room 310. | ||
+ | *Matlab matching code with Ed, James, and Chenyu via Skype. | ||
− | 2018-06-18 Trained on using PostgreSQL for DBServer. | + | 2018-06-18 <br> |
+ | Trained on using PostgreSQL for DBServer. | ||
− | 2018-06-19 Further training with SQL and SDC Platinum. Job assignment among team members. | + | 2018-06-19 <br> |
+ | Further training with SQL and SDC Platinum. Job assignment among team members. | ||
− | 2018-06-20 | + | 2018-06-20 <br> |
− | *Started reading a short tutorial on [https://ocw.mit.edu/courses/economics/14-385-nonlinear-econometric-analysis-fall-2007/lecture-notes/lec13_gmm.pdf | + | *Started reading a short tutorial on GMM and its implementation[https://www.kevinsheppard.com/images/5/55/Chapter6.pdf][https://ocw.mit.edu/courses/economics/14-385-nonlinear-econometric-analysis-fall-2007/lecture-notes/lec13_gmm.pdf]. Should have a good grasp before the end of the week. |
+ | *[http://www.gurobi.com/documentation/8.0/quickstart_windows.pdf Gurobi interface guide] | ||
+ | *It seems that Gurobi does not support GPGPU computation here in [http://www.gurobi.com/pdfs/webinar-parallel-and-distributed-optimization-english.pdf page 36], and here is [https://groups.google.com/forum/?fromgroups#!searchin/gurobi/gpu/gurobi/KTP6zDvodII/oPPQT4-mofMJ a slightly more elaborate communication] between the engineering director of Gurobi and the community regarding GPGPU computation support. Need to figure out how to do parallel computation in Matlab[https://www.mathworks.com/discovery/matlab-gpu.html][https://www.mathworks.com/help/distcomp/getting-started-with-parallel-computing-toolbox.html], and where we need it in the Startup-VC Code. | ||
</onlyinclude> | </onlyinclude> | ||
+ | 2018-06-21 <br> | ||
+ | *Huge problem with code '''gmm_2stage_estimated.m'''. In line 80, we compute ''W'' by taking the inverse of matrix ''Om''. We kept getting ''W'' as an ill-conditioned matrix, whose entries are infinitely large. There might be a bug in the readjusted code. I will try to catch it by comparing with the original code. If I can't, will try to set up another skype phone call with Chenyu. | ||
+ | '''Update''': This might be related to the bug reported in the [http://mcnair.bakerinstitute.org/wiki/Estimating_Unobserved_Complementarities_between_Entrepreneurs_and_Venture_Capitalists_Matlab_Code#Bugs Matlab Code page]. I also don't think the fix was correct. I will look into that. | ||
+ | |||
+ | 2018-06-22 <br> | ||
+ | *Want to test Matlab and its parallel computing toolbox on DBServer. Cannot use the Matlab GUI remotely. This is possible due to the environment variable setting for remote access. '''Update''': now we have Matlab GUI. Nvidia CUDA is configured correctly as well. Today is a good day for a Linux user. | ||
+ | *Probably it's the right time to further configure the VNC server on DBServer. [https://www.tightvnc.com/vncserver.1.php Documentation for TightVNC configuration]. '''Done'''. Documented in the [http://mcnair.bakerinstitute.org/wiki/Database_Server_Documentation Database Server Documentation] | ||
+ | |||
+ | 2018-06-25/26 <br> | ||
+ | Looking at quick tutorials for C/C++ and CUDA, in case that I will need to read CUDA code in the future. | ||
[[Category:Work Log]] | [[Category:Work Log]] | ||
+ | |||
+ | 2018-06-27 <br> | ||
+ | Sick. Working from home. | ||
+ | |||
+ | 2018-06-28 <br> | ||
+ | Emailed Chenyu about the bug in '''gmm_2stage_estimated.m'''. | ||
+ | |||
+ | 2018-06-29 <br> | ||
+ | Set up a meeting with Jeremy to talk about the Matlab code and paper. | ||
+ | |||
+ | 2018-07-02 <br> | ||
+ | *Meeting with Jeremy. | ||
+ | *Here is a [http://hpc.fs.uni-lj.si/sites/default/files/HPC_for_dummies.pdf gentle introduction to HPC] for myself (and whoever might benefit from it), even though this was published in 2009 and by AMD (hence it is less likely to have recent information on GPU computing). | ||
+ | *This recent paper [https://www.researchgate.net/publication/308967833_GPU_Computing_Applied_to_Linear_and_Mixed_Integer_Programming GPU Computing Applied to Linear and Mixed Integer Programming] should be helpful. It summarizes recent advancement in GPU computing in the OR community. | ||
+ | |||
+ | 2018-07-03 <br> | ||
+ | Finally learned from Chenyu that the "bug" reported on June 21 was not a bug at all. We are getting singular matrices because we are using too small R and monte_M. | ||
+ | |||
+ | 2018-07-05 <br> | ||
+ | *There is still problems with W being singular. I have changed R and monte_M to be as big as in the original code. Either I neglected something, or there is still a bug. Or perhaps it's just normal. See [[File:a copy of warning messages from Matlab command windows.pdf]]. When W is singular, this leads to fitness function for the second stage ga being minimized to negative infinity. | ||
+ | *I am trying to put gurobi into a parfor in Matlab. So far, not good. Wanted to figure out how to do CPU-based parallel computing with Gurobi. I cannot find a way to run Gurobi solvers inside a parfor. I believe Matlab's linprog can, but linprog is much slower than Gurobi. There will be some trade off. I need to test this. | ||
+ | |||
+ | 2018-07-06 | ||
+ | I really need to understand the code better. Also we probably can run Gurobi inside parfor, but I need to wrap it around inside a function. | ||
+ | |||
+ | 2018-07-09 | ||
+ | I have run profiling on the Matlab code several times. It seems that moments.m takes up as much time as calling Gurobi to solve LPs. Probably we should optimize moments.m instead. | ||
+ | |||
+ | [[File:profiling.png]] | ||
+ | |||
+ | 2018-07-10 | ||
+ | Ran profiling again. With big enough R, the parallel code is much faster. Documented in the project [[Matlab, CUDA, and GPU Computing]]<br> | ||
+ | |||
+ | 2018-07-11 | ||
+ | '''Croatia 2-1 England!!!!!!!!!!!!!!!!!!!!!''' | ||
+ | |||
+ | 2018-07-12 | ||
+ | *Helped Minh install Tensorflow on DB Server. | ||
+ | *Learned to use NOTS. | ||
+ | |||
+ | 2018-07-13 | ||
+ | *Further parallelize Matlab code (msmf_corr_coeff.m). Now on our 12 cores server, one call to msmf_corr_coeff takes about 35 seconds for R=200, monte_M = 70, mktsize = 30. | ||
+ | *Will try to parallelize moments.m. Currently it takes 10 seconds per call. This in included in the 35 seconds runtime of msmf_corr_coeff.m. | ||
+ | *Reverted back to using Matlab's linprog rather than Gurobi. In a parfor, gurobi takes much longer than the native linprog to solve our LPs. I do not fully understand why this is happening. It might be the way that Gurobi was called inside a function that increases the overhead, or it might be that Gurobi couldn't utilize the full power of our CPU since all 12 cores have been scheduled to work on 12 different LPs (some LP algorithm of Gurobi has parallelism). Note that creating a model for Gurobi takes time. | ||
+ | [[FILE:msmf35seconds.png]] | ||
+ | <br> | ||
+ | 2018-07-16 | ||
+ | *Run monte_data mode with R=200, monte_M = 70, mktsize = 30. The msmf was computed fairly fast. | ||
+ | [[FILE:msmf_monte_data.png]] | ||
+ | *Run data mode with R=200, monte_M=70, mktsize = 30. | ||
+ | [[FILE:msmf_data.png]] | ||
+ | <br> | ||
+ | 2018-07-18 | ||
+ | *Run monte mode with R=200, monte_M = 200, mktsize = 30. | ||
+ | *Helped Maxine with industrial classifier. | ||
+ | *Worked on documentations for NOTS and parallelization | ||
+ | *Worked on running matlab code on NOTS | ||
+ | |||
+ | 2018-07-19 ~ 30 | ||
+ | Ran diagnostics requested by Chenyu | ||
+ | |||
+ | 2018-08 | ||
+ | Help Marcus get familiar with the Matlab code for matching VCs to startups. |
Latest revision as of 17:08, 3 August 2018
Notes from Ed
Detail about the install/config/basic use of software on the db server is on the Database Server Documentation documentation page.
Please build and link to project pages that describe what you have done to date!
Summer 2018
2018-06-11
- Set up wiki page and RDP for work. Installed CUDA on dbserver. Waiting for Matlab and Gurobi to be
installed on the dbserver (or I will do it myself later this week).
- Started looking at the paper in progress Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists and its related Matlab code.
- Began looking for ways to incorporate CUDA parallel computation with Matlab and Julia.
2018-06-12
- Installed tightVNC on dbserver by following steps 1-3 from this tightVNC tutorial
- Tested connection via localhost port.
- Matlab matching code with Ed, James, and Chenyu via Skype.
2018-06-13
- Continued searching for a method to set up vnc for dbserver without ssh.
- Started moving the selenium box, monitors, keyboards, etc, from Room 310.
- Matlab matching code with Ed, James, and Chenyu via Skype.
2018-06-18
Trained on using PostgreSQL for DBServer.
2018-06-19
Further training with SQL and SDC Platinum. Job assignment among team members.
2018-06-20
- Started reading a short tutorial on GMM and its implementation[1][2]. Should have a good grasp before the end of the week.
- Gurobi interface guide
- It seems that Gurobi does not support GPGPU computation here in page 36, and here is a slightly more elaborate communication between the engineering director of Gurobi and the community regarding GPGPU computation support. Need to figure out how to do parallel computation in Matlab[3][4], and where we need it in the Startup-VC Code.
2018-06-21
- Huge problem with code gmm_2stage_estimated.m. In line 80, we compute W by taking the inverse of matrix Om. We kept getting W as an ill-conditioned matrix, whose entries are infinitely large. There might be a bug in the readjusted code. I will try to catch it by comparing with the original code. If I can't, will try to set up another skype phone call with Chenyu.
Update: This might be related to the bug reported in the Matlab Code page. I also don't think the fix was correct. I will look into that.
2018-06-22
- Want to test Matlab and its parallel computing toolbox on DBServer. Cannot use the Matlab GUI remotely. This is possible due to the environment variable setting for remote access. Update: now we have Matlab GUI. Nvidia CUDA is configured correctly as well. Today is a good day for a Linux user.
- Probably it's the right time to further configure the VNC server on DBServer. Documentation for TightVNC configuration. Done. Documented in the Database Server Documentation
2018-06-25/26
Looking at quick tutorials for C/C++ and CUDA, in case that I will need to read CUDA code in the future.
2018-06-27
Sick. Working from home.
2018-06-28
Emailed Chenyu about the bug in gmm_2stage_estimated.m.
2018-06-29
Set up a meeting with Jeremy to talk about the Matlab code and paper.
2018-07-02
- Meeting with Jeremy.
- Here is a gentle introduction to HPC for myself (and whoever might benefit from it), even though this was published in 2009 and by AMD (hence it is less likely to have recent information on GPU computing).
- This recent paper GPU Computing Applied to Linear and Mixed Integer Programming should be helpful. It summarizes recent advancement in GPU computing in the OR community.
2018-07-03
Finally learned from Chenyu that the "bug" reported on June 21 was not a bug at all. We are getting singular matrices because we are using too small R and monte_M.
2018-07-05
- There is still problems with W being singular. I have changed R and monte_M to be as big as in the original code. Either I neglected something, or there is still a bug. Or perhaps it's just normal. See File:A copy of warning messages from Matlab command windows.pdf. When W is singular, this leads to fitness function for the second stage ga being minimized to negative infinity.
- I am trying to put gurobi into a parfor in Matlab. So far, not good. Wanted to figure out how to do CPU-based parallel computing with Gurobi. I cannot find a way to run Gurobi solvers inside a parfor. I believe Matlab's linprog can, but linprog is much slower than Gurobi. There will be some trade off. I need to test this.
2018-07-06 I really need to understand the code better. Also we probably can run Gurobi inside parfor, but I need to wrap it around inside a function.
2018-07-09 I have run profiling on the Matlab code several times. It seems that moments.m takes up as much time as calling Gurobi to solve LPs. Probably we should optimize moments.m instead.
2018-07-10
Ran profiling again. With big enough R, the parallel code is much faster. Documented in the project Matlab, CUDA, and GPU Computing
2018-07-11 Croatia 2-1 England!!!!!!!!!!!!!!!!!!!!!
2018-07-12
- Helped Minh install Tensorflow on DB Server.
- Learned to use NOTS.
2018-07-13
- Further parallelize Matlab code (msmf_corr_coeff.m). Now on our 12 cores server, one call to msmf_corr_coeff takes about 35 seconds for R=200, monte_M = 70, mktsize = 30.
- Will try to parallelize moments.m. Currently it takes 10 seconds per call. This in included in the 35 seconds runtime of msmf_corr_coeff.m.
- Reverted back to using Matlab's linprog rather than Gurobi. In a parfor, gurobi takes much longer than the native linprog to solve our LPs. I do not fully understand why this is happening. It might be the way that Gurobi was called inside a function that increases the overhead, or it might be that Gurobi couldn't utilize the full power of our CPU since all 12 cores have been scheduled to work on 12 different LPs (some LP algorithm of Gurobi has parallelism). Note that creating a model for Gurobi takes time.
- Run monte_data mode with R=200, monte_M = 70, mktsize = 30. The msmf was computed fairly fast.
- Run data mode with R=200, monte_M=70, mktsize = 30.
- Run monte mode with R=200, monte_M = 200, mktsize = 30.
- Helped Maxine with industrial classifier.
- Worked on documentations for NOTS and parallelization
- Worked on running matlab code on NOTS
2018-07-19 ~ 30 Ran diagnostics requested by Chenyu
2018-08 Help Marcus get familiar with the Matlab code for matching VCs to startups.