Parallelize msmf corr coeff.m

Jump to navigation Jump to search

Parallelize msmf corr coeff.m
Project logo 02.png
Project Information
Has title Parallelize msmf corr coeff.m
Has owner Wei Wu
Has start date 2018-07-09
Has deadline date
Has keywords Matlab, parallel computing, CPU
Has project status Active
Is dependent on Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code
Dependent(s): NOTS Computing for Matching Entrepreneurs to VCs
Has sponsor McNair Center
Copyright © 2019 All Rights Reserved.

This page documents all changes that Wei made to the Matlab code for Matching Entrepreneurs to VC. It also explains the decisions Wei made, and analyzes where further improvement might be achieved in the future.

Changes made to the code

To speed up the valuation of the objective function used by ga, Wei parallelized a couple of lines in msmf_corr_coeff.m.

Around line 80 of msmf_corr_coeff

Original code:

for m = 1 : M
   cholA = chol(exchsig(N1(m)-1, N2(m)-1, th));
   epsim12 = zeros(N1(m), N2(m), R);
   epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1: psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]);
   epsim12(2:end, 2:end, :) = epsimR12;          
   f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - reshape(epsim12, [N1(m)*N2(m) R])                   

For R=200, monte_M = 70 (hence large M), and mktsize = 30, each call to msmf_corr_coeff takes ~140 seconds. Computing epsimR12 is taking more than 50% time of msmf_corr_coeff.m. Each computation of epsimR12 is taking no more than 2 seconds, but for a large M it will take a long time in total. To improve this, we use a parfor to paralellize this part.
EpsimR12 new.png
New code:

epsim_cell = cell(M,1);  
parfor m = 1 : M
   cholA = chol(exchsig(N1(m)-1, N2(m)-1, th));
   epsim12 = zeros(N1(m), N2(m), R);
   epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1 : psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]);
   epsim12(2:end, 2:end, :) = epsimR12;
   epsim_cell{m} = reshape(epsim12, [N1(m)*N2(m) R]);                           
for m = 1 : M
    f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - epsim_cell{m};

To avoid data-racing, in each iteration m, we stored epsim12 into epsim_cell, and compute f using another for loop. Using parfor on a 12-cores machine gives a four time speed up for computing msmf_corr_coeff.m. Note that the actual speedup depends how many cores you are using. For the same R, monte_M, and mktsize, it now takes ~ 35 seconds to finish one call to msmf_corr_coeff.

Msmf corr coeff speedup.png

Around line 120 of msmf_corr_coeff solving LP problems

For a large R, e.g. R = 200, solving 200 linear programming problems in parallel should be beneficial, especially when the each LP itself is easy to solve (takes less than 0.1 seconds each). Initially we managed to use gurobi inside a parfor:

   % Gurobi inisde parfor. For large R (>100). By Wei.
    model.rhs = [full(constrB); full(constrM)];
    model.A = [sparse(B);sparse(-B)];
    model.ub =  ones(L,1); =  zeros(L,1);
    model.sense = '<';
    for r = 1 : R
        % solve_LP is a function wrapper. It calls gurobi to solve a single LP. 
        asgr(:,r) = solve_LP(r,f,model);
   % The function wrapper to use Gurobi
    function X = solve_LP(r,f,model)
       model.obj = f(:,r);    
       parameters.outputflag = 0;
       result = gurobi(model, parameters);
       X = result.x;

However using Gurobi actually did not help with the code performance. This is possibly due to the cost to create models for each Gurobi object. Gurobi is indeed the best commercial solver in the market, but it should be used for solving large LPs (hundreds of thousands of constraints), where as our Lp has only a few thousand constraints. We reversed back to using Matlab's native linprog:

   % Solve LPs in parallel with Matlab's native linprog
    parfor r = 1 : R
        options = optimset('Display', 'off');
        [asgr(:, r), ~, eflag(r)] = linprog(f(:, r), [B;-B], [constrB; constrM], [], [], zeros(L, 1), ones(L,1),options);

Possible Further Improvements

At this point I believe I have done everything I could for code performance. However if you look at the profiler, you will find that the function moments also takes a good percentage of time. I tried to parallelise moments with parfor, but it did not give any improvements.