Latest revision as of 12:34, 21 September 2020

This page documents all changes that Wei made to the Matlab code for Matching Entrepreneurs to VC. It also explains the decisions Wei made, and analyzes where further improvement might be achieved in the future.

Changes made to the code

Project
Parallelize msmf corr coeff.m
Project Information
Has title	Parallelize msmf corr coeff.m
Has owner	Wei Wu
Has start date	2018-07-09
Has deadline date
Has keywords	Matlab, parallel computing, CPU
Has project status	Active
Is dependent on	Estimating Unobserved Complementarities between Entrepreneurs and Venture Capitalists Matlab Code
Dependent(s):	NOTS Computing for Matching Entrepreneurs to VCs
Has sponsor	McNair Center
	Copyright © 2019 edegan.com. All Rights Reserved.

To speed up the valuation of the objective function used by ga, Wei parallelized a couple of lines in msmf_corr_coeff.m.

Around line 80 of msmf_corr_coeff

Original code:

for m = 1 : M
   cholA = chol(exchsig(N1(m)-1, N2(m)-1, th));
   epsim12 = zeros(N1(m), N2(m), R);
   epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1: psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]);
   epsim12(2:end, 2:end, :) = epsimR12;          
   f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - reshape(epsim12, [N1(m)*N2(m) R])                   
end

For R=200, monte_M = 70 (hence large M), and mktsize = 30, each call to msmf_corr_coeff takes ~140 seconds. Computing epsimR12 is taking more than 50% time of msmf_corr_coeff.m. Each computation of epsimR12 is taking no more than 2 seconds, but for a large M it will take a long time in total. To improve this, we use a parfor to paralellize this part.

New code:

epsim_cell = cell(M,1);  
parfor m = 1 : M
   cholA = chol(exchsig(N1(m)-1, N2(m)-1, th));
   epsim12 = zeros(N1(m), N2(m), R);
   epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1 : psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]);
   epsim12(2:end, 2:end, :) = epsimR12;
   epsim_cell{m} = reshape(epsim12, [N1(m)*N2(m) R]);                           
end   
for m = 1 : M
    f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - epsim_cell{m};
end

To avoid data-racing, in each iteration m, we stored epsim12 into epsim_cell, and compute f using another for loop. Using parfor on a 12-cores machine gives a four time speed up for computing msmf_corr_coeff.m. Note that the actual speedup depends how many cores you are using. For the same R, monte_M, and mktsize, it now takes ~ 35 seconds to finish one call to msmf_corr_coeff.

Around line 120 of msmf_corr_coeff solving LP problems

For a large R, e.g. R = 200, solving 200 linear programming problems in parallel should be beneficial, especially when the each LP itself is easy to solve (takes less than 0.1 seconds each). Initially we managed to use gurobi inside a parfor:

   % Gurobi inisde parfor. For large R (>100). By Wei.
    model.rhs = [full(constrB); full(constrM)];
    model.A = [sparse(B);sparse(-B)];
    model.ub =  ones(L,1);
    model.lb =  zeros(L,1);
    model.sense = '<';
    for r = 1 : R
        % solve_LP is a function wrapper. It calls gurobi to solve a single LP. 
        asgr(:,r) = solve_LP(r,f,model);
    end

   % The function wrapper to use Gurobi
    function X = solve_LP(r,f,model)
       model.obj = f(:,r);    
       parameters.outputflag = 0;
       result = gurobi(model, parameters);
       X = result.x;
    end

However using Gurobi actually did not help with the code performance. This is possibly due to the cost to create models for each Gurobi object. Gurobi is indeed the best commercial solver in the market, but it should be used for solving large LPs (hundreds of thousands of constraints), where as our Lp has only a few thousand constraints. We reversed back to using Matlab's native linprog:

   % Solve LPs in parallel with Matlab's native linprog
    parfor r = 1 : R
        options = optimset('Display', 'off');
        [asgr(:, r), ~, eflag(r)] = linprog(f(:, r), [B;-B], [constrB; constrM], [], [], zeros(L, 1), ones(L,1),options);
    end

Possible Further Improvements

At this point I believe I have done everything I could for code performance. However if you look at the profiler, you will find that the function moments also takes a good percentage of time. I tried to parallelise moments with parfor, but it did not give any improvements.

Difference between revisions of "Parallelize msmf corr coeff.m"

Latest revision as of 12:34, 21 September 2020

Contents

Changes made to the code

Around line 80 of msmf_corr_coeff

Around line 120 of msmf_corr_coeff solving LP problems

Possible Further Improvements

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Sites

Sections

Organizations

Help

Tools

@@ Line 1: / Line 1: @@
-{{McNair Projects
+{{Project
+|Has project output=
+|Has sponsor=McNair Center
 |Has title=Parallelize msmf corr coeff.m
 |Has owner=Wei Wu,
@@ Line 9: / Line 11: @@
 }}
-This page documents all changes that Wei made to the Matlab code for Matching Entrepreneurs to VC. It also explains the decisions Wei made, and where further improvement might be achieved in the future.
+This page documents all changes that Wei made to the Matlab code for Matching Entrepreneurs to VC. It also explains the decisions Wei made, and analyzes where further improvement might be achieved in the future.
 =Changes made to the code=
@@ Line 25: / Line 27: @@
   end
-For large R, monte_M (hence large M), and mktsize, computing epsimR12 is taking more than 50% time of msmf_corr_coeff.m.
+For R=200, monte_M = 70 (hence large M), and mktsize = 30, each call to msmf_corr_coeff takes ~140 seconds. Computing epsimR12 is taking more than 50% time of msmf_corr_coeff.m. Each computation of epsimR12 is taking no more than 2 seconds, but for a large M it will take a long time in total. To improve this, we use a parfor to paralellize this part.
-[[FILE:epsimR12.png]]
+<br>
+[[FILE:epsimR12_new.png]]
+<br>
+New code:
+ epsim_cell = cell(M,1);
+  parfor m = 1 : M
+    cholA = chol(exchsig(N1(m)-1, N2(m)-1, th));
+    epsim12 = zeros(N1(m), N2(m), R);
+    epsimR12 = reshape(mtimesx(repmat(cholA', [1 1 R]), reshape(epsimR(psa(m)+1 : psa(m)+(N1(m)-1)*(N2(m)-1), :, :), [(N1(m)-1)*(N2(m)-1) 1 R])), [N1(m)-1 N2(m)-1 R]);
+    epsim12(2:end, 2:end, :) = epsimR12;
+    epsim_cell{m} = reshape(epsim12, [N1(m)*N2(m) R]);
+ end
+ for m = 1 : M
+     f(psa(m)+1 : psa(m+1), :) = f(psa(m)+1 : psa(m+1), :) - epsim_cell{m};
+ end
+To avoid data-racing, in each iteration m, we stored epsim12 into epsim_cell, and compute f using another for loop. Using parfor on a 12-cores machine gives a four time speed up for computing msmf_corr_coeff.m. Note that the actual speedup depends how many cores you are using. For the same R, monte_M, and mktsize, it now takes ~ 35 seconds to finish one call to msmf_corr_coeff.
+[[FILE:msmf_corr_coeff_speedup.png]]
+==Around line 120 of msmf_corr_coeff solving LP problems==
+For a large R, e.g. R = 200, solving 200 linear programming problems in parallel should be beneficial, especially when the each LP itself is easy to solve (takes less than 0.1 seconds each). Initially we managed to use gurobi inside a parfor:
+    % Gurobi inisde parfor. For large R (>100). By Wei.
+     model.rhs = [full(constrB); full(constrM)];
+     model.A = [sparse(B);sparse(-B)];
+     model.ub =  ones(L,1);
+     model.lb =  zeros(L,1);
+     model.sense = '<';
+     for r = 1 : R
+         % solve_LP is a function wrapper. It calls gurobi to solve a single LP.
+         asgr(:,r) = solve_LP(r,f,model);
+     end
+    % The function wrapper to use Gurobi
+     function X = solve_LP(r,f,model)
+        model.obj = f(:,r);
+        parameters.outputflag = 0;
+        result = gurobi(model, parameters);
+        X = result.x;
+     end
+However using Gurobi actually did not help with the code performance. This is possibly due to the cost to create models for each Gurobi object. Gurobi is indeed the best commercial solver in the market, but it should be used for solving large LPs (hundreds of thousands of constraints), where as our Lp has only a few thousand constraints. We reversed back to using Matlab's native linprog:
+    % Solve LPs in parallel with Matlab's native linprog
+     parfor r = 1 : R
+         options = optimset('Display', 'off');
+         [asgr(:, r), ~, eflag(r)] = linprog(f(:, r), [B;-B], [constrB; constrM], [], [], zeros(L, 1), ones(L,1),options);
+     end
+=Possible Further Improvements=
+At this point I believe I have done everything I could for code performance. However if you look at the profiler, you will find that the function moments also takes a good percentage of time. I tried to parallelise moments with parfor, but it did not give any improvements.