Hi Anders,
Thank you for your swift response. I've made progress and managed to compile the C code and run the MATLAB code. Here's a detailed rundown of what I did:
### System & Software Configuration:
- **MATLAB Version**: 2021b
- **CUDA Version**: 11.7.0
- **NVIDIA Driver Version**: 515.43.04
- **GCC Version**: 10.3.0
### Compilation of MCmatlab.c:
```
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' -outdir +MCmatlab/@model/private ./+MCmatlab/src/MCmatlab.c -lut
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -std=c11 -Wall' -outdir +MCmatlab/@model/private -output MCmatlab_singlethreaded ./+MCmatlab/src/MCmatlab.c -lut
```
### Compilation of finiteElementHeatPropagator.c:
```
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' -outdir +MCmatlab/@model/private ./+MCmatlab/src/finiteElementHeatPropagator.c -lut
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -std=c11 -Wall' -outdir +MCmatlab/@model/private -output finiteElementHeatPropagator_singlethreaded ./+MCmatlab/src/finiteElementHeatPropagator.c -lut
```
### Compilation of getDownsampledParamVals.c:
```
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' -outdir +MCmatlab/@model/private ./+MCmatlab/src/getDownsampledParamVals.c
```
### Preparation for CUDA Compilation:
```
cp ./+MCmatlab/src/MCmatlab.c ./+MCmatlab/src/MCmatlab_CUDA.cu
cp ./+MCmatlab/src/finiteElementHeatPropagator.c ./+MCmatlab/src/finiteElementHeatPropagator_CUDA.cu
```
### Compilation Using mexcuda:
Initially, using mexcuda directly in the command line did not work. So, I opened MATLAB via the command line and compiled with:
```
mexcuda COMPFLAGS='-use_fast_math -res-usage $COMPFLAGS' -outdir +MCmatlab/@model/private ./+MCmatlab/src/MCmatlab_CUDA.cu
mexcuda COMPFLAGS='-use_fast_math -res-usage $COMPFLAGS' -outdir +MCmatlab/@model/private ./+MCmatlab/src/finiteElementHeatPropagator_CUDA.cu
```
Note: I had to remove the -llibut flag and commented out the utIsInterruptPending function in the .cu files. Since it's used to catch ctrl+c during MEX execution, I believe it's not crucial.
### Performance Testing:
#### Without GPU:
```
function fluence = main_mc()
%% Geometry definition
MCmatlab.closeMCmatlabFigures();
model = MCmatlab.model;
model.G.nx = 200; % Number of bins in the x direction
model.G.ny = 200; % Number of bins in the y direction
model.G.nz = 30; % Number of bins in the z direction %10
model.G.Lx = 1; % [cm] x size of simulation cuboid
model.G.Ly = 1; % [cm] y size of simulation cuboid
model.G.Lz = 0.15; % [cm] z size of simulation cuboid %0.1
model.G.mediaPropertiesFunc = @mediaPropertiesFunc; % Media properties defined as a function at the end of this file
model.G.geomFunc = @geometryDefinition; % Function to use for defining the distribution of media in the cuboid. Defined at the end of this m file.
%model = plot(model,'G');
% X = model.G.x;
% Y = model.G.y;
% Z = model.G.z;
%% Top-hat focal plane, Gaussian angular intensity distribution
model.MC.nPhotonsRequested = 4*5e6; % # of photons to launch
% model.MC.nExamplePaths = 20;
model.MC.matchedInterfaces = true; % Assumes all refractive indices are the same
model.MC.boundaryType = 2; % 0: No escaping boundaries, 1: All cuboid boundaries are escaping, 2: Top cuboid boundary only is escaping, 3: Top and bottom boundaries are escaping, while the side boundaries are cyclic
w = 400%:1:700;
for i=1%:3%:length(w)
model.MC.wavelength = w(i);
model.MC.lightSource.xFocus = 0; % [cm] x position of focus
model.MC.lightSource.yFocus = 0; % [cm] y position of focus
model.MC.lightSource.zFocus = 0; % [cm] z position of focus
model.MC.lightSource.sourceType = 4; % 0: Pencil beam, 1: Isotropically emitting line or point source, 2: Infinite plane wave, 3: Laguerre-Gaussian LG01 beam, 4: Radial-factorizable beam (e.g., a Gaussian beam), 5: X/Y factorizable beam (e.g., a rectangular LED emitter)
model.MC.lightSource.focalPlaneIntensityDistribution.radialDistr = 0; % Radial focal plane intensity distribution - 0: Top-hat, 1: Gaussian, Array: Custom. Doesn't need to be normalized.
model.MC.lightSource.focalPlaneIntensityDistribution.radialWidth = 1.2; % [cm] Radial focal plane 1/e^2 radius if top-hat or Gaussian or half-width of the full distribution if custom
model.MC.lightSource.angularIntensityDistribution.radialDistr = 2; % Radial angular intensity distribution - 0: Top-hat, 1: Gaussian, 2: Cosine (Lambertian), Array: Custom. Doesn't need to be normalized.
model.MC.lightSource.angularIntensityDistribution.radialWidth = 0; %pi/6; % [rad] Radial angular 1/e^2 half-angle if top-hat or Gaussian or half-angle of the full distribution if custom. For a diffraction limited Gaussian beam, this should be set to model.MC.wavelength*1e-9/(pi*model.MC.lightSource.focalPlaneIntensityDistribution.radialWidth*1e-2))
model.MC.lightSource.theta = 0; % [rad] Polar angle of beam center axis = 0; %pi/2; % [rad] Azimuthal angle of light source center axis
% Collector
model.MC.useLightCollector = true;
model.MC.lightCollector.x = 0; %0; % [cm] x position of either the center of the objective lens focal plane or the fiber tip
model.MC.lightCollector.y = 0; %-0.005; % [cm] y position
model.MC.lightCollector.z = 0; % [cm] z position
model.MC.lightCollector.theta = 0; % [rad] Polar angle of direction the light collector is facing
model.MC.lightCollector.phi = 0; %pi/2; % [rad] Azimuthal angle of direction the light collector is facing
model.MC.lightCollector.f = 0.0001; % [cm] Focal length of the objective lens (if light collector is a fiber, set this to Inf).
model.MC.lightCollector.diam = sqrt(model.G.Lx^2 + model.G.Ly^2); % [cm] Diameter of the light collector aperture. For an ideal thin lens, this is 2*f*tan(asin(NA)).
model.MC.lightCollector.fieldSize = sqrt(model.G.Lx^2 + model.G.Ly^2); % [cm] Field Size of the imaging system (diameter of area in object plane that gets imaged). Only used for finite f.
model.MC.lightCollector.NA = 0.22; % [-] Fiber NA. Only used for infinite f.
model.MC.lightCollector.res = 200; % X and Y resolution of light collector in pixels, only used for finite f
model.MC.depositionCriteria.onlyCollected = true;
%CPU
model.MC.useGPU = false; % (Default: false) Use CUDA acceleration for NVIDIA GPUs
% model = runMonteCarlo(model);
%GPU
%model.MC.useGPU = true; % (Default: false) Use CUDA acceleration for NVIDIA GPUs
%model.MC.GPUdevice = 0; % (Default: 0, the first GPU) The index of the GPU device to use for the simulation
tic
model = runMonteCarlo(model);
toc
fluencee = model.MC.normalizedFluenceRate;
fluence {i,1} = fluencee(:,:,4);
end
end
%save spatially_varying_epidermis_16_voxs_CUDA.mat spatially_varying_epidermis_16_voxs_CUDA
%% Geometry function(s) (see readme for details)
function M = geometryDefinition(X,Y,Z,parameters)
air_thick = 0.0175;
dermis_thick = 0.0375;
rng(0,'twister');
a = 0;
b = 100;
r = (b-a).*rand(16,1) + a;
r = r/10000;
Out_Epth = reshape(r, [4,4]);
media(1,:) = 2:5;
media(2,:) = 6:9;
media(3,:) = 10:13;
media(4,:) = 14:17;
x_res = 0.25;
y_res = 0.25;
M = ones(size(X)); % Air
for m=1:4
for n=1:4
M((X>=(-0.5+(m-1)*x_res))&((-0.5+m*x_res)>X) & (Y>=-0.5+(n-1)*y_res)&((-0.5+n*y_res)>Y) & (Z>=air_thick)&((air_thick + Out_Epth(m,n))>Z)) = media(m,n);
M((X>=(-0.5+(m-1)*x_res))&((-0.5+m*x_res)>X) & (Y>=-0.5+(n-1)*y_res)&((-0.5+n*y_res)>Y) & (Z>=(air_thick + Out_Epth(m,n)))&((air_thick + Out_Epth(m,n)+dermis_thick)>Z)) = 18; %dermis
M(Z > (air_thick + Out_Epth(m,n)+dermis_thick)) = 19; %fat
end
end
end
%% Media Properties function (see readme for details)
function mediaProperties = mediaPropertiesFunc(parameters)
disp("Inside")
mediaProperties = MCmatlab.mediumProperties;
j=1;
mediaProperties(j).name = 'air';
mediaProperties(j).mua = 1e-8; % Absorption coefficient [cm^-1]
mediaProperties(j).mus = 1e-8; % Scattering coefficient [cm^-1]
mediaProperties(j).g = 1; % Henyey-Greenstein scattering anisotropy
mel = 0.01:0.005:0.1;
hem = 0;
water = 0.75;
s = 0.75;
fat = 0;
for j=2:17
mediaProperties(j).name = ['vox', num2str(j-1)];
mediaProperties(j).mua = @(wavelength) calc_mua(wavelength, s,hem,water,fat,mel(j-1));
mediaProperties(j).mus = @(wavelength) calc_mus(wavelength, 40,0,1,0.9);
mediaProperties(j).g = 0.9;
mediaProperties(j).VHC = 3391*1.109e-3; % [J cm^-3 K^-1]
mediaProperties(j).TC = 0.37e-2; % [W cm^-1 K^-1]
end
j=18;
mediaProperties(j).name = 'dermis';
mediaProperties(j).mua = @func_mua18;
function mua = func_mua18(wavelength)
B = 0.002; % Blood content
S = 0.67; % Blood oxygen saturation
W = 0.65; % Water content
M = 0; % Melanin content
F = 0; % Fat content
mua = calc_mua(wavelength,S,B,W,F,M); % Jacques "Optical properties of biological tissues: a review" eq. 12
end
mediaProperties(j).mus = @func_mus18;
function mus = func_mus18(wavelength)
aPrime = 42.4; % musPrime at 500 nm
fRay = 0.62; % Fraction of scattering due to Rayleigh scattering
bMie = 1; % Scattering power for Mie scattering
g = 0.9; % Scattering anisotropy
mus = calc_mus(wavelength,aPrime,fRay,bMie,g); % Jacques "Optical properties of biological tissues: a review" eq. 2
end
mediaProperties(j).g = 0.9;
mediaProperties(j).VHC = 3391*1.109e-3; % [J cm^-3 K^-1]
mediaProperties(j).TC = 0.37e-2; % [W cm^-1 K^-1]
j=19;
mediaProperties(j).name = 'fat';
mediaProperties(j).mua = @func_mua19;
function mua = func_mua19(wavelength)
B = 0; % Blood content
S = 0; % Blood oxygen saturation
W = 0.05; % Water content
M = 0; % Melanin content
F = 0.95; % Fat content
mua = calc_mua(wavelength,S,B,W,F,M); % Jacques "Optical properties of biological tissues: a review" eq. 12
end
mediaProperties(j).mus = @func_mus19;
function mus = func_mus19(wavelength)
aPrime = 40; % musPrime at 500 nm
fRay = 0.75; % Fraction of scattering due to Rayleigh scattering
bMie = 1; % Scattering power for Mie scattering
g = 0.9; % Scattering anisotropy
mus = calc_mus(wavelength,aPrime,fRay,bMie,g); % Jacques "Optical properties of biological tissues: a review" eq. 2
end
mediaProperties(j).g = 0.9;
end
function mus = calc_mus(wavelength,aPrime,fRay,bMie,g)
fMie = 1 - fRay;
musPrime= aPrime*(fRay*(wavelength/500)^(-4) + fMie*(wavelength/500)^(-bMie)); % Jacques "Optical properties of biological tissues: a review" eq. 2
mus = musPrime/(1-g);
end
function mua = calc_mua(wavelength,S,B,W,F,M)
persistent muadeoxy muafat muamel muaoxy muawater musp nmLIB
if isempty(muadeoxy)
load('spectralLIB.mat','muadeoxy','muafat','muamel','muaoxy','muawater','musp','nmLIB');
end
mua_deoxy = interp1(nmLIB,muadeoxy,wavelength);
mua_fat = interp1(nmLIB,muafat,wavelength);
mua_mel = interp1(nmLIB,muamel,wavelength);
mua_oxy = interp1(nmLIB,muaoxy,wavelength);
mua_water = interp1(nmLIB,muawater,wavelength);
% Jacques "Optical properties of biological tissues: a review" eq. 12:
mua = B*S*mua_oxy + B*(1-S)*mua_deoxy + W*mua_water + F*mua_fat + M*mua_mel; % Equation 12 without the bilirubin and beta-Carotene terms
end
```
Result: Took around 130 seconds.
#### With GPU
Lines 63 to 69 are swapped.
```
%CPU
%model.MC.useGPU = false; % (Default: false) Use CUDA acceleration for NVIDIA GPUs
% model = runMonteCarlo(model);
%GPU
model.MC.useGPU = true; % (Default: false) Use CUDA acceleration for NVIDIA GPUs
model.MC.GPUdevice = 0; % (Default: 0, the first GPU) The index of the GPU device to use for the simulation
```
Result: Took around 180 seconds.
From the above tests, it appears that the GPU accelerated version is slower. I'd appreciate your insights or suggestions on what might be causing this.