Untitled

mail@pastecode.io avatar
unknown
plain_text
a year ago
13 kB
9
Indexable
Never
Hi Anders,

Thank you for your swift response. I've made progress and managed to compile the C code and run the MATLAB code. Here's a detailed rundown of what I did:

### System & Software Configuration:

- **MATLAB Version**: 2021b
- **CUDA Version**: 11.7.0
- **NVIDIA Driver Version**: 515.43.04
- **GCC Version**: 10.3.0

###  Compilation of MCmatlab.c:

```
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' -outdir +MCmatlab/@model/private ./+MCmatlab/src/MCmatlab.c -lut
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -std=c11 -Wall' -outdir +MCmatlab/@model/private -output MCmatlab_singlethreaded ./+MCmatlab/src/MCmatlab.c -lut
```

### Compilation of finiteElementHeatPropagator.c:

```
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' -outdir +MCmatlab/@model/private ./+MCmatlab/src/finiteElementHeatPropagator.c -lut
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -std=c11 -Wall' -outdir +MCmatlab/@model/private -output finiteElementHeatPropagator_singlethreaded ./+MCmatlab/src/finiteElementHeatPropagator.c -lut
```

### Compilation of getDownsampledParamVals.c:

```
mex COPTIMFLAGS='$COPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' LDOPTIMFLAGS='$LDOPTIMFLAGS -Ofast -fopenmp -std=c11 -Wall' -outdir +MCmatlab/@model/private ./+MCmatlab/src/getDownsampledParamVals.c
```

### Preparation for CUDA Compilation:

```
cp ./+MCmatlab/src/MCmatlab.c ./+MCmatlab/src/MCmatlab_CUDA.cu
cp ./+MCmatlab/src/finiteElementHeatPropagator.c ./+MCmatlab/src/finiteElementHeatPropagator_CUDA.cu
```

### Compilation Using mexcuda:

Initially, using mexcuda directly in the command line did not work. So, I opened MATLAB via the command line and compiled with:

```
mexcuda COMPFLAGS='-use_fast_math -res-usage $COMPFLAGS' -outdir +MCmatlab/@model/private ./+MCmatlab/src/MCmatlab_CUDA.cu
mexcuda COMPFLAGS='-use_fast_math -res-usage $COMPFLAGS' -outdir +MCmatlab/@model/private ./+MCmatlab/src/finiteElementHeatPropagator_CUDA.cu
```

Note: I had to remove the -llibut flag and commented out the utIsInterruptPending function in the .cu files. Since it's used to catch ctrl+c during MEX execution, I believe it's not crucial.

### Performance Testing:

#### Without GPU:

```
function fluence = main_mc()
%% Geometry definition
MCmatlab.closeMCmatlabFigures();
model = MCmatlab.model;

model.G.nx                = 200; % Number of bins in the x direction
model.G.ny                = 200; % Number of bins in the y direction
model.G.nz                = 30; % Number of bins in the z direction  %10
model.G.Lx                = 1; % [cm] x size of simulation cuboid
model.G.Ly                = 1; % [cm] y size of simulation cuboid
model.G.Lz                = 0.15; % [cm] z size of simulation cuboid  %0.1

model.G.mediaPropertiesFunc = @mediaPropertiesFunc; % Media properties defined as a function at the end of this file
model.G.geomFunc            = @geometryDefinition; % Function to use for defining the distribution of media in the cuboid. Defined at the end of this m file.

%model = plot(model,'G');

% X = model.G.x;
% Y = model.G.y;
% Z = model.G.z;

%% Top-hat focal plane, Gaussian angular intensity distribution
model.MC.nPhotonsRequested        = 4*5e6; % # of photons to launch

% model.MC.nExamplePaths            = 20;

model.MC.matchedInterfaces        = true; % Assumes all refractive indices are the same
model.MC.boundaryType             = 2;    % 0: No escaping boundaries, 1: All cuboid boundaries are escaping, 2: Top cuboid boundary only is escaping, 3: Top and bottom boundaries are escaping, while the side boundaries are cyclic

w = 400%:1:700;
for i=1%:3%:length(w)
model.MC.wavelength               = w(i);
model.MC.lightSource.xFocus       = 0; % [cm] x position of focus
model.MC.lightSource.yFocus       = 0; % [cm] y position of focus
model.MC.lightSource.zFocus       = 0; % [cm] z position of focus

model.MC.lightSource.sourceType   = 4; % 0: Pencil beam, 1: Isotropically emitting line or point source, 2: Infinite plane wave, 3: Laguerre-Gaussian LG01 beam, 4: Radial-factorizable beam (e.g., a Gaussian beam), 5: X/Y factorizable beam (e.g., a rectangular LED emitter)

model.MC.lightSource.focalPlaneIntensityDistribution.radialDistr = 0; % Radial focal plane intensity distribution - 0: Top-hat, 1: Gaussian, Array: Custom. Doesn't need to be normalized.
model.MC.lightSource.focalPlaneIntensityDistribution.radialWidth = 1.2; % [cm] Radial focal plane 1/e^2 radius if top-hat or Gaussian or half-width of the full distribution if custom
model.MC.lightSource.angularIntensityDistribution.radialDistr = 2; % Radial angular intensity distribution - 0: Top-hat, 1: Gaussian, 2: Cosine (Lambertian), Array: Custom. Doesn't need to be normalized.
model.MC.lightSource.angularIntensityDistribution.radialWidth = 0; %pi/6; % [rad] Radial angular 1/e^2 half-angle if top-hat or Gaussian or half-angle of the full distribution if custom. For a diffraction limited Gaussian beam, this should be set to model.MC.wavelength*1e-9/(pi*model.MC.lightSource.focalPlaneIntensityDistribution.radialWidth*1e-2))
model.MC.lightSource.theta = 0; % [rad] Polar angle of beam center axis         = 0;  %pi/2; % [rad] Azimuthal angle of light source center axis

% Collector
model.MC.useLightCollector        = true;
model.MC.lightCollector.x         = 0;   %0; % [cm] x position of either the center of the objective lens focal plane or the fiber tip
model.MC.lightCollector.y         = 0;   %-0.005; % [cm] y position
model.MC.lightCollector.z         = 0; % [cm] z position

model.MC.lightCollector.theta     = 0; % [rad] Polar angle of direction the light collector is facing
model.MC.lightCollector.phi       = 0; %pi/2; % [rad] Azimuthal angle of direction the light collector is facing

model.MC.lightCollector.f         = 0.0001; % [cm] Focal length of the objective lens (if light collector is a fiber, set this to Inf).
model.MC.lightCollector.diam      = sqrt(model.G.Lx^2 + model.G.Ly^2);   % [cm] Diameter of the light collector aperture. For an ideal thin lens, this is 2*f*tan(asin(NA)).
model.MC.lightCollector.fieldSize = sqrt(model.G.Lx^2 + model.G.Ly^2);   % [cm] Field Size of the imaging system (diameter of area in object plane that gets imaged). Only used for finite f.

model.MC.lightCollector.NA        = 0.22; % [-] Fiber NA. Only used for infinite f.
model.MC.lightCollector.res       = 200;   % X and Y resolution of light collector in pixels, only used for finite f
model.MC.depositionCriteria.onlyCollected = true;


%CPU
model.MC.useGPU                   = false; % (Default: false) Use CUDA acceleration for NVIDIA GPUs
% model = runMonteCarlo(model);

%GPU
%model.MC.useGPU                   = true;  % (Default: false) Use CUDA acceleration for NVIDIA GPUs
%model.MC.GPUdevice                = 0;     % (Default: 0, the first GPU) The index of the GPU device to use for the simulation
tic
model = runMonteCarlo(model);
toc

fluencee = model.MC.normalizedFluenceRate;
fluence {i,1} = fluencee(:,:,4);

end
end
%save spatially_varying_epidermis_16_voxs_CUDA.mat spatially_varying_epidermis_16_voxs_CUDA
%% Geometry function(s) (see readme for details)
function M = geometryDefinition(X,Y,Z,parameters)
air_thick = 0.0175;
dermis_thick = 0.0375;
  
rng(0,'twister');
a = 0;
b = 100;
r = (b-a).*rand(16,1) + a;
r = r/10000;
Out_Epth = reshape(r, [4,4]);
media(1,:) = 2:5;
media(2,:) = 6:9;
media(3,:) = 10:13;
media(4,:) = 14:17;
x_res = 0.25;
y_res = 0.25;   

M = ones(size(X)); % Air
 
for m=1:4
    for n=1:4
        M((X>=(-0.5+(m-1)*x_res))&((-0.5+m*x_res)>X)   &   (Y>=-0.5+(n-1)*y_res)&((-0.5+n*y_res)>Y)   &   (Z>=air_thick)&((air_thick + Out_Epth(m,n))>Z)) = media(m,n); 
        M((X>=(-0.5+(m-1)*x_res))&((-0.5+m*x_res)>X)   &   (Y>=-0.5+(n-1)*y_res)&((-0.5+n*y_res)>Y)   &   (Z>=(air_thick + Out_Epth(m,n)))&((air_thick + Out_Epth(m,n)+dermis_thick)>Z)) = 18; %dermis
        M(Z > (air_thick + Out_Epth(m,n)+dermis_thick)) = 19; %fat
    end
end
end


%% Media Properties function (see readme for details)
function mediaProperties = mediaPropertiesFunc(parameters)
  disp("Inside")
  mediaProperties = MCmatlab.mediumProperties; 
  

  j=1;
  mediaProperties(j).name  = 'air';
  mediaProperties(j).mua   = 1e-8; % Absorption coefficient [cm^-1]
  mediaProperties(j).mus   = 1e-8; % Scattering coefficient [cm^-1]
  mediaProperties(j).g     = 1;    % Henyey-Greenstein scattering anisotropy
  
  
  mel = 0.01:0.005:0.1;
  hem = 0;
  water = 0.75;
  s = 0.75;
  fat = 0;
  
  for j=2:17
      mediaProperties(j).name  = ['vox', num2str(j-1)];
      mediaProperties(j).mua = @(wavelength) calc_mua(wavelength, s,hem,water,fat,mel(j-1));
      mediaProperties(j).mus = @(wavelength) calc_mus(wavelength, 40,0,1,0.9);
      mediaProperties(j).g   = 0.9;
      mediaProperties(j).VHC = 3391*1.109e-3; % [J cm^-3 K^-1]
      mediaProperties(j).TC  = 0.37e-2; % [W cm^-1 K^-1]
  end  
  
  
  j=18;
  mediaProperties(j).name = 'dermis';
  mediaProperties(j).mua = @func_mua18;
  function mua = func_mua18(wavelength)
    B = 0.002; % Blood content
    S = 0.67; % Blood oxygen saturation
    W = 0.65; % Water content
    M = 0; % Melanin content
    F = 0; % Fat content
    mua = calc_mua(wavelength,S,B,W,F,M); % Jacques "Optical properties of biological tissues: a review" eq. 12
  end
  mediaProperties(j).mus = @func_mus18;
  function mus = func_mus18(wavelength)
    aPrime = 42.4; % musPrime at 500 nm
    fRay = 0.62; % Fraction of scattering due to Rayleigh scattering
    bMie = 1; % Scattering power for Mie scattering
    g = 0.9; % Scattering anisotropy
    mus = calc_mus(wavelength,aPrime,fRay,bMie,g); % Jacques "Optical properties of biological tissues: a review" eq. 2
  end
  mediaProperties(j).g   = 0.9;
  mediaProperties(j).VHC = 3391*1.109e-3; % [J cm^-3 K^-1]
  mediaProperties(j).TC  = 0.37e-2; % [W cm^-1 K^-1]
  
  
  j=19;
  mediaProperties(j).name  = 'fat';
  mediaProperties(j).mua = @func_mua19;
  function mua = func_mua19(wavelength)
    B = 0;     % Blood content
    S = 0;  % Blood oxygen saturation
    W = 0.05;  % Water content
    M = 0; % Melanin content
    F = 0.95;     % Fat content
    mua = calc_mua(wavelength,S,B,W,F,M); % Jacques "Optical properties of biological tissues: a review" eq. 12
  end
  mediaProperties(j).mus = @func_mus19;
  function mus = func_mus19(wavelength)
    aPrime = 40; % musPrime at 500 nm
    fRay = 0.75;    % Fraction of scattering due to Rayleigh scattering
    bMie = 1;    % Scattering power for Mie scattering
    g = 0.9;     % Scattering anisotropy
    mus = calc_mus(wavelength,aPrime,fRay,bMie,g); % Jacques "Optical properties of biological tissues: a review" eq. 2
  end
  mediaProperties(j).g   = 0.9; 
end

 
function mus = calc_mus(wavelength,aPrime,fRay,bMie,g)
  fMie = 1 - fRay;
  musPrime= aPrime*(fRay*(wavelength/500)^(-4) + fMie*(wavelength/500)^(-bMie)); % Jacques "Optical properties of biological tissues: a review" eq. 2
  mus = musPrime/(1-g);
end



function mua = calc_mua(wavelength,S,B,W,F,M)
  persistent muadeoxy muafat muamel muaoxy muawater musp nmLIB
  if isempty(muadeoxy)
    load('spectralLIB.mat','muadeoxy','muafat','muamel','muaoxy','muawater','musp','nmLIB'); 
  end
  mua_deoxy = interp1(nmLIB,muadeoxy,wavelength);
  mua_fat = interp1(nmLIB,muafat,wavelength);
  mua_mel = interp1(nmLIB,muamel,wavelength);
  mua_oxy = interp1(nmLIB,muaoxy,wavelength);
  mua_water = interp1(nmLIB,muawater,wavelength);

  % Jacques "Optical properties of biological tissues: a review" eq. 12:
  mua = B*S*mua_oxy + B*(1-S)*mua_deoxy + W*mua_water + F*mua_fat + M*mua_mel; % Equation 12 without the bilirubin and beta-Carotene terms
end
```

Result: Took around 130 seconds.

#### With GPU

Lines 63 to 69 are swapped.

```
%CPU
%model.MC.useGPU                   = false; % (Default: false) Use CUDA acceleration for NVIDIA GPUs
% model = runMonteCarlo(model);

%GPU
model.MC.useGPU                   = true;  % (Default: false) Use CUDA acceleration for NVIDIA GPUs
model.MC.GPUdevice                = 0;     % (Default: 0, the first GPU) The index of the GPU device to use for the simulation
```

Result: Took around 180 seconds.

From the above tests, it appears that the GPU accelerated version is slower. I'd appreciate your insights or suggestions on what might be causing this.