Issue with two steps simulation

Versions

Please provide the used software versions.

FLUKA: 4-5.1
Flair: 3.4-5
Operating system: MacOS 26.1 (Tahoe)
Compiler: GNU Fortran (MacPorts gcc14 14.3.0_0+stdlib_flag) 14.3.0

Description

Dear FLUKA experts,

I am performing a two steps simulation of the muon dose rate at the end of HiRadMat’s tunnel.
On the first step, I register all the muons crossing a boundary; in particular I am interested in their energy, position, direction and statistical weight. I already performed this part of the simulation and merged all the different text files, coming from different jobs/cycles, in one big input .txt file that will feed the second step of the simulation.

After some optimisation I managed to “shrink“ down the .txt file with more than 120M entries to ~15 GB. And then the problems started. I decided to compile and run the basic source_newgen.f file provided in the FLUKA installation since it already had exactly the function I need (read_phase_space_file()). The simulation run but I had two major problems:

  1. I cannot run different jobs in parallel. This is due to the fact that I run out of memory pretty fast and my computer straight up reboot itself. I have the suspect that this is caused by the function that reads from an external file; probably it reads all the file and then copies each entry into the particle stack. If 8 jobs in parallel tries to do this it means that I fill 15*8=120 (!) GB of RAM+swap file, which my computer (16 GB of RAM + ~100 GB of free disk space) cannot take. My suspect is confirmed by the fact that before starting a small, one cycle, test run (with 50k primaries) nothing happens for a very long time (~10 mins) and then the run starts; this should be the period where it loads all the data into memory.
  2. Even when i manage to run a single cycle, it takes too long! The average time per primary is 120 ms while if I run the “normal” simulation (i.e. beam on target, dump…) it only takes 10 ms/primary.
    I have pretty stringent thresholds on all particles (apart from the muons obviously) and I double-checked if somehow I screwed up them in the second step of the simulation but they are the same.
    This issue defeats the purpose of splitting up the simulation.

I thought about running the simulation on the CERN cluster (clueet) but I didn’t want to potentially brick the server and also this doesn’t solve the second problem.

I will upload all the relative files down below. The FLUKA .inp file is only one since I manage the two steps by using a preprocessor flag (MGDRAW_ON_1 for the first step and SOURCE_ON for the second one).
I have also attached a small snippet (1000 lines) of the .txt input file.

Many thanks in advance for your help.

Cheers,
Matteo

Input files

HiRadMat+muons_MGDRAW_2ndStep.inp (95.4 KB)

source.f (5.7 KB)

muons_snippet.txt (122.1 KB)

mgdraw.f (12.7 KB)

Dear @mcapitan,

Glad to see you on the FLUKA forum after meeting you briefly in the R1 cafétéria. Now, about your issue:

i) As you correctly found out, the routine source_newgen.fdoes read the entire file for each run, and since it is a very large file, this is why run into storage issue. Your second points stems from the fact that, by default (sequential_logical_flagas .false.), the read_phase_space_file function selects a random event from the very large file, hence why you also have a large computing time per primary, defeating your initial purpose.

ii) As a possible solution, though not the cleanest (will still investigate), I can suggest three avenues:
a) you could split your file into multiple subsets and proceed as you do up to now, but with sequential_logical_flagas .true., thereby simulating only and all your scored muons.

b) you can analyze your file to see if you actually need all it, i.e. try to reduce the size of your file without compromising on the information within - i.e. what would be so special about a muon that occurs only 1/120M of entries for your dose studies? If you plot distributions (energy, angle, etc.) of your file to see where to cut.

c) distributions - probably the quickest but the least clean solution - instead of loading each entry, you could extract distributions/histograms for your variables and then feed this into your source_newgen.f. Note however, that you definitely have correlations (e.g. high energy and small scattering angle) that you would want to preserve - but this approach would remove any correlation (i.e. each variable would be sampled independently of the other).

These are my current suggestions - do let me know if you found any of them useful.

Best,
Daniel

Dear @dprelipc

I am happy to hear from you after our brief encounter and many thanks for your reply!

I just realised that I uploaded an older version of the source routine that I tried to write by myself before knowing the existence of the source_newgen.f file.

By setting sequential_logical_flagas true, the loading of the file gets faster and marginally also its reading, but I am still at ~100 ms/primary. As of now the simulation will end in May 2026…

I can also try to remove some entry from the muons input file; an energy cut will not be very effective since the spectrum is centred around ~100 GeV and the idea of dividing the simulation in two steps was done in order to easily change the thickness of the shielding and seeing how the muon dose rate will evolve. So if I start to remove the shielding, “low” energy muons starts to register at the end of the tunnel.
An angular cut will not have that problem but my muon beam is still very focussed.

One approach I can think of is to run Ninput file in parallel but with the same executable.
I can split the big file inNsmaller subfiles, compileNdifferent executable and run 1 cycle for each one.
Then, I could simply merge all outputs together.
It’s a bit time consuming, but it’s better than waiting for 6 months :slight_smile: .

Also, if I run the small 1000 lines snippet I attached previously, I manage to get ~70-80 ms/primary, which is a 20% improvement on CPU time. So a smaller file should run a little bit faster.

a) Do you think that my strategy is viable?

b) I tried to profile my quick run and I discovered that FLUKA spend almost 20% of the time simulating e± which is weird since this could not come from muon decays (a 100 GeV muon travels ~600 km on average before decaying). Moreover, I have a 200 GeV threshold on anything that it isn’t a muon and EMF is also off, so anything electro-magnetical should be dumped on the spot. Do you have any suggestions how to reduce this contribution and gain another 20% in CPU time?

Let me know what do you think about it.

Cheers,

Matteo

Dear @mcapitan,

Multiple points:

almost 20% of the time simulating e±

→ looking at your geometry, you have surrounded your HiRadMat geometry with Earth and then with Air for 100 km (hence, giving your muons space to decay). I suggest that you surround your geometry with BLCKHOLE closer to your tunnel geometry (only a few tens of m of Earth should suffice).

I have a 200 GeV threshold on anything that it isn’t a muon and EMF is also off

→ I hope this is just for the 1st step. For the second step, to estimate the dose, you should activate EMF and put lower thresholds.

So a smaller file should run a little bit faster.

Yes, indeed. Your issue stems from the very large file that you have generated, which has to be loaded for every run. I repeat my point from the previous message: do you need all 120M entries in your file? i.e. what is the difference between the “Nsmaller subfiles" that you would use? Would they have the same statistical value?

Best,
Daniel

Dear @dprelipc ,

  1. Since yesterday, I updated the geometry and enclosed the tunnels within a BLCKHOLE region, except for a small cone along the muon path so that multiple scattering events are not discarded. These events turned out to be important in previous studies. With this change I observed a small improvement in performance (about 10 ms per primary), because some tracks now stop inside the BLCKHOLE. However, electrons and positrons still account for roughly 20% of the total run time.
    To be completely sure this is not related to muon decays, I applied a very strong bias to the muon decay length using the LAM-BIAS card, setting WHAT(1) to 10000 km. Even with this setup, tracking e± still takes around 20% of the total CPU time.
  2. Perhaps I did not explain clearly the purpose of our simulation. We are interested in estimating the dose on the surface induced by the HiRadMat facility. The tunnel hosting HiRadMat has an upward slope, and between its end and the surface there are several hundred meters of iron and soil. For this reason, only very high energy muons (>200 GeV) can realistically reach the surface. Based on this and following the recommendation of RP, we decided to neglect the contribution to the dose from all other particle species in order to improve the simulation speed.
  3. I will test some cuts on energy and angle, especially for the configurations with more shielding, where the low energy component becomes less relevant.
    My plan is to split the large input file into several smaller files and run multiple jobs in parallel, each reading only its own chunk. Afterwards I will merge the outputs. This should allow effective parallelisation without requiring each job to load the full 15 GB file.

Cheers,

Matteo