The source/
folder of this repo is a mirror of the team's HPC MiniWeather Project. Below is what every directory contains and how it
maps to the visualization you see on the home page.
All four implementations compute the same kernel — a 3D 7-point Laplacian-style stencil over a (nx × ny × nz) grid. They differ in how the work is scheduled across hardware:
src/stencil_cpu_serial.cpp — single thread, plain triple loop. Reference correctness; everything else is benchmarked against this.src/stencil_cpu_blocked.cpp — same math, but the loops are tiled into blocks sized to fit in L1/L2. Same FLOPs, far fewer cache misses.src/stencil_cpu_parallel.cpp + src/halo.cpp — OpenMP threads over the inner loops; when MPI is enabled, multiple ranks each own a sub-block of the grid and exchange halo (ghost) cells every step.src/stencil_gpu.cu + src/stencil_gpu.cpp — same kernel as a CUDA grid. Each cell is one thread; tiled into shared memory for coalesced loads.include/ — public headers (stencil.hpp, halo.hpp, config.hpp, …).slurm/, submit.sbatch, submit_*.slurm — SLURM batch scripts for the Magic Castle cluster runs (serial, parallel, scaling sweep, GPU).scripts/ — Python helpers: visualize_weather.py renders the PNG/GIF artefacts you see in the "Cluster outputs" section; plot_results.py + collect_results.py build the speedup/scaling tables.results/ — the artefacts themselves (also mirrored into web/assets/ so this Pages site can display them).docs/ — the team's writeup: ARCHITECTURE.md, RESULTS.md, SYSTEM.md, the EuroHPC proposal, the pitch slides, and a reproduction guide.tests/ — sanity check at 16³ and a Python verifier that compares all four backends against each other to flag correctness regressions.profiling/ — instrumentation notes for the cluster runs.CMakeLists.txt, run.sh — build / configure / run from a single entry point. ./run.sh build compiles with whatever toolchain modules the cluster has loaded; ./run.sh test launches a smoke run.The interactive page runs a 2D analogue of the same operator. Same stencil shape (centre + face-neighbours), same diffusion math, same buoyancy term — just one dimension fewer so the result fits on a canvas. Resolution, diffusion and buoyancy are sliders so you can feel what the cluster code is computing per cell, per step, before it's distributed across thousands of threads.