We studied the navigation of a self-propelled inertial particle in two-dimensional Rayleigh--Bénard convection at Prandtl number Pr = 0.71 and cell aspect ratio \Gamma = 4, over Rayleigh numbers Ra in the range 10^{7} to 10^{11}. A reinforcement-learning controller selects the propulsive acceleration bounded by an upper limit to complete a fixed horizontal displacement. Navigation performance is assessed by the success rate, the completion time and the propulsion energy. The dependence of these measures on flow intensity is interpreted using instantaneous flow fields and proper orthogonal decomposition (POD). We found that the success rate initially remains zero and then increases with the actuation bound A_{\max}. At moderate Ra, the rise is abrupt; while at higher Ra, it is gradual and the onset shifts to larger A_{\max}. The completion time decreases with A_{\max} and increases with Ra, with diminishing benefit once A_{\max} is large. For fixed A_{\max}, the propulsion energy required to traverse the same distance decreases with Ra. The POD analysis reveals that these performance differences are manifestations of the change in the carrier-flow organisation. At moderate Ra, a dominant large-scale circulation (LSC) partitions the domain into basins separated by transport barriers, so barrier crossing requires a finite surplus of propulsive acceleration. At higher Ra, the LSC breaks into smaller vortices, energy spreads over many modes, barriers fragment and drift, and transient plume-assisted pathways aid motion. Compared with a constant-heading naive baseline, the learned policy consumes less propulsion energy, and its energy benefit grows with Ra. These results connect convective-flow organisation to navigability under bounded actuation and provide guidelines for selecting actuation limits in buoyancy-driven flows.

PDF version