High dynamic range (HDR) image acquisition from a single image capture, also known as snapshot HDR imaging, is challenging because the bit depths of camera sensors are far from sufficient to cover the full dynamic range of the scene. Existing HDR techniques focus either on algorithmic reconstruction or hardware modification to extend the dynamic range. In this paper we propose a joint design for snapshot HDR imaging by devising a spatially varying modulation mask in the hardware and building a deep learning algorithm to reconstruct the HDR image. We leverage transfer learning to overcome the lack of sufficiently large HDR datasets available. We show how transferring from a different large‐scale task (image classification on ImageNet) leads to considerable improvements in HDR reconstruction. We achieve a reconfigurable HDR camera design that does not require custom sensors, and instead can be reconfigured between HDR and conventional mode with very simple calibration steps. We demonstrate that the proposed hardware–software so lution offers a flexible yet robust way to modulate per‐pixel exposures, and the network requires little knowledge of the hardware to faithfully reconstruct the HDR image. Comparison results show that our method outperforms the state of the art in terms of visual perception quality.