Compressive spectral imaging (CSI) is a technique used to capture high-dimensional hyperspectral images (HSIs) with a few multiplexed measurements, thereby reducing data acquisition costs and complexity. However, existing CSI methods often rely on end-to-end learning from training sets, which may struggle to generalize well to unseen scenes and phenomena. In this paper, we present a progressive self-supervised method specifically tailored for coded aperture snapshot spectral imaging (CASSI). Our proposed method enables HSI reconstruction solely from the measurements, without requiring any ground truth spectral data. To achieve this, we integrate positional encoding and spectral cluster-centroid features within a novel progressive training framework. Additionally, we employ an attention mechanism and a multi-scale architecture to enhance the robustness and accuracy of HSI reconstruction. Through extensive experiments on both synthetic and real datasets, we validate the effectiveness of our method. Our results demonstrate significantly superior performance compared to state-of-the-art self-supervised CASSI methods, while utilizing fewer parameters and consuming less memory. Furthermore, our proposed approach showcases competitive performance in terms of reconstruction quality when compared to state-of-the-art supervised methods. The related code and data are available at https://github.com/ccccddd1/ceinr .