Hyperspectral Imagery (HSI) has been used in many applications to non-destructively determine the material and/or chemical compositions of samples. There is growing interest in creating 3D hyperspectral reconstructions, which could provide both spatial and spectral information while also mitigating common HSI challenges such as non-Lambertian surfaces and translucent objects. However, traditional 3D reconstruction with HSI is difficult due to technological limitations of hyperspectral cameras. In recent years, Neural Radiance Fields (NeRFs) have seen widespread success in creating high quality volumetric 3D representations of scenes captured by a variety of camera models. Leveraging recent advances in NeRFs, we propose computing a hyperspectral 3D reconstruction in which every point in space and view direction is characterized by wavelength-dependent radiance and transmittance spectra. To evaluate our approach, a dataset containing nearly 2000 hyperspectral images across 8 scenes and 2 cameras was collected. We perform comparisons against traditional RGB NeRF baselines and apply ablation testing with alternative spectra representations. Finally, we demonstrate the potential of hyperspectral NeRFs for hyperspectral super-resolution and imaging sensor simulation. We show that our hyperspectral NeRF approach enables creating fast, accurate volumetric 3D hyperspectral scenes and enables several new applications and areas for future study.

Please click on the headings to expand/collapse.

Here are some examples of raw images taken by the hyperspectral cameras.

Our HS-NeRF approach (Ours-Hyper) using 128 channels clearly outperforms NeRFs using only RGB

We can observe that the right-most column (trained using all 128 hyperspectral channels) produces better color accuracy and clarity than the middle-3 columns (trained using only pseudo-RGB images).
We believe this to be because the additional channels improve the SNR, as nearby wavelengths are correlated so more data samples provides more information to the model.
The additional data outweighs the effect of increased data complexity while keeping model complexity (and parameter count) roughly constant.

Using our method on pseudo-rgb data (Ours-RGB and Ours-Cont), we don't see much performance difference to the nerfacto baseline despite the more challenging representation.

Sample __ground truth__ image

(baseline)

(discrete 3ch radiance & transmittance)

(continuous 3ch radiance & transmittance)

(HS-NeRF, continuous 128ch)

Depth maps can reveal issues with the 3D structures of NeRFs that are not immediately apparent in the rendered images.

Qualitatively, ours is noticeably better than the rgb baselines, and is among the best of the ablations.

(baseline)

(discrete 3ch radiance & transmittance)

(continuous 3ch radiance & transmittance)

(HS-NeRF, continuous 128ch)

$\begin{array}{r}{C}_{1}{\sigma}_{1}{P}_{0}\end{array}$ is notably strong for the Anacampseros scene, but finds a completely degenerate solution for both the Caladium and Pinecone scenes, likely due to low contrast between the foreground and background color (black).

$\begin{array}{r}{C}_{1}\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{1}\\ {\sigma}_{1}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

(ours)

(ours)

$\begin{array}{r}C\\ \sigma \\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{2}\\ {\sigma}_{2}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ \sigma \\ {P}_{\lambda}\end{array}$

Finally, we can also export the NeRFs to point clouds to verify that their 3D structure is reasonable. Shown below are screenshots of the point clouds for Anacampseros (left) and Caladium (right)

Qualitatively, our stated approach appears among the best, but slightly different architectures are not significantly worse.

Careful observation reveals that the networks using
$\begin{array}{r}{C}_{}\end{array}$
(including ours) tend to be the most accurate in terms of absolute intensity (see for example that the other networks tend to over-estimate the leaf brightnesses in the middle row).
Meanwhile, the network using
$\begin{array}{r}{C}_{2}\end{array}$
often appears fuzzy, which suggests that mixing together the spatial and wavelength information is not as efficient.
Finally, although the networks uing
$\begin{array}{r}{C}_{1}\end{array}$
generally appear slightly sharper than ours, they do not have a continuous wavelength representation so they do not as naturally interpolate wavelengths and, as mentioned, they tend to have slightly less accurate absolute intensities (e.g. on the middle row's leaf brightness).

This video is the easiest for comparing the performance between different methods.

In this video, t = 0s depicts the shortest wavelength (image channel 0) and we depict progressively longer wavelengths until the end (t = 5s) depicts the longest wavelength (image channel 127). This camera view was not in the training set.
# GROUND TRUTH

#
$\begin{array}{r}{C}_{1}\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{1}\\ {\sigma}_{1}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

(ours)

(ours)

$\begin{array}{r}C\\ \sigma \\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{2}\\ {\sigma}_{2}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ \sigma \\ {P}_{\lambda}\end{array}$

This video shows the camera moving to evidence that the NeRF's performance is consistently good across novel viewpoints.

This grid of videos shows different ablation variations as columns and different wavelengths as rows.

$\begin{array}{r}{C}_{1}\\ {\sigma}_{1}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

(ours)

(ours)

$\begin{array}{r}C\\ \sigma \\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{2}\\ {\sigma}_{2}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ \sigma \\ {P}_{\lambda}\end{array}$

Ch 15 (477nm)

Ch 35 (576nm)

Ch 55 (675nm)

Ch 75 (772nm)

Ch 95 (869nm)

Ch 105 (918nm)

This video shows the camera moving simultaneously while the wavelength sweeps to evidence that the NeRF's performance is consistently good across novel viewpoints over all wavelengths.

In this video, t = 0s depicts the shortest wavelength (image channel 0) and we depict progressively longer wavelengths until the end (t = 5s) depicts the longest wavelength (image channel 127). This camera view was not in the training set.
#
$\begin{array}{r}{C}_{1}\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{1}\\ {\sigma}_{1}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ {\sigma}_{0}\\ {P}_{0}\end{array}$

(ours)

(ours)

$\begin{array}{r}C\\ \sigma \\ {P}_{0}\end{array}$

$\begin{array}{r}{C}_{2}\\ {\sigma}_{2}\\ {P}_{0}\end{array}$

$\begin{array}{r}C\\ \sigma \\ {P}_{\lambda}\end{array}$

Our NeRF loses almost no accuracy in predicting the full hyperspectral image, even when trained on only 1/8th of the wavelengths.

Our NeRFs can accurately predict an unseen viewpoint consistently across all wavelengths.

Furthermore, even withholding 87.5% of wavelengths from the training set has marginal impact on accuracy.

Watch this rotating video to see that the wavelength interpolation performance is consistently good across novel viewpoints.

Inspect the individual rendered images by wavelength and amount of interpolation.

This is the same as the video above, but in image form to allow for closer inspection.

Currently only every 11th wavelength is loaded.

Key:

Wavelength in train set

Wavelength NOT in train set

Hyperspectral NeRFs allow us to simulate arbitrary camera image sensors from a single reference image.

Different camera imaging sensors have different spectral responses which produce unique image qualities. Similarly, filters are often used by photographers to achieve different artistic effects. From a hyperspectral image, we theoretically have enough information to reconstruct what the image would look like if it were taken with a different camera sensor or filter.

We demonstrate results using the following procedure:

For each scene, we present simulated photos for an iPhone photo with various digital filters applied.

A number of different__target photos (left of each pair)__ can be __simulated by HS-NeRF (right of each pair)__ with excellent agreement.

Note that the agreement often appears so good/consistent that it seems as though a filter was applied to both target and simulation together, but the filters were only applied to the target photos, then the simulations were generated according to the above procedure.

We demonstrate results using the following procedure:

- Construct a hyperspectral NeRF using HS-NeRF
- Take a single photo with the target camera
- Localize the target photo's camera pose in the NeRF scene
- Render a hyperspectral image for that camera pose
- Run a least-squares optimization to find spectral response functions which result in a simulated photo that best matches the target photo
- Render the simulated photo

For each scene, we present simulated photos for an iPhone photo with various digital filters applied.

A number of different

Note that the agreement often appears so good/consistent that it seems as though a filter was applied to both target and simulation together, but the filters were only applied to the target photos, then the simulations were generated according to the above procedure.

```
@article{Chen24arxiv_HS-NeRF,
title={Hyperspectral Neural Radiance Fields},
author={Gerry Chen and Sunil Kumar Narayanan and Thomas Gautier Ottou and Benjamin Missaoui and Harsh Muriki and Cédric Pradalier and Yongsheng Chen},
journal={arXiv preprint arXiv:2403.14839},
year={2024}
}
```