Hyperspectral Neural Radiance Fields

1Georgia Institute of Technology
2Georgia Tech Europe

HS-NeRF extends NeRF to hyperspectral data to address the 3D reconstruction challenges inherent to hyperspectral data. We demonstrate applications such as hyperspectral super-resolution and camera sensor simulation, with potential future applications in non-destructively estimating 3D distributions of material compositions for agriculture, mining, medicine, and more.

Abstract

Hyperspectral Imagery (HSI) has been used in many applications to non-destructively determine the material and/or chemical compositions of samples. There is growing interest in creating 3D hyperspectral reconstructions, which could provide both spatial and spectral information while also mitigating common HSI challenges such as non-Lambertian surfaces and translucent objects. However, traditional 3D reconstruction with HSI is difficult due to technological limitations of hyperspectral cameras. In recent years, Neural Radiance Fields (NeRFs) have seen widespread success in creating high quality volumetric 3D representations of scenes captured by a variety of camera models. Leveraging recent advances in NeRFs, we propose computing a hyperspectral 3D reconstruction in which every point in space and view direction is characterized by wavelength-dependent radiance and transmittance spectra. To evaluate our approach, a dataset containing nearly 2000 hyperspectral images across 8 scenes and 2 cameras was collected. We perform comparisons against traditional RGB NeRF baselines and apply ablation testing with alternative spectra representations. Finally, we demonstrate the potential of hyperspectral NeRFs for hyperspectral super-resolution and imaging sensor simulation. We show that our hyperspectral NeRF approach enables creating fast, accurate volumetric 3D hyperspectral scenes and enables several new applications and areas for future study.

Overview Architecture Diagram

Tip: These are collapsible sections.
Please click on the headings to expand/collapse.

Raw Dataset Images

Here are some examples of raw images taken by the hyperspectral cameras.

BaySpec Camera Datasets

Anacampseros



Caladium



Pinecone





Surface Optics Camera Datasets

Rosemary



Basil



Tools



Origami

RGB Baseline Comparisons

Our HS-NeRF approach (Ours-Hyper) using 128 channels clearly outperforms NeRFs using only RGB

We can observe that the right-most column (trained using all 128 hyperspectral channels) produces better color accuracy and clarity than the middle-3 columns (trained using only pseudo-RGB images). We believe this to be because the additional channels improve the SNR, as nearby wavelengths are correlated so more data samples provides more information to the model. The additional data outweighs the effect of increased data complexity while keeping model complexity (and parameter count) roughly constant.
Using our method on pseudo-rgb data (Ours-RGB and Ours-Cont), we don't see much performance difference to the nerfacto baseline despite the more challenging representation.


Sample ground truth image
nerfacto
(baseline)
Ours-RGB
(discrete 3ch radiance & transmittance)
Ours-Cont
(continuous 3ch radiance & transmittance)
Ours-Hyper
(HS-NeRF, continuous 128ch)



Depth Maps

Depth maps can reveal issues with the 3D structures of NeRFs that are not immediately apparent in the rendered images.
Qualitatively, ours is noticeably better than the rgb baselines, and is among the best of the ablations.

RGB

Comparing the depth maps against RGB NeRF baselines, we find that our hyperspectral NeRF produces much better results.
nerfacto
(baseline)
Ours-RGB
(discrete 3ch radiance & transmittance)
Ours-Cont
(continuous 3ch radiance & transmittance)
Ours-Hyper
(HS-NeRF, continuous 128ch)


Ablations

Comparing the depth maps of ablations, ours and C1σ0P0 are consistent and high quality, with the fewest spurious discontinuities in foreground surfaces.
C1σ1P0 is notably strong for the Anacampseros scene, but finds a completely degenerate solution for both the Caladium and Pinecone scenes, likely due to low contrast between the foreground and background color (black).
C1σ0P0
C1σ1P0
Cσ0P0
(ours)
CσP0
C2σ2P0
CσPλ


Point Clouds

Finally, we can also export the NeRFs to point clouds to verify that their 3D structure is reasonable. Shown below are screenshots of the point clouds for Anacampseros (left) and Caladium (right)

Ablations

Qualitatively, our stated approach appears among the best, but slightly different architectures are not significantly worse.

Careful observation reveals that the networks using C (including ours) tend to be the most accurate in terms of absolute intensity (see for example that the other networks tend to over-estimate the leaf brightnesses in the middle row). Meanwhile, the network using C2 often appears fuzzy, which suggests that mixing together the spatial and wavelength information is not as efficient. Finally, although the networks uing C1 generally appear slightly sharper than ours, they do not have a continuous wavelength representation so they do not as naturally interpolate wavelengths and, as mentioned, they tend to have slightly less accurate absolute intensities (e.g. on the middle row's leaf brightness).

Fixed camera viewpoint not in training set (wavelength sweep)

This video is the easiest for comparing the performance between different methods.

In this video, t = 0s depicts the shortest wavelength (image channel 0) and we depict progressively longer wavelengths until the end (t = 5s) depicts the longest wavelength (image channel 127). This camera view was not in the training set.

GROUND TRUTH

C1σ0P0

C1σ1P0
Cσ0P0
(ours)
CσP0
C2σ2P0
CσPλ

Moving camera viewpoints (6 wavelengths)

This video shows the camera moving to evidence that the NeRF's performance is consistently good across novel viewpoints.

Select which scene to view:



This grid of videos shows different ablation variations as columns and different wavelengths as rows.

C1σ0P0

C1σ1P0
Cσ0P0
(ours)
CσP0
C2σ2P0
CσPλ

Ch 15 (477nm)

Ch 35 (576nm)

Ch 55 (675nm)

Ch 75 (772nm)

Ch 95 (869nm)

Ch 105 (918nm)

Moving camera viewpoints (wavelength sweep)

This video shows the camera moving simultaneously while the wavelength sweeps to evidence that the NeRF's performance is consistently good across novel viewpoints over all wavelengths.

In this video, t = 0s depicts the shortest wavelength (image channel 0) and we depict progressively longer wavelengths until the end (t = 5s) depicts the longest wavelength (image channel 127). This camera view was not in the training set.

C1σ0P0

C1σ1P0
Cσ0P0
(ours)
CσP0
C2σ2P0
CσPλ

Applications

Super-resolution (Wavelength Interpolation)

Our NeRF loses almost no accuracy in predicting the full hyperspectral image, even when trained on only 1/8th of the wavelengths.


Image Grid: Sweep hyperspectral wavelengths   [+]

Inspect the individual rendered images by wavelength and amount of interpolation.
This is the same as the video above, but in image form to allow for closer inspection.

Currently only every 11th wavelength is loaded.
Select which scene to view:


Key:
Wavelength in train set
Wavelength NOT in train set

Ground Truth

Trained on every wavelength

Trained on every 2nd wavelength

Trained on every 4th wavelength

Trained on every 8th wavelength

Camera Image Sensor Simulation

Hyperspectral NeRFs allow us to simulate arbitrary camera image sensors from a single reference image.

Different camera imaging sensors have different spectral responses which produce unique image qualities. Similarly, filters are often used by photographers to achieve different artistic effects. From a hyperspectral image, we theoretically have enough information to reconstruct what the image would look like if it were taken with a different camera sensor or filter.

We demonstrate results using the following procedure:
  1. Construct a hyperspectral NeRF using HS-NeRF
  2. Take a single photo with the target camera
  3. Localize the target photo's camera pose in the NeRF scene
  4. Render a hyperspectral image for that camera pose
  5. Run a least-squares optimization to find spectral response functions which result in a simulated photo that best matches the target photo
  6. Render the simulated photo

For each scene, we present simulated photos for an iPhone photo with various digital filters applied.
A number of different target photos (left of each pair) can be simulated by HS-NeRF (right of each pair) with excellent agreement.
Note that the agreement often appears so good/consistent that it seems as though a filter was applied to both target and simulation together, but the filters were only applied to the target photos, then the simulations were generated according to the above procedure.


BibTeX

@article{Chen24arxiv_HS-NeRF,
  title={Hyperspectral Neural Radiance Fields},
  author={Gerry Chen and Sunil Kumar Narayanan and Thomas Gautier Ottou and Benjamin Missaoui and Harsh Muriki and Cédric Pradalier and Yongsheng Chen},
  journal={arXiv preprint arXiv:2403.14839},
  year={2024}
}