unmixing. In this context, Raman measurements xi ∈ Rb aretreated as mixtures xi = F (M, αi) of a set of n unknown end-member components M ∈ Rn×b based on their relative abun-dances αi ∈ Rn in a given measurement xi. Blind unmixingaims to decompose a set of Raman measurements X ∈ Rm×b(i.e. a given scan) into endmember components M and theirrelative abundances A ∈ Rm×n.To achieve this, we train an autoencoder model A consist-ing of an encoder module E and a decoder module D. Theencoder was responsible for mapping input spectra x into la-tent space representations z = E (x), and the decoder was re-sponsible for mapping these latent representations into recon-structions of the original input bx = D(z) = D(E (x)) = A (x).The model was trained in an unsupervised manner by min-imising the reconstruction error between the input x and theoutput bx. During this process, the model was guided to learnthe endmember components M and their relative abundances Athrough the introduction of related physical constraints. Below,we provide additional details about the developed architectureand training procedure. For more information about hyperspec-tral unmixing, the reader is pointed to previous works by Ke-shava and Mustard55 and Li et al.56. For more informationabout autoencoders, the reader is pointed to the work of Good-fellow et al.81.The encoder E comprised two separate blocks applied se-quentially. The first part was a multi-branch convolutionalblock comprising four parallel convolutional layers with ker-nel sizes of 5, 10, 15 and 20, designed to capture patterns atmultiple spectral scales. Each convolutional layer contained32 filters with ReLU activation, He initialisation82 and ‘same’padding. Batch normalisation83 and dropout with a rate of0.284 were applied to each convolutional layer to improve train-ing stability and generalisation. The outputs of the four convo-lutional layers were merged channel-wise through a fully con-nected layer to yield an output of dimension matching that ofthe input spectrum. The rationale behind this was to transformintensity values into representations that capture local spectralfeatures (e.g. peak shape, width, local neighbourhood) and thuspromote better generalisability. The second part of the encoderwas a fully connected dimensionality reduction block, appliedto learn patterns between the learnt spectral features. This blockcomprised a series of fully connected layers of sizes 256, 128,64 and 32 with He initialisation and ReLU activation. Batchnormalisation and dropout (rate of 0.5) were also applied ateach fully connected layer. The block was followed by a finalfully connected layer (Xavier uniform initialisation85) that re-duced the final 32 features to a latent space of size n. The num-ber n was treated as a hyperparameter that encodes the numberof endmembers to extract, with latent representations treated asabundance fractions. To improve interpretation, non-negativitywas enforced in the latent space using a ‘softly-rectified’ hyper-bolic tangent function f (x) = 1γ log(1 + eγ∗tanh(x)), with γ = 10,as we previously reported53.
It seems totally reasonable to use convolution across spectral space because any one place in the spectrum should contain information about proximal spaces. Similarly though, it seems reasonable to expect that there should be shared information across proximal pixels across the imaging space. Likely, pixels that are on the boundary between the nucleus and outside of the nucleus seem likely to have different spectra than those within the nucleus. As a result, combining convolutions across both physical and spectral space (a truly hyper-spectral convolutional autoencoder) seems both reasonable and like it would leverage the spatial information to further refine the endmember definitions. Did you try something like this?