| Presentation 140-5 at PITTCON-2010 |
|
|
|
Kernel-Based One-Class Nearest Neighbor Approach for Identification of Chlorinated SolventsShehroz S. Khan, National University of Ireland, Galway In this paper, we present a novel technique for qualitative analysis of molecular spectra, specifically Raman spectra, in a scenario where there is an inadequate distribution of counter-examples in the analysis dataset. We demonstrate the technique on chlorinated solvents. Existing algorithms provide viable solutions for identification, with good accuracy rates, when spectra from all relevant categories of compounds (e.g. chlorinated and non-chlorinated solvents) are available and statistically well-distributed. A problem arises when the one category is either completely unavailable or poorly sampled. Under such a condition, conventional methods fail to provide satisfactory results. However, this type of situation can be handled using One Class Classification methods. In the machine learning/data mining community, kernel-based classification is emerging as a popular technique. In this work, we propose a Kernel-Based One-Class Nearest Neighbor (KB-OCNN) algorithm to identify chlorinated solvents in the absence of non-chlorinated solvents. In the standard One-Class Nearest Neighbor (OCNN) approach, Euclidean distance is typically used as a measure of dissimilarity between two spectra. In KB-OCNN, we use a kernel function as a distance metric instead. We test our method using standard kernels and also using customized kernels developed for spectral data. We observe that for the chlorinated solvent data, our proposed method provides better identification rates than conventional one class nearest neighbour approach. Our approach also helps in choosing the right kernel and its parameters suitable for better qualitative analysis of the spectra. |



