Article Contents
Article Contents

# Foveated compressive imaging for low power vehicle fingerprinting and tracking in aerial imagery

This work is supported by the Defense Advanced Research Projects Agency and SPAWAR Systems Center Pacific under Contract No. N66001-11-C-4001
• We describe a foveated compressive sensing approach for image analysis applications that utilizes knowledge of the task to be performed to reduce the number of required sensor measurements and sensor size, weight, and power (SWAP) compared to conventional Nyquist sampling and compressive sensing-based approaches. Our Compressive Optical Foveated Architecture (COFA) adapts the dictionary and compressive measurements to structure and sparsity in the signal, task, and scene by reducing measurement and dictionary mutual coherence and increasing sparsity using principles of actionable information and foveated compressive sensing. Actionable information is used to extract task-relevant regions of interest (ROIs) from a low-resolution scene analysis by eliminating the effects of nuisances for occlusion and anomalous motion detection. From the extracted ROIs, preferential measurements are taken using foveation as part of the compressive sensing adaptation process. The task-specific measurement matrix is optimized by using a novel saliency-weighted coherence minimization with respect to the learned signal dictionary. This incorporates the relative usage of the atoms in the dictionary. We utilize a patch-based method to learn the signal priors. A tree-structured dictionary of image patches using K-SVD is learned which can sparsely represent any given image patch with the tree structure. We have implemented COFA in an end-to-end simulation of a vehicle fingerprinting task for aerial surveillance using foveated compressive measurements adapted to hierarchical ROIs consisting of background, roads, and vehicles. Our results show 113× reduction in measurements over conventional sensing and 28× reduction over compressive sensing using random measurements.

Mathematics Subject Classification: Primary: 58F15, 58F17; Secondary: 53C35.

 Citation:

• Figure 1.  (Top) Flowchart of overall approach for adaptation of foveated measurements and signal representations. (Bottom) Details of online scene-adaptive reconstruction for the vehicle fingerprinting task. Left: Dynamic input scene. Middle: Reconstructed low-resolution background scene that is used to detect ROIs using anomalous motion detection. Right: Reconstructed high resolution ROIs using adapted dictionary overlaid with low-resolution background. This representation reduces the total number of measurements $M = M_{\rm Backg}+ M_{\rm ROI}$ needed for the task.

Figure 2.  The Manx Shearwater seabird [19] has multiple hierarchical levels of fovea (Right) for acquisition and tracking.

Figure 3.  Our foveated compressive sensing optical architecture generates a composite image frame consisting of low-resolution background contextual information and the high-resolution task-relevant regions of interest (ROIs). A fixed budget of $M$ measurements can be adaptively divided between the background and ROIs, allowing background resolution to be traded for higher resolution ROIs. By adapting both the measurement matrix and the dictionary to the ROIs, the number of measurements needed for a given level of task performance can be greatly reduced

Figure 4.  Simulation results comparing conventional imaging, conventional CS imaging, and foveated CS imaging. The conventional CS imaging reconstructs the image from random DCT measurements via $\ell_1$-minimization. Imaging results were all obtained using 3025 measurements of the scene but foveated compressive sensing achieved much higher effective resolution in the region of interest (ROI) than conventional imaging while also reconstructing the context around the ROI

Figure  .  Algorithm 1: Iterative reweighted subspace minimization algorithm that we use to find salient regions in images.

Figure 5.  Detection of moving vehicle ROIs in two frames using Actionable Saliency despite camera motions. [10]

Figure 6.  Detection of moving vehicle ROIs from images reconstructed from different numbers of compressive sensing measurements.

Figure 7.  Top: Example images of cars from OIRDS [18] aerial views used for training our dictionary. Bottom left: Learned tree-structured dictionary with example atoms from each level of the tree. Bottom right: Distribution of coefficients over the training set

Figure 8.  Patch-based compressive optical foveated architecture (COFA) optical system

Figure 9.  Hierarchical layered regions of interest (ROIs) for the vehicle tracking and fingerprinting task. Layer 1 is the background, Layer 2 is the road, and Layer 3 contains the moving vehicles on the road

Figure 10.  Contours of the minimax noise sensitivity $M^*(\delta,\rho)$ in the $(\delta,\rho)$ plane. $\delta=M/N$ is the subsampling rate and $\rho=K/M$ is the sparsity. The dotted black curve graphs the phase boundary $M^*(\delta,\rho{\rm MSE}(\delta))$. Above this curve, $M^*(\delta,\rho)=\infty$. The colored lines represent level sets of $M*(\delta,\rho)$. (From [8])

Figure 11.  Reduction in measurements needed over conventional compressive sensing as function of ROI resolution and size for 2 and 3 ROI layers

Figure 12.  Reconstruction SNR for CSUAV scenes with 1$\times$, 2$\times$, and 4$\times$ downsampling

Figure 13.  Example reconstructed CSUAV frames using the wmc + tree method. Sufficient resolution is maintained with $25\%$ of ($160\times 120$) measurements or $1/64$ of the number of Nyquist samples to detect ROIs corresponding to moving vehicles in the scene

Figure 14.  Left, Middle: Reconstruction SNR for vehicles displayed graphically and numerically. Right: Example reconstructed vehicles vs. number of measurements and measurement/dictionary types

Figure 15.  Fingerprinting task performance results for reconstructed vehicle windows from CSUAV motion imagery. Baseline performance on original input windows is 76.17$\%$

Figure 16.  COFA simulation framework for vehicle fingerprinting. For simplicity, Layer 2 (road ROIs) is not shown

Figure 17.  Left: 3-layer ROI hierarchy for COFA pipeline. Right: Multi-resolution composite reconstruction of a CSUAV video frame. Note the variable resolution in the patches corresponding to different ROI types. The Car ROIs have the highest resolution

Figure 18.  Reconstruction SNR and noise sensitivity for CSUAV Layer 1 (Background). The results are averaged over all $16\times 16$ patches in 50 frames of CSUAV-11 video. Non-random measurements and structured dictionary resulted in 4$\times$ fewer measurements for the same SNR compared to random measurements. Left: Reconstruction SNR (dB) vs. measurements percentage $(M_{\rm ROI}/N_{\rm ROI})$. Right: Reconstruction SNR (dB) vs. added measurement noise level ($\%$) with fixed $6.25\%$ of measurements of Layer 1

Figure 19.  Reconstruction SNR and noise sensitivity for CSUAV Layer 2 (Road). The results are averaged over all $16\times 16$ patches in 50 frames of CSUAV-11 video. Non-random measurements and structured dictionary resulted in $>8\times$ fewer measurements for the same SNR compared to random measurements. Left: Reconstruction SNR (dB) vs. measurements percentage $(M_{\rm ROI}/N_{\rm ROI})$. Right: Reconstruction SNR (dB) vs. added measurement noise level ($\%$) with fixed $6.25\%$ of measurements of Layer 2

Figure 20.  Reconstruction SNR and noise sensitivity for CSUAV Layer 3 (Cars). The results are averaged over all $16\times 16$ patches in 50 frames of CSUAV-11 video. Non-random measurements and structured dictionary resulted in $4\times$ fewer measurements for the same SNR compared to random measurements. Left: Reconstruction SNR (dB) vs. measurements percentage $(M_{\rm ROI}/N_{\rm ROI})$. Right: Reconstruction SNR (dB) vs. added measurement noise level ($\%$) with fixed $25\%$ measurements of Layer 3

Figure 21.  Vehicle fingerprinting performance and noise sensitivity results for 3-layer pipeline. Left: Correct identification vs. measurements percentage $(M_{\rm ROI}/N_{\rm ROI})$. Right: Correct identification vs. added measurement noise level ($\%$) with fixed $25\%$ measurements of Layer 3

Table 1.  Tested methods and reconstruction algorithms

 Method Measurement Dictionary rand + flat random Gaussian orthonormal measurements (flat) ksvd dictionary rand + tree random Gaussian orthonormal measurements hierarchical (tree) dictionary mc + flat minimum coherence measurements (flat) ksvd dictionary mc + tree minimum coherence measurements hierarchical (tree) dictionary wmc + tree weighted minimum coherence measurements hierarchical (tree) dictionary
•  M. Aharon , M. Elad  and  A. Bruckstein , K-SVD: An algorithm for designing overcomplete dictionaries for sparse representation, IEEE Transactions on Signal Processing, 54 (2006) , 4311-4322. A. Ayvaci, M. Raptis and S. Soatto, Occlusion Detection and Motion Estimation with Convex Optimization Neural Information Processing Systems, 2010. A. Bruckstein , D. Donoho  and  M. Elad , From sparse solutions of systems of equations to sparse modeling of signals and images, SIAM Review, 51 (2009) , 34-81.  doi: 10.1137/060657704. E. Candés  and  T. Tao , Decoding by linear programming, IEEE Trans. Inform. Theory, 51 (2005) , 4203-4215.  doi: 10.1109/TIT.2005.858979. I. Ciocoiu , Foveated compressed sensing, Proc. of Europe. Conf. on Circuit Theory and Design, (2011) , 29-32.  doi: 10.1109/ECCTD.2011.6043336. Columbus surrogate unmanned aerial vehicle (CSUAV) dataset, United States Air Force Research Lab (AFRL). J. P. Curzan , C. R. Baxter  and  M. A. Massie , Variable acuity imager with dynamically steerable, programmable superpixels, Infrared Technology and Applications, Proc. SPIE, 4820 (2003) , p318.  doi: 10.1117/12.451183. D. Donoho , A. Maleki  and  A. Montanari , Noise sensitivity phase transition in compressed sensing, IEEE Transactions on Information Theory, 57 (2011) , 6920-6941.  doi: 10.1109/TIT.2011.2165823. J. Duarte-Carvajalino  and  G. Sapiro , Learning to sense sparse signals: Simultaneous sensing matrix and sparsifying dictionary optimization, IEEE Transactions on Image Processing, 18 (2009) , 1395-1408.  doi: 10.1109/TIP.2009.2022459. G. Georgiadis, A. Ayvaci and S. Soatto, Actionable Saliency Detection Proc. of CVPR, 2012. Z. Harmany , A. Oh , R. Marcia  and  R. Willet , Motion-adaptive compressive coded apertures, Proc. of SPIE, 8165 (2011) , 1-5.  doi: 10.1117/12.892726. D. Heeger  and  A. Jepson , Subspace methods for recovering rigid motion, Intl. J. of Comp. Vis., 7 (1992) , 95-117. InView Shortwave Infrared (SWIR) Cameras, http://inviewcorp.com/products/shortwave-infrared-swir-cameras/. R. Jenatton, J. Mairal, G. Obozinski and F. Bach, Proximal Methods for Sparse Hierarchical Dictionary Learning J. Machine Learning Research, 2011. R. Larcom  and  T. Coffman , Foveated image formation through compressive sensing, Proc. of Southwest Symp. Image Anal. Interp., (2010) , 145-148.  doi: 10.1109/SSIAI.2010.5483896. T. Mundhenk , K. Ni , K. Kim  and  Y. Owechko , Detection of unknown targets from aerial camera and extraction of simple object fingerprints for the purpose of target reacquisition, Proc. of SPIE, 8301 (2012) , 1-14.  doi: 10.1117/12.906491. S. Soatto, Steps Towards a Theory of Visual Information Textbook Draft. A. Soni  and  J. Haupt , Efficient adaptive compressive sensing using sparse hierarchical learned dictionaries, Proc. of ASILOMAR, (2011) , 1250-1254.  doi: 10.1109/ACSSC.2011.6190216. P. D. Sturkie, Sturkie's Avian Physiology 5th Edition, Academic Press, San Diego. N. Sundaram, T. Brox and K. Keutzer, Dense point trajectories by GPU-accelerated large displacement optical flow, Chapter: Computer Vision C ECCV 2010, Volume 6311 of the series Lecture Notes in Computer Science, (2010), 438-451. doi: 10.1007/978-3-642-15549-9_32. F. Tanner, B. Colder, C. Pullen, D. Heagy, M. Eppolito, V. Carlan, C. Oertel and P. Sallee, Overhead Imagery Research Data Set: An Annotated Data Library and Tools to aid in the Development of Computer Vision Algorithms Proc. of IEEE Applied Imagery Pattern Rec. Workshop, 2009. doi: 10.1109/AIPR.2009.5466304. L. Zelnik-Manor , K. Rosenblum  and  Y. Eldar , Sensing matrix optimization for block-sparse decoding, IEEE Transactions on Signal Processing, 59 (2011) , 4300-4312.  doi: 10.1109/TSP.2011.2159211.

Figures(22)

Tables(1)