We present an energy-efficient high resolution image acquisition approach based on a two-tiered system comprising low-cost, low-power, non-actuated, extremely resource-constrained stereo image sensor platforms and more capable but more power-consumptive high resolution imaging platforms with actuation capability. The resource constrained platforms are used to compute 3D object location and subsequently to compute appropriate pan/tilt/zoom settings for the high resolution imaging platforms. The high resolution imaging platforms with actuation capability acquire high resolution images which can be utilized for various recognition purposes. We present our design methodology and system architecture and evaluate coverage, latency and energy tradeoffs in our system. Experimental results show that use of the two-tiered network significantly reduces energy consumption and latency of high resolution image acquisition versus a single-tiered network while preserving the coverage of the system.