Being embedded in the physical world, sensor networks present a wide range of bugs and misbehavior qualitatively different from those in most distributed systems. Unfortunately, due to resource constraints, programmers must investigate these bugs with only limited visibility into the application. This paper presents the design and evaluation of Sympathy, a tool for detecting and debugging failures in sensor networks. Sympathy has selected metrics that enable efficient failure detection, and includes an algorithm that root-causes failures and localizes their sources in order to reduce overall failure notifications and point the user to a small number of probable causes. We describe Sympathy and evaluate its performance through fault injection and by debugging an active application, ESS, in simulation and deployment. We show that for a broad class of data gathering applications, it is possible to detect and diagnose failures by collecting and analyzing a minimal set of metrics at a centralized sink. We have found that there is a tradeoff between notification latency and detection accuracy; that additional metrics traffic does not always improve notification latency; and that Sympathy’s process of failure localization reduces primary failure notifications by at least 50% in most cases.
contributors
- D Estrin
Author
- Eddie Kohler
Author
- Rahul Kapur
Author
- Lewis Girod
Author
- Nithya Ramanathan
Author
- Kevin Chang
Author