Byte-addressable non-volatile memories (NVM) have been envisioned as a new tier in computer systems, providing memory-like performance and storage-level capacity and persistence. Because of the relatively high latency and low bandwidth of NVM (comparing with dynamic random-access memory (DRAM)), NVM is often paired with DRAM to build a heterogeneous main memory system (HMS). As a result, application data must be carefully placed to NVM and DRAM for best performance. Moreover, in a NVM-based HMS, data on NVM is not lost when the system crashes because of the non-volatility nature of NVM. However, because of the volatile caches and the processor’s reordering of instructions, data must be logged in failure-atomic transactions and explicitly flushed from caches into NVM to ensure consistency and correctness before crashes, which can cause large runtime overhead.
This dissertation focuses on building lightweight runtime systems on the NVM-based HMS to effectively manage data placement and data crash consistency. This dissertation first studies the data placement of two types of high performance computing (HPC) applications on NVM-based HMS (i.e., message passing interface (MPI) programs and task-parallel programs). The dissertation presents the Unimem and Tahoe runtimes to implement automatic and transparent data placement on NVM-based HMS for MVM-based applications and task-parallel applications, respectively.
Failure-atomic transactions are a critical mechanism for accessing and manipulating data on NVM with crash consistency. This dissertation then investigates performance problems in common transaction implementations on real NVM hardware and highlights the importance of considering NVM architecture characteristics for transaction performance. The dissertation presents ArchTM, an architecture-aware NVM transaction system.
Finally, this dissertation analyzes the cache line flushing (CLF) mechanism, which is a fundamental building block for programming NVM to ensure crash consistency. This dissertation designs and implements Ribbon to optimize CLF mechanisms through a decoupled concurrency control and proactive CLF to change cache line status. Ribbon also uses cache line coalescing as an application-specific solution for those with low dirtiness in flushed cache lines.