Template:Database Compression Between RAM and CPU Cache
From What The Wiki?!
Database Compression Between RAM and CPU Cache
Hacking the Memory Hierarchy
Data-intensive query processing tasks like data mining, scientific data analysis, and decision support can leave a database system severely I/O bound, even when common RAID configurations are used. Traditionally, this problem has been tackled by adding more and more disks, connected through expensive interconnect networks. This brute-force approach results in systems of which the price is dominated by the cost of their disk subsystems and a lot of disk space is wasted as disks are only added to gain bandwidth.A more subtle and cost-effective solution can be found in data compression, which has the potential to alleviate the I/O bottleneck. However, traditional algorithms like Huffman coding, Arithmetic coding and Lempel-Ziv style dictionary methods are not suited for this goal due to high processing overheads.In order to be of practical value, even on common RAID configurations,decompression algorithms should be capable of producing roughly one byteper CPU cycle on modern hardware, or around three gigabytes per second.To achieve this, a case is made for novel,light-weight compression algorithms, which exploit both the structure of theunderlying database and the characteristics of modern CPUs.In general, performance is prefered over compression ratio, and algorithmsshould strive to extractmaximum instructions-per-clock-cycle (IPC) from modern CPUs.Furthermore, to rule out main memory bottlenecks, candidate algorithms shouldallow for incremental, into-cache decompression.This presentation introduces three novel compression schemes (PFOR,PFOR-DELTA, and PDICT) that are designed towards these goals.Experimental results show that these methodscan significantly alleviate I/O-, and sometimes evenmain memory bottlenecks, thereby effectively increasing the performance of todays hierarchical memorybased systems.Target audience: high performance computing, large databases, hardware architecture.Skills that are a pre: modern CPU internals, hierarchical memory design, C programming
