GBase 8a’s columnar engine uses DataCells (DCs) as its fundamental I/O unit. Each DC holds exactly 65,536 rows, and the last block of a DC remains uncompressed by design. This architecture has profound and largely positive effects on data loading, query performance, and compression — a deliberate trade‑off tailored for analytical workloads.
Impact on Data Loading
- Ultra‑fast bulk writes: Organising data into 65,536‑row DCs turns scattered inserts into massive sequential writes, drastically cutting disk seeks. Documented load speeds exceed 30 TB/hour.
- Append‑only tail: New data always lands in the uncompressed tail of the current DC without touching existing DCs, making insertion extremely lightweight.
- Trade‑off: Data that doesn’t fill a full DC stays uncompressed and misses out on bulk compression and optimal I/O until a full DC is accumulated.
Impact on Query Performance
- Column‑level I/O: Only the columns referenced in a query trigger I/O; untouched columns are never read.
- Smart index pruning: Every DC carries a lightweight index (min, max, null count). The optimizer checks these boundaries against the query predicates and skips the entire 65,536 rows when there is no match, dramatically reducing the data volume that needs to be decompressed and processed.
- Vectorized processing: Reading 65,536 values of the same column into a contiguous memory block fits perfectly with modern CPU SIMD instructions, boosting aggregation and filter throughput.
- Trade‑off: Point queries may read a whole column DC, causing some I/O amplification, but this is still far cheaper than reading an entire row as in row‑store engines.
Impact on Compression
- Exceptional compression ratios: 65,536 homogeneous values of the same column compress extremely well — ratios of 1:20 or better are common. Compressed data remains on disk and can be processed directly, saving both I/O and memory.
- Flexible compression policies: Compression algorithms (0, 3, 5, etc.) can be set at the database, table, or column level, letting you balance speed and storage savings.
- Trade‑off: The uncompressed tail block forgoes storage savings and read benefits temporarily in exchange for the highest possible write speed. Once a DC is sealed and compressed, the gains are recovered.
The Big Picture
GBase 8a’s DC design embodies a classic OLAP trade‑off: maximise bulk‑scan and aggregation performance by grouping data into well‑aligned, highly compressible chunks, while keeping the write path fast with a small, uncompressed tail. It sacrifices a little point‑load elegance for massive overall analytical throughput — exactly what you want in a gbase database running data‑warehouse or BI workloads.
Top comments (0)