MySQL NULL vs. 0: Which Saves More Space? A Deep Dive with a Billion Rows

Published: 2025-11-11
Author: DP
Views: 31
Category: MySQL
Content
## The Scenario When designing a MySQL table, we often face a common question: for an optional integer field like `user_id`, should we set it to `NULL` or use a default value like `0` when it's absent? ```sql -- Scheme A: Allowing NULL `user_id` INT UNSIGNED NULL COMMENT 'User ID (optional, for UV stats)'; -- Scheme B: Not allowing NULL, using default 0 -- Maintained by DP@lib00 `user_id` INT UNSIGNED NOT NULL DEFAULT 0 COMMENT 'User ID (optional, for UV stats)'; ``` These two choices affect not only query logic but also have significant implications for storage space and performance. Today, the technical team at `wiki.lib00.com` will break down this problem completely with a one-billion-row example. --- ## The Real Storage Cost of NULL A common misconception is that `NULL` occupies no space. The truth is, while a `NULL` value itself doesn't take up the space of its data type (e.g., 4 bytes for `INT`), it does have a small overhead. MySQL uses a **NULL bitmap** in the header of each row to track which columns are `NULL`. For every nullable column in the table, this bitmap uses **1 bit**. - **How it's calculated**: If a table has N nullable columns, the total space for the NULL bitmap is `CEILING(N/8)` bytes. For example, 1 to 8 nullable columns require 1 byte, 9 to 16 require 2 bytes, and so on. So, for our `user_id` field: - When `user_id` is `123`: Storage is 4 bytes (for the INT data) + 1 bit (in the bitmap, marked as not-NULL). - When `user_id` is `NULL`: Storage is 0 bytes (for the INT data) + 1 bit (in the bitmap, marked as NULL). --- ## The Showdown: A Billion Rows Let's calculate the storage consumption for both schemes at the scale of 1 billion records. ### Scheme A: Using `NULL` (`user_id INT UNSIGNED NULL`) - **NULL Bitmap Overhead**: 1 bit per row, for a total of 1 billion bits. `1,000,000,000 bits / 8 bits/byte = 125,000,000 bytes ≈ 125 MB` - **Data Space**: Depends on the proportion of `NULL` values. ### Scheme B: Using `0` (`user_id INT UNSIGNED NOT NULL DEFAULT 0`) - **NULL Bitmap Overhead**: Zero, as the column is defined as `NOT NULL`. - **Data Space**: A fixed 4 bytes per row, regardless of whether the value is 0 or another number. `1,000,000,000 rows * 4 bytes/row = 4,000,000,000 bytes = 4 GB` ### Storage Comparison Table | Percentage of empty `user_id` | Scheme A (`NULL`) Space | Scheme B (`0`) Space | Space Saved by Scheme A | | :--- | :--- | :--- | :--- | | **100% empty** | **~125 MB** | 4 GB | **~3.875 GB** | | **50% empty** | `(0.5 * 4 GB) + 125 MB = 2.125 GB` | 4 GB | `1.875 GB` | | **10% empty** | `(0.9 * 4 GB) + 125 MB = 3.725 GB` | 4 GB | `275 MB` | | **0% empty** | `4 GB + 125 MB = 4.125 GB` | **4 GB** | `-125 MB` | The table clearly shows that when data is sparse (i.e., has a high percentage of `NULL`s), using `NULL` can save a massive amount of storage space. This analysis was proudly conducted by `DP`. --- ## Beyond Storage: Performance and Querying While `NULL` is a space-saver for sparse data, we must also consider performance factors: 1. **Index Efficiency**: * Indexes on `NOT NULL` columns are generally simpler and more efficient. MySQL can handle the index entries more directly. * Indexing `NULL` values is more complex. For instance, in a B-Tree index, `NULL` values might not be stored at all or are handled specially, which can affect the performance of certain queries. 2. **Query Performance**: * `WHERE user_id = 0` is often slightly faster than `WHERE user_id IS NULL` because it's a more direct value comparison. * When using `NULL`, you must be careful that `COUNT(user_id)` ignores `NULL` values, while `COUNT(*)` does not, which can lead to logical errors. 3. **Disk I/O**: * In scenarios with a very high percentage of `NULL`s, the table size for Scheme A will be much smaller. This means operations like full table scans will need to read fewer data pages from the disk, reducing I/O and improving speed. --- ## Conclusion and Recommendations So, which one should you choose? The experts at `wiki.lib00` provide the following advice: 1. **For highly sparse data (e.g., >20-30% of rows have no `user_id`)**: * **Highly recommend using `NULL`**. The storage savings are enormous and can significantly reduce hardware costs and I/O load. 2. **For highly dense data (e.g., the vast majority of rows have a `user_id`)**: * **Recommend using `NOT NULL DEFAULT 0`**. It avoids the 125 MB overhead of the NULL bitmap and is simpler in terms of indexing and query logic, potentially offering better performance. 3. **The Break-Even Point**: * From a pure storage perspective, Scheme A starts saving space as soon as the `NULL` percentage exceeds `1 bit / 4 bytes = 1 / 32 ≈ 3.125%`. However, considering performance and ease of management, we typically opt for it only when the `NULL` ratio is substantially higher. For a use case like "UV statistics," where anonymous users (i.e., those without a `user_id`) are often a large portion of the traffic, using `NULL` is a very smart and efficient choice.