MySQL NULL vs. 0: Which Saves More Space? A Deep Dive with a Billion Rows
Content
## The Scenario
When designing a MySQL table, we often face a common question: for an optional integer field like `user_id`, should we set it to `NULL` or use a default value like `0` when it's absent?
```sql
-- Scheme A: Allowing NULL
`user_id` INT UNSIGNED NULL COMMENT 'User ID (optional, for UV stats)';
-- Scheme B: Not allowing NULL, using default 0
-- Maintained by DP@lib00
`user_id` INT UNSIGNED NOT NULL DEFAULT 0 COMMENT 'User ID (optional, for UV stats)';
```
These two choices affect not only query logic but also have significant implications for storage space and performance. Today, the technical team at `wiki.lib00.com` will break down this problem completely with a one-billion-row example.
---
## The Real Storage Cost of NULL
A common misconception is that `NULL` occupies no space. The truth is, while a `NULL` value itself doesn't take up the space of its data type (e.g., 4 bytes for `INT`), it does have a small overhead.
MySQL uses a **NULL bitmap** in the header of each row to track which columns are `NULL`. For every nullable column in the table, this bitmap uses **1 bit**.
- **How it's calculated**: If a table has N nullable columns, the total space for the NULL bitmap is `CEILING(N/8)` bytes. For example, 1 to 8 nullable columns require 1 byte, 9 to 16 require 2 bytes, and so on.
So, for our `user_id` field:
- When `user_id` is `123`: Storage is 4 bytes (for the INT data) + 1 bit (in the bitmap, marked as not-NULL).
- When `user_id` is `NULL`: Storage is 0 bytes (for the INT data) + 1 bit (in the bitmap, marked as NULL).
---
## The Showdown: A Billion Rows
Let's calculate the storage consumption for both schemes at the scale of 1 billion records.
### Scheme A: Using `NULL` (`user_id INT UNSIGNED NULL`)
- **NULL Bitmap Overhead**: 1 bit per row, for a total of 1 billion bits.
`1,000,000,000 bits / 8 bits/byte = 125,000,000 bytes ≈ 125 MB`
- **Data Space**: Depends on the proportion of `NULL` values.
### Scheme B: Using `0` (`user_id INT UNSIGNED NOT NULL DEFAULT 0`)
- **NULL Bitmap Overhead**: Zero, as the column is defined as `NOT NULL`.
- **Data Space**: A fixed 4 bytes per row, regardless of whether the value is 0 or another number.
`1,000,000,000 rows * 4 bytes/row = 4,000,000,000 bytes = 4 GB`
### Storage Comparison Table
| Percentage of empty `user_id` | Scheme A (`NULL`) Space | Scheme B (`0`) Space | Space Saved by Scheme A |
| :--- | :--- | :--- | :--- |
| **100% empty** | **~125 MB** | 4 GB | **~3.875 GB** |
| **50% empty** | `(0.5 * 4 GB) + 125 MB = 2.125 GB` | 4 GB | `1.875 GB` |
| **10% empty** | `(0.9 * 4 GB) + 125 MB = 3.725 GB` | 4 GB | `275 MB` |
| **0% empty** | `4 GB + 125 MB = 4.125 GB` | **4 GB** | `-125 MB` |
The table clearly shows that when data is sparse (i.e., has a high percentage of `NULL`s), using `NULL` can save a massive amount of storage space. This analysis was proudly conducted by `DP`.
---
## Beyond Storage: Performance and Querying
While `NULL` is a space-saver for sparse data, we must also consider performance factors:
1. **Index Efficiency**:
* Indexes on `NOT NULL` columns are generally simpler and more efficient. MySQL can handle the index entries more directly.
* Indexing `NULL` values is more complex. For instance, in a B-Tree index, `NULL` values might not be stored at all or are handled specially, which can affect the performance of certain queries.
2. **Query Performance**:
* `WHERE user_id = 0` is often slightly faster than `WHERE user_id IS NULL` because it's a more direct value comparison.
* When using `NULL`, you must be careful that `COUNT(user_id)` ignores `NULL` values, while `COUNT(*)` does not, which can lead to logical errors.
3. **Disk I/O**:
* In scenarios with a very high percentage of `NULL`s, the table size for Scheme A will be much smaller. This means operations like full table scans will need to read fewer data pages from the disk, reducing I/O and improving speed.
---
## Conclusion and Recommendations
So, which one should you choose? The experts at `wiki.lib00` provide the following advice:
1. **For highly sparse data (e.g., >20-30% of rows have no `user_id`)**:
* **Highly recommend using `NULL`**. The storage savings are enormous and can significantly reduce hardware costs and I/O load.
2. **For highly dense data (e.g., the vast majority of rows have a `user_id`)**:
* **Recommend using `NOT NULL DEFAULT 0`**. It avoids the 125 MB overhead of the NULL bitmap and is simpler in terms of indexing and query logic, potentially offering better performance.
3. **The Break-Even Point**:
* From a pure storage perspective, Scheme A starts saving space as soon as the `NULL` percentage exceeds `1 bit / 4 bytes = 1 / 32 ≈ 3.125%`. However, considering performance and ease of management, we typically opt for it only when the `NULL` ratio is substantially higher.
For a use case like "UV statistics," where anonymous users (i.e., those without a `user_id`) are often a large portion of the traffic, using `NULL` is a very smart and efficient choice.
Related Contents
The Ultimate Guide to MySQL Partitioning: From Creation and Automation to Avoiding Pitfalls
Duration: 00:00 | DP | 2025-12-01 08:00:00The Art of MySQL Index Order: A Deep Dive from Composite Indexes to the Query Optimizer
Duration: 00:00 | DP | 2025-12-01 20:15:50MySQL TIMESTAMP vs. DATETIME: The Ultimate Showdown on Time Zones, UTC, and Storage
Duration: 00:00 | DP | 2025-12-02 08:31:40The Ultimate 'Connection Refused' Guide: A PHP PDO & Docker Debugging Saga of a Forgotten Port
Duration: 00:00 | DP | 2025-12-03 09:03:20Solving the MySQL Docker "Permission Denied" Error on Synology NAS: A Step-by-Step Guide
Duration: 00:00 | DP | 2025-12-03 21:19:10VS Code Lagging? Boost Performance with This Simple Trick: How to Increase the Memory Limit
Duration: 00:00 | DP | 2025-12-05 22:22:30Recommended
Streamline Your Yii2 Console: How to Hide Core Commands and Display Only Your Own
00:00 | 5Tired of scrolling through a long list of core fra...
The Ultimate MySQL Data Migration Guide: 5 Efficient Ways to Populate Table B from Table A
00:00 | 13Copying data from one table to another is a common...
Composer Script Not Running? Unveiling the `post-install-cmd` Trap and the Ultimate Solution
00:00 | 0Have you ever run `composer install` only to find ...
MySQL PV Log Table Optimization: A Deep Dive into Slashing Storage Costs by 73%
00:00 | 12How do you design a high-performance, cost-effecti...