MySQL PV Log Table Optimization: A Deep Dive into Slashing Storage Costs by 73%
Content
## Background
Logging user page views (PV) is a common requirement in website and application development. This log data can grow enormous and rapidly. Without proper table design, it can quickly become a bottleneck for storage costs and performance. This article uses a scenario of 100,000 daily PVs to demonstrate how to dramatically improve storage costs and performance through a series of optimization techniques, starting from an initial log table design.
### Initial Table Structure
Let's assume our initial PV log table, `content_pv_log`, is designed as follows to record every access in real-time:
```sql
-- Original PV access log table
CREATE TABLE `content_pv_log` (
`id` BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`content_id` INT UNSIGNED NOT NULL COMMENT 'Associated content ID',
`user_id` INT UNSIGNED COMMENT 'User ID (optional, for UV stats)',
`ip` CHAR(32) COMMENT 'IP value (for UV stats)',
`user_agent` VARCHAR(255) COMMENT 'User Agent',
`referer` VARCHAR(500) COMMENT 'Referrer page',
`accessed_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT 'Access time',
INDEX `idx_content_accessed` (`content_id`, `accessed_at`),
INDEX `idx_accessed_at` (`accessed_at`)
) ENGINE=InnoDB COMMENT='Raw content PV access log';
```
**Initial Analysis:**
- **High Storage Footprint:** The `user_agent` and `referer` `VARCHAR` fields consume the majority of the space for a single row, estimated at around **359 bytes per record**.
- **Index Overhead:** Two secondary indexes need to be maintained on every insert, increasing write latency and disk space usage.
---
## Clarifying Optimization Goals
Before diving into optimization, we must clarify the business requirements. The core needs are:
1. **PV and UV Statistics:** We only need to know the total view count and unique visitor count for each piece of content.
2. **Offline Processing:** The statistics job runs once daily during off-peak hours, so real-time query performance is not a high priority.
3. **Cost Control:** Minimize storage and database resource consumption as much as possible.
Based on these goals, we can formulate a clear optimization strategy: **The primary responsibility of this table is efficient writes, with the read pressure shifted to a once-daily batch processing job**.
---
## Optimization Steps
### Step 1: Trim the Fat - Remove Unnecessary Fields
Since we don't need to perform in-depth analysis on `user_agent` and `referer`, storing them is the biggest waste. Decisively removing these two fields is the first and most effective step in our optimization.
### Step 2: Re-evaluate the Indexing Strategy
In a write-heavy model, fewer indexes mean faster writes. However, our daily statistics and data cleanup jobs still rely on indexes for efficiency.
- `idx_content_accessed (content_id, accessed_at)`: This composite index is primarily for queries like "find all visits for a specific content within a time range." For our daily full-data aggregation scenario, it's overkill and significantly slows down every `INSERT`.
- `idx_accessed_at (accessed_at)`: This index is crucial. It efficiently locates the data range for "yesterday" for our statistics job and helps find expired data for cleanup.
**Decision:** Drop `idx_content_accessed` and keep `idx_accessed_at`.
### The Optimized Table Structure
After these two steps, we get a leaner, more focused log table, which we'll call `lib00_pv_log`:
```sql
-- Optimized PV access log table (write-only)
CREATE TABLE `lib00_pv_log` (
`id` BIGINT UNSIGNED PRIMARY KEY AUTO_INCREMENT,
`content_id` INT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED,
`ip` CHAR(32),
`accessed_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
INDEX `idx_accessed_at` (`accessed_at`)
) ENGINE=InnoDB COMMENT='Content PV log - write-optimized';
```
---
## Quantifying the Gains
Let's see the tangible benefits of this optimization.
| Metric | Original Schema | Optimized Schema | Improvement/Saving |
| --------------------- | --------------- | ----------------- | --------------------- |
| **Row Size** | ~359 Bytes | ~97 Bytes | **↓ 73%** |
| **Daily Storage** | ~34.2 MB | ~9.26 MB | **↓ 73%** |
| **Annual Storage** | ~12.5 GB | ~3.38 GB | **Saves ~9.12 GB** |
| **Write Perf. (TPS)** | ~20,000/s | ~65,000/s | **↑ 3.25x** |
| **Daily Stats Time** | ~1.0 sec | ~1.5 sec | **Slower by 0.5s** |
**Analysis:**
- **Drastic Reduction in Storage Costs:** Saving over 9 GB annually is a significant win for any long-running system.
- **Massive Write Performance Boost:** The application becomes more responsive, and the database load is reduced.
- **The Only Trade-off:** The daily statistics job is 0.5 seconds slower. For a background task running in the middle of the night, this is a completely acceptable sacrifice.
---
## The Ultimate Upgrade: Partitioning and IP Encoding
For ultimate performance and manageability, we can introduce two advanced techniques.
### 1. Use Table Partitioning
For time-series data like logs, partitioning by time is a best practice. It offers two major advantages:
- **Improved Query Performance:** When calculating stats for yesterday, the database only needs to scan yesterday's partition, not the entire massive table.
- **Instantaneous Data Deletion:** Removing old data is no longer a slow `DELETE` operation. Instead, you `DROP PARTITION`, which is a millisecond-level DDL operation and generates minimal binlog.
### 2. IP Address Encoding
Using `CHAR(32)` to store an IPv4 address is inefficient. We can use MySQL's built-in `INET_ATON()` function to convert the IP address into a 4-byte unsigned integer (`INT UNSIGNED`) and use `INET_NTOA()` to convert it back when needed.
- **Space Savings:** `CHAR(32)` (32 bytes) -> `INT UNSIGNED` (4 bytes). This single change reduces each row by another 28 bytes.
### Final Recommended Schema (by DP@lib00)
```sql
CREATE TABLE `content_pv_log_final` (
`id` BIGINT UNSIGNED NOT NULL AUTO_INCREMENT,
`content_id` INT UNSIGNED NOT NULL,
`user_id` INT UNSIGNED,
`ip` INT UNSIGNED COMMENT 'IP converted using INET_ATON()',
`accessed_at` DATETIME NOT NULL DEFAULT CURRENT_TIMESTAMP,
PRIMARY KEY (`id`, `accessed_at`), -- Include partition key in the primary key
INDEX `idx_accessed_at` (`accessed_at`)
) ENGINE=InnoDB
PARTITION BY RANGE (TO_DAYS(accessed_at)) (
PARTITION p20240101 VALUES LESS THAN (TO_DAYS('2024-01-02')),
PARTITION p20240102 VALUES LESS THAN (TO_DAYS('2024-01-03')),
-- ... A script is needed to automatically create new partitions daily
PARTITION p_future VALUES LESS THAN MAXVALUE
);
```
With this ultimate solution, the **total storage saving rate reaches an impressive 82%**, while query and maintenance efficiency are also maximized.
---
## Conclusion
Database table design is not a one-size-fits-all task. **Targeted optimization based on the actual read/write patterns of your application is key**. For write-intensive logging applications, significant improvements can be achieved through the following strategies:
1. **Be Minimalist with Fields:** Only store what is absolutely necessary.
2. **Minimize Indexes:** Reduce write overhead by keeping only the indexes essential for batch processing and data cleanup.
3. **Embrace Partitioning:** It's the silver bullet for managing time-series data.
4. **Optimize Data Types:** For instance, use integers to store IP addresses.
By applying these principles, as demonstrated in the logging system at `wiki.lib00.com`, we can build robust systems that can handle massive data ingestion while remaining extremely cost-effective.
Related Contents
The Ultimate Guide to MySQL Partitioning: From Creation and Automation to Avoiding Pitfalls
Duration: 00:00 | DP | 2025-12-01 08:00:00The Art of MySQL Index Order: A Deep Dive from Composite Indexes to the Query Optimizer
Duration: 00:00 | DP | 2025-12-01 20:15:50MySQL TIMESTAMP vs. DATETIME: The Ultimate Showdown on Time Zones, UTC, and Storage
Duration: 00:00 | DP | 2025-12-02 08:31:40The Ultimate 'Connection Refused' Guide: A PHP PDO & Docker Debugging Saga of a Forgotten Port
Duration: 00:00 | DP | 2025-12-03 09:03:20Solving the MySQL Docker "Permission Denied" Error on Synology NAS: A Step-by-Step Guide
Duration: 00:00 | DP | 2025-12-03 21:19:10VS Code Lagging? Boost Performance with This Simple Trick: How to Increase the Memory Limit
Duration: 00:00 | DP | 2025-12-05 22:22:30Recommended
The Magic of PHP Enums: Elegantly Convert an Enum to a Key-Value Array with One Line of Code
00:00 | 5How do you dynamically get all statuses of a model...
From Phantom Conflicts to Docker Permissions: A Deep Dive into Debugging an Infinite Loop in a Git Hook for an AI Assistant
00:00 | 21This article documents a complete technical troubl...
The SQL LIKE Underscore Trap: How to Correctly Match a Literal '_'?
00:00 | 9Why does a SQL query with `LIKE 't_%'` incorrectly...
Solved: Fixing the 'TS2769: No overload matches this call' Error with vue-i18n in Vite
00:00 | 9Struggling with the TypeScript error TS2769 when u...