Improving Performance with SQL Query Optimization
A practical SQL performance optimization guide covering slow query root cause analysis with EXPLAIN ANALYZE, index design, and query refactoring
Problem
Required Tools
The target database server. Optimization strategies differ by version, so verifying the exact version is essential.
Visually inspects the query execution plan. Adding ANALYZE measures actual execution time and row counts.
Automatically records queries that exceed a configured threshold (e.g., 1 second). This is the first step in identifying performance bottleneck queries.
Analyzes the slow query log and ranks the most frequent and slowest queries.
Solution Steps
Enable slow query log and identify slow queries
First, enable the slow query log to accurately identify bottleneck queries in the production environment. Setting long_query_time to 1 second will log all queries taking longer than 1 second. Once logs accumulate, analyze them with mysqldumpslow or pt-query-digest to optimize the most impactful queries first.
-- MySQL: Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
-- Verify current settings
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';
-- Analyze the log (from terminal)
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow.logAnalyze execution plan with EXPLAIN
Run EXPLAIN ANALYZE on the problematic query to analyze its execution plan. Key items to watch: - type: ALL (full table scan) means no index is being used - rows: The higher the estimated scanned rows, the more inefficient - Extra: Using filesort, Using temporary indicate additional sorting/temp table usage - key: NULL means no index is being used at all
-- The problematic original query
EXPLAIN ANALYZE
SELECT o.order_id, o.order_date, o.total_amount,
c.customer_name, c.email,
p.product_name, oi.quantity, oi.unit_price
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
AND c.customer_name LIKE '%John Smith%'
ORDER BY o.order_date DESC
LIMIT 50;
-- Example execution plan result (problem indicators)
-- type: ALL, rows: 5000000, Extra: Using where; Using filesort
-- -> Full table scan + filesort detected!Add appropriate indexes
Design the necessary indexes based on the EXPLAIN results. Column order in composite indexes is crucial. Place equality (=) condition columns first, and range condition columns later in the WHERE clause. Creating covering indexes allows the query to be processed using only the index without reading the actual table, significantly improving performance.
-- 1) orders table: composite index for date range + sorting
CREATE INDEX idx_orders_date_customer
ON orders (order_date, customer_id);
-- 2) order_items table: JOIN key index
CREATE INDEX idx_order_items_order_product
ON order_items (order_id, product_id);
-- 3) customers table: for name search (only LIKE 'keyword%' can use index)
CREATE INDEX idx_customers_name
ON customers (customer_name);
-- Update statistics after creating indexes
ANALYZE TABLE orders, order_items, customers, products;
-- Verify index list
SHOW INDEX FROM orders;Refactor queries (convert subqueries to JOINs)
Converting inefficient subqueries to JOINs significantly improves the execution plan. Also, replace SELECT * with explicitly specified columns, and replace LIKE '%keyword%' patterns with fulltext indexes (FULLTEXT) or a dedicated search engine (Elasticsearch). Switching pagination from OFFSET-based to cursor (Keyset) pagination makes a huge performance difference with large datasets.
-- [Before] Subquery approach - very slow
SELECT * FROM orders
WHERE customer_id IN (
SELECT customer_id FROM customers
WHERE customer_name LIKE '%John Smith%'
)
AND order_date BETWEEN '2024-01-01' AND '2024-12-31';
-- [After] JOIN approach + select only needed columns + cursor pagination
SELECT o.order_id, o.order_date, o.total_amount,
c.customer_name, c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
AND c.customer_name LIKE 'John Smith%' -- Remove leading wildcard
AND o.order_id < ? -- Cursor-based pagination
ORDER BY o.order_id DESC
LIMIT 50;Verify optimization results and set up monitoring
After adding indexes and refactoring queries, always measure the improvement with EXPLAIN ANALYZE. Check whether type changed from ALL to ref or range, whether the rows count decreased, and whether filesort disappeared from Extra. Also verify that actual execution time dropped below the target (e.g., 100ms). Continuous monitoring is necessary even after performance improvements.
-- Check execution plan after optimization
EXPLAIN ANALYZE
SELECT o.order_id, o.order_date, o.total_amount,
c.customer_name, c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
AND c.customer_name LIKE 'John Smith%'
ORDER BY o.order_date DESC
LIMIT 50;
-- Expected result: type=range, rows=1200, Extra=Using index condition
-- Execution time: 5200ms -> 45ms (approximately 115x improvement)
-- Continuous monitoring with Performance Schema
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;Core Code
Composite index design and optimized query. Use >= / < instead of BETWEEN for date ranges, and remove leading wildcards from LIKE to maximize index utilization.
-- ========================================
-- Core: EXPLAIN ANALYZE + Indexes + Optimized Query
-- ========================================
-- Step 1: Create composite indexes (column order matters)
CREATE INDEX idx_orders_date_cust ON orders (order_date DESC, customer_id);
CREATE INDEX idx_oi_order_prod ON order_items (order_id, product_id, quantity, unit_price);
-- Step 2: Optimized query
EXPLAIN ANALYZE
SELECT o.order_id,
o.order_date,
o.total_amount,
c.customer_name,
COUNT(oi.order_item_id) AS item_count
FROM orders o
INNER JOIN customers c
ON o.customer_id = c.customer_id
INNER JOIN order_items oi
ON o.order_id = oi.order_id
WHERE o.order_date >= '2024-01-01'
AND o.order_date < '2025-01-01' -- Use range instead of BETWEEN (better index utilization)
AND c.customer_name LIKE 'John Smith%' -- Remove leading wildcard
GROUP BY o.order_id, o.order_date, o.total_amount, c.customer_name
ORDER BY o.order_date DESC
LIMIT 50;Common Mistakes
Using SELECT * which retrieves all columns unnecessarily
Explicitly list only the required columns. SELECT * defeats covering indexes and increases network transfer. Specify the minimum columns needed, such as SELECT o.order_id, o.order_date, c.customer_name.
Creating too many indexes, degrading write performance
Having too many indexes means all indexes must be updated on every INSERT/UPDATE/DELETE, significantly degrading write performance. Analyze actual query patterns first and create only essential indexes. Generally, no more than 5 indexes per table is recommended.
Using double-sided wildcards with LIKE '%keyword%' (causes full table scan)
A leading wildcard (%) prevents the use of B-Tree indexes. Change to LIKE 'keyword%', or adopt a FULLTEXT INDEX or a dedicated search engine like Elasticsearch.
Using OFFSET-based pagination causing extremely slow queries for later pages
OFFSET 1000000 internally reads and discards 1 million rows. Switch to cursor (Keyset) pagination using WHERE id < last_seen_id ORDER BY id DESC LIMIT 50.