liminfo

Improving Performance with SQL Query Optimization

A practical SQL performance optimization guide covering slow query root cause analysis with EXPLAIN ANALYZE, index design, and query refactoring

SQL query optimizationslow query improvementEXPLAIN analysisindex tuningexecution planMySQL performancePostgreSQL optimizationslow query logsubquery to JOIN conversiondatabase tuning

Problem

While running an e-commerce site, the order list query takes over 5 seconds, causing severely slow page loads. The orders table has accumulated over 5 million rows, and the query involves JOINs with the customers table and products table. Response times spike dramatically when using date range filters combined with customer name searches. Since this is a screen the operations team uses daily, immediate performance improvement is required.

Required Tools

MySQL / PostgreSQL

The target database server. Optimization strategies differ by version, so verifying the exact version is essential.

EXPLAIN / EXPLAIN ANALYZE

Visually inspects the query execution plan. Adding ANALYZE measures actual execution time and row counts.

slow query log

Automatically records queries that exceed a configured threshold (e.g., 1 second). This is the first step in identifying performance bottleneck queries.

mysqldumpslow / pt-query-digest

Analyzes the slow query log and ranks the most frequent and slowest queries.

Solution Steps

1

Enable slow query log and identify slow queries

First, enable the slow query log to accurately identify bottleneck queries in the production environment. Setting long_query_time to 1 second will log all queries taking longer than 1 second. Once logs accumulate, analyze them with mysqldumpslow or pt-query-digest to optimize the most impactful queries first.

-- MySQL: Enable slow query log
SET GLOBAL slow_query_log = 'ON';
SET GLOBAL long_query_time = 1;
SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';

-- Verify current settings
SHOW VARIABLES LIKE 'slow_query%';
SHOW VARIABLES LIKE 'long_query_time';

-- Analyze the log (from terminal)
-- mysqldumpslow -s t -t 10 /var/log/mysql/slow.log
2

Analyze execution plan with EXPLAIN

Run EXPLAIN ANALYZE on the problematic query to analyze its execution plan. Key items to watch: - type: ALL (full table scan) means no index is being used - rows: The higher the estimated scanned rows, the more inefficient - Extra: Using filesort, Using temporary indicate additional sorting/temp table usage - key: NULL means no index is being used at all

-- The problematic original query
EXPLAIN ANALYZE
SELECT o.order_id, o.order_date, o.total_amount,
       c.customer_name, c.email,
       p.product_name, oi.quantity, oi.unit_price
FROM orders o
JOIN customers c ON o.customer_id = c.customer_id
JOIN order_items oi ON o.order_id = oi.order_id
JOIN products p ON oi.product_id = p.product_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
  AND c.customer_name LIKE '%John Smith%'
ORDER BY o.order_date DESC
LIMIT 50;

-- Example execution plan result (problem indicators)
-- type: ALL, rows: 5000000, Extra: Using where; Using filesort
-- -> Full table scan + filesort detected!
3

Add appropriate indexes

Design the necessary indexes based on the EXPLAIN results. Column order in composite indexes is crucial. Place equality (=) condition columns first, and range condition columns later in the WHERE clause. Creating covering indexes allows the query to be processed using only the index without reading the actual table, significantly improving performance.

-- 1) orders table: composite index for date range + sorting
CREATE INDEX idx_orders_date_customer
  ON orders (order_date, customer_id);

-- 2) order_items table: JOIN key index
CREATE INDEX idx_order_items_order_product
  ON order_items (order_id, product_id);

-- 3) customers table: for name search (only LIKE 'keyword%' can use index)
CREATE INDEX idx_customers_name
  ON customers (customer_name);

-- Update statistics after creating indexes
ANALYZE TABLE orders, order_items, customers, products;

-- Verify index list
SHOW INDEX FROM orders;
4

Refactor queries (convert subqueries to JOINs)

Converting inefficient subqueries to JOINs significantly improves the execution plan. Also, replace SELECT * with explicitly specified columns, and replace LIKE '%keyword%' patterns with fulltext indexes (FULLTEXT) or a dedicated search engine (Elasticsearch). Switching pagination from OFFSET-based to cursor (Keyset) pagination makes a huge performance difference with large datasets.

-- [Before] Subquery approach - very slow
SELECT * FROM orders
WHERE customer_id IN (
  SELECT customer_id FROM customers
  WHERE customer_name LIKE '%John Smith%'
)
AND order_date BETWEEN '2024-01-01' AND '2024-12-31';

-- [After] JOIN approach + select only needed columns + cursor pagination
SELECT o.order_id, o.order_date, o.total_amount,
       c.customer_name, c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
  AND c.customer_name LIKE 'John Smith%'   -- Remove leading wildcard
  AND o.order_id < ?                       -- Cursor-based pagination
ORDER BY o.order_id DESC
LIMIT 50;
5

Verify optimization results and set up monitoring

After adding indexes and refactoring queries, always measure the improvement with EXPLAIN ANALYZE. Check whether type changed from ALL to ref or range, whether the rows count decreased, and whether filesort disappeared from Extra. Also verify that actual execution time dropped below the target (e.g., 100ms). Continuous monitoring is necessary even after performance improvements.

-- Check execution plan after optimization
EXPLAIN ANALYZE
SELECT o.order_id, o.order_date, o.total_amount,
       c.customer_name, c.email
FROM orders o
INNER JOIN customers c ON o.customer_id = c.customer_id
WHERE o.order_date BETWEEN '2024-01-01' AND '2024-12-31'
  AND c.customer_name LIKE 'John Smith%'
ORDER BY o.order_date DESC
LIMIT 50;

-- Expected result: type=range, rows=1200, Extra=Using index condition
-- Execution time: 5200ms -> 45ms (approximately 115x improvement)

-- Continuous monitoring with Performance Schema
SELECT * FROM performance_schema.events_statements_summary_by_digest
ORDER BY SUM_TIMER_WAIT DESC
LIMIT 10;

Core Code

Composite index design and optimized query. Use >= / < instead of BETWEEN for date ranges, and remove leading wildcards from LIKE to maximize index utilization.

-- ========================================
-- Core: EXPLAIN ANALYZE + Indexes + Optimized Query
-- ========================================

-- Step 1: Create composite indexes (column order matters)
CREATE INDEX idx_orders_date_cust ON orders (order_date DESC, customer_id);
CREATE INDEX idx_oi_order_prod ON order_items (order_id, product_id, quantity, unit_price);

-- Step 2: Optimized query
EXPLAIN ANALYZE
SELECT o.order_id,
       o.order_date,
       o.total_amount,
       c.customer_name,
       COUNT(oi.order_item_id) AS item_count
FROM orders o
INNER JOIN customers c
  ON o.customer_id = c.customer_id
INNER JOIN order_items oi
  ON o.order_id = oi.order_id
WHERE o.order_date >= '2024-01-01'
  AND o.order_date < '2025-01-01'    -- Use range instead of BETWEEN (better index utilization)
  AND c.customer_name LIKE 'John Smith%' -- Remove leading wildcard
GROUP BY o.order_id, o.order_date, o.total_amount, c.customer_name
ORDER BY o.order_date DESC
LIMIT 50;

Common Mistakes

Using SELECT * which retrieves all columns unnecessarily

Explicitly list only the required columns. SELECT * defeats covering indexes and increases network transfer. Specify the minimum columns needed, such as SELECT o.order_id, o.order_date, c.customer_name.

Creating too many indexes, degrading write performance

Having too many indexes means all indexes must be updated on every INSERT/UPDATE/DELETE, significantly degrading write performance. Analyze actual query patterns first and create only essential indexes. Generally, no more than 5 indexes per table is recommended.

Using double-sided wildcards with LIKE '%keyword%' (causes full table scan)

A leading wildcard (%) prevents the use of B-Tree indexes. Change to LIKE 'keyword%', or adopt a FULLTEXT INDEX or a dedicated search engine like Elasticsearch.

Using OFFSET-based pagination causing extremely slow queries for later pages

OFFSET 1000000 internally reads and discards 1 million rows. Switch to cursor (Keyset) pagination using WHERE id < last_seen_id ORDER BY id DESC LIMIT 50.

Related liminfo Services