A quick guide on how to discover performance issues with MySQL RDS database.
Troubleshooting MySQL RDS instance or given any database is time-consuming and requires an in-depth understanding of how MySQL engine works. This blog explains some of the important aspects of how to troubleshoot MySQL RDS performance issues in a way that one can quickly figure out the most common aswell as advanced issues just by using readymade queries.
There are a thousand ways a database can be troubleshooting especially when you have a hosted MySQL server, however, when it comes to managed MySQL servers you have limited access to the server infrastructure and can only use a certain set of commands/queries to fine-tune/troubleshooting issues. On Managed MySQL RDS instance, you rely on Cloudwatch metrics, database logs and limited Admin access to understand performance issues, therefore, you might not able to apply all the available online solution to Managed MySQL RDS instance.
Hence for that reason, this article helps you on how to troubleshooting performance issues in Managed MySQL RDS instances. To start off, you will go through the most common issues of MySQL RDS and then discover commands which you could use to do a advanced troubleshooting on a MySQL RDS instance.
How to fix the most common issues with MySQL RDS?
Fixing databases issues is complex and takes time, why? Because you have to look at a lot of factors before you find the real problem. The most common MySQL RDS performance issues can be detected by setting up Cloudwatch Monitoring alerts. Issues like High CPU Utilization, Hight Memory Usage and Insufficient Disk Space are straight forward and easily fixable on AWS with an upgrade or tweak in DB Parameter Group.
Most common issues and their solutions:
- High CPU Utilization – A solution this is will be upgrading the RDS instance type as High CPU utilization means that the RDS instances need more compute power than it has at the moment.
- High Memory Usage or Low Free Memory Swap Space – This type of issues means two things the Memory for the given RDS instances is no enough or it can also mean that there are such queries of the task which are consuming the majority of the Memory. Instances type upgradation can be an easiest/quickest solution to this issue but definitely not always the right solution. Continue this blog to know how to locate what is causing memory-related issues.
- Insufficient Disk Space – This issue in itself means that your RDS instance is on low disk space. However, it also can be an issue caused by a bad/select/update query which is dumping temporary data onto disk in such case you will see a sudden rise disk utilization. The only way to stop such an issue is to identify and kill the query. Further in this blog, you will find aways to identify such queries.
- A sudden spike in the number of Databases Connections – This issue is a tough one to drill down, however, most of the time this issue is also related RDS instance running low on resources where an Upgrade would possibly solve the problem. The spike in a number of Database Connections basically means the database is piling connection, why? because it doesn’t have enough capacity to respond to each query. This can also be caused if the CPU Credit Balance is 0 for the given T Family RDS instances, If that’s the case then prefer upgrading it right away this will fix this issue.
These issues are straight forward and can be detected with proper monitoring, however, it not always the case of resource crunch, sometimes, the issues relate to non-peforming queries which can only be discovered using specific schema-level commands.
Commands to troubleshoot MySQL RDS instance performance issues
You can figure out MySQL RDS performance issues mostly by working with MySQL INFORMATION_SCHEMA
.
Note: Admin access to the database is required to run the queries listed below.
The first step will be to switch to the INFORMATION_SCHEMA
database: USE INFORMATION_SCHEMA;
List all the processes running
Display the ongoing threads/queries and their time duration.
MySQL privilege required SHOW PROCESS on *.*
SHOW PROCESSLIST
ORSHOW FULL PROCESSLIST
Query to determine any longest running transaction
MySQL privilege required SHOW PROCESS on *.*
SELECT id,state,command,time,left(replace(info,'\n',''),120)
FROM information_schema.processlist
WHERE command <> 'Sleep'
AND info NOT LIKE '%PROCESSLIST%'
ORDER BY time DESC LIMIT 50;
Get a list of the long-running transaction with the timestamp on when they actually started
MySQL privilege required SHOW PROCESS on *.*
SELECT trx.trx_id,
trx.trx_started,
trx.trx_mysql_thread_id
FROM INFORMATION_SCHEMA.INNODB_TRX trx
JOIN INFORMATION_SCHEMA.PROCESSLIST ps ON trx.trx_mysql_thread_id = ps.id
WHERE trx.trx_started < CURRENT_TIMESTAMP - INTERVAL 1 SECOND
AND ps.user != 'system_user';
Once you have the trx_mysql_thread_id
from the above query, you can get the thread ID using the following command (Note: performance schema should be enabled)
SELECT * FROM performance_schema.threads
WHERE processlist_id = threadID;
Advanced troubleshooting commands
Check INNODB Engine status
This command will give you the History list length. In case when the transactions are running for a long time, the old versions of data have to be kept until all the queries using the same copy of data commit.
This metric will give you the unpurged old row versions.
SHOW ENGINE INNODB STATUS
List all the Blocking Transactions
Display all the queries blocking.
MySQL privilege required GRANT SELECT on *.* or GRANT ALL ON *.*
SELECT * FROM INNODB_LOCKS WHERE LOCK_TRX_ID IN (SELECT BLOCKING_TRX_ID FROM INNODB_LOCK_WAITS);
List of locks on a particular table:
Note: Replace the database and table name in the command.
MySQL privilege required GRANT SELECT on *.* or GRANT ALL
SELECT * FROM INNODB_LOCKS WHERE LOCK_TABLE = db_name.table_name;
List of transactions waiting for locks:
List transactions which blocked by an ongoing query.
MySQL privilege required GRANT SELECT on *.* or GRANT ALL
SELECT TRX_ID, TRX_REQUESTED_LOCK_ID, TRX_MYSQL_THREAD_ID, TRX_QUERY FROM INNODB_TRX WHERE TRX_STATE = 'LOCK WAIT';
Check lock dependencies:
List all the related dependencies for a query which has locked certain tables. Useful to check the affected tables and fields.
MySQL privilege required GRANT SELECT on *.* or GRANT ALL
select r.trx_id AS waiting_trx_id, r.trx_mysql_thread_id AS waiting_thread, r.trx_query AS waiting_query, rl.lock_id AS waiting_lock_id, rl.lock_mode AS waiting_lock_mode, rl.lock_type AS waiting_lock_type, rl.lock_table AS waiting_lock_table, rl.lock_index AS waiting_lock_index, b.trx_id AS blocking_trx_id, b.trx_mysql_thread_id AS blocking_thread, b.trx_query AS blocking_query, bl.lock_id AS blocking_lock_id, bl.lock_mode AS blocking_lock_mode, bl.lock_type AS blocking_lock_type, bl.lock_table AS blocking_lock_table, bl.lock_index AS blocking_lock_index from ((((information_schema.INNODB_LOCK_WAITS w join information_schema.INNODB_TRX b on((b.trx_id = w.blocking_trx_id))) join information_schema.INNODB_TRX r on((r.trx_id = w.requesting_trx_id))) join information_schema.INNODB_LOCKS bl on((bl.lock_id = w.blocking_lock_id))) join information_schema.INNODB_LOCKS rl on((rl.lock_id = w.requested_lock_id)));
Additionally, there many other ways and issues related to MySQL RDS performance which also needs to be looked upon. AWS has listed the other most common issues and their solutions which will help optimize MySQL RDS instance performance.
Also, as a part of best practices, it is suggested that proper monitoring of a database is a must and would help to identify and address issues quickly. I would also recommend enabling Enhanced Monitoring feature for the most critical databases this would give real-time metrics of the database to foresee any issues.
RDS Instance with Burstable Performance
If you’re using T Family instance types I would recommend understanding the Burstable Performance aspect as it plays a vital role in T Family sized MySQL RDS instances. A T2 family instance burstable performance can be calculated as for eg.
You use an RDS MySQL instance is using t2.medium instance class. Each T2 instance continuously earns (at a millisecond-level resolution) a set rate of CPU credits per hour, depending on the instance size.
A t2.medium instance can earn 24 CPU credits per hour. If a T2 instance needs to burst above the baseline performance level, it spends the accrued credits. The more credits a T2 instance has accrued, the more time it can burst beyond its baseline when more performance is needed. The maximum amount of CPU credits a t2.medium instance can accrue is 576.
The number of credits an instance earns per hour can be expressed as a percentage of CPU utilization and is known as the baseline performance, and sometimes just as the baseline. A t2.micro earns 24 credits per hour resulting in a baseline performance of 20% * 2 = 40%.
Note: The performance of T family instance should never be taken as a baseline to measure database performance because of burstable performance which does not give a clear picture of database performance instead is it advised to switch database to M Famil or C family instance for more stable performance and clear baseline for performance testing.
Reference:
https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/burstable-credits-baseline-concepts.html
https://aws.amazon.com/premiumsupport/knowledge-center/blocked-mysql-query/
Best practices :
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.MySQLStorage
How to create cloudwatch alarm:
https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/CHAP_BestPractices.html#CHAP_BestPractices.EnhancedMonitoring