I recently faced an issue where some application code in a scheduled job was creating some DB locks, but due to some unforeseen issues, these locks were never being cleared. Although these scheduled jobs were doing a multitude of things, a simple example would be when making an API request that never actually completes due to a timeout, the process would almost fail silently and the code to release the lock would never be triggered.
You could argue that the aforementioned example, the potential improvement here would be to ensure that there are adequate max timeout values in place to handle such issues. In fact, this was something we did actually implement, but in order to ensure we have all bases covered this article will focus on setting a couple of DB config values which I will introduce now.
lock_timeout
This value is used to set the maximum amount of time to wait for a lock. For example, imagine you need to update the
bookings
table in your system. Perhaps your scheduled job which runs every day at midnight is going to import bookings
from an external system. Your code may lock the table whilst this scheduled jobs runs so that everything can be kept in
sync. Whilst this is happening, users may also be able to book on your system.
If we set lock_timeout
to 1 minute
, this would mean that the maximum amount of time the code which is trying to
handle the individual bookings would wait. At this point, it's really up to you how your code would handle this
scenario, perhaps you would raise an exception, or maybe you would retry later it really depends.
idle_in_transaction_session_timeout
As you probably realised, lock_timeout
is only useful from thex perspective of processes trying to access the
bookings
table. If we want to actually set a maximum amount of time that a session with open transactions will
persist, we have to set the idle_in_transaction_session_timeout
value. It is worth noting that setting this value
won't directly clear a database lock, but in setting this value, upon expiration the transaction (if not completed),
will roll back.
So back to our original example, if we set this second value to 1 minute
, imagine when our API request to import our
bookings experiences issues and just hangs, if we didn't set any timeout value. After 1 minute, the lock which we
created would automatically be removed when the transaction rolls back.
Setting these values
Depending on how your Postgres DB is set up, your configuration settings will need to be managed accordingly. If you are using a managed service such as AWS RDS, then you may find that it's not possible to set these values directly in the DB config, or in an IaC (infrastructure as code) tool like terraform for all connections. This was exactly the issue that I faced, so had to ensure that I set them specifically for each connection in the application code itself. This wasn't too much of an issue because the main application used the same connection throughout, so just needed to set these values once.
Testing
Once you have set the values, you may wish to check that they are set correctly. The best way to do this is to query the DB directly using the same connection details you applied the changes to. If you don't use the same connection then you might see a different value than expected, as this will likely fall back to the respective default value.
SHOW lock_timeout;
SHOW idle_in_transaction_session_timeout;
Final words
Although it's hard to say with any certainly about the actual requirements of you application. I would question whether these database locks are actually valid in the first instance. If like me, you are working on a mature application that you didn't build, then it's very likely that you won't have the full context as to why these exist in the first place. That being said, excessive locking is an anti-pattern, and having to implement something like this is probably just masking the real underlying issue.
Useful links