What happens if you have a SQL Agent job, but the logic is not aware (or cannot be aware) that you may have an Availability Group in place. There is nothing built-in to Agent to account for this, but there is a way can you configure an Agent job to check for the existence of an AG before proceeding.
Here is the scenario.
First, we have a 2-node cluster running a synchronous Availability Group with automatic failover and readable secondaries turned off. Next, we have a requirement to run a SQL Agent job every hour to collect data from a user database. Since the AG is configured for automatic failover, we need to make sure the job will continue to run successfully regardless of which node is active.
With readable secondaries turned off, this creates a problem for any Agent job that needs to read from those user databases. Those jobs will fail with an error saying the database not accessible for queries.
We can easily get around this issue by using a single query and an extra job step.
First, we need to add a TSQL job step that we will call “AG Check” to execute the following query.
IF sys.fn_hadr_is_primary_replica('WSS_Content') <> 1
THROW 50000, 'This is not the primary replica.', 1;
This query uses a DMF, sys.fn_hadr_is_primary_replica, to determine if a database is a part of the primary or secondary replica. The function will return a value of 1 if we are on the primary replica, so if it returns anything else then we want to throw an error. The error will allow us to change the behavior of the job.
Note: sys.fn_hadr_is_primary_replica was introduced in SQL Server 2014; however, I wrote a function that can do the same thing for SQL Server 2012.
Next, on the Advanced tab, we need to change the On Failure Action to “Quit the job reporting success”, then click OK.
We want to use “Quit the job reporting success” in stead of “Quit the job reporting failure”, so the overall result of the job will be successful. That way we will not cause additional email alerts to be sent.
Finally, on the Steps tab, we need to use the Move arrows to make sure AG Check is the first step on the list, and make sure it is listed as the start step.
The result is we that we now have a job with two steps. The first step checks the status of the replica. If it succeeds, then we can proceed to the next step to do actual work. If the first step fails, then we gracefully exit the job.
There are other ways you can setup an Agent job to account for an Availability group, but this method is quick and easy to setup for almost any environment.
Nice work Patrick! This tip has proved very useful in a production environment
If I remember correctly, we went with
DATABASEPROPERTYEX(DB_NAME(), 'Updateability') = 'READ_WRITE'
. It does not require VIEW SERVER STATE permission, and it will help sorting out cases, where a database is secondary in an AG, but primary in a DAG.Note however, that it is not the same as checking sys.databases.
Excellent fast fix for a job gone haywire after last nights prod patching and failover! And pointed out that I needed to create the jobs on the other instance as well.
Very cool and simple solution. Thanks a lot Patrick!
Thanks Patrick! Very nice article. The error produced causes our monitoring software to think the job failed even if our Job Step says to Quit Reporting Success. So I had to go another route.
I also like the ability to use code that doesn’t have to change i.e. Database Name, Listener Name, or even Job Name etc. So I used info provided by Wayne Sheffield to use a SQL Agent token.
https://blog.waynesheffield.com/wayne/archive/2018/04/primary-replica-jobs/
This gives me a nice clean AG_Test Step 1 that can be added to any SQL Agent Job.
IF (SELECT primary_replica FROM sys.dm_hadr_availability_group_states) is NULL
BEGIN
PRINT ‘This is a not an AG member server. –Not an AG Member Server and Job Execution should continue.’
END
ELSE IF (SELECT primary_replica FROM sys.dm_hadr_availability_group_states) = @@Servername
BEGIN
PRINT ‘This is a AG Primary Replica. –Do Nothing. This is an AG Primary Replica and Job Execution should continue.’
END
ELSE IF (Select primary_replica FROM sys.dm_hadr_availability_group_states) @@Servername
BEGIN
PRINT ‘This is a AG Secondary Replica. –Stop Job. This is an AG Secondary Replica and we do not want Job Execution to continue on this Node.’
DECLARE @job_id UNIQUEIDENTIFIER;
SET @job_id = CONVERT(UNIQUEIDENTIFIER, CONVERT(VARBINARY(34), $(ESCAPE_SQUOTE(JOBID)), 1));
EXECUTE msdb.dbo.sp_stop_job @job_id = @job_id;
END
The result is that if the Server executing the SQL Agent Job is not an AG member server or is an AG Primary, it does nothing. If it is an AG member server NOT the Primary, it cancels the Job.
The Job is stopped and the log shows as cancelled. No more error thrown to cause “false” failure alerts.
Thanks for providing a great explanation and helping me think through this. Hope this Job Step is useful for others in the community.
Bill