Static WAIT’s better than Dynamic WAIT’s in Automation ? ☄️

It’s almost considered a sin to use Static sleep’s in Test Automation code.
But using Dynamic sleep’s everywhere as a thumb rule can do more harm than good. I recently came across this scenario which demystifies this myth!

Our code was something like this.

1- Do an operation
2- Wait for 20s (expected time taken for step #1 to complete)
3- Hit an api from all the nodes (in our case 3) and check if the operation is successful from any node
4- Fail if none of the api’s had the positive response

pseudo code, pythonish!

do_some_work()
sleep(20)
test_pass = false
for node in nodes:
    resp = hit_api(node):
    if 'Success' in resp.text:
        test_pass = true
        break

note:- Only one node could respond a success response and this could be random

There was a suggestion to change the static sleep of 20 seconds to a dynamic one (4 retry 5 seconds each)

Now our code looked something like this.

1- Do an operation
2- Run a loop with 5s sleep in each iteration for 4 retries performing step #3.
3- Hit an api from all the nodes (in our case nodes=3) and check if the operation is successful from any node
4- Fail if none of the api’s had the positive response

do_some_work()
test_pass = false
for node in nodes:
    for retry in range(5):
        resp = hit_api(node):
        if ‘Success’ in resp.text:
            test_pass = true
            break
        else:
            sleep(5)
     if test_pass:   # break outer loop
         break

Sure at the first glance this looks more refined and optimal right.

But there are Two Drawbacks!
Firstly, when you know a certain operation will take a defined amount of time (in our case 20 seconds). There is really no point in retrying it in the initial 20 seconds. This will simply add computing overheads.

A combined approach is often better

When the task completion times are not certain go for retries.
For example if you know a task would take 60–80s.
Probably use a static sleep for 50s then put 6–8 retries with a sleep of 5s each.

Secondly, coming back to our example the drawback of using dynamic waits is explained below.

When the test is a success flow, really there is no difference. But during the failure scenario. If we used the old code the time to fail the test would have been x (time to do all other operations)+ 20 (static wait)+ y * 3(time taken by the verification api * 3 nodes)

if we assume x=1, y=1 seconds. Total execution time = 24s.

With the updated code. It would result to x + (5 * 4 * 3 * y)

1 + (5 * 4 * 3 * 1) = 61s.

(i.e. 5 seconds sleep * 4 retries * 3 nodes * 1 seconds time of execution for y). Because now the retry is happening at a node level (since this is a failure flow and the iterations would be exhausted), this notion can be easily missed if you are not careful.

So why did this happen?

This happened because when you have multiple nodes / clients from which you are polling the verification response of sorts, you really cannot say that retry only from the first node (technically it’s possible but it would make a spaghetti of a code). So the retry is now being carried over to every single node which made the difference.

p.s.

This difference could become huge based as few factors start to vary (if no. of nodes start to increase or no. of iterations increase then the difference would rise exponentially). So watch this out and Do take into consideration that systems could behave differently on various environments and wire up the automation accordingly.

Comments

No comments yet. Why don’t you start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *