I have been trying to use the Find Image and Click on Image to interact with and find elements in a remote desktop interface in development of a POC bot for a client.

I have gotten nothing but inaccuracy from these functions, regardless of the target or settings.

Case 1:
Take this page -

Looking for this item -

At 99% accuracy the bot found this location on the page (the red mark) -

Case 2:

Take this snippet of a larger window of the virtual desktop application -

The red marks correspond to the attempts by the bot to click on this button at 99% and 99.5% accuracy (lower or higher accuracy values were either wildly inaccurate or not found, respectively) -

The green mark corresponds to the attempt by the bot to click this button at 99.5% accuracy (lower or higher accuracy values were either wildly inaccurate or not found, respectively) -

Clearly the function is not working as required and I would like to know if there is anything I can do to improve my results, as the success of this bot requires these functions.


I’ve got a few questions:

  1. Have you tried to set a short delay before using these image functions? Try setting a short delay, say, 2 or 3 seconds to make the picture load completely before the search starts.
  2. How is your Citrix environment organized? How does it work? Is it working in the browser as an iframe?
  3. Could we access at least the first website or it isn’t public?

We’re investigating your case with our developers team.

Our developers team says that it may be connected with scales parameters set in the Windows system. Check them and make sure your fonts, icons and so on are not scaled in the system.

If this still doesn’t help then you may try the following workaround. Just implement the scale coefficient yourself and apply it to the resulting coordinates. Find the best coefficient and use it in your bot. You may put it to a config file or something like this. This is what you can try now.

Regarding those small pictures (“Clients”, “Window”), try to capture a bigger area around the words. Also make sure you specified where to click in the platform (if you click on the sample image, you will see the red dot there so you’re able to change its position)

As for those 3 questions above, I need them answered anyway.

  1. A short delay prior to the image functions doesn’t seem to have an effect, the result is the same either way.

  2. As far as I can tell from examining the HTML of the page, the Citrix is not done in an iFrame, it is run through their proprietary Citrix Receiver platform.

  3. Unfortunately the website isn’t public, I do not know whether or not I am allowed to provide access. I will see.

By using the scale coefficient in the bot, do you mean have the bot adjust the display scale itself at the start of operation through the GUI? Or is there a command line option that you know of that the bot can run prior to the rest of the operation?

I rather meant to choose the coefficient manually by using different values and finding the best one.
Have you checked you system settings? Is there any scale applied to the icons and fonts in the system?

Yes, I tested adjusting the scale factor in Windows and it worked to allow the bot to find the proper locations, I was just wondering what you meant for the implementation suggestion.

Could you please tell me if you have questions regarding the issue now?