Automating Screen Operations with Python (Includes Various Code Snippets for Beginners)


This article introduces methods to automate screen operations with Python.
The example used here is "Registering an XML file with Google Search Console," but these methods can be applied to other scenarios as well, such as "performing the same operation repeatedly with software created by historical figures."
(Note: These methods are based on recent versions of Windows, so they may not work on other operating systems or older versions of Windows. Please be aware of this...!)

What You Will Learn from This Article

  • How to automate screen operations with Python
    • Basic commands
    • Handy functions
  • Automation Example: Automating the process of registering an XML sitemap with Google Search Console
    • Step-by-step procedure
    • Code for automation
    • Execution results (video)

What Tools Are Used for Automation?

Various operations can be automated with Python.
In this article, the following libraries were used.
While these libraries alone can achieve quite a bit, there are many other useful libraries out there.

  • PyAutoGUI
    • Automates GUI operations such as mouse movement and clicks, and keyboard inputs.
  • pyperclip
    • Allows reading and writing to the clipboard.
  • subprocess
    • Can start external processes and manipulate their input/output.
    • Especially useful for executing command prompt operations within Python.
    • It's a standard Python library, so no separate installation is required.
  • win32gui, win32con, win32api
    • Allows access to Windows GUI elements and system resources.
  • time
    • Provides functionalities such as obtaining the current time, pausing execution (sleep), performance measurement, converting time expressions, and performing time arithmetic.
    • It's a standard Python library, so no separate installation is required.

Here are the installation commands.
Note that win32*** libraries are bundled and installed with pywin32!

pip install pyperclip pip install pyautogui pip install pywin32

What Operations Can Be Automated?

Here are some simple specific examples of the automation code used in this article.
These are just a few examples, but there are many other things you can automate!

Pressing Keys

With PyAutoGUI, you can automate the pressing of one or more keys.
There are many other keys you can press, so feel free to explore further!

import pyautogui as pa pa.hotkey('alt','d') # Focus the URL input field in Google Chrome pa.hotkey('f5') # Refresh the page in the browser

Typing Text

You can use PyAutoGUI and pyperclip to type text.
To avoid issues with switching between Japanese/English input modes, it's better to use the "copy to clipboard and paste" method.

import pyautogui as pa import pyperclip pyperclip.copy("hoge, hoge") # Copy text to clipboard pa.hotkey('ctrl','v') # Paste the text

Locating an Image on the Screen and Clicking

PyAutoGUI can search for images on the screen.
If found, it returns the center coordinates of the image's location.

import pyautogui as pa position_im = pa.locateOnScreen(img, confidence=0.8)

In this example, I defined a wait_disp function to wait for an image to appear.
Here's how it's used:

import pyautogui as pa def wait_disp(img, wait_time=10): # Waits for an image to appear on the screen. Returns the image's location if found. dt = 0.5 for i in range(int(wait_time/dt)): position_im = pa.locateOnScreen(img, confidence=0.8) if position_im: return position_im else: time.sleep(dt) return False position_im = wait_disp('./hoge.png') # Search for hoge.png on the screen pa.click(position_im, button='left', clicks=1) # Left-click the center of hoge.png

Launching Applications

You can use subprocess to launch specified applications.
For example, you can start Google Chrome with the following code.

chrome_path = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe" subprocess.Popen(chrome_path)

Adjusting Application Display Position/Screen, Maximizing, etc.

Using win32gui, win32con, and win32api, you can adjust the display state of applications.
Sometimes, you may need to specify the window size if the buttons to be displayed are not visible in a small window.
I'll skip the detailed explanation here!

import win32gui import win32con import win32api def get_chrome_window(): # Retrieves the handle of the most recently opened Chrome window. chrome_windows = [] def enum_handler(hwnd, lParam): if win32gui.IsWindowVisible(hwnd) and 'chrome' in win32gui.GetWindowText(hwnd).lower(): chrome_windows.append(hwnd) win32gui.EnumWindows(enum_handler, None) return chrome_windows[0] if chrome_windows else None def move_to_main_screen(hwnd, wait_time=10): # Switches the display monitor to the main device. dt = 0.5 n = wait_time/dt count=0 while count<=n: monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow(hwnd, win32con.MONITOR_DEFAULTTONEAREST)) main_monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow(0, win32con.MONITOR_DEFAULTTOPRIMARY)) if monitor_info['Device'] == main_monitor_info['Device']: break else: # Switch the monitor displaying hwnd win32gui.MoveWindow(hwnd, 0, 0, 500, 500, True) # Position the window at 500*500 from the top left corner time.sleep(dt) count +=1 # Adjusting the display window chrome_window = None while not chrome_window: chrome_window = get_chrome_window() time.sleep(0.5) move_to_main_screen(chrome_window) # Open in the main screen time.sleep(0.5) win32gui.ShowWindow(chrome_window, win32con.SW_MAXIMIZE) # Maximize the window

What Operations Are Automated This Time?

This time, the operation being automated is the registration of an XML sitemap with Google Search Console.
Specifically, the operation involves the following steps:

  1. Launch Google Chrome

  2. Adjust the display window

    • Open in the main window
    • Maximize the window
  3. Enter the Google Search Console URL in the URL field of Google Chrome and press the Enter key

  4. Confirm that the Google Search Console page has opened

    • Find sitemap1.png on the screen
  5. Enter the file name (sitemap.xml) in the specified field

    • After clicking sitemap1.png to focus on the input field, press the Tab key
    • Copy the file name to the clipboard and paste it into the input field
  6. Click the submit button (sitemap2.png)

  7. Confirm that the submission process is complete and refresh the page

    • When sitemap3.png (the button to close the submission completion popup) appears, click it
    • Wait for a moment and then press the F5 key

Here are the images used as triggers for the automated operations.

What Does the Code Look Like?

Here's the code!
At the very least, replace the URL at the beginning with your own!

import pyautogui as pa import pyperclip import time import subprocess import win32gui import win32con import win32api URL = "https://search.google.com/search-console/sitemaps?resource_id=https%3A%2F%2Fhitbug0.github.io%2F" def wait_disp(img, wait_time=10): # Waits for an image to appear on the screen. Returns the image's location if found. dt = 0.5 for i in range(int(wait_time/dt)): position_im = pa.locateOnScreen(img, confidence=0.8) if position_im: return position_im else: time.sleep(dt) return False def get_chrome_window(): # Retrieves the handle of the most recently opened Chrome window. chrome_windows = [] def enum_handler(hwnd, lParam): if win32gui.IsWindowVisible(hwnd) and 'chrome' in win32gui.GetWindowText(hwnd).lower(): chrome_windows.append(hwnd) win32gui.EnumWindows(enum_handler, None) return chrome_windows[0] if chrome_windows else None def move_to_main_screen(hwnd, wait_time=10): # Switches the display monitor to the main device. dt = 0.5 n = wait_time/dt count=0 while count<=n: monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow (hwnd, win32con.MONITOR_DEFAULTTONEAREST)) main_monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow(0, win32con.MONITOR_DEFAULTTOPRIMARY)) if monitor_info['Device'] == main_monitor_info['Device']: break else: # Switch the monitor displaying hwnd win32gui.MoveWindow(hwnd, 0, 0, 500, 500, True) # Position the window at 500*500 from the top left corner time.sleep(dt) count +=1 def main(): # Execute command to launch Google Chrome chrome_path = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe" subprocess.Popen(chrome_path) # Adjust the display window time.sleep(0.5) # Wait time for Chrome to launch chrome_window = None while not chrome_window: chrome_window = get_chrome_window() time.sleep(0.5) move_to_main_screen(chrome_window) # Open in the main screen time.sleep(0.5) win32gui.ShowWindow(chrome_window, win32con.SW_MAXIMIZE) # Maximize the window time.sleep(0.5) # Enter URL pa.hotkey('alt','d') time.sleep(2) pyperclip.copy(URL) pa.hotkey('ctrl','v') time.sleep(0.05) pa.hotkey('enter') # Find and click sitemap1.png on the screen time.sleep(0.05) position_im = wait_disp('./contents/img/sitemap1.png') pa.click(position_im, button='left', clicks=1) # Enter file name and submit pa.hotkey('tab') time.sleep(0.05) pyperclip.copy("sitemap.xml") pa.hotkey('ctrl','v') time.sleep(0.05) position_im = wait_disp('./contents/img/sitemap2.png') pa.click(position_im, button='left', clicks=1) # Refresh page after submission is complete position_im = wait_disp('./contents/img/sitemap3.png', wait_time=40) time.sleep(0.05) pa.click(position_im, button='left', clicks=1) time.sleep(1) pa.hotkey('f5') main()

What Does the Execution Result Look Like?

I'll upload a video soon!

Summary

Using the example of "registering an XML file with Google Search Console," I introduced a method to automate screen operations with Python.
This method can be useful in scenarios like "performing the same operation repeatedly with software created by historical figures."
I have personally used this method for work several times, so if you find it useful, please give it a try!