Automating Screen Operations with Python (Includes Various Code Snippets for Beginners)
This article introduces methods to automate screen operations with Python.
The example used here is "Registering an XML file with Google Search Console," but these methods can be applied to other scenarios as well, such as "performing the same operation repeatedly with software created by historical figures."
(Note: These methods are based on recent versions of Windows, so they may not work on other operating systems or older versions of Windows. Please be aware of this...!)
What You Will Learn from This Article
- How to automate screen operations with Python
- Basic commands
- Handy functions
- Automation Example: Automating the process of registering an XML sitemap with Google Search Console
- Step-by-step procedure
- Code for automation
- Execution results (video)
What Tools Are Used for Automation?
Various operations can be automated with Python.
In this article, the following libraries were used.
While these libraries alone can achieve quite a bit, there are many other useful libraries out there.
- PyAutoGUI
- Automates GUI operations such as mouse movement and clicks, and keyboard inputs.
- pyperclip
- Allows reading and writing to the clipboard.
- subprocess
- Can start external processes and manipulate their input/output.
- Especially useful for executing command prompt operations within Python.
- It's a standard Python library, so no separate installation is required.
- win32gui, win32con, win32api
- Allows access to Windows GUI elements and system resources.
- time
- Provides functionalities such as obtaining the current time, pausing execution (sleep), performance measurement, converting time expressions, and performing time arithmetic.
- It's a standard Python library, so no separate installation is required.
Here are the installation commands.
Note that win32***
libraries are bundled and installed with pywin32
!
pip install pyperclip
pip install pyautogui
pip install pywin32
What Operations Can Be Automated?
Here are some simple specific examples of the automation code used in this article.
These are just a few examples, but there are many other things you can automate!
Pressing Keys
With PyAutoGUI, you can automate the pressing of one or more keys.
There are many other keys you can press, so feel free to explore further!
import pyautogui as pa
pa.hotkey('alt','d') # Focus the URL input field in Google Chrome
pa.hotkey('f5') # Refresh the page in the browser
Typing Text
You can use PyAutoGUI and pyperclip to type text.
To avoid issues with switching between Japanese/English input modes, it's better to use the "copy to clipboard and paste" method.
import pyautogui as pa
import pyperclip
pyperclip.copy("hoge, hoge") # Copy text to clipboard
pa.hotkey('ctrl','v') # Paste the text
Locating an Image on the Screen and Clicking
PyAutoGUI can search for images on the screen.
If found, it returns the center coordinates of the image's location.
import pyautogui as pa
position_im = pa.locateOnScreen(img, confidence=0.8)
In this example, I defined a wait_disp
function to wait for an image to appear.
Here's how it's used:
import pyautogui as pa
def wait_disp(img, wait_time=10):
# Waits for an image to appear on the screen. Returns the image's location if found.
dt = 0.5
for i in range(int(wait_time/dt)):
position_im = pa.locateOnScreen(img, confidence=0.8)
if position_im:
return position_im
else:
time.sleep(dt)
return False
position_im = wait_disp('./hoge.png') # Search for hoge.png on the screen
pa.click(position_im, button='left', clicks=1) # Left-click the center of hoge.png
Launching Applications
You can use subprocess to launch specified applications.
For example, you can start Google Chrome with the following code.
chrome_path = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
subprocess.Popen(chrome_path)
Adjusting Application Display Position/Screen, Maximizing, etc.
Using win32gui, win32con, and win32api, you can adjust the display state of applications.
Sometimes, you may need to specify the window size if the buttons to be displayed are not visible in a small window.
I'll skip the detailed explanation here!
import win32gui
import win32con
import win32api
def get_chrome_window():
# Retrieves the handle of the most recently opened Chrome window.
chrome_windows = []
def enum_handler(hwnd, lParam):
if win32gui.IsWindowVisible(hwnd) and 'chrome' in win32gui.GetWindowText(hwnd).lower():
chrome_windows.append(hwnd)
win32gui.EnumWindows(enum_handler, None)
return chrome_windows[0] if chrome_windows else None
def move_to_main_screen(hwnd, wait_time=10):
# Switches the display monitor to the main device.
dt = 0.5
n = wait_time/dt
count=0
while count<=n:
monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow(hwnd, win32con.MONITOR_DEFAULTTONEAREST))
main_monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow(0, win32con.MONITOR_DEFAULTTOPRIMARY))
if monitor_info['Device'] == main_monitor_info['Device']:
break
else:
# Switch the monitor displaying hwnd
win32gui.MoveWindow(hwnd, 0, 0, 500, 500, True) # Position the window at 500*500 from the top left corner
time.sleep(dt)
count +=1
# Adjusting the display window
chrome_window = None
while not chrome_window:
chrome_window = get_chrome_window()
time.sleep(0.5)
move_to_main_screen(chrome_window) # Open in the main screen
time.sleep(0.5)
win32gui.ShowWindow(chrome_window, win32con.SW_MAXIMIZE) # Maximize the window
What Operations Are Automated This Time?
This time, the operation being automated is the registration of an XML sitemap with Google Search Console.
Specifically, the operation involves the following steps:
-
Launch Google Chrome
-
Adjust the display window
- Open in the main window
- Maximize the window
-
Enter the Google Search Console URL in the URL field of Google Chrome and press the
Enter
key -
Confirm that the Google Search Console page has opened
- Find
sitemap1.png
on the screen
- Find
-
Enter the file name (sitemap.xml) in the specified field
- After clicking
sitemap1.png
to focus on the input field, press theTab
key - Copy the file name to the clipboard and paste it into the input field
- After clicking
-
Click the submit button (
sitemap2.png
) -
Confirm that the submission process is complete and refresh the page
- When
sitemap3.png
(the button to close the submission completion popup) appears, click it - Wait for a moment and then press the
F5
key
- When
Here are the images used as triggers for the automated operations.
What Does the Code Look Like?
Here's the code!
At the very least, replace the URL at the beginning with your own!
import pyautogui as pa
import pyperclip
import time
import subprocess
import win32gui
import win32con
import win32api
URL = "https://search.google.com/search-console/sitemaps?resource_id=https%3A%2F%2Fhitbug0.github.io%2F"
def wait_disp(img, wait_time=10):
# Waits for an image to appear on the screen. Returns the image's location if found.
dt = 0.5
for i in range(int(wait_time/dt)):
position_im = pa.locateOnScreen(img, confidence=0.8)
if position_im:
return position_im
else:
time.sleep(dt)
return False
def get_chrome_window():
# Retrieves the handle of the most recently opened Chrome window.
chrome_windows = []
def enum_handler(hwnd, lParam):
if win32gui.IsWindowVisible(hwnd) and 'chrome' in win32gui.GetWindowText(hwnd).lower():
chrome_windows.append(hwnd)
win32gui.EnumWindows(enum_handler, None)
return chrome_windows[0] if chrome_windows else None
def move_to_main_screen(hwnd, wait_time=10):
# Switches the display monitor to the main device.
dt = 0.5
n = wait_time/dt
count=0
while count<=n:
monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow
(hwnd, win32con.MONITOR_DEFAULTTONEAREST))
main_monitor_info = win32api.GetMonitorInfo(win32api.MonitorFromWindow(0, win32con.MONITOR_DEFAULTTOPRIMARY))
if monitor_info['Device'] == main_monitor_info['Device']:
break
else:
# Switch the monitor displaying hwnd
win32gui.MoveWindow(hwnd, 0, 0, 500, 500, True) # Position the window at 500*500 from the top left corner
time.sleep(dt)
count +=1
def main():
# Execute command to launch Google Chrome
chrome_path = "C:\\Program Files\\Google\\Chrome\\Application\\chrome.exe"
subprocess.Popen(chrome_path)
# Adjust the display window
time.sleep(0.5) # Wait time for Chrome to launch
chrome_window = None
while not chrome_window:
chrome_window = get_chrome_window()
time.sleep(0.5)
move_to_main_screen(chrome_window) # Open in the main screen
time.sleep(0.5)
win32gui.ShowWindow(chrome_window, win32con.SW_MAXIMIZE) # Maximize the window
time.sleep(0.5)
# Enter URL
pa.hotkey('alt','d')
time.sleep(2)
pyperclip.copy(URL)
pa.hotkey('ctrl','v')
time.sleep(0.05)
pa.hotkey('enter')
# Find and click sitemap1.png on the screen
time.sleep(0.05)
position_im = wait_disp('./contents/img/sitemap1.png')
pa.click(position_im, button='left', clicks=1)
# Enter file name and submit
pa.hotkey('tab')
time.sleep(0.05)
pyperclip.copy("sitemap.xml")
pa.hotkey('ctrl','v')
time.sleep(0.05)
position_im = wait_disp('./contents/img/sitemap2.png')
pa.click(position_im, button='left', clicks=1)
# Refresh page after submission is complete
position_im = wait_disp('./contents/img/sitemap3.png', wait_time=40)
time.sleep(0.05)
pa.click(position_im, button='left', clicks=1)
time.sleep(1)
pa.hotkey('f5')
main()
What Does the Execution Result Look Like?
I'll upload a video soon!
Summary
Using the example of "registering an XML file with Google Search Console," I introduced a method to automate screen operations with Python.
This method can be useful in scenarios like "performing the same operation repeatedly with software created by historical figures."
I have personally used this method for work several times, so if you find it useful, please give it a try!