by | January 7, 2020 | Development

Cleaning file system regularly manually is not good. Automate them!

Deleting files and folders manually is not an exciting task, as one may think. It makes sense to automate them.

Here comes Python to make our lives easier. Python is an excellent programming language for scripting. We are going to take advantage of Python to finish our task without any obstacle. First, you should know why Python is a good choice.

  • Python is an all-time favorite language for automating tasks
  • Less code compared to other programming languages
  • Python is compatible with all the operating systems. You can run the same code in Windows, Linux, and Mac.
  • Python has a module called os that helps us to interact with the operating system. We are going to use this module to complete our automation of deleting the files.

We can replace any annoying or repetitive system tasks using Python. Writing scripts for completing a specific system task is a cupcake if you know Python. Let’s look at the following use case.

Note: the following are tested on Python 3.6

Removing files/folders older than X days

Often you don’t need old logs, and you regularly need to clean them to make storage available. It could be anything and not just logs.

We have a method called stat in the os module that gives details of last access (st_atime), modification (st_mtime), and metadata modification (st_ctime) time. All the methods return time in seconds since the epoch. You can find more details about the epoch here.

We will use a method called os.walk(path) for traversing through the subfolders of a folder.

Follow the below steps to write code for the deletion files/folders based on the number of days.

  • Import the modules time, os, shutil
  • Set the path and days to the variables
  • Convert the number of days into seconds using time.time() method
  • Check whether the path exists or not using the os.path.exists(path) module
  • If the path exists, then get the list of files and folders present in the path, including subfolders. Use the method os.walk(path), and it will return a generator containing folders, files, and subfolders
  • Get the path of the file or folder by joining both the current path and file/folder name using the method os.path.join()
  • Get the ctime from the os.stat(path) method using the attribute st_ctime
  • Compare the ctime with the time we have calculated previously
  • If the result is greater than the desired days of the user, then check whether it is a file or folder. If it is a file, use the os.remove(path) else use the shutil.rmtree() method
  • If the path doesn’t exist, print not found message

Let’s see the code in detail.

# importing the required modules
import os
import shutil
import time

# main function
def main():

	# initializing the count
	deleted_folders_count = 0
	deleted_files_count = 0

	# specify the path
	path = "https://geekflare.com/PATH_TO_DELETE"

	# specify the days
	days = 30

	# converting days to seconds
	# time.time() returns current time in seconds
	seconds = time.time() - (days * 24 * 60 * 60)

	# checking whether the file is present in path or not
	if os.path.exists(path):
		
		# iterating over each and every folder and file in the path
		for root_folder, folders, files in os.walk(path):

			# comparing the days
			if seconds >= get_file_or_folder_age(root_folder):

				# removing the folder
				remove_folder(root_folder)
				deleted_folders_count  = 1 # incrementing count

				# breaking after removing the root_folder
				break

			else:

				# checking folder from the root_folder
				for folder in folders:

					# folder path
					folder_path = os.path.join(root_folder, folder)

					# comparing with the days
					if seconds >= get_file_or_folder_age(folder_path):

						# invoking the remove_folder function
						remove_folder(folder_path)
						deleted_folders_count  = 1 # incrementing count


				# checking the current directory files
				for file in files:

					# file path
					file_path = os.path.join(root_folder, file)

					# comparing the days
					if seconds >= get_file_or_folder_age(file_path):

						# invoking the remove_file function
						remove_file(file_path)
						deleted_files_count  = 1 # incrementing count

		else:

			# if the path is not a directory
			# comparing with the days
			if seconds >= get_file_or_folder_age(path):

				# invoking the file
				remove_file(path)
				deleted_files_count  = 1 # incrementing count

	else:

		# file/folder is not found
		print(f'"{path}" is not found')
		deleted_files_count  = 1 # incrementing count

	print(f"Total folders deleted: {deleted_folders_count}")
	print(f"Total files deleted: {deleted_files_count}")


def remove_folder(path):

	# removing the folder
	if not shutil.rmtree(path):

		# success message
		print(f"{path} is removed successfully")

	else:

		# failure message
		print(f"Unable to delete the {path}")



def remove_file(path):

	# removing the file
	if not os.remove(path):

		# success message
		print(f"{path} is removed successfully")

	else:

		# failure message
		print(f"Unable to delete the {path}")


def get_file_or_folder_age(path):

	# getting ctime of the file/folder
	# time will be in seconds
	ctime = os.stat(path).st_ctime

	# returning the time
	return ctime


if __name__ == '__main__':
	main()

You need to adjust the following two variables in the above code based on the requirement.

days = 30 
path = "https://geekflare.com/PATH_TO_DELETE"

Removing files larger than X GB

Let’s search for the files that are larger than a particular size and delete them. It is similar to the above script. In the previous script, we have taken age as a parameter, and now we will take size as a parameter for the deletion.

# importing the os module
import os

# function that returns size of a file
def get_file_size(path):

	# getting file size in bytes
	size = os.path.getsize(path)

	# returning the size of the file
	return size


# function to delete a file
def remove_file(path):

	# deleting the file
	if not os.remove(path):

		# success
		print(f"{path} is deleted successfully")

	else:

		# error
		print(f"Unable to delete the {path}")


def main():
	# specify the path
	path = "ENTER_PATH_HERE"

	# put max size of file in MBs
	size = 500

	# checking whether the path exists or not
	if os.path.exists(path):

		# converting size to bytes
		size = size * 1024 * 1024

		# traversing through the subfolders
		for root_folder, folders, files in os.walk(path):

			# iterating over the files list
			for file in files:
				
				# getting file path
				file_path = os.path.join(root_folder, file)

				# checking the file size
				if get_file_size(file_path) >= size:
					# invoking the remove_file function
					remove_file(file_path)
			
		else:

			# checking only if the path is file
			if os.path.isfile(path):
				# path is not a dir
				# checking the file directly
				if get_file_size(path) >= size:
					# invoking the remove_file function
					remove_file(path)


	else:

		# path doesn't exist
		print(f"{path} doesn't exist")

if __name__ == '__main__':
	main()

Adjust the following two variables.

path = "ENTER_PATH_HERE" 
size = 500

Removing files with a specific extension

There might be a scenario where you want to delete files by their extension types. Let’s say .log file. We can find the extension of a file using the os.path.splitext(path) method. It returns a tuple containing the path and the extension of the file.

# importing os module
import os

# main function
def main():
    
    # specify the path
    path = "PATH_TO_LOOK_FOR"
    
    # specify the extension
    extension = ".log"
    
    # checking whether the path exist or not
    if os.path.exists(path):
        
        # check whether the path is directory or not
        if os.path.isdir(path):
        
            # iterating through the subfolders
            for root_folder, folders, files in os.walk(path):
                
                # checking of the files
                for file in files:

                    # file path
                    file_path = os.path.join(root_folder, file)

                    # extracting the extension from the filename
                    file_extension = os.path.splitext(file_path)[1]

                    # checking the file_extension
                    if extension == file_extension:
                        
                        # deleting the file
                        if not os.remove(file_path):
                            
                            # success message
                            print(f"{file_path} deleted successfully")
                            
                        else:
                            
                            # failure message
                            print(f"Unable to delete the {file_path}")
        
        else:
            
            # path is not a directory
            print(f"{path} is not a directory")
    
    else:
        
        # path doen't exist
        print(f"{path} doesn't exist")

if __name__ == '__main__':
    # invoking main function
    main()

Don’t forget to update the path and extension variable in the above code to meet your requirements.

I would suggest testing the scripts in the NON PRODUCTION environment. Once you are satisfied with the results, you can schedule through cron (if using Linux) to run it periodically for maintenance work. Python is great to achieve this stuff and if interested in learning to do more then check out this Udemy course.

Tags:

More Great Reading on Geekflare