Processing and working with files is a necessary skill for Python developers. Glob (short for Global) is a powerful Python built-in module that allows you to easily find all the files matching a specified pattern in a directory tree. It is a handy tool that makes it easy to search and retrieve files in your file system without the need to navigate through directories manually. In this tutorial, we will explore how to use the glob module in Python.
Getting Started with Glob in Python
Glob is a standard Python module, so you don’t need to install anything to start using it. To get started, you need to import the glob module into your Python script:
Glob patterns are special patterns used to match filenames in a directory tree. They are similar to regular expressions but are much simpler to use. The following are some of the most commonly used globbing patterns:
*– matches any string of characters, including the empty string.
?– matches any single character.
[set]– matches any character in the specified set of characters. You can also specify ranges using the dash (-) character.
[!set]– matches any character that is not in the specified set of characters.
Basic Glob usage
Now that you know about globbing patterns, let’s look at how to use them with the glob module. The glob module provides a single function,
glob.glob(), which takes a glob pattern as its argument and returns a list of filenames that match the pattern.
import glob files = glob.glob('/path/to/files/*.txt') print(files)
In this example, the
glob() function returns a list of all the files in the
/path/to/files/ directory that have a
Sometimes, you may want to search for files not only in a single directory but also in its subdirectories. The glob module makes it easy to search for files recursively using the
** globbing pattern. The
** pattern matches any number of directories, including none, so it can be used to search for files in the current directory and all its subdirectories.
import glob files = glob.glob('/path/to/files/**/*.txt', recursive=True) print(files)
In this example, the
glob() function searches for all the files in the
/path/to/files/ directory and all its subdirectories that have a
Access multiple csv files with glob and pandas
Now imagine there is a folder with several csv files. Let’s access them using glob and add them to a Pandas dataframe
import pandas as pd import glob # Set path to the folder containing CSV files path = '/path/to/csv/files/*.csv' # Use glob to get a list of all CSV files in the folder files = glob.glob(path) # Initialize an empty DataFrame to store the combined data df = pd.DataFrame() # Loop through the files and concatenate them into a single DataFrame for file in files: temp_df = pd.read_csv(file) df = pd.concat([df, temp_df]) # Print the combined DataFrame print(df)
In this example, we first use the
glob() function to get a list of all CSV files in the folder. We then initialize an empty Pandas DataFrame to store the combined data. Finally, we loop through the list of files, read each one into a temporary DataFrame, and concatenate it with the main DataFrame using the
concat() function. At the end of the loop, we print the combined DataFrame to verify that all the data has been successfully loaded.
Note that if the CSV files have different column names or datatypes, you may need to specify additional arguments when reading them into Pandas using
pd.read_csv(). For example, you may need to use the
header arguments to ensure that all the data is correctly parsed.
Access multiple text files with glob
Reading text files in a folder using Glob in Python is very similar to reading CSV files, as shown in the previous example. Here’s an example of how you can use Glob to access all text files in a folder and read their contents:
import glob # Set path to the folder containing text files path = '/path/to/text/files/*.txt' # Use glob to get a list of all text files in the folder files = glob.glob(path) # Loop through the files and read their contents for file in files: with open(file, 'r') as f: contents = f.read() print(contents)
In this example, we first use the
glob() function to get a list of all text files in the folder. We then loop through the list of files and use the
open() function to read the contents of each file. The
with statement is used to automatically close the file when we’re done reading from it.
Note that the
open() function is used with the mode
'r' to open the file in read-only mode. You can also use the
readlines() method instead of
read() to read the file contents into a list of lines. Additionally, you can specify the encoding of the file if it’s not in the default UTF-8 encoding, by passing the encoding parameter to the
By default, the
glob() function returns the file paths in lexicographic order, which may not be the order you want. You can sort the list of file paths returned by
glob() using the
sorted() function if you need to process them in a specific order.
accessing and Sorting files using glob
You can sort the files returned by
glob() in various ways depending on your requirements. Here are some examples:
- Sort files by name: You can sort the files alphabetically by name using the
sorted()function. This is the default sorting behavior of
glob(). For example:
import glob # Get a list of all text files in the folder sorted by name files = sorted(glob.glob('/path/to/text/files/*.txt'))
2. Sort files by creation time: You can sort the files based on their creation time using the
os.path.getctime() function. For example:
import glob import os # Get a list of all text files in the folder sorted by creation time files = sorted(glob.glob('/path/to/text/files/*.txt'), key=os.path.getctime)
3. Sort files by modification time: You can sort the files based on their modification time using the
os.path.getmtime() function. For example:
import glob import os # Get a list of all text files in the folder sorted by modification time files = sorted(glob.glob('/path/to/text/files/*.txt'), key=os.path.getmtime)
Sort files by size: You can sort the files based on their size using the
os.path.getsize() function. For example:
import glob import os # Get a list of all text files in the folder sorted by size files = sorted(glob.glob('/path/to/text/files/*.txt'), key=os.path.getsize)
Note that in all the above examples, we first use
glob() to get a list of all text files in the folder, and then we use the
sorted() function to sort the list based on a specific sorting key. The
key parameter is set to a function that returns the value that we want to use for sorting. For example,
os.path.getctime() returns the creation time of a file.
The glob module in Python is a powerful tool that can save you a lot of time when searching for files in a directory tree. With its easy-to-use globbing patterns and recursive searching capabilities, it’s a great tool for any Python developer to have in their arsenal.