Create a new project

We will start with creating a function that will create a folder for each new website. The function will be created in the file called general.py that will contain all the shared functions that we will use in our program.

Inside the created folder we will keep two files:

  • queue.txt – the list of links that need to be crawled.
  • crawled.txt – the list of links that have already been crawled.

These two files will ensure that we do not crawl the same web page multiple times. To create a directory, we will use the Python method os.makedirs(). Here is the code you need to write in the general.py file:

import os # imports the module for manipulating files

def create_project_dir(directory): # creates a folder for each new website
    if not os.path.exists(directory): # creates a folder only if it doesn't already exist
        print ('Creating project ' + directory)
        os.makedirs(directory)
Geek University 2022