Give the crawler information
We will now write the code that will give the spider some basic information, such as the name of the project, base URL, and such. The project name and the base URL are the only information that will be provided by the user. Here is the code needs to be added to the spider.py file, under the Spider class:
def __init__(self, project_name, base_url, domain_name): Spider.project_name = project_name # sets the value for the class variable, so that all spiders have the same information Spider.base_url = base_url Spider.domain_name = domain_name Spider.queue_file = Spider.project_name + '/queue.txt' # defines the path for the queue file Spider.crawled_file = Spider.project_name + '/crawled.txt' self.boot() # the method that will create the project directory and the data files self.crawl_page('First spider', Spider.base_url) # the method that will start the page crawling and print the message to the user