Add links to queue
After we gather the links from a webpage, we need to add them to the queue so they can be crawled as well. In this section we will write a function that will add new links to the queue.
The code that goes in the spider.py file, under the Spider class:
@staticmethod def add_links_to_queue(links): # the function that will take a set of links and add them to the waiting list for url in links: # loops through the set if (url in Spider.queue) or (url in Spider.crawled): # checks if links are already in the waiting or the crawled list continue if Spider.domain_name != get_domain_name(url): # checks if the domain name is present in the URL. #This ensures that the crawler will crawl only pages on the targeted website, and not the external links present on the website. continue Spider.queue.add(url) # adds link to the waiting list @staticmethod def update_files(): # updates the files set_to_file(Spider.queue, Spider.queue_file) set_to_file(Spider.crawled, Spider.crawled_file)