Add links to queue
After we gather the links from a webpage, we need to add them to the queue so they can be crawled as well. In this section we will write a function that will add new links to the queue.
The code that goes in the spider.py file, under the Spider class:
@staticmethod
def add_links_to_queue(links): # the function that will take a set of links and add them to the waiting list
for url in links: # loops through the set
if (url in Spider.queue) or (url in Spider.crawled): # checks if links are already in the waiting or the crawled list
continue
if Spider.domain_name != get_domain_name(url): # checks if the domain name is present in the URL.
#This ensures that the crawler will crawl only pages on the targeted website, and not the external links present on the website.
continue
Spider.queue.add(url) # adds link to the waiting list
@staticmethod
def update_files(): # updates the files
set_to_file(Spider.queue, Spider.queue_file)
set_to_file(Spider.crawled, Spider.crawled_file)