Quantcast
Channel: Web Crawler not following the links - Stack Overflow
Viewing all articles
Browse latest Browse all 2

Web Crawler not following the links

$
0
0

I want to crawl news site using Scrapy. The code retrieved related news from current link but not following the next page links. The news site has following link property

enter image description here

The code I am following :

import scrapy class fakenews(scrapy.Spider):    name = "bb8"    allowed_domains = ["snopes.com"]    start_urls = ["https://www.snopes.com/fact-check/category/science/"    ]    custom_settings = {'FEED_URI': "fakenews_%(time)s.csv",'FEED_FORMAT': 'csv'}    def parse(self, response):        name1 = input(" Please enter input :     ")        name1 = name1.lower()        links =response.xpath("//div[@class='media-list']/article/a/@href").extract()        headers = response.xpath('//div[@class="media-body"]/h5/text()').extract()        headers1 = [c.strip().lower() for c in headers]        raw_data=zip(headers1,links)        for header, link in raw_data:            p = header            l=link            if name1 in p:                scrap_info3 = {'page': response.url, 'title': header, 'link':l}                yield scrap_info3                next_page = response.css("//a[@class='btn-next btn']/@href").get()                if next_page is not None:                    next_page = response.urljoin(next_page)                    yield scrapy.Request(next_page, callback=self.parse)

Though from current page it return information but also showing error.

enter image description here

For input I entered: NASA


Viewing all articles
Browse latest Browse all 2

Latest Images

Trending Articles



Latest Images