I want to crawl news site using Scrapy. The code retrieved related news from current link but not following the next page links. The news site has following link property
Image may be NSFW.
Clik here to view.
The code I am following :
import scrapy class fakenews(scrapy.Spider): name = "bb8" allowed_domains = ["snopes.com"] start_urls = ["https://www.snopes.com/fact-check/category/science/" ] custom_settings = {'FEED_URI': "fakenews_%(time)s.csv",'FEED_FORMAT': 'csv'} def parse(self, response): name1 = input(" Please enter input : ") name1 = name1.lower() links =response.xpath("//div[@class='media-list']/article/a/@href").extract() headers = response.xpath('//div[@class="media-body"]/h5/text()').extract() headers1 = [c.strip().lower() for c in headers] raw_data=zip(headers1,links) for header, link in raw_data: p = header l=link if name1 in p: scrap_info3 = {'page': response.url, 'title': header, 'link':l} yield scrap_info3 next_page = response.css("//a[@class='btn-next btn']/@href").get() if next_page is not None: next_page = response.urljoin(next_page) yield scrapy.Request(next_page, callback=self.parse)
Though from current page it return information but also showing error.
Image may be NSFW.
Clik here to view.
For input I entered: NASA