Quantcast
Viewing latest article 1
Browse Latest Browse All 2

Answer by vezunchik for Web Crawler not following the links

The main error is that you have css function and xpath selector for next_page:

next_page = response.css("//a[@class='btn-next btn']/@href").get()

The next problem is that you have yielding request of next page inside for cycle. This will lead to calling a lot of duplicate request.

So I suppose these changes:

def parse(self, response):    name1 = input(" Please enter input :     ")    name1 = name1.lower()    links = response.xpath("//div[@class='media-list']/article/a/@href").extract()    headers = response.xpath('//div[@class="media-body"]/h5/text()').extract()    headers1 = [c.strip().lower() for c in headers]    # my changes since this moment:    raw_data = zip(headers1, links)    # use less variables in loop (yes, just cosmetic, but your code will more readable)    for header, link in raw_data:        if name1 in header:            yield {'page': response.url, 'title': header, 'link': link}    # use proper selector here    next_page = response.css("a.btn-next::attr(href)").get()    # move all this block out of for loop    if next_page:        yield response.follow(next_page)

Viewing latest article 1
Browse Latest Browse All 2

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>