[Python3.6.*]Scrapy安装和使用

Xiaowei Liu's Blog

2017-04-10

Python, Scrapy, Spider

Don’t use the python-scrapy package provided by Ubuntu, they are typically too old and slow to catch up with latest Scrapy.

本文所使用的Scrapy安装环境如下:
OS: Ubuntu 16.04 Desktop 64bits
Python Version: 3.6.1

安装

1.直接按照Scrapy的安装文档(pip install Scrapy)进行安装，并运行quotesSpider.py程序后，出现了下面的错误信息：

1 2	Could not find a version that satisfies the requirement Twisted>=13.1.0 (from scrapy) (from versions: ) No matching distribution found for Twisted>=13.1.0 (from scrapy)

经网上查阅，发现是Python3.6.*在编译安装之前没有安装libbz2-dev。
详细的说明和解决方案在这里：简单说就是安装libbz2-dev(sudo apt-get install libbz2-dev)，然后重新编译安装Python3.6.*。
2.完成步骤1之后，运行quotesSpider.py程序，又出现了下面的错误：

1	ImportError: No module named _sqlite3

经网上查阅，发现还是Python3.6.*的问题，需要先安装libsqlite3-dev(sudo apt-get install libsqlite3-dev)，然后重新编译安装Python3.6.*。

推荐一并安装下面的包：

1
2
3

$ sudo apt-get install libreadline6-dev    # 上下左右方向键无法使用
$ sudo apt-get install openssl    # pip无法使用https源
$ sudo apt-get install libssl-dev    # pip无法使用https源

执行上述操作后，再统一重新编译安装Python3.6.*(每次编译安装的时间比较长)。

1
2
3

$ ./configure --enable-optimizations
$ make
$ sudo make install

示例代码(`quotesSpider.py`)

解决上面的两个问题之后，就可以正确运行quotesSpider.py程序了，具体的源代码如下。

#!/usr/bin/env python3
# coding: utf-8
# File: quotesSpider.py
# Author: lxw
# Date: 4/15/17 9:22 PM

import scrapy


class QuotesSpider(scrapy.Spider):
    name = "quotes"
    start_urls = ["http://quotes.toscrape.com/tag/humor/"]

    def parse(self, response):
        for quote in response.css("div.quote"):
            yield {
                "text": quote.css("span.text::text").extract_first(),
                "author": quote.css("span small.author::text").extract_first(),
            }
        next_page = response.css('li.next a::attr("href")').extract_first()
        if next_page:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)

运行：

1 2	(py3.6.1scrapy1.3.3) lxw@lxw(00:05:04)scrapyDemo$ scrapy runspider quotesSpider.py - o quotes.json

References

How Can I install Twisted + Scrapy on Python3.6 and CentOs
Installation guide

安装

示例代码(quotesSpider.py)

References

示例代码(`quotesSpider.py`)