[Python3.6.*]Scrapy安装和使用

Don’t use the python-scrapy package provided by Ubuntu, they are typically too old and slow to catch up with latest Scrapy.

本文所使用的Scrapy安装环境如下:
OS: Ubuntu 16.04 Desktop 64bits
Python Version: 3.6.1

安装

1.直接按照Scrapy的安装文档(pip install Scrapy)进行安装,并运行quotesSpider.py程序后,出现了下面的错误信息:

1
2
Could not find a version that satisfies the requirement Twisted>=13.1.0 (from scrapy) (from versions: )
No matching distribution found for Twisted>=13.1.0 (from scrapy)

网上查阅,发现是Python3.6.*在编译安装之前没有安装libbz2-dev
详细的说明和解决方案在这里: 简单说就是安装libbz2-dev(sudo apt-get install libbz2-dev),然后重新编译安装Python3.6.*
2.完成步骤1之后,运行quotesSpider.py程序,又出现了下面的错误:

1
ImportError: No module named _sqlite3

网上查阅,发现还是Python3.6.*的问题,需要先安装libsqlite3-dev(sudo apt-get install libsqlite3-dev),然后重新编译安装Python3.6.*

推荐一并安装下面的包:

1
2
3
$ sudo apt-get install libreadline6-dev    # 上下左右方向键无法使用
$ sudo apt-get install openssl    # pip无法使用https源
$ sudo apt-get install libssl-dev    # pip无法使用https源

执行上述操作后,再统一重新编译安装Python3.6.*(每次编译安装的时间比较长)。

1
2
3
$ ./configure --enable-optimizations
$ make
$ sudo make install

示例代码(quotesSpider.py)

解决上面的两个问题之后,就可以正确运行quotesSpider.py程序了,具体的源代码如下。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
#!/usr/bin/env python3
# coding: utf-8
# File: quotesSpider.py
# Author: lxw
# Date: 4/15/17 9:22 PM

import scrapy


class QuotesSpider(scrapy.Spider):
name = "quotes"
start_urls = ["http://quotes.toscrape.com/tag/humor/"]

def parse(self, response):
for quote in response.css("div.quote"):
yield {
"text": quote.css("span.text::text").extract_first(),
"author": quote.css("span small.author::text").extract_first(),
}
next_page = response.css('li.next a::attr("href")').extract_first()
if next_page:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)

运行

1
2
(py3.6.1scrapy1.3.3) lxw@lxw(00:05:04)scrapyDemo$ scrapy runspider quotesSpider.py -
o quotes.json

References

How Can I install Twisted + Scrapy on Python3.6 and CentOs
Installation guide