python爬蟲之連接mysql

ISherry 發(fā)布于2019-07-31 10:02 / 1255人閱讀

摘要：準(zhǔn)備工作運(yùn)行本地?cái)?shù)據(jù)庫服務(wù)器安裝建表連接數(shù)據(jù)庫用操作還是比較簡單的，如果有一點(diǎn)數(shù)據(jù)庫基礎(chǔ)的話，可以直接上手，最后一定不要忘了寫提交，不然數(shù)據(jù)只是緩存，存不到數(shù)據(jù)庫里完整示例爬取百度上最熱的幾個(gè)新聞標(biāo)題，并存儲到數(shù)據(jù)庫，太懶了沒寫注釋

準(zhǔn)備工作

運(yùn)行本地?cái)?shù)據(jù)庫服務(wù)器

    mysql -u root -p

安裝pymysql

    pip install pymysql

建表

CREATE DATABASE crawls;
// show databases; 
use db;

CREATE TABLE IF NOT EXISTS baiduNews("
       "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,"
       "ranking VARCHAR(30),"
       "title VARCHAR(60),"
       "datetime TIMESTAMP,"
       "hot VARCHAR(30));
// show tables;

pymysql連接數(shù)據(jù)庫

db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456", 
                    db="crawls", charset="utf8")
cursor = db.cursor()
cursor.execute(sql_query)
db.commit()

用python操作mysql還是比較簡單的，如果有一點(diǎn)數(shù)據(jù)庫基礎(chǔ)的話，可以直接上手，最后一定不要忘了寫commit提交，不然數(shù)據(jù)只是緩存，存不到數(shù)據(jù)庫里

完整示例

爬取百度上最熱的幾個(gè)新聞標(biāo)題，并存儲到數(shù)據(jù)庫，太懶了沒寫注釋-_- (確保本地mysql服務(wù)器已經(jīng)打開）

"""
Get the hottest news title on baidu page,
then save these data into mysql
"""
import datetime

import pymysql
from pyquery import PyQuery as pq
import requests
from requests.exceptions import ConnectionError

URL = "https://www.baidu.com/s?wd=%E7%83%AD%E7%82%B9"
headers = {
    "User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/70.0.3538.77 Safari/537.36",
    "Upgrade-Insecure-Requests": "1"
}

def get_html(url):
    try:
        response = requests.get(url, headers=headers)
        if response.status_code == 200:
            return response.text
        return None
    except ConnectionError as e:
        print(e.args)
        return None

def parse_html(html):
    doc = pq(html)
    trs = doc(".FYB_RD table.c-table tr").items()
    for tr in trs:
        index = tr("td:nth-child(1) span.c-index").text()
        title = tr("td:nth-child(1) span a").text()
        hot = tr("td:nth-child(2)").text().strip(""")
        yield {
            "index":index,
            "title":title,
            "hot":hot
        }

def save_to_mysql(items):
    try:
        db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456",
                             db="crawls", charset="utf8")
        cursor = db.cursor()
        cursor.execute("use crawls;")
        cursor.execute("CREATE TABLE IF NOT EXISTS baiduNews("
                       "id INT PRIMARY KEY NOT NULL AUTO_INCREMENT,"
                       "ranking VARCHAR(30),"
                       "title VARCHAR(60),"
                       "datetime TIMESTAMP,"
                       "hot VARCHAR(30));")
        try:
            for item in items:
                print(item)
                now = datetime.datetime.now()
                now = now.strftime("%Y-%m-%d %H:%M:%S")
                sql_query = "INSERT INTO baiduNews(ranking, title, datetime, hot) VALUES ("%s", "%s", "%s", "%s")" % (
                            item["index"], item["title"], now, item["hot"])
                cursor.execute(sql_query)
                print("Save into mysql")
            db.commit()
        except pymysql.MySQLError as e:
            db.rollback()
            print(e.args)
            return
    except pymysql.MySQLError as e:
        print(e.args)
        return

def check_mysql():
    try:
        db = pymysql.connect(host="localhost", port=3306, user="root", passwd="123456",
                             db="crawls", charset="utf8")
        cursor = db.cursor()
        cursor.execute("use crawls;")
        sql_query = "SELECT * FROM baiduNews"
        results = cursor.execute(sql_query)
        print(results)
    except pymysql.MySQLError as e:
        print(e.args)

def main():
    html = get_html(URL)
    items = parse_html(html)
    save_to_mysql(items)
    #check_mysql()

if __name__ == "__main__":
    main()

云服務(wù)器 GPU云服務(wù)器 python連接MySQL python35連接mysql python3連接mysql 連接之云服務(wù)器失敗

文章版權(quán)歸作者所有，未經(jīng)允許請勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請注明本文地址：http://www.ezyhdfw.cn/yun/43127.html

發(fā)表評論

登陸后可評論

0條評論

ISherry

男|高級講師

我要關(guān)注我要私信

TA的文章

Linux——Linux驅(qū)動之雜項(xiàng)設(shè)備（基本概念、注冊流程、雜項(xiàng)設(shè)備的驅(qū)動編寫）

閱讀 2025·2021-09-04 16:45
給Ant Design list列表增加滑動框功能

閱讀 836·2019-08-30 15:44
Bootstrap網(wǎng)格系統(tǒng)

閱讀 960·2019-08-30 13:07
css里的BFC的用法

閱讀 517·2019-08-29 16:06
Material-UI menuItem和NavLink組合使用時(shí)的樣式控制

閱讀 1437·2019-08-29 13:43
UCloud云主機(jī)CentOS 6.X下配置Keepalived VIP

閱讀 1427·2019-08-26 17:00
ES6 static相關(guān)

閱讀 1583·2019-08-26 13:51
面試的信心來源于過硬的基礎(chǔ)

閱讀 2356·2019-08-26 11:48

亚洲中字慕日产2020,大陆极品少妇内射AAAAAA,无码av大香线蕉伊人久久,久久精品国产亚洲av麻豆网站

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

python爬蟲之連接mysql

相關(guān)文章

爬蟲初級操作（二）

爬蟲初級操作（二）

Python入門網(wǎng)絡(luò)爬蟲之精華版

發(fā)表評論

0條評論

ISherry

男|高級講師

TA的文章

Linux——Linux驅(qū)動之雜項(xiàng)設(shè)備（基本概念、注冊流程、雜項(xiàng)設(shè)備的驅(qū)動編寫）

給Ant Design list列表增加滑動框功能

Bootstrap網(wǎng)格系統(tǒng)

css里的BFC的用法

Material-UI menuItem和NavLink組合使用時(shí)的樣式控制

UCloud云主機(jī)CentOS 6.X下配置Keepalived VIP

ES6 static相關(guān)

面試的信心來源于過硬的基礎(chǔ)

最新活動

資訊專欄INFORMATION COLUMN

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！

python爬蟲之連接mysql

相關(guān)文章

發(fā)表評論

0條評論

男|高級講師

TA的文章

最新活動

上云采購季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺、長期優(yōu)惠，快來選購！