python爬蟲(chóng)實(shí)戰(zhàn)，requests模塊，Python實(shí)現(xiàn)抓取頭條街拍美圖

番茄西紅柿發(fā)布于2021-11-29 10:50 / 3454人閱讀

摘要：前言利用爬取的是今日頭條中的街拍美圖。詳細(xì)瀏覽器信息獲取文章鏈接相關(guān)代碼街拍獲取失敗這里需要提一下模塊的報(bào)錯(cuò)在對(duì)象上調(diào)用方法如果下載文件出錯(cuò)會(huì)拋出異常需要使用和語(yǔ)句將代碼行包裹起來(lái)處理這一錯(cuò)誤不讓程序崩潰。

前言

利用Python爬取的是今日頭條中的街拍美圖。廢話(huà)不多說(shuō)。

讓我們愉快地開(kāi)始吧~

開(kāi)發(fā)工具

Python版本： 3.6.4

相關(guān)模塊：

requests模塊；

re模塊；

以及一些Python自帶的模塊。

環(huán)境搭建

安裝Python并添加到環(huán)境變量，pip安裝需要的相關(guān)模塊即可。

詳細(xì)瀏覽器信息

獲取文章鏈接相關(guān)代碼：

import requestsimport jsonimport reheaders = {    'user-agent': 'Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/55.0.2883.87 Safari/537.36'}def get_first_data(offset):    params = {        'offset': offset,        'format': 'json',        'keyword': '街拍',        'autoload': 'true',        'count': '20',        'cur_tab': '1',        'from':'search_tab'    }    response = requests.get(url='https://www.toutiao.com/search_content/', headers=headers, params=params)    try:        response.raise_for_status()        return response.text    except Exception as exc:        print("獲取失敗")        return Nonedef handle_first_data(html):    data = json.loads(html)    if data and "data" in data.keys():        for item in data.get("data"):            yield item.get("article_url")

這里需要提一下requests模塊的報(bào)錯(cuò)，在response對(duì)象上調(diào)用 raise_for_status()方法，如果下載文件出錯(cuò)，會(huì)拋出異常，需要使用 try 和 except 語(yǔ)句將代碼行包裹起來(lái)，處理這一錯(cuò)誤，不讓程序崩潰。

另外附上requests模塊技術(shù)文檔網(wǎng)址：http://cn.python-requests.org/zh_CN/latest/

獲取圖片鏈接相關(guān)代碼：

def get_second_data(url):    if url:         try:            reponse = requests.get(url, headers=headers)            reponse.raise_for_status()            return reponse.text        except Exception as exc:            print("進(jìn)入鏈接發(fā)生錯(cuò)誤")            return Nonedef handle_second_data(html):    if html:        pattern = re.compile(r'gallery: JSON.parse/((.*?)/),', re.S)        result = re.search(pattern, html)        if result:            imageurl = []            data = json.loads(json.loads(result.group(1)))            if data and "sub_images" in data.keys():                sub_images = data.get("sub_images")                images = [item.get('url') for item in sub_images]                for image in images:                    imageurl.append(images)                return imageurl        else:            print("have no result")

獲取圖片相關(guān)代碼：

def download_image(imageUrl):    for url in imageUrl:        try:            image = requests.get(url).content        except:            pass        with open("images"+str(url[-10:])+".jpg", "wb") as ob:            ob.write(image)            ob.close()            print(url[-10:] + "下載成功！" + url)def main():    html = get_first_data(0)    for url in handle_first_data(html):        html = get_second_data(url)        if html:            result = handle_second_data(html)            if result:                try:                    download_image(result)                except KeyError:                    print("{0}存在問(wèn)題，略過(guò)".format(result))                    continueif __name__ == '__main__':    main()

最后下載成功

查看詳情

云服務(wù)器 GPU云服務(wù)器 python抓取爬蟲(chóng) python爬蟲(chóng)抓取數(shù)據(jù) python爬蟲(chóng)抓取圖片 python爬蟲(chóng)抓取文字

文章版權(quán)歸作者所有，未經(jīng)允許請(qǐng)勿轉(zhuǎn)載,若此文章存在違規(guī)行為，您可以聯(lián)系管理員刪除。

轉(zhuǎn)載請(qǐng)注明本文地址：http://www.ezyhdfw.cn/yun/125564.html

發(fā)表評(píng)論

登陸后可評(píng)論

0條評(píng)論

番茄西紅柿

男|高級(jí)講師

我要關(guān)注我要私信

TA的文章

tensor

閱讀 844·2023-04-25 19:43
Windows 下安裝 XGBoost

閱讀 4109·2021-11-30 14:52
Hadoop 2.6.0 啟動(dòng)問(wèn)題 lib/native/libhadoop.so which mi

閱讀 3919·2021-11-30 14:52
VmShell：黑五美國(guó)VPS,免費(fèi)先開(kāi)通測(cè)試,滿(mǎn)意后付款!支持tiktok美區(qū)

閱讀 4024·2021-11-29 11:00
百度智能云：云產(chǎn)品特惠福利,1核2G輕量應(yīng)用服務(wù)器僅48元/年

閱讀 3917·2021-11-29 11:00
Linux系統(tǒng)和寶塔面板如何啟用禁ping功能?

閱讀 4035·2021-11-29 11:00
301重定向怎么做?301重定向設(shè)置方法有幾種

閱讀 3752·2021-11-29 11:00
wordpress網(wǎng)站重定向次數(shù)過(guò)多的解決方法

閱讀 6597·2021-11-29 11:00

亚洲中字慕日产2020,大陆极品少妇内射AAAAAA,无码av大香线蕉伊人久久,久久精品国产亚洲av麻豆网站

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

python爬蟲(chóng)實(shí)戰(zhàn)，requests模塊，Python實(shí)現(xiàn)抓取頭條街拍美圖

前言

開(kāi)發(fā)工具

環(huán)境搭建

詳細(xì)瀏覽器信息

最后下載成功

查看詳情

相關(guān)文章

Python3網(wǎng)絡(luò)爬蟲(chóng)實(shí)戰(zhàn)---36、分析Ajax爬取今日頭條街拍美圖

Python3網(wǎng)絡(luò)爬蟲(chóng)實(shí)戰(zhàn)---35、 Ajax數(shù)據(jù)爬取

Python3網(wǎng)絡(luò)爬蟲(chóng)實(shí)戰(zhàn)---37、動(dòng)態(tài)渲染頁(yè)面抓取:Selenium

**node網(wǎng)絡(luò)爬蟲(chóng)實(shí)例了解下？**

Python爬蟲(chóng)實(shí)戰(zhàn)，requests模塊，Python實(shí)現(xiàn)抓取微博評(píng)論

發(fā)表評(píng)論

0條評(píng)論

番茄西紅柿

男|高級(jí)講師

TA的文章

tensor

Windows 下安裝 XGBoost

Hadoop 2.6.0 啟動(dòng)問(wèn)題 lib/native/libhadoop.so which mi

VmShell：黑五美國(guó)VPS,免費(fèi)先開(kāi)通測(cè)試,滿(mǎn)意后付款!支持tiktok美區(qū)

百度智能云：云產(chǎn)品特惠福利,1核2G輕量應(yīng)用服務(wù)器僅48元/年

Linux系統(tǒng)和寶塔面板如何啟用禁ping功能?

301重定向怎么做?301重定向設(shè)置方法有幾種

wordpress網(wǎng)站重定向次數(shù)過(guò)多的解決方法

最新活動(dòng)

資訊專(zhuān)欄INFORMATION COLUMN

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

python爬蟲(chóng)實(shí)戰(zhàn)，requests模塊，Python實(shí)現(xiàn)抓取頭條街拍美圖

前言

開(kāi)發(fā)工具

環(huán)境搭建

詳細(xì)瀏覽器信息

最后下載成功

查看詳情

相關(guān)文章

發(fā)表評(píng)論

0條評(píng)論

男|高級(jí)講師

TA的文章

最新活動(dòng)

上云采購(gòu)季！| 2核2G4M爆款云服務(wù)器低至59元/年，更有多臺(tái)、長(zhǎng)期優(yōu)惠，快來(lái)選購(gòu)！

python爬蟲(chóng)實(shí)戰(zhàn)，requests模塊，Python實(shí)現(xiàn)抓取頭條街拍美圖