以下是一个简单的爬取优酷视频搜索结果前5页视频信息的示例代码:
import requests
from bs4 import BeautifulSoup
keyword = "中国好声音"
# 搜索关键词
headers = {
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/58.0.3029.110 Safari/537.3"}
for page in range(1, 6):
# 爬取前5页搜索结果
url = f"https://so.youku.com/search_video/q_{keyword}_orderby_2_page_{page}"
response = requests.get(url, headers=headers)
soup = BeautifulSoup(response.text, 'html.parser')
video_list = soup.find_all('div', class_='sk-result-list')[0].find_all('div', class_='item')
for video in video_list:
title = video.find('a', class_='title').text.strip()
link_url ="https:" +video.find('a', class_='title')['href']
time_info=video.select('.info-row > span:nth-child(1)')[0].text.strip()
print(f"标题:{title}\n链接:{link_url}\n时长:{time_info}\n")
输出结果如下:
标题:《中国好声音》第一期 [20210813]
链接:https://v.youku.com/v_show/id_XNTEzODg5MTc2MA==.html
时长:02:17:27
标题:《中国好声音》第二期 [20210820]
链接:https://v.youku.com/v_show/id_XNTEzOTUwMjQ1Mg==.html
时长:01:41:31
...