股票分析及买入建议python (板块分类申万)

对A股全部股票进行板块分析,首先需要最新的历史行情数据,笔者的这篇文章《Python - 快速处理通达信*载下**的A股历史行情数据(完整代码)》讲述了如何*载下**A股全部个股的历史行情数据,并提供了对应的操作过程视频,点击这里可观看。另外,还需要业内公认的板块分类数据。板块分类常见的有证监会发布的分类(通达信软件都提供了这种分类),还有门户网站也有自己的分类,或使用业内常用的申万分类,本文就介绍如何从某浪抓取A股全部股票的申万分类信息,相关操作指导视频《python爬取申万股票分类数据》,本文不再赘述,只做简单说明,并提供完整可运行的代码。

A股按申万一级分类的信息

某浪股票分类下的申万分类如下图:

板块分类申万,股票分析及买入建议python

申万分类网页信息如下图,其中有类似sw1_730000、sw2_460800、sw3_461103这样的文字串:

板块分类申万,股票分析及买入建议python

个股列表网页如下图,

板块分类申万,股票分析及买入建议python

可以抓取个股的信息及其说明如下:

"symbol":"sz002281", (市场代码)

"code":"002281", (股票代码)

"name":"\u5149\u8fc5\u79d1\u6280", (股票名称,十六进制编码)

"trade":"22.740", (最新成交价)

"pricechange":-0.29, (与昨日相比的涨跌值)

"changepercent":-1.259, (与昨日相比的涨跌百分比)

"buy":"22.740", (买一价)

"sell":"22.750", (卖一价)

"settlement":"23.030", (昨日收盘价)

"open":"23.050", (今日开盘价)

"high":"23.220", (最高价)

"low":"22.670", (最低价)

"volume":6874488, (成交量)

"amount":157353968, (成交额)

"ticktime":"15:00:03", (发布时间)

"pb":2.905, (市净率)

"mktcap":1590455.879532, (总市值)

"nmc":1507701.365214, (流通市值)

"turnoverratio":1.03685, (换手率)

个股列表网页提供的个股信息很多,甚至有些信息在网页浏览状态下并未显示出来。上述内容仅供深入学习时参考,不感兴趣就直接上代码运行,看结果!

完整代码

import requests
from bs4 import BeautifulSoup
import re
from operator import itemgetter
import time
import random
import pandas as pd

def remove_col(arr, ith):
    itg = itemgetter(*filter((ith).__ne__, range(len(arr[0]))))
    return list(map(list, map(itg, arr))) 

url = 'http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodes'
# http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodes

heads = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.90 Safari/537.36"}

resText = requests.get(url)

soup = BeautifulSoup(resText.content, features='lxml')  
s = soup.text

print('\n申万一级分类:')  
shw1 = s[s.find('swhy'):s.find('sw1_hy')]
shw1_cut = shw1[shw1.find('[['):shw1.find(']]')]
shw1_cut = re.sub(r'\[','',shw1_cut)
shw1_cut = re.sub(r'"','',shw1_cut)
shw1_list = shw1_cut.split(']')

shw1_list_split = []
for i in range(0,len(shw1_list)):
    item_split = shw1_list[i].split(',')
    if i == 0:        
        temp_str = item_split[0].encode('utf-8').decode('unicode_escape')
        item_split[0] = temp_str
    else:
        temp_str = item_split[1].encode('utf-8').decode('unicode_escape')
        item_split[1] = temp_str
        item_split = item_split[1:4]    
    shw1_list_split.append(item_split)   

result_shw1 = remove_col(shw1_list_split, 1)
print()
print('申万一级分类总数:',len(result_shw1))
print(result_shw1)
print()

## 申万一级分类及其各分类下的股票,
print('申万一级及其所属股票')
shw1_category_and_stocks = []
shw1_categorystock = []
for i in range(0,len(result_shw1)): 
    s2 = ''
    page_i = 1
    while True:
        # 实例: https://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodeData?page=1&num=500&sort=symbol&asc=1&node=sw1_270000&symbol=&_s_r_a=init
        url2 = 'http://vip.stock.finance.sina.com.cn/quotes_service/api/json_v2.php/Market_Center.getHQNodeData?page='+str(page_i)+'&num=200&sort=symbol&asc=1&node=' + result_shw1[i][1][0:11] + '&symbol=&_s_r_a=init'   
        #'http://vip.stock.finance.sina.com.cn/mkt/#sw2_730100' 
        # # print(url2,i,result_shw1[i][0],result_shw1[i][1][0:11])  
        print(i,result_shw1[i][0],result_shw1[i][1][0:11])
        
        resText2 = requests.get(url2)
        soup2 = BeautifulSoup(resText2.content, features='lxml')          
        if len(soup2.text) > 10:
            current_s = soup2.text            
            s2 = s2 + current_s # '\n,'+
            page_i = page_i + 1
        else:
            break

    print('------------------------------------------------------')
        
    resStr2 = re.sub(r'\[','',s2)
    resStr2 = re.sub(r'\]','',resStr2) 
    resStr2 = re.sub(r'{','',resStr2) 

    resStr2_list = resStr2.split('}')
    resStr2_list.pop() # 删除最后一个元素,由于split产生的空元素    
    

    shw_one_stocks = []    
    for j in range(0, len(resStr2_list)):    
        singlestock_info = resStr2_list[j].split(',')         
        if len(singlestock_info) == 20:        
            rst = [[x for x in ss.split(':')] for ss in singlestock_info]                               
            shw_one_stocks.append([rst[0][1][0:len(rst[0][1])],rst[1][1][0:len(rst[1][1])],rst[2][1][0:len(rst[2][1])].encode('utf-8').decode('unicode_escape')])
            shw1_categorystock.append([result_shw1[i][0][0:len(result_shw1[i][0])],
                                       result_shw1[i][1][0:len(result_shw1[i][1])],
                                       rst[0][1][1:len(rst[0][1])-1],rst[1][1][1:len(rst[1][1])-1],
                                       rst[2][1][1:len(rst[2][1])-1].encode('utf-8').decode('unicode_escape'),
                                       rst[-15][1], # "changepercent", round(float(rst[-15][1]),2)
                                       round(float(rst[-3][1]),2), # 总市值
                                       round(float(rst[-2][1]),2), # 流通市值
                                       rst[-1][1] # 换手率
                                       ])
        else: 
            rst = [[x for x in ss.split(':')] for ss in singlestock_info]                              
            
            shw_one_stocks.append([rst[1][1][0:len(rst[1][1])],rst[2][1][0:len(rst[2][1])],rst[3][1][0:len(rst[3][1])].encode('utf-8').decode('unicode_escape')])
            shw1_categorystock.append([result_shw1[i][0][0:len(result_shw1[i][0])],
                                       result_shw1[i][1][0:len(result_shw1[i][1])], 
                                       rst[1][1][1:len(rst[1][1])-1],rst[2][1][1:len(rst[2][1])-1],
                                       rst[3][1][1:len(rst[3][1])-1].encode('utf-8').decode('unicode_escape'),
                                       rst[-15][1], # "changepercent", round(float(rst[-15][1]),2)
                                       round(float(rst[-3][1]),2), # 总市值
                                       round(float(rst[-2][1]),2),  # 流通市值
                                       rst[-1][1] # 换手率
                                       ])

    tmp_removequotes = [result_shw1[i][0][0:len(result_shw1[i][0])],result_shw1[i][1][0:len(result_shw1[i][1])]]
    shw1_category_and_stocks.append([tmp_removequotes,shw_one_stocks])
    time.sleep(random.randint(1,6)) #随机暂停秒数,防止抓取页面密集访问网站而被封



# print('========显示前5条内容==============================')
for i in range(0,5): # len(shw1_category_and_stocks)
    print(shw1_category_and_stocks[i][0])
    print(shw1_category_and_stocks[i][1])
    print()

print()
for i in range(0,5): # len(shw1_categorystock)   
    print(shw1_categorystock[i])

print()
print('申万一级分类总数:',len(result_shw1))
print('申万一级分类总数(包括各分类的股票):',len(shw1_categorystock))

# 申万一级和二级分类数据写入文本文件
shw1_category = [x[0][0] for x in shw1_category_and_stocks] 
shw1_code = [x[0][1] for x in shw1_category_and_stocks] 
dict1 = {'shw1_code': shw1_code,'shw1_category': shw1_category} 
df1 = pd.DataFrame(dict1) 
df1.to_csv('shenwan1_category.csv',index = False) # 申万一级分类文件

shw1_category_code =  [x[1] for x in shw1_categorystock] 
shw1_category_name =  [x[0] for x in shw1_categorystock] 
shw1_category_mktcode =  [x[2] for x in shw1_categorystock] 
shw1_stock_code =  [x[3] for x in shw1_categorystock] 
shw1_stock_name =  [x[4] for x in shw1_categorystock] 
shw1_stock_changepercent =  [x[5] for x in shw1_categorystock] 
stock_mktcap = [x[6] for x in shw1_categorystock] 
stock_nmc = [x[7] for x in shw1_categorystock] 
stock_hsl = [x[8] for x in shw1_categorystock] 
dict2 = {'shw1_code': shw1_category_code,'category_name': shw1_category_name,'category_mktcode':shw1_category_mktcode,\
         'stock_code':shw1_stock_code,'stock_name':shw1_stock_name,'stock_changepercent':shw1_stock_changepercent,\
         'stock_mktcap':stock_mktcap,'stock_nmc':stock_nmc,'stock_hsl':stock_hsl}  # 
df2 = pd.DataFrame(dict2) 
df2.to_csv('shenwan1_category_stocks.csv',index = False) # 申万二级分类文件

以上代码运行中,部分输出结果:

  • 抓取网页过程中,31个申万一级分类中的个股顺序

板块分类申万,股票分析及买入建议python

  • 申万一级分类及其所属个股

板块分类申万,股票分析及买入建议python

  • 申万一级分类信息文件, shenwan1_category.csv,如下图:

板块分类申万,股票分析及买入建议python

  • 申万一级分类及其所属股票信息文件,shenwan1_category_stocks.csv,可根据自己的需要获取相关属性值,如下图所示:

板块分类申万,股票分析及买入建议python

其中的文件头标识及其说明(这是自己定义的):

shw1_code (申万一级分类编码),

category_name(分类名称),

category_mktcode(市场代码股票代码),

stock_code(股票代码),

stock_name(股票名称),

stock_changepercent(股价涨跌百分比),

stock_mktcap(总市值),

stock_nmc(流通市值),

stock_hsl(换手率)。

抓取新浪财经申万二级分类信息的完整代码,在文章《A股行业申万一级和二级分类(含抓取新浪财经的python代码)》中,点击这里查看。

本文完。

(后续将发布《板块分析2/2 - 如何根据板块成交额的日数据变化判断板块轮动》)。