demand :

Crawling : All anchor details page information

Home page analysis

Analysis shows that the data is obtained through ajax Requested .

Analysis request header

Details page analysis

Details page and details page data url comparative analysis

After testing , It turns out that we just need to change '''userid''' Different data can be obtained by using the value of .

After the analysis, write the code

The complete code is as follows

import re
import requests
import json
import jsonpath
import pymongo
class VtaoSpider:
'referer': '',
'user-agent': 'Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/72.0.3626.119 Safari/537.36',
} db=None
def open(self):
' Connect to database '
self.db=client['trip'] def get_first_page(self):
' Get all the data on the home page '
for i in range(1,26): #25 Page data
' Process page '
'cateType': 602,
'currentPage': i,
'_ksTS': '1554971959356_87',
'_output_charset': 'UTF-8',
'_input_charset': 'UTF-8',
start_url='' first_data=requests.get(url=start_url,headers=self.headers,params=params)
# print(first_data.text)
return url_lst def get_detail_url(self):
' For details page url'
for response in response_list:
dd = response.text
d_dict = json.loads(dd)
detail_url = jsonpath.jsonpath(d_dict, '$..homeUrl')
#detail_url It's a list
# print(all_detail_url)
return all_detail_url def get_detail_data(self):
# print(url_list)
for url in url_list:
# print(detail_data_url) # Get response data
data = requests.get(url=detail_data_url, headers=self.headers).text
'rank':rank, }
# Store in database
if self.db['vtaobao'].insert(res_data):
print('save to mongo is successful!') except Exception as e:
print(e) if __name__ == '__main__':
# Database startup only needs to be performed once

A total of 450 Data , Namely 450 Information about an anchor !!!

This code is for using multi process , Multithreading , Crawling time is not what you want , Interested friends can refactor the code , Use Multi process , Multithreading , Share another wave , Let's learn , thank you !!!

