Automatically read the information displayed on other websites through the program , It's like a crawler . Let's say we have a system , To extract BaiDu Song search ranking on the website . The analysis system is analyzing the data according to the obtained data . Provide reference data for business .
In order to fulfill the above requirements , We need to simulate the browser browsing the web , Get the data of the page and analyze it , Finally, the structure of the analysis , That is, the sorted data is written into the database . So our idea is :
1、 send out HttpRequest request .
2、 receive HttpResponse The result returned . Get a page specific html Source file .
3、 Take out the part of the source code that contains the data .
4、 according to html Source code generation HtmlDocument, Loop out data .
5、 Write to database .

The code is as follows :

// according to Url The address gets the page's html Source code
        private string GetWebContent(string Url)
            string strResult="";
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);

// Make a statement HttpWebRequest request
                request.Timeout = 30000;
                // Set connection timeout
                request.Headers.Set("Pragma", "no-cache");
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                Stream streamReceive = response.GetResponseStream();
                Encoding encoding = Encoding.GetEncoding("GB2312");
                StreamReader streamReader = new StreamReader(streamReceive, encoding);
                strResult = streamReader.ReadToEnd();
                MessageBox.Show(" error ");
            return strResult;
In order to use HttpWebRequest and HttpWebResponse, You need to fill in the name space reference
using System.Net;

The following is the specific implementation process of the program :
 private void button1_Click(object sender, EventArgs e)
            // To grab URL Address
            string Url = "";

// Get the specified Url Source code
string strWebContent = GetWebContent(Url);

richTextBox1.Text = strWebContent;
// Take out the source code related to the data
            int iBodyStart = strWebContent.IndexOf("<body", 0);
            int iStart = strWebContent.IndexOf(" song TOP500", iBodyStart);
            int iTableStart = strWebContent.IndexOf("<table", iStart);
            int iTableEnd = strWebContent.IndexOf("</table>", iTableStart);
            string strWeb = strWebContent.Substring(iTableStart, iTableEnd - iTableStart + 8);

// Generate HtmlDocument
WebBrowser webb = new WebBrowser();
            HtmlDocument htmldoc = webb.Document.OpenNew(true);
            HtmlElementCollection htmlTR = htmldoc.GetElementsByTagName("TR");
            foreach (HtmlElement tr in htmlTR)
                string strID = tr.GetElementsByTagName("TD")[0].InnerText;
                string strName = SplitName(tr.GetElementsByTagName("TD")[1].InnerText, "MusicName");
                string strSinger = SplitName(tr.GetElementsByTagName("TD")[1].InnerText, "Singer");
                strID = strID.Replace(".", "");
                // Insert DataTable
                AddLine(strID, strName, strSinger,"0");

string strID1 = tr.GetElementsByTagName("TD")[2].InnerText;
                string strName1 = SplitName(tr.GetElementsByTagName("TD")[3].InnerText, "MusicName");
                string strSinger1 = SplitName(tr.GetElementsByTagName("TD")[3].InnerText, "Singer");
                // Insert DataTable
                strID1 = strID1.Replace(".", "");
                AddLine(strID1, strName1, strSinger1,"0");

string strID2 = tr.GetElementsByTagName("TD")[4].InnerText;
                string strName2 = SplitName(tr.GetElementsByTagName("TD")[5].InnerText, "MusicName");
                string strSinger2 = SplitName(tr.GetElementsByTagName("TD")[5].InnerText, "Singer");
                // Insert DataTable
                strID2 = strID2.Replace(".", "");
                AddLine(strID2, strName2, strSinger2,"0");

            // Insert database

            dataGridView1.DataSource = dt.DefaultView;

