Automatically read the information displayed on other websites through the program , It's like a crawler . Let's say we have a system , To extract BaiDu Song search ranking on the website . The analysis system is analyzing the data according to the obtained data . Provide reference data for business .
In order to fulfill the above requirements , We need to simulate the browser browsing the web , Get the data of the page and analyze it , Finally, the structure of the analysis , That is, the sorted data is written into the database . So our idea is :
1、 send out HttpRequest request .
2、 receive HttpResponse The result returned . Get a page specific html Source file .
3、 Take out the part of the source code that contains the data .
4、 according to html Source code generation HtmlDocument, Loop out data .
5、 Write to database .

The code is as follows :

// according to Url The address gets the page's html Source code
        private string GetWebContent(string Url)
            string strResult="";
                HttpWebRequest request = (HttpWebRequest)WebRequest.Create(Url);

// Make a statement HttpWebRequest request
                request.Timeout = 30000;
                // Set connection timeout
                request.Headers.Set("Pragma", "no-cache");
                HttpWebResponse response = (HttpWebResponse)request.GetResponse();
                Stream streamReceive = response.GetResponseStream();
                Encoding encoding = Encoding.GetEncoding("GB2312");
                StreamReader streamReader = new StreamReader(streamReceive, encoding);
                strResult = streamReader.ReadToEnd();
                MessageBox.Show(" error ");
            return strResult;
In order to use HttpWebRequest and HttpWebResponse, You need to fill in the name space reference
using System.Net;

The following is the specific implementation process of the program :
 private void button1_Click(object sender, EventArgs e)
            // To grab URL Address
            string Url = "";

// Get the specified Url Source code
string strWebContent = GetWebContent(Url);

richTextBox1.Text = strWebContent;
// Take out the source code related to the data
            int iBodyStart = strWebContent.IndexOf("<body", 0);
            int iStart = strWebContent.IndexOf(" song TOP500", iBodyStart);
            int iTableStart = strWebContent.IndexOf("<table", iStart);
            int iTableEnd = strWebContent.IndexOf("</table>", iTableStart);
            string strWeb = strWebContent.Substring(iTableStart, iTableEnd - iTableStart + 8);

// Generate HtmlDocument
WebBrowser webb = new WebBrowser();
            HtmlDocument htmldoc = webb.Document.OpenNew(true);
            HtmlElementCollection htmlTR = htmldoc.GetElementsByTagName("TR");
            foreach (HtmlElement tr in htmlTR)
                string strID = tr.GetElementsByTagName("TD")[0].InnerText;
                string strName = SplitName(tr.GetElementsByTagName("TD")[1].InnerText, "MusicName");
                string strSinger = SplitName(tr.GetElementsByTagName("TD")[1].InnerText, "Singer");
                strID = strID.Replace(".", "");
                // Insert DataTable
                AddLine(strID, strName, strSinger,"0");

string strID1 = tr.GetElementsByTagName("TD")[2].InnerText;
                string strName1 = SplitName(tr.GetElementsByTagName("TD")[3].InnerText, "MusicName");
                string strSinger1 = SplitName(tr.GetElementsByTagName("TD")[3].InnerText, "Singer");
                // Insert DataTable
                strID1 = strID1.Replace(".", "");
                AddLine(strID1, strName1, strSinger1,"0");

string strID2 = tr.GetElementsByTagName("TD")[4].InnerText;
                string strName2 = SplitName(tr.GetElementsByTagName("TD")[5].InnerText, "MusicName");
                string strSinger2 = SplitName(tr.GetElementsByTagName("TD")[5].InnerText, "Singer");
                // Insert DataTable
                strID2 = strID2.Replace(".", "");
                AddLine(strID2, strName2, strSinger2,"0");

            // Insert database

            dataGridView1.DataSource = dt.DefaultView;

Web Web data capture (C/S) More articles about

  1. Reptiles ---selenium Dynamic web data capture

    Dynamic web data capture What is? AJAX: AJAX(Asynchronouse JavaScript And XML) asynchronous JavaScript and XML. Through a small amount of data exchange with the server in the background ,Ajax You can make web pages ...

  2. Web data grabbing tool ,webscraper The simplest data capture tutorial , Everyone can use it

    Web Scraper It's a free one , For ordinary users ( No need to be professional IT technology ) My crawler tool , You can easily get the data you want through mouse and simple configuration . For example, Zhihu answer list . Weibo is popular . Weibo comments . TaoBao . Tmall . Amazon and other e-businesses ...

  3. Android Sign in client, Verification code acquisition , Web data capture and analysis ,HttpWatch Basic use

    Hello everyone , I am a M1ko. In the age of Internet , Suppose a App No internet access . So this App There won't be a long life cycle , therefore Android Network programming is every Android Necessary skills for developers . The bloggers are college students , self-taught ...

  4. 【Android My blog APP】1. Grab the list of articles on the front page of the blog —— Web data capture

    I'm going to be my own blog in the blog Garden APP, First of all, you should be able to access the home page to get the data and the list of articles on the home page , The first step is to capture the content of the list of articles on the front page of the blog , The millet 2S The effect picture on the screen is as follows : The train of thought is : Through the preparation of the tool class to visit the web page , Get the page source code , ...

  5. Web data capture (B/S)

    C# Grab web content ( turn ) 1. Grab the general content You need three classes :WebRequest.WebResponse.StreamReader Required namespace :System.Net.System.IO Core code : We ...

  6. delphi use idhttp do web Page data capture matters needing attention

    It's not discussed here webbrowse Way . Direct adoption indy Of idhttp  Get post Can be very convenient access to web data . But if you want to grab a lot of data It's not so easy for a program to run stably without crashing . Many similar tools have been made in recent years ...

  7. java Examples of Web data capture

    In many industries , To classify and summarize industry data , Timely analysis of industry data , For the future development of the company , There's a good reference and horizontal contrast . therefore , At work , We may come across the concept of data collection , The ultimate goal of data acquisition is to obtain data , Extract useful data ...

  8. Python Reptiles - Dynamic web data capture

    What is? AJAX: AJAX(Asynchronouse JavaScript And XML) asynchronous JavaScript and XML. Through a small amount of data exchange with the server in the background ,Ajax Asynchronous update of web pages . This means ...

  9. python Web data capture of the introductory practice

    This is good . It's a good time to learn how to use . 1. It uses feedparser: skill : Use Universal Feed Parser Control RSS ...

Random recommendation

  1. [ turn ] consider PHP 5.0~5.6 Compatibility of different versions cURL Upload files

    FROM : A recent need to do , To pass the PHP call cURL, With multipart/form-data Format upload file . ...

  2. spark spark presto2.0 Calculation engine http://blog ...

  3. Gaussian Discriminant Analysis

    If in our classification problem , Input characteristics $x$ It's a continuous random variable , Gaussian discriminant model (Gaussian Discriminant Analysis,GDA) I can use it . Take the binary classification problem as an example , The model is built as follows : ...

  4. Java The actual combat 03Spring-03Spring The core of the game AOP(Aspect Oriented Programming Section oriented programming )

    3、 ... and .Spring The core of the game AOP(Aspect Oriented Programming  Section oriented programming ) 1.AOP Concepts and principles 1.1. What is? AOP OOP:Object Oriented Progra ...

  5. Linux Get rid of Windows Of documents ^M

    Windows The line break of the following file is \r\n  and Linux Next file exchange behavior \n So in Linux You can use vim Edit the file   Use the full text replace command :%s/\r//g take \r Replace all It can also be in Linux Use dos ...

  6. magento in get And post How to receive the value

    $this->getRequest()->getParam('customer_id'); This method is to get post and get You don't have to $_POST[''] 了 .$this->getReq ...

  7. FORTH Basic stack operations

    body, table{font-family: Microsoft YaHei } table{border-collapse: collapse; border: solid gray; border-width: 2p ...

  8. 【.NET】 C# Time stamps and DataTime Interconversion

    1.C# DateTime Convert to Unix Time stamp System.DateTime startTime = TimeZone.CurrentTimeZone.ToLocalTime(, , )); // ...

  9. &lt;HTML Explain profound theories in simple language &gt; Reading notes

    <html> <head> <meta http-equiv="Content-Type" content="text/html; char ...

  10. 1.1 A unit of capacity in a machine B、KB、MB、GB and TB The relationship between

    byte (Byte, abbreviation B) And K.KB.M.MB The relationship between 1. The units of storage capacity in a computer are in bytes (Byte Jian Wei B) To express , Besides, there are KB.MB.GB and TB, Their relationship is : 1KB=1024Bytes=2 Of 1 ...