Practical experience of automation operation and maintenance of some small teams
Program description DD 2021-06-25 02:39:05

notes : This paper requires readers to understand Ansible and Jenkins Have a certain understanding .

. : Happy families are alike Every unhappy family has its own misfortune

The automation operation and maintenance architecture of various giants in the industry has all kinds of cool functions , Here's the picture , Out of reach . Now we all know what it will look like in the end , But the problem is how to evolve to that goal step by step according to the current situation of your team ?

My team , Three and a half development , To maintain dozens of cloud machines , More than a dozen applications have been deployed , These applications 90% All legacy systems . The compilation and packaging of the application system is basically on the programmer's own computer . Branch management is the same dev Branch Development , After passing the test , And then merge into master Branch . The application configuration of the production environment needs to log on to the specific machine , Not to mention configuration center and configuration versioning .

by the way , Not even basic machine level basic monitoring .

My usual job is 50% Business development ,50% Operation and maintenance . Facing so many problems , I thought ah , How to realize automatic operation and maintenance at low cost . This article is to summarize my experience and practice in this field . I hope it will help the readers .

Stop talking , First, monitor and alarm

There are priorities , Monitoring and alarming are what I think we should do at the beginning , Even if business development is slowed down . Only knowing the current situation , You can make the next plan .

There are many monitoring systems on the market :Zabbix、Open-Falcon、Prometheus. In the end, the author chose Prometheus. because :

  1. It's pull mode
  2. It's easy to configure with text , Good for configuration versioning
  3. Too many plugins , What you want to monitor , Basically, there will be ready-made
  4. Three of the above , I basically have to learn it again , Why don't I learn one Google SRE What's recommended in the book ?
    We've talked about it before , Less people, more machines , therefore , install Prometheus And the process has to be automated , At the same time, versioning . I used Ansible + Git Realization . The end result is as follows :

Here is a brief introduction to :

  1. Prometheus Server Responsible for monitoring data collection and storage
  2. Prometheus Alert manager Responsible for the alarm according to the alarm rules , It can integrate many alarm channels
  3. node-exporter The function of the index is to read the index from the machine , And then expose a http service ,Prometheus It is to collect monitoring indicators from this service . Of course Prometheus There are all kinds of official exporter.

Use Ansible One advantage of being a deployment tool is that there are too many out of the box role 了 , install Prometheus when , I'm using off the shelf :prometheus-ansble

With the monitoring data , We can visualize the data ,Grafana and Prometheus Very well integrated , therefore , We're deploying again Grafana:

stay Grafana Check out nodex-exporter The effect picture of the collected data is as follows :

But , It is not possible for us 24 Hours staring at the screen CPU Is there any overload ? It's time for the alarm ,Promehtues Integrated by default N Multiple channels . It's a pity that there are no integrated nails . But it doesn't matter , Some kind-hearted students have opened the source of nailing integration Prometheus Components of the alarm :prometheus-webhook-dingtalk. next , We're on it, too :

 Integrated alarms

After finishing the above work , Our basic monitoring shelf is finished . For us later on Redis monitor 、JVM Monitoring and other higher level monitoring are ready .

Configuration versioning starts with dolls

In the process of building the monitoring system , We've pulled the configuration out , Put it in a separate code warehouse for management . All future deployment , We all separate the configuration and deployment logic .

On how to use Ansible Configuration management , You can refer to this article :How to Manage Multistage Environments with Ansible . This is how we organize environment variables .

├── environments/ # Parent directory for our environment-specific directories
│ │
│ ├── dev/ # Contains all files specific to the dev environment
│ │ ├── group_vars/ # dev specific group_vars files
│ │ │ ├── all
│ │ │ ├── db
│ │ │ └── web
│ │ └── hosts # Contains only the hosts in the dev environment
│ │
│ ├── prod/ # Contains all files specific to the prod environment
│ │ ├── group_vars/ # prod specific group_vars files
│ │ │ ├── all
│ │ │ ├── db
│ │ │ └── web
│ │ └── hosts # Contains only the hosts in the prod environment
│ │
│ └── stage/ # Contains all files specific to the stage environment
│ ├── group_vars/ # stage specific group_vars files
│ │ ├── all
│ │ ├── db
│ │ └── web
│ └── hosts # Contains only the hosts in the stage environment

At this stage , All of our configurations are stored as text , In the future, we will switch to using Consul Do configuration center , It's also very convenient , because Ansible2.0 The above version has been natively integrated consule: consul_module

Tips: Ansible The configuration variables are hierarchical , This gives us a lot of flexibility in configuration management .

Jenkins turn : Give the package to Jenkins

We're going to hand over the packaging of all the projects to Jenkins. Of course , In reality, we put some projects into Jenkins Pack up , Gradually put the project on Jenkins.

First of all, we need to have Jenkins. build Jenkins There are also ready-made Ansible Script :ansible-role-jenkins. Pay attention , Most of the articles I see on the Internet tell you Jenkins All plug-ins need to be installed manually , And the one we use ansible-role-jenkins Automatic installation of plug-ins is realized , You just need to add a configuration variable jenkins_plugins That's all right. , The official example is as follows :

- hosts: all
- blueocean
- ghprb
- greenballs
- workflow-aggregator
jenkins_plugin_timeout: 120

- include_tasks: java-8.yml

- ansible-role-jenkins

Set up well Jenkins after , It's about integration Gitlab 了 . We already have Gitlab 了 , therefore , No need to rebuild . How to integrate is not detailed , There are many articles on the Internet .

Final Jenkins Build it like this :

About Jenkins master And Jenkins agent Mode of connection , Because the network environment is different , There are many ways to do it online , You can choose the right way .

good , Now we need to tell Jenkins How to compile and package our business code . There are two ways :

  1. Interface Settings
  2. Use Jenkinsfile: Be similar to Dockerfile A text file of , Specific introduction :Using a Jenkinsfile

Without hesitation, the author chose the second one 2 Kind of , Because one is good for versioning ; Second, flexibility .

Jenkinsfile Like this :

pipeline {
agent any
stages {
stage('Build') {
steps {
sh './gradlew clean build'
archiveArtifacts artifacts: '**/target/*.jar', fingerprint: true

that Jenkinsfile Where to put it ? With the business code , Like this, each project manages its own Jenkinsfile:

At this time , We can do that Jenkins Create a pipleline Job 了 :

About branch management , We are few , therefore , It is suggested that all projects be unified in master Branch to develop and release .

Give Way Jenkins Help us carry out Ansible

We used to do it on the programmer's computer Ansible Of , Now we're going to give the job to Jenkins. Specific operation :

  1. stay Jenkins install Ansible plug-in unit
  2. stay Jenkinsfile In the implementation of
    withCredentials([sshUserPrivateKey(keyFileVariable:"deploy_private",credentialsId:"deploy"),file(credentialsId: 'vault_password', variable: 'vault_password')]) {
    ansiblePlaybook vaultCredentialsId: 'vault_password', inventory: "environments/prod", playbook: "playbook.yaml",
    ansible_ssh_private_key_file: [value: "${deploy_private}", hidden: true],
    build_number: [value: "${params.build_number}", hidden: false]

It needs to be explained here :

  1. ansiblePlaybook yes Jenkins ansible Plugins provide pipeline grammar , Similar to manual execution :ansible-playbook .
  2. withCredentials yes Credentials Binding The syntax of the plug-in , Used to refer to sensitive information , Such as execution Ansible When needed ssh key And Ansible Vault password .
  3. Some sensitive configuration variables , We use Ansible Vault Technology encryption .

Ansible Where should the script be ?

We already know that each project is responsible for its own automated build , therefore ,Jenkinfile Put them in their own projects . What about the deployment of the project ? Same thing , We feel that each project should also be responsible for itself , therefore , There will be one under every project we need to deploy ansible Catalog , To hold Ansible Script . Like this :

however , How to do? ? We're going to put... In the packaging phase Ansible The catalog goes on zip pack . When it's really deployed , Then decompress and execute the playbook.

Quickly generate... For all projects Ansible Script and Jenkinsfile

above , We're going to do a project Jenkins Hua He Ansible turn , But we still have a lot of projects to do the same . Considering that it's physical work , And we will do it often in the future , So I decided to use cookiecutter Automatic generation of Technology Jenkinsfile And Ansible Script , Create a project , like this :


Sum up , The implementation sequence of our small team's automatic operation and maintenance is about :

  1. On the basis of monitoring
  2. On Gitlab
  3. On Jenkins, And integrate Gitlab
  4. Use Jenkins Realize automatic compilation and packaging
  5. Use Jenkins perform Ansible

The above is just a shelf , Based on this “ Shelves ”, Then we can evolve to the architecture of those big factories . such as :

  • CMDB Construction : We use ansible-cmdb according to inventory Automatically generate the current situation of all machines
  • Release management :Jenkins You can customize each stage of the release on the . Blue green publishing and other publishing methods can be used by modifying Ansible Scripts and Inventory Realization .
  • Automatic volume expansion and shrinkage : By configuring Prometheus Alarm rules , Call the corresponding webhook Can be realized
  • ChatOps: ChatOps actual combat

The above is the author's practice of automation operation and maintenance . Still on the way . Hope to communicate with you .

The author of this article : Zhai Zhijun ,
Link to the original text :
The copyright belongs to the author , Please indicate the author of the reprint 、 original text 、 Translators and other source information
Please bring the original link to reprint ,thank
Similar articles