Application and practice of Apache Flink in auto home

Flink Chinese community 2021-10-14 07:56:59
▼ Focus on 「 Flink The Chinese community 」, Get more technical dry goods  
Flink The Chinese community
Flink The Chinese community
Apache Flink Officer of micro ,Flink PMC maintain
246 Original content
official account

Abstract : Ben This article is compiled from di Xingxing, the person in charge of the real-time computing platform of the automobile home Flink Forward Asia 2020 Shared topics 《Apache Flink The application and practice in automobile home 》. The main contents include :


  1. Background and current situation
  2. AutoStream platform
  3. be based on Flink Real time ecological construction
  4. Follow up planning

Tips: spot blow 「 Read the original writing 」 You can view the original video shared by the author ~ 

 picture  GitHub Address   picture
Welcome to  Flink  Like to send star~

 picture


One 、 Background and current situation


1. The first stage


stay 2019 Years ago , Most of the real-time business of autohome runs in Storm Above .Storm As an early mainstream real-time computing engine , With simple Spout and Bolt The programming model and the stability of the cluster itself , Captured a large number of users , We are 2016 Built in Storm platform .

 picture


With the increasing demand for real-time computing , The data scale gradually increases ,Storm In terms of development and maintenance costs , Here are some pain points :


  • High development cost

    We always use Lambda framework , Will use T+1 Offline data correction real-time data , That is, the offline data shall prevail , Therefore, the real-time calculation caliber should be completely consistent with that offline , The requirement documents for real-time data development are offline SQL, The core work of real-time developers is to take offline SQL Translate into Storm Code , Although it encapsulates some common Bolt To simplify development , But take offline hundreds of lines SQL Accurate translation into code is still challenging , And every run has to be packaged 、 Upload 、 A series of cumbersome operations to restart , The debugging cost is very high .

  • Computational inefficiency

    Storm Poor state support , Usually with the help of Redis、HBase This kind of kv Storage maintenance intermediate state , We were strongly dependent Redis. For example, common calculations UV Scene , The easiest way is to use Redis Of sadd Command judgment uid Whether it already exists , But this method will bring high network IO, At the same time, if there is no big promotion reported in advance or activities, the traffic will double , It's easy to put Redis The memory is full , O & M students will also be killed by surprise . meanwhile Redis The throughput capacity of also limits the throughput of the whole job .

  • Difficult to maintain 、 management

    Due to the use of Storm Code development , Difficult to analyze metadata and kinship , At the same time, poor readability , Calculate the aperture opacity , The cost of business handover is very high .

  • I'm not friendly

    The data warehouse team is a team that directly connects business requirements , They are more familiar with Hive Of SQL Development mode , Usually not good at Storm Development of homework , This leads to some requirements that were originally real-time , We have to go back to the next choice T+1 Give the data in a way .

At this stage , We support the most basic real-time computing requirements , Because the development door The sill is relatively high , Many real-time services are developed by our platform , As a platform , And data development , The distraction is serious .

2. The second stage


 picture


We from 2018 Research began in Flink engine , Its relatively complete SQL Support , Natural support for state attracts us , After study and Research ,2019 Start design and development at the beginning of the year Flink SQL platform , And in 2019 It went online in the middle of the year AutoStream 1.0 platform . When the platform went online, it was in the warehouse team 、 Monitoring team and operation and maintenance team team To apply , Being able to be quickly by users mainly benefits from the following points :

  • Development 、 Low maintenance cost : Most of the real-time tasks of car home can be used Flink SQL + UDF Realization . The platform provides common Source and Sink, And common in business development UDF, At the same time, users can write UDF. be based on  "SQL + To configure " The way to complete the development , Can meet most of the needs . For custom tasks , We provide easy to develop and use SDK, Help users develop custom products quickly Flink Mission . The users of the platform are not just professional data developers , General development 、 test 、 After basic learning, operation and maintenance personnel can complete daily real-time data development on the platform , Enable the platform . Data assets are manageable ,SQL The statement itself is structured , We parse a job's SQL, combination source、 sink Of DDL, You can easily know the upstream and downstream of this operation , Nature preserves kinship .

  • High performance :Flink Can be completely state based ( Memory , disk ) Do calculations , Compare the previous scenario of relying on external storage for computing , Great performance improvement . stay 818 During the active pressure test , The modified program can easily support the real-time calculation of dozens of times the original flow , And the lateral expansion performance is very good .

  • Comprehensive monitoring and alarm : Users host tasks on the platform , The survival of the task is the responsibility of the platform , Users can focus on the logic development of the task itself . about SQL Mission ,SQL High readability , Easy to maintain ; For custom tasks , Based on us SDK Development , Users can focus more on combing business logic . Whether it's SQL The task is still SDK, We all have a lot of monitoring embedded , And associated with the alarm platform , It is convenient for users to quickly find, analyze, locate and repair tasks , Improve stability .

  • Empowering business : Support data warehouse hierarchical model , The platform provides a good SQL Support , Storekeepers can use SQL, The construction experience of offline data warehouse is applied to the construction of real-time data warehouse , Since the platform went online , The data warehouse gradually began to meet the needs of real-time computing .

 picture


Pain points :

  • Ease of use needs to be improved , For example, users cannot manage themselves UDF, You can only use the platform's built-in UDF Or put a good jar Send the package to the platform administrator , Handle the upload problem manually .

  • With the rapid growth of platform operation , platform on-call The cost is very high . First of all, we often face some basic problems of new users :

    • The use of the platform ;
    • Problems encountered in the development process , For example, why package errors ;
    • Flink UI The use of ;
    • The meaning of monitoring graphics , How to configure alarms .

    There are also some questions that are not easy to answer quickly :

    • Jar Packet collision ;
    • Why consume Kafka Delay ;
    • Why did the task report an error .

    Especially the delay problem , Our common data skew ,GC, The back pressure problem can directly guide the user to Flink UI And monitoring charts , But sometimes you still need to check it manually on the server jmap、jstack Etc , Sometimes you also need to generate flame diagrams to help users locate performance problems .

    At the beginning, we didn't cooperate with the operation team , It is our developers who directly connect and deal with these problems , Although a large number of documents were supplemented during the period , But on the whole on-call The cost is still high .

  • stay Kafka or Yarn Failure time , There is no quick recovery solution , When faced with some reinsurance business , Some of them are stretched out . as everyone knows , There is no permanent stability , A fail safe environment or component , When a major failure occurs , Need a solution to quickly restore business .

  • Resources are not under reasonable control , There is a serious waste of resources . With the increasing number of users using platform development tasks , The number of jobs on the platform is also increasing . Some users cannot control the use of cluster resources well , Too many resources are often applied for , This leads to low efficiency and even idle jobs , It's a waste of resources .

stay AutoStream1.0 This stage of the platform , be based on SQL The way of development greatly reduces the threshold of real-time development , Each business party can realize the development of real-time business by itself , At the same time, after a simple study , Start docking real-time services , Release our platform from a large number of business needs , Let's concentrate on the platform .

3. At this stage


 picture


For the above aspects , We have made the following targeted upgrades :


  1. introduce Jar Service: Support user self-service upload UDF jar package , And in SQL Self reference in fragment , Realize self-service management UDF. At the same time, custom jobs can also be configured Jar Service Medium Jar, Multiple jobs share the same Jar Scene , Users only need to configure... In the job Jar Service Medium jar The package path can , Avoid uploading repeatedly every time you go online Jar The tedious operation of ;

  2. Self diagnosis : We developed a method to dynamically adjust the log level 、 Self help view flame diagram and other functions , It is convenient for users to locate problems by themselves , Reduce our daily on-call cost ;

  3. Job health check function : Analyze from multiple dimensions , For each Flink Homework grading , Each low sub item gives corresponding suggestions ;

  4. Flink Fast disaster recovery at job level : We have built two YARN Environmental Science , every last YARN Corresponding to a single HDFS, Two HDFS Passed before SNAPSHOT In a way that Checkpoint Two way replication of data , At the same time, the function of switching clusters is added on the platform , In a YARN When the cluster is not available , Users can self-help on the platform , Select the name of the standby cluster Checkpoint;

  5. Kafka Multi cluster architecture supports : Use our self-developed Kafka SDK, Support fast switching Kafka colony ;

  6. Connect with the budget system : The resources occupied by each job are directly corresponding to the budget team , This ensures that resources will not be occupied by other teams to a certain extent , At the same time, the budget administrator of each team can view the budget usage details , Know what businesses your budget supports in your team .

At present, users have become familiar with the use of the platform , At the same time, the functions of self-service health examination and self-service diagnosis are launched , Our platform side's daily work on-call The frequency is gradually decreasing , It began to gradually enter the virtuous cycle stage of platform construction .

4. Application scenarios


 picture


The data used by car home for real-time calculation is mainly divided into three categories :


  1. Client log , That is what we call the click stream log internally , Including the startup log reported by the client 、 Duration log 、PV journal 、 Click log and various event logs , This category is mainly user behavior logs , Is that we build a wide flow meter in the real-time data warehouse 、UAS System 、 The basis of real-time portrait , It also supports intelligent search 、 Intelligent recommendation and other online services ; At the same time, the basic traffic data is also used to support the traffic analysis of each business line 、 Real time effect statistics , Support daily operational decisions .

  2. Server log , Include nginx journal 、 Logs generated by various back-end applications 、 Logs of various middleware . These log data are mainly used for health monitoring of back-end services 、 Performance monitoring and other scenarios .

  3. Real time change record of business library , There are three main types :MySQL Of binlog,SQLServer Of CDC,TiDB Of TiCDC data , Based on these real-time data change records , We abstract and standardize various content data , Built a content Center 、 Basic services such as resource pool ; There are also some business data real-time statistics scenarios with simple logic , The result data is used for real-time large screen 、 Compass, etc. , Do data presentation .

The above three types of data will be written in real time Enter into Kafka colony , stay Flink Calculate for different scenarios in the cluster , The result data is written to Redis、MySQL、Elasticsearch、HBase、Kafka、Kylin Wait in the engine , Used to support upper layer applications .

Here are some application scenarios :

 picture


5. The cluster size


at present Flink Cluster server 400+, The deployment mode is YARN (80%) and Kubernetes, Number of jobs running 800+, Daily calculated quantity 1 One trillion , Peak data processing per second 2000 Ten thousand .

 picture


Two 、AutoStream platform


1. The platform architecture


 picture


It's on it AutoStream The current overall architecture of the platform , The main contents are as follows :

  • AutoStream core System

    This is the core service of our platform , Responsible for metadata service 、Flink Client services 、Jar Manage services and interactions Integration of query services , Expose platform functions to users through front-end pages .

    It mainly includes SQL and Jar Homework management 、 Management of library table information 、UDF management 、 Management of operation records and historical versions 、 health examination 、 Self diagnosis 、 Alarm management module , At the same time, it provides the ability to connect with external systems , Support other systems to manage library table information through interface 、SQL Operation information and operation startup and shutdown, etc . be based on Akka The life cycle management and scheduling system of tasks provides high efficiency , Simple , Low delay operation guarantee , It improves the efficiency and ease of use of users .

  • Metadata services (Catalog-like Unified Metastore)


    It mainly corresponds to Flink Catalog Back end implementation of , In addition to supporting basic library and table information management , It also supports the permission control of library table granularity , Combined with our own characteristics , Support user group level authorization .


    At the bottom, we provide Plugin Catalog Mechanism , It can be used for both Flink Existing Catalog Realize integration , It is also convenient for us to embed custom Catalogs, adopt Plugin Mechanisms can be easily reused HiveCatalog,JdbcCatalog etc. , This ensures the consistency of the period of the library table .


    At the same time, the metadata service is also responsible for the data submitted by users DML Statement parsing , Identify the dependent table information of the current job , For job analysis and submission process , At the same time, blood relationship can be recorded .

  • Jar Service


    Various services provided by the platform SDK stay Jar Service Unified management on , At the same time, users can also customize on the platform Jar、UDF jar When submitted to Jar Service On the unified management , Then in the job through configuration or DDL quote .

  • Flink Client services (Customed Flink Job Client)


    Be responsible for converting the work on the platform into Flink Job Submitted to the Yarn or Kubernetes On , We are on this floor for Yarn and Kubernetes It's abstract , Unify the behavior of the two scheduling frameworks , Expose unified interfaces and standardized parameters , Weaken Yarn and Kubernetes The difference of , by Flink The operation switches seamlessly between the two frames Laid a good foundation .


    Each job has different dependencies , In addition to the management of basic dependence , You also need to support personalized dependencies . For example, different versions of SQL SDK, Uploaded by users themselves Jar、UDF etc. , Therefore, the submission stages of different jobs need to be isolated .


    We're going to use Jar service + Process isolation , Through the and Jar Service docking , Depending on the type and configuration of the job , Choose the appropriate Jar, And submit a separate process to execute , Achieve physical isolation .

  • Result caching service (Result Cache Serivce)


    Is a simple caching service , be used for SQL Online debugging scenario in job development stage . When we analyze the user's SQL sentence , take Select The result set of the statement is stored in the cache service ; Then the user can select SQL Serial number ( Each complete SELECT Statement corresponds to a sequence number ), Real-time view SQL Corresponding result data , It is convenient for users to develop and analyze problems .

  • built-in Connectors (Source & Sink)


    The rightmost part is mainly various Source、Sink The implementation of the , Some are reuse Flink Provided connector, Some are developed by ourselves connector.


    For each connector We all added the necessary Metric, And configured as a separate monitoring chart , It is convenient for users to understand the operation of jobs , At the same time, it also provides data basis for positioning problems .

2. be based on SQL Development process


Based on the above functions provided by the platform , Users can quickly realize SQL Development of homework :


  1. Create a SQL Mission ;

  2. To write DDL Statement Source and Sink;

  3. To write DML, Complete the implementation of main business logic ;

  4. View results online , If the data meets expectations , add to INSERT INTO sentence , Write to the specified Sink Then you can .

 picture


The platform will be saved by default SQL Record of each change , Users can view historical versions online , At the same time, we will record various operations for the job , In the job maintenance phase, it can help users trace the change history , Location problem .

Here's a Demo, It is used to count the of the day PV、UV data :

 picture


3. be based on Catalog Metadata management


 picture


The main content of Metadata Management :


  • Support permission control : In addition to supporting basic library and table information management , It also supports table granularity permission control , Combine our own characteristic , Support user group level authorization ;

  • Plugin Catalog Mechanism : You can combine a variety of other Catalog Realization , Reuse the existing Catalog;

  • Library table life cycle behavior is unified : Users can choose to unify the life cycle of tables on the platform and underlying storage , Avoid separate maintenance on both sides , Duplicate table creation ;

  • New and old versions are fully compatible : Because in AutoStream 1.0 When , We did not introduce Metastore service , Besides 1.0 In the period of DDL SQL The parsing module is a self-developed component . So we are building MetaStore The service , You need to consider the compatibility of historical jobs and historical database table information .


    • For library table information , new MetaStore At the bottom, the library table information of the new version and the old version is transformed into a unified storage format , This ensures the compatibility of Library and table information .
    • For homework , Here we use the abstract interface , And provide V1Service and V2Service Two implementation paths , It ensures the compatibility of new and old jobs at the user level .

Here are several modules and Metastore Schematic diagram of interaction :

 picture


4. UDXF management


We introduced Jar Service Services are used to manage various Jar, Including user-defined jobs 、 Inside the platform SDK Components 、UDXF etc. , stay Jar Service On this basis, we can easily implement UDXF Self-service management , stay On k8s In the scene of , We provide a unified image ,Pod It will start from Jar Service Download the corresponding Jar Go inside the container , Used to support job startup .

User submitted SQL If it contains Function DDL, We will be in Job Client Service Will analyze DDL, Download the corresponding Jar To local .

To avoid dependency conflicts with other jobs , Each time, we will start a sub process to complete the operation of job submission .UDXF Jar Will be and added to classpath in , We are right. Flink Some changes have been made , The assignment will be submitted with this Jar Upload to HDFS in ; meanwhile AutoSQL SDK The current job will be registered according to the function name and class name UDF.

 picture


5. Monitor alarm and log collection


Thanks to the Flink Perfect Metric Mechanism , We can easily add Metric, in the light of Connector, We have embedded a wealth of Metric, The default monitoring Kanban is configured , You can view... Through kanban CPU、 Memory 、JVM、 Network transmission 、Checkpoint、 Various Connector The monitoring chart of . At the same time, the platform is connected with the company's cloud monitoring system , Automatically generate the default alarm policy , Monitor survival status 、 Key indicators such as consumption delay . At the same time, users can modify the default alarm policy in the cloud monitoring system , Add a new alarm item to realize personalized monitoring alarm .

Log through cloud Filebeat The component writes to Elasticsearch colony , At the same time open Kibana For users to query .

 picture


The overall monitoring alarm and log collection architecture is as follows :

 picture


6. Health examination mechanism


With the rapid growth of the number of jobs , There have been a lot of unreasonable use of resources , For example, the waste of resources mentioned earlier . Users are mostly docking with new requirements , Support new business , It is rare to go back and evaluate whether the resource allocation of the job is reasonable , Optimize resource use . Therefore, the platform plans a version of the cost evaluation model , That is, the current health examination mechanism , The platform will make multi-dimensional health scores for homework every day , Users can view the scores and recent results of a single job on the platform at any time 30 Score change curve of days .

Low score jobs will be prompted when users log in to the platform , And regularly send emails to remind users to optimize 、 Rectification , After optimizing the job, the user can actively trigger re scoring , See the optimization effect .

 picture


We introduced multi-dimensional , Weight based scoring strategy , in the light of CPU、 Memory usage 、 Whether there are idle Slot、GC situation 、Kafka Delay in consumption 、 The indicators of multiple dimensions such as the amount of data processed per second by a single core are analyzed and evaluated in combination with the calculation of the complement diagram , Finally, a comprehensive score .

Each low score item will show the reason for the low score and the reference range , And show some guidance , Assist users to optimize .

We added a new Metric, Use one 0%~100% The figures reflect TaskManagner CPU utilization . In this way, users can intuitively evaluate CPU Whether there is waste .

 picture


The following is the general process of homework scoring : First, we will collect and sort out the basic information and Metrics Information . Then apply the rules we set , Get basic score and basic suggestion information . Finally, the score information and suggestions are integrated , Comprehensive evaluation , Get the comprehensive score and the final report . Users can view reports through the platform . For assignments with low scores , We will send an alarm to the home user of the job .

 picture


7. Self diagnosis


As mentioned before, pain points , When users locate online problems , We can only turn to our platform , Cause us on-call A lot of work , At the same time, the user experience is not good , In view of this , So we launched the following functions :


  • Dynamic change log level : We learned from Storm How to modify the log level , stay Flink Similar functions are implemented on , By extending the REST API and RPC Interface method , Support to modify the specified Logger To a certain log level , And support setting an expiration time , When it expires , Change Logger Your log will be restored to INFO Level ;

  • Support self-service viewing of thread stack and heap memory information :Flink UI Online viewing of thread stack is already supported in (jstack), We reuse this interface directly ; There is also an additional increase in view heap memory (jmap) The interface of , Convenient for users to view online ;

  • Support online generation 、 Look at the flame diagram : Flame map is a powerful tool for locating program performance problems , We took advantage of Ali's arthas Components , by Flink Increased the ability to view flame diagrams online , When users encounter performance problems , Can quickly assess performance bottlenecks .

 picture


8. be based on Checkpoint Rapid disaster recovery of replication


 picture


When real-time computing is applied in important business scenarios , Single Yarn Once the cluster fails and cannot be recovered in a short time , Then it may have a great impact on the business .

In this context , We built Yarn Multi cluster architecture , Two independent Yarn Each corresponds to an independent set of HDFS Environmental Science ,checkpoint The data is regularly in two HDFS Copy each other . at present checkpoint The latency of replication is stable at 20 Within minutes .

meanwhile , At the platform level , We open the function of switching clusters directly to users , Users can view it online checkpoint Replication of , Choose the right one checkpoint after ( Of course, you can also choose not from checkpoint recovery ) Cluster switching , Then restart the job , Realize the relatively smooth migration of jobs between clusters .

3、 ... and 、 be based on Flink Real time ecological construction


AutoStream The core scenario of the platform is to support the use of real-time computing developers , Make real-time computing development simple and efficient 、 Can be monitored 、 Easy operation and maintenance . At the same time, with the gradual improvement of the platform , We began to explore how to AutoStream Platform for reuse , How to make Flink Applied in more scenarios . reusing AutoStream There are several advantages :

  • Flink It is an excellent distributed computing framework , It has high computing performance , Good fault tolerance and mature state management mechanism , The community is thriving , Guaranteed function and stability ;

  • AutoStream It has a perfect monitoring and alarm mechanism , The job runs on the platform , There is no need to connect the monitoring system separately , meanwhile Flink Yes Metric The support is very friendly , You can easily add new Metric;

  • A lot of technical precipitation and operation experience , Through more than two years of platform construction , We are AutoStream It has achieved a relatively perfect Flink Management of job life cycle , And built Jar Service And other basic components , Packaging through a simple upper interface , You can connect to other systems , Let other systems have the ability of real-time computing ;

  • Support Yarn and Kubernetes Deploy .

 picture


Based on the above , When we build other systems , Give priority to reuse AutoStream platform , Interface call is used for docking , take Flink The life cycle of the whole operation process , Completely entrusted to AutoStream platform , Each system shall give priority to realizing its own business logic .

In our team AutoDTS ( Access and distribution tasks ) and AutoKafka (Kafka Cluster replication ) The system now relies on AutoStream Constructed . Briefly introduce the integration method , With AutoDTS For example :


  1. Put the task Flink turn ,AutoDTS Access on 、 Distribute tasks , It's all about Flink The form of job exists ;

  2. and AutoStream Platform docking , Call interface implementation Flink Job creation 、 modify 、 start-up 、 Stop and wait operation . here Flink Homework can be either Jar, It can also be SQL Homework ;

  3. AutoDTS The platform is based on business scenarios , Build personalized front-end pages , Personalized form data , After the form is submitted , You can store form data in MySQL in ; At the same time, the operation information and Jar Package address and other information are assembled into AutoStream Format of interface definition , Call through the interface at AutoStream The platform automatically generates a Flink Mission , Save this at the same time Flink Mission ID;

  4. start-up AutoDTS An access task , Call directly AutoStream The interface realizes the start of the job .

1. AutoDTS Data access and distribution platform


AutoDTS The system mainly includes two functions :


  1. Data access : Change the data in the database (Change log) Write to in real time Kafka;

  2. Data dissemination : Connect to Kafka The data of , Real time write to other storage engines .

■ 1.1 AutoDTS Data access


The following is the architecture diagram of data access :

 picture


We maintain a system based on Flink Data access SDK And defined a unified JSON data format , in other words MySQL Binlog,SQL Server、 TiDB The changed data is accessed to Kafka after , The data format is consistent , When used by downstream businesses , Development based on unified format , There is no need to focus on the type of the original business library .

Data access to Kafka Topic At the same time ,Topic Will automatically register as a AutoStream Flow table on platform , Convenient for users .

Data access is based on Flink There is an additional benefit of construction , It can be based on Flink The exact one-time semantics of , Realize accurate primary data access at low cost , This is for businesses that require high data accuracy , It's a necessary condition .

At present, we are doing to access the full amount of data in the business table Kafka Topic in , be based on Kafka Of compact Pattern , Can achieve Topic It contains both stock data and incremental data . This is very friendly for data distribution scenarios , At present, if you want to synchronize data to other storage engines in real time , It needs to be based on the scheduling system first , Access a full amount of data , Then start the real-time distribution task , Real time distribution of change data . With Compact Topic after , The operation of full access can be omitted .Flink1.12 The version has been modified to Compact Topic Support , introduce upsert-kafka Connector [1]

[1] https://cwiki.apache.org/confluence/display/Flink/FLIP-149%3A+Introduce+the+upsert-kafka+Connector


Here is a sample data :

 picture


The default flow table registered on the platform is Schemaless Of , Users can use JSON dependent UDF Get the field data .

 picture


Here is an example of using a flow table :

 picture


■ 1.2 AutoDTS Data dissemination


 picture


We already know , Access to Kafka The data in can be used as a flow table , In essence, the data distribution task is to write the data of the stream table to other storage engines , Whereas AutoStream The platform already supports a variety of Table Sink (Connector), We only need to fill in the information such as the type and address of downstream storage , You can assemble SQL To achieve data distribution .

Through direct reuse Connector The way , Maximize and avoid repeated development work .

The following is the corresponding to a distribution task SQL Example :

 picture


2. Kaka Multi cluster architecture


Kafka in application , Some scenes need to be done Kafka Supported by multi cluster architecture , Here are some common scenarios :


  • Data redundancy and disaster recovery , Replicate data in real time to another standby cluster , When one Kafka When the cluster is not available , The application can be switched to the standby cluster , Fast business recovery ;

  • Cluster migration , When the computer room contract expires , Or in the clouds , Cluster migration is required , At this time, you need to count the number of clusters According to the cluster copied to the new computer room as a whole , Let the business migrate relatively smoothly ;

  • Read write separation scenario , Use Kafka when , Most of the time, I read more and write less , To ensure the stability of data writing , You can choose to build Kafka Read write separation cluster .

We have now built Kafka Multi cluster architecture , and Flink There are mainly two related contents :


  1. Kafka The program of data replication between clusters runs in Flink In the cluster ;

  2. Transformed Flink Kafka Connector, Support fast switching Kafka colony .

■ 2.1 The overall architecture


 picture


Let's take a look first Kafka Data replication between clusters , This is the basis for building a multi cluster architecture . We use MirrorMaker2 To achieve data replication , We put MirrorMaker2 Transformed into ordinary Flink Homework , Running on the Flink In the cluster .

We introduced Route Service and Kafka SDK, To realize the fast switching access of the client Kafka colony .

The client needs to rely on our own Kafka SDK, And... Is no longer specified in the configuration bootstrap.servers Parameters , But by setting cluster.code Parameter to declare the cluster you want to access .SDK Will be based on cluster.code Parameters , visit Route Service Get the real address of the cluster , Then create Producer/Consumer Started to produce / Consumption data .

SDK Will listen for changes in routing rules , When you need to switch clusters , Only need Route Service Background switching routing rules ,SDK When the routing cluster changes , Will restart Producer/Consumer example , Switch to the new cluster .

If the consumer has a cluster switch , because Cluster1 and Cluster2 in Topic Of offset Is different , Need to pass through Offset Mapping Service To get the current Consumer Group stay Cluster2 Medium offset, And then from these Offset Start spending , Achieve relatively smooth cluster switching .

■ 2.2 Kafka Data replication between clusters


We use MirrorMaker2 To achieve data replication between clusters ,MirrorMaker2 yes Kafka 2.4 Version introduced , Specific features are as follows :


  • Automatically identify new Topic and Partition;

  • Automatic synchronization Topic To configure :Topic The configuration of is automatically synchronized to the target cluster ;

  • Automatic synchronization ACL;

  • Provide Offset The conversion tool for : Support clustering according to the source 、 Target cluster and Group Information , Get the Group The corresponding in the target cluster Offset Information ;

  • Support extended black and white list policy : Flexible customization , Dynamic effect .

clusters = primary, backupprimary.bootstrap.servers = vip1:9091backup.bootstrap.servers = vip2:9092primary->backup.enabled = truebackup->primary.enabled = true

This configuration is complete primary To backup Bidirectional data replication of cluster ,primary In the cluster topic1 The data in will be copied to backup In the cluster primary.topic1 This Topic in , Of the target cluster Topic The naming rule is sourceCluster.sourceTopicName, By implementing ReplicationPolicy Interface to customize naming policies .

 picture


■ 2.3 MirrorMaker2 dependent Topic Introduce


  • In the source cluster Topic

    heartbeats: Store heartbeat data ;

    mm2-offset-syncs.targetCluster.internal: Storage source cluster (upstreamOffset) And the target cluster offset(downstreamOffset) Corresponding relation .

  • In the target cluster Topic  

    mm2-configs.sourceCluster.internal: connect Frame with , Used to store configuration ;

    mm2-offsets.sourceCluster.internal: connect Frame with , Used to store WorkerSourceTask What is being processed offset,mm2 In this scenario, the current data is synchronized to the source cluster topic partition Which one offset, This is more like Flink Of checkpoint Concept ;

    mm2-status.sourceCluster.internal: connect Frame with , Used to store connector state .

The above three are used connect runtime Module KafkaBasedLog Tool class , This tool class can read and write a compact Mode topic data , here MirrorMaker2 hold topic As KV Storage use .

sourceCluster.checkpoints.internal: Record sourceCluster consumer group In the corresponding of the current cluster offset,mm2 From the source on a regular basis kafka Cluster read topic Corresponding consumer group The submitted offset, And write to the target cluster sourceCluster.checkpoints.internal topic in .

 picture


■ 2.4 MirrorMaker2 Deployment of


Here is MirrorMaker2 Process of job operation , stay AutoKafka Create a data replication job on the platform , Would call AutoStream Platform interface , Create a corresponding MM2 Type of homework . When starting a job , Would call AutoStream The interface of the platform MM2 Job submitted to Flink Running in cluster .

 picture


■ 2.5 Routing service


Route Service Responsible for handling the routing request of the client , Match the appropriate routing rules according to the information of the client , The final routing result , That is, the cluster information is returned to the client .

Support based on cluster name 、Topic、Group、ClientID And flexibly configure routing rules with client-defined parameters .

The following example is to Flink Homework ID by 1234 The consumer , Route to cluster_a1 colony .

 picture


■ 2.6 Kafka SDK


Use native kafka-clients It is impossible to do so. Route Service Communicating , The client needs to rely on what we provide Kafka SDK ( Developed in-house by autohome SDK) Energy and harmony Route Service signal communication , Achieve the effect of dynamic routing .

Kafka SDK Realized Producer、Consumer Interface , The essence is kafka-clients Agent for , With fewer changes to the business, you can introduce Kafka SDK.

Business depends on Kafka SDK after ,Kafka SDK Will be responsible for and Route Service signal communication , Monitor route changes , When the cluster in which the route is found changes , Meeting close Current Producer/Consumer, Create a new Producer/Consumer, Access the new cluster .

Besides Kafka SDK He is also responsible for Producer、Consumer Of metric Uniformly report to the cloud monitoring system prometheus, By viewing the pre configured dashboard of the platform , You can clearly see the production of the business 、 Consumption .

meanwhile SDK Will collect some information , For example, the application name 、IP port 、 Process number, etc , This information can be found in AutoKafka Found on the platform , It is convenient for us and users to jointly locate the problem .

 picture


■ 2.7 Offset Mapping Service


When Consumer When the routing changes and the cluster is switched , The situation is a little complicated , Because at present MirrorMaker2 First, consume the data from the source cluster , Then write to the target cluster , The same piece of data can be written to the target topic The same partition of , however offset Different from the source cluster .

In view of this offset A situation of inconsistency ,MirrorMaker2 Will consume the source cluster __consumer_offsets data , Add... Corresponding to the target cluster offset, Write to the target cluster sourceCluster.checkpoints.internal topic in .

meanwhile , Source cluster's mm2-offset-syncs.targetCluster.internal topic The source cluster and target cluster are recorded offset The mapping relation of , Combine these two topic, We built Offset Mapping Service To complete the of the target cluster offset Conversion work .

So when Consumer When you need to switch clusters , Would call Offset Mapping Service The interface of , Get the name of the target cluster offsets, Then take the initiative seek Start spending at these locations , This enables relatively smooth cluster switching .

 picture


■ 2.8 Flink And Kafka Integration of multi cluster architecture


because Kafka SDK compatible kafka-clients Usage of , The user only needs to change the dependency , Then set the cluster.code、Flink.id It's OK to wait for parameters .

When Producer/Consumer After cluster switching occurs , Due to the creation of a new Producer/Consumer example ,Kafka Of metric Data not re registered , Lead to metric Data cannot be reported normally . We are AbstractMetricGroup Class unregister Method , Monitoring Producer/Consumer When switching events for , Re registration kafka metrics That's all right. .

So far we're done Flink Yes Kafka Multi cluster architecture support .

 picture


Four 、 Follow up planning


 picture


  1. At present, most of the data statistics scenarios we support are based on traffic data or user behavior data , These scenarios don't require precise semantics , With the current community demand for Change Log Gradual improvement of support , At the same time, our data access system supports precise one-time semantics , And is doing business table full access to Kafka The function of , Therefore, accurate data statistics can be realized later , Support transaction 、 clue 、 Statistical demand for Finance .

  2. Some companies have put forward the concept of Lake Warehouse Integration , Data Lake technology can indeed solve some problems of the original data warehouse architecture spot , For example, the data does not support update operations , Can not achieve quasi real-time data query . At present, we are doing some Flink and Iceberg、Hudi Some attempts at integration , In the follow-up, we will find the scene in the company and land it .



Hot recommendation





more Flink Related technical issues , Can scan code to join the community nail exchange group ~
 picture

  picture    Poke me , Review the author's original sharing video !

Please bring the original link to reprint ,thank
Similar articles