Let me write it out front :

This document is great for Terence
Parr Works 《the definitive antlr4 reference》 Translated version of . Pay tribute to ! Welcome to reprint , Please indicate the original address , Please respect the achievements of labor . There are some mistakes in the translation , Welcome to correct .

Welcome to ANTLR The world of !

ANTLR V4 It's a book you can use to read . Handle , function , Or into structured text or binary file powerful parsing generator . It's widely used in academia and industry to build languages . Tools and frameworks . Twitter Search uses ANTLR Query parsing for , More than a day 2,000,000,000 Multiple queries .Hive.Pig, Data warehouse and hadoop The language of the analysis system is used ANTLR.Lex
Machina use ANTLR Extracting information from regular text .

Oracle stay SQL
developer ide And its migration tool ANTLR.NetBeans
IDE use ANTLR analysis C++.Hibernate in HQL Language use ANTLR build OR Mapping framework .

Except for the big names , High profile projects . You can build all kinds of practical tools , Such as the reading of configuration file . Legacy code converters ,Wiki The marker's render and JSON Parser . Some of the little tools I created for relational database mapping , Description and narration 3D fictitious . And inject the analysis code into Java Source code , I even made a simple DNA Examples of pattern matching .

Through the language description of grammar ,ANTLR Be able to generate such a language parser and generate the parse tree on its own initiative ( A data structure that represents how syntax matches input ).

ANTLR You can also generate trees on your own initiative walkers. You can access the nodes of those trees to run application specific code .

Version number as reference , At the same time, it is also a guide to solve the problem of language recognition . You will learn how to do things like the following :

• Identify grammatical language patterns from the given examples and reference manuals , Put this to use .

Finally achieve the requirements of establishing their own grammar .

• From simple language ( Such as JSON) All the way to complex programming languages ( image R Language ) Take your time building grammar , You will solve Python and XML Some tricky identification problems .

• Based on those grammars, through walking The parsing tree generated by myself is used to realize the language application .

• Define and identify error handling methods and error reports for specific application areas .

• By embedding Java The code implements absolute control over parsing .

Different from the partial theory of Textbooks . The case driven discussion in this book is to make the content more detailed and vivid , It's convenient for you to provide some toolsets for building your own language applications .

Who is this book written for

This book is dedicated to readers interested in learning how to build data , Language interpretation and transformation of the program apes .

The content of this book is how to use ANTLR To build these things . Of course , But you generally need to know about lexical analyzers and parsers .

Beginners and experts need to use this book efficiently ANTLR
V4. Before you see the advanced topics in part three , You'd better read the previous chapters to get information about ANTLR Some of the experience . Readers should also know something about Java Knowledge of .

ANTLR V4 Why so cool

ANTLR Of V4 The version number has some important new features , It can make learning less detours , Make development grammar and language application more easy. The most important thing is ,ANTLR
V4 Embrace every grammar you define ( The exception is indirect left handed return ).ANTLR Before translating your grammar into a runnable file 、 When you read and analyze code, there are no syntax conflicts or fuzzy warnings .

Suppose you give it to you ANTLR The generated parser is a valid input , No matter how complicated the grammar is , The parser will recognize it correctly . Of course . It's up to you first to make sure that the grammar is accurate in describing the problem .

ANTLR The parser uses me and Sam
Harwell It's a new parsing technology , be called Adaptive LL(*) Or rather, ALL(*).

ALL(*) yes V3 Of LL(*) An extension of Technology , Before the generated parser runs , Finish parsing dynamically instead of static parsing before . because ALL(*) The parser gets the actual input sequence , They can always figure out how to recognize sequences by properly weaving Syntax .

And compared to static analysis , All the possibilities have to be taken into account ( Infinite length ) Input sequence .

actually , Use ALL(*) It means you don't have to distort your grammar to cater to the basic analysis strategies of most other parser generator tools , contain ANTLR
V3. Suppose you were ANTLR V3 Fuzzy warning or in yacc One of them is to reduce or reduce the conflict . that ANTLR
V4 It's exactly what you want .

Little knowledge :yacc,Yet
AnotherCompiler Compiler, yes Unix/Linux The last one was used to generate compiler The compiler ( Compiler code generator ).yacc Generated compiler Mainly used C A grammar parser written in a language (Parser), Need to work with a lexical parser Lex Use it together , And then the two parts are produced C The program is compiled together .
yacc It was just Unix On the system . But now it has been widely transplanted to Windows And other platforms .

The next new feature ,ANTLR V4 Greatly simplifies the syntax rules for matching syntax structures , Like programming languages, arithmetic expressions . Expressions need to use ANTLR Grammar ( Use recursive descent analyzer to manually identify ) It has become a troublesome thing to specify . Recognize the most natural syntax object of expression ANTLR
V3 So the traditional top-down analyzer generator is invalid .

Now . use antlr V4. You can use rules like this to match expressions :

expr : expr'*'
expr   // match subexpressions joined with '*'operator

| expr'+'
expr       // match subexpressions joined with '+' operator

| INT             // matches simple integer atom

;

image expr The rule of self reference is recursion , especially , Left recursion points to itself at least once in its selection .

ANTLR V4 I rewrite the left recursion rule on my own initiative , If you put expr Become a non left recursive equivalent .

The only limitation is , Left recursion in the rule reference itself must be direct . It's impossible to point to an alternative rule scheme .

In addition to these two grammar related improvements . send ANTLR V4 Building language applications is more easy.

ANTLR The generated parser creates a convenient representation of the input itself . It's called a parse tree . In this way, the application can traverse the trigger code fragment according to its own interests . before ,V3 Users have to add syntax trees for construction jobs .

In addition to building your own trees ,ANTLR
V4 Can also be in listener and visitor Pattern implementation of their own initiative to generate analytic tree traverser .
listener Be similar to XML Document processing XML Parser triggered SAX The response object of the event .

ANTLR V4 Study easy Much more , Because those new functions are not available from V3 inherited .

• The biggest change is ,V4 No more emphasis on actions embedded in grammar ( Code ). In its place listener and visitor. Easy to decouple syntax from application code . No embedded action , You can also reuse the same syntax in different applications without compiling the parser again .ANTLR Still agree with the embedded action , But doing so is V4 Advanced applications for .

The highest level of control that this behavior requires , And at the cost of losing syntax reuse .

• because ANTLR Generate their own initiative parsing tree and tree traverser , You don't have to be in V4 Build tree grammar . You can use visitor This familiar design pattern replaces .

It means , Once you've learned ANTLR grammar . You can go back to the comfortable and familiar Java Programming language field to realize the actual language application .

• ANTLR V3 Of LL(*) The analysis strategy is weaker than V4 Of ALL(*). therefore V3 Sometimes we rely on backtracking to parse the input phrase correctly . Backtracking is very difficult to debug grammar through the generated parser, because the parser parses the same input many times ( Because of recursion ). And it's very difficult for backtracking to give satisfactory error information when parser encounters invalid input .

ANTLR V4, It's the result of a small bend I made when I was a graduate student , It's been 25 years . I think I'm going to change my motto .

Why program by hand in fivedays what you can spend twenty-five years of your life automating?

For years, I finally got what I wanted . Now , Those days are nearly blurred .

What does the book say

The book is divided into four parts .

•   
The first part introduces ANTLR. Provides some language background , And take you to a ANTLR The performance journey of . You will have a taste of grammar for the first time . Know what you can do with it .

•   
The second part is through the combination of syntax and tree traverser , To design grammar 、 Building language applications .

•   
The third part begins to show you how to define the parser's error handling by yourself . Next . You'll learn how to embed actions in grammar , Because sometimes it's much simpler and more efficient than building a tree and traversing it . Related actions , You'll also learn how to use semantics predicatesto To change the behavior of the analyzer to deal with some challenging identification problems . The last chapter tackles some challenging language recognition problems , Such as Python
in XML And context sensitive newline recognition .

•   
The fourth part is the reference part , It's listed ANTLR Syntax metalanguage and execution dependency library usage rules .

The source code of all samples in this book is available online . Suppose you're reading this version of this article , Or just want to get a complete code package , You can click on the site of this book .

You can find the key elements of the discussion and most of the code snippets in the book . Pay attention to , All file headers have copyright notices , The same is true for the input file . If in listener In the subfolder t.properties. Before using them as input . Please remove the copyright notice . Electronic readers can also paste and copy from the book . Code that does not display the copyright notice , As you can see below :

listeners/t.properties

user="parrt"

machine="maniac"

A lot of other things about ANTLR Online learning

stay http://www.antlr.org On the site . You'll find that ANTLR Download Interface for ,ANTLRWorks2 Graphical user interface (GUI) Development environment of , file , Pre built grammar . example . article . And file sharing . The tech support mailing list is great for beginners .

Terence Parr

University of San Francisco , 2012
year 11 month

ANTLR4 Authoritative reference manual ( One ) More articles about

  1. Hadoop 2.6.0 Distributed deployment reference manual

    Hadoop 2.6.0 Distributed deployment reference manual About this reference manual word file . Can be downloaded to, for example, the following address :http://download.csdn.net/detail/u012875880/8291493 ...

  2. 6. GC tuning ( Tools section ) - GC Reference manual

    Conduct GC When tuning performance , Need to understand , Current GC How much impact behavior has on the system and users . There are many kinds of monitoring GC The tools and methods of , This chapter introduces the tools that are often used one by one . You should have read the previous chapter : A brief introduction to garbage collection - GC Ginseng ...

  3. It's very useful for beginners chm The reference manual at the end cannot be displayed normally

    Downloaded from the Internet struts2 Reference manual for .chm( This paper is applicable to all of them .chm Final document ) It can't be opened and used normally . Pictured : watermark/2/text/aHR0cDovL2Jsb2cuY3Nkbi5uZXQv/ ...

  4. HTML5 interface element Canvas Reference manual

    HTML5 interface element Canvas Reference manual The beautiful life of Vulcan (http://blog.csdn.net/opengl_es) This article follows " A signature - Non commercial use - bring into correspondence with " Creative Commons ...

  5. MySQL Chinese Reference Manual

    Very good Chinese Manual : link :http://www.sdau.edu.cn/support/mysq_doc/manual_toc.html

  6. bash Reference manual 5 (shell Variable ) Continuation 3

    LINENO The currently running script or shell The line number of the function . LINES command select Used to determine the column width of the print selection list . received SIGWINCH after , Set it up on your own initiative . MACHTYPE Is a string , It describes what is going on ...

  7. bash Reference manual 6 (Bash characteristic )

    6 Bash characteristic This part describes the narration Bash Unique characteristics . *   call Bash : Bash Acceptable command line options . *  Bash Startup file : Bash When and how to run scripts . *   Interaction Shell : what ...

  8. An early Christmas present !--android Reverse rookie quick reference manual finished version

    My explanation : Let old Pete sort out this manual for such a long time , I'm sorry , I'll go back to Shenzhen and take his daughter to the playground , hard .peter! Too many words , I can't describe this feeling any more , We have to find a time . Stay or get drunk ... notes : Next ...

  9. Nginx Chinese Manual

    download : Nginx Chinese Manual Nginx Common application technology guide [Nginx Tips] The second edition author :NetSeek http://www.linuxtone.org (IT Operation and maintenance expert network | Cluster architecture | Performance tuning ...

Random recommendation

  1. Report errors :init: Could not find wglGetExtensionsStringARB!

    You can recover as follows :

  2. Gap Buffer

    From codeproject: http://www.codeproject.com/Articles/20910/Generic-Gap-Buffer

  3. sublime ctags

    install ctags: download ctags, Copy exe Go to the system directory or sublime text A directory of sublime: install ctags plug-in unit Use : Shortcut key Command Key Binding Alt Bi ...

  4. php Source code installation

    Use swoole, First you have to have PHP Environmental Science . because swoole Some of the characteristics of , It's better to be able to compile and install from the source code PHP, In this way, many unnecessary errors can be avoided in the use process .PHP Download address :http://php.net/ Pick you up here ...

  5. Mac install Grunt

    First step : install brew open http://brew.sh/ land brewhome Official website , There are installation commands on it . The second step : install node Step 1 after successful installation , Command line input brew install node The third step : install ...

  6. C# cookies

    Google browser - Tools ---internet Options --- routine --- Browse history --- Set up  cookie And website data C:\Documents and Settings\Administrator\Local ...

  7. Get to the top of the browser js and jquery

    Get the height from the current window to the top of the page : js: document.documentElement.scrollTop JQ:$(document).scrollTop() perhaps $(window).scrollTo ...

  8. solve mysql Turn on GTID Master slave synchronization appears 1236 Error problem 【 turn 】

    Recently encountered mysql Turn on gtid When making a copy , From the library 1236 error , Causes synchronization to fail , In this paper, we record the steps to deal with this problem , of gtid Knowledge is not introduced here ,mysql Version is 5.7.16. One . Error cause analysis Error messages such as ...

  9. C# Commission of (delegate、Action、Func、predicate)

    A delegate is a class , It defines the type of method , Make it possible to pass a method as an argument to another method . An event is a special kind of delegate . 1. Statement of delegation delegate A statement that we use frequently delegate At least 0 Parameters , at most 32 Ginseng ...

  10. Junit Medium setUp() And setUpBefore(), tearDown() And tearDownAfterClass() analysis

    @BeforeClass public static void setUpBeforeClass() throws Exception { } @AfterClass public static vo ...