Using right to realize Chinese programming and Chinese-English translation of micro Python / Python

Cao mang Lao Wu 2021-10-14 06:39:09

pyright It is a Microsoft open source for python Provide type check 、 Automatic completion 、 Tools for language services such as document information tips , use typescript It's written in , Microsoft's own VS Code python Expand Pylance Is based on pyright Development .

The author is right python Interpreter in progress culture , After realizing Cao mang Chinese programming language , He planned to treat his little brother micropython Ongoing culture . however ,mpy It is a simplified interpreter for small memory devices such as single chip microcomputer , Yes uft8 Your support is inherently limited . therefore , The author used python Try Chinese cultural ideas mpy Chinese culture is not successful .

In the near future , I use typescript—— It can be seen as ts/js Language service tools —— Realized ts/js Chinese culture ( Fast Web The development language is beginning to take shape ). In the process , The author found that , In addition to providing common language services , In fact, there is another important use , That is, it can be easily used to realize Chinese culture and Chinese-English translation of programming language .

With this in mind , The author began to understand python Type checking and language service tools , I just found that Microsoft has such an open source tool , And a rising star .

however , Unlike typescript,pyright Did not provide a API Documentation and d.ts Declaration file , There are few code comments ,github And few other websites use it , Its biggest application pylance It's closed source .

fortunately , The author previously developed extreme speed Web Language experience comes in handy , After reading the source code , Overcome the above difficulties .

Let's share the author's exploration results , Hope to help promote the development and rise of Chinese programming language .

This paper mainly explains how to translate Chinese code into English code , English translation is similar , I won't repeat .

Chinese cultural steps

A Chinese code file is actually a string , What we need to do is process this string , Translate the Chinese that needs to be translated ( Including reserved words 、 Class name in the library 、 Function name 、 Parameter name, etc ) Translate into English , Then generate an English code file , The subsequent processing and execution is the familiar routine operation .

produce AST

pyright contain tokenizer( Used to classify each word element in the above string , Form a word element or... With rich information token flow ) and parser( For use in accordance with token The stream forms an abstract syntax tree or AST), Make minor modifications to enable it to recognize and parse Chinese reserved words .

// tokenizerTypes.ts
export const enum KeywordType {
And,
...
With,
Yield,
No ,
be not in ,
}
// tokenizer.ts
const _keywords: Map<string, KeywordType> = new Map([
['and', KeywordType.And],
[' And ', KeywordType.And],
...
['True', KeywordType.True],
[' really ', KeywordType.True],
[' No ', KeywordType. No ],
[' be not in ', KeywordType. be not in ],
]);
// parser.ts
// comparison: expr (comp_op expr)*
// comp_op: '<'|'>'|'=='|'>='|'<='|'<>'|'!='|'in'|'not' 'in'|'is'|'is' 'not'
private _parseComparison(): ExpressionNode {
...
while (true) {
...
if (...
} else if (this._consumeTokenIfKeyword(KeywordType.In)) {
comparisonOperator = OperatorType.In;
} else if (this._consumeTokenIfKeyword(KeywordType. No )) {
comparisonOperator = OperatorType.IsNot;
} else if (this._consumeTokenIfKeyword(KeywordType. be not in )) {
comparisonOperator = OperatorType.NotIn;
} else if (...
}
...
return leftExpr;
}

then , We can use parser Handle Chinese code , get AST.

 const parser = new Parser();
let result = parser.parseSourceFile(code, new ParseOptions(), new DiagnosticSink())

Traverse AST And translation

pyright Provides traversal AST Class ParseTreeWalker, Implement one of its subclasses and override the handler functions of different nodes , Node information can be modified , Achieve our translation purpose .

class treeWalker extends ParseTreeWalker {
constructor(srcFile, program) {
super();
this._srcFile = srcFile;
this._program = program;
}
visitName(node/* : NameNode */) {
// NameNode Including reserved words 、 Function name, etc , Is the focus of translation 
let pos = {line: 0, character: node.start+1};
// Find document information 
let hoverResult = this._program.getHoverForPosition(this._srcFile, pos, 'plaintext', CancellationToken.None);
let sigHelp = this._program.getSignatureHelpForPosition(this._srcFile, pos, 'plaintext', CancellationToken.None);
if (hoverResult) {
// Replace Chinese reserved words 、 Function name, etc 
}
if (sigHelp) {
// Replace the Chinese parameter name of the function 
}
return true;
}
}

In the above code ,program This object is crucial , It contains all relevant files ( Standard library 、 Project documents 、 Various pyi etc. ) Information about , Through it, you can get the identifier of a specific location ( Excluding reserved words ) Documentation information for , From this we know what to translate it into . establish program Object requires two parameters , As shown in the following example .

 const configOptions = new ConfigOptions(dir, 'off');
// The detailed configuration information is shown in configOptions.ts in 
configOptions.pythonPath = '';
configOptions.typeshedPath = '';
configOptions.stubPath = '';
configOptions.verboseOutput = true;
// configOptions.useLibraryCodeForTypes = true;
const fs = createFromRealFileSystem();
const importResolver = new ImportResolver(fs, configOptions, new FullAccessHost(fs));
const program = new Program(importResolver, configOptions);
program.setTrackedFiles([sourceFile]);

In order to implement the replacement correctly , Document information must comply with certain specifications , You are welcome to make suggestions on this . At present, the author simply stipulates as follows :

  • For the library name 、 Class name 、 Function name, etc , A separate line of comment must be written :@ english xxxx
  • For parameter names , A separate line of comment must be written :@@ [ Chinese parameter name ] @ english xxxx

Generate English code file

The last step is to modify the token The stream is then assembled into a code string , The translation of reserved words is also carried out here . This step is not difficult , But it's more complicated , If you are interested, you can see me in code cloud (gitee) Source code on : Golden Python .

Please bring the original link to reprint ,thank
Similar articles

2021-10-14

2021-10-14

2021-10-14

2021-10-14

2021-10-14

2021-10-14