源码阅读-Cypher编译链路

Posted by keys961 on April 18, 2019

1. Cypher编译接口

位于Compiler.scala下:

1
2
3
4
5
6
7
@throws[org.neo4j.cypher.CypherException]
def compile(preParsedQuery: PreParsedQuery,
              tracer: CompilationPhaseTracer,
              preParsingNotifications: Set[Notification],
              transactionalContext: TransactionalContext,
              params: MapValue
             ): ExecutableQuery

通常而言,使用实现该trait的CypherCurrentCompiler类进行编译。

编译阶段阶段分为下面几个部分:

1
2
3
4
5
6
7
8
9
10
11
// In CompilatioinPhaseTracer.java
enum CompilationPhase
{
        PARSING,
        DEPRECATION_WARNINGS,
        SEMANTIC_CHECK,
        AST_REWRITE, // 前4个在front-end执行
        LOGICAL_PLANNING, // 在logical-plans执行
        CODE_GENERATION, // 这部分只会在2.3版本中使用,最新版本里没有使用它,在neo4j-cypher模块的helper.scala中使用
        PIPE_BUILDING // 构建pipe-line
}

实际上,除了PARSING之外,优化贯穿全部,后面的顺序不一定完全满足上面定义的顺序。

2. 调用链

入口位于ExecutionEngine.scalaexecute()方法。

2.1. Pre-parse

入口是PreParser.scalapreParseQuery()方法,返回PreParsedQuery

preParser,preParseQuery() : PreParsedQuery // ExecutionEngine.scala
	...
    actuallyPreParse() : PreParsedQuery // PreParser.scala
        CypherPreParser.apply(queryText) : PreParsedQuery
            ... // If cache miss
            Base[PreParsedQuery].parseOrThrow() : PreParsedStatement

这里将传过来的Cypher进行预处理,但没有真正开始编译。

2.2. Compile

优先会从queryCache中获取ExecututableQuery,若没有,则调用masterCompiler.compile()进行编译并缓存:

getOrCompile() : ExecutableQuery // ExecutionEngine.scala
    ...
    compile() : ExecutableQuery // MasterCompiler.scala
        ...
        innerCompiler() : ExecutableQuery // MasterCompiler.scala
            ...
            compile() : ExecutableQuery // CypherCurrentCompiler.scala

2.2.1. Parse to Logical Planning

然后入口是Cypher35Planner.scalaparseAndPlan(),完成Parse到Logical Planning的步骤:

parseAndPlan() : LogicalPlanResult
    ...
    getOrParse() : BaseState
    ...
        getOrParse() : BaseState // CachingPlanner.scala
            ...
            parse() : BaseState // Cypher35Planner.scala
                planner.parseQuery() : BaseState // CypherPlanner.scala

a) Parse部分

走入planner.parseQuery(),最终会走到静态方法CompilationPhases.parsing(),方法内容如下:

def parsing(sequencer: String => RewriterStepSequencer,
              literalExtraction: LiteralExtraction = IfNoParameter,
              deprecations: Deprecations = Deprecations.V1
             ): Transformer[BaseContext, BaseState, BaseState] =
    Parsing.adds(BaseContains[Statement]) andThen 
      SyntaxDeprecationWarnings(deprecations) andThen // 进入 DEPRECATION_WARNINGS 
      PreparatoryRewriting(deprecations) andThen // 为AST重写准备
      SemanticAnalysis(warn = true).adds(BaseContains[SemanticState]) andThen // 进入 SEMANTIC_CHECK
      AstRewriting(sequencer, literalExtraction) // 进入AST_REWRITE

每个步骤都由Pipeline进行连接(定义在Transformer.scala中),执行完毕后,回到Cypher35Planner.scalaparse()方法,为Logical Planning准备。

b) Logical Planning部分

回去后,得到了syntacticQuery对象,然后创建planContext等一些上下文,然后调用planner.normalizeQuery()方法,将查询标准化,进行预准备,对查询进行进一步的优化:

// In CypherPlanner.scala
val prepareForCaching: Transformer[PlannerContext, BaseState, BaseState] =
    RewriteProcedureCalls andThen // 重写Procedure Call 
    ProcedureDeprecationWarnings andThen // 处理各种Warning
    ProcedureWarnings
prepareForCaching.transform(state, context)

之后进行一系列检查后,进入createPlan() : CacheableLogicalPlan方法,创建Logical Plan,并会进行缓存。创建的步骤依旧在planner.planPreparedQuery()中进行

// In CypherPlanner.scala
def planPreparedQuery(state: BaseState, context: Context): LogicalPlanState = {
    val pipeLine = if (context.debugOptions.contains("tostring"))
      planPipeLine andThen DebugPrinter
    else
      planPipeLine // 生成Plan pipeline
    pipeLine.transform(state, context)
}

// Plan pipeline的生成
val planPipeLine: Transformer[Context, BaseState, LogicalPlanState] =
    ProcedureCallOrSchemaCommandPlanBuilder andThen
    If((s: LogicalPlanState) => s.maybeLogicalPlan.isEmpty)(standardPipeline) // 可能会生成standard pipeline
                                                            
// Standard pipeline的生成
val standardPipeline: Transformer[Context, BaseState, LogicalPlanState] =
    CompilationPhases.lateAstRewriting andThen // 延迟重写AST
    irConstruction andThen //IR(中间表示)生成
    costBasedPlanning // 基于cost进行planning
                                                            
// 后2个的步骤如下:
val irConstruction: Transformer[PlannerContext, BaseState, LogicalPlanState] =
    ResolveTokens andThen
      CreatePlannerQuery.adds(CompilationContains[UnionQuery]) andThen
      OptionalMatchRemover
val costBasedPlanning: Transformer[PlannerContext, LogicalPlanState, LogicalPlanState] = 
	QueryPlanner().adds(CompilationContains[LogicalPlan]) andThen
      PlanRewriter(sequencer) andThen
      replacePropertyLookupsWithVariables andThen
      If((s: LogicalPlanState) => s.unionQuery.readOnly)(CheckForUnresolvedTokens)

完成后,得到LogicalPlanState。系统先从缓存中获取,若没有,则创建新的。

2.2.3. To Executable

得到LogicalPlanState后,创建runtimeContext,然后在该上下文中得到ExecutableQueryPlan,即调用:runtime.compileToExecutable(planState, runtimeContext),然后得到ExecutableQuery对象。

获得该对象后,即可执行execute()进行查询等操作。

最后实际的操作肯定通过GraphDatabaseFacade.java,然后经过内核进行一系列操作。