Simple constant propagation AST-based analysis #852

jeshecdom · 2024-09-19T16:06:33Z

Issue

Closes #716.

The solution is able to detect not only division by zero problems, but any kind of problem that depends on variable tracing, like null dereferencings, number overflows. Although, I need to add testing for all the other possibilities.

Checklist

I have updated CHANGELOG.md
I have added tests to demonstrate the contribution is correctly implemented: this usually includes both positive and negative tests, showing the happy path(s) and featuring intentionally broken cases
I have run all the tests locally and no test failure was reported
I have run the linter, formatter and spellchecker
I did not do unrelated and/or undiscussed refactorings

…itionals.

…hod. It also supports variable tracing in structs.

… did not show it. This was due to partial evaluator making use of dummySrcInfo.

- forgot to change initialSctx in foreach statement to foreachSctx

Added negative tests.

jeshecdom · 2024-09-20T01:41:30Z

The main idea of the approach is to keep a map from variable names to either a value or undefined:

Map<string, Value | undefined>

This map is stored in the statement context, and it is used to track the value that variables have so far in the program.

We say that a variable is "undetermined" if either:

It is not a key in the map, or
It is a key and it maps to undefined.

The approach keeps track of the value of each variable so far. For example, in this program snippet:

let a = 5;   // A
let b = a;   // B
a = 10;      // C

After line A executes, the map will be:
a --> 5
After line B:
a --> 5, b --> 5
After line C:
a --> 10, b --> 5

During this trace analysis, a variable can become undetermined mainly because of the following reasons:

The variable gets assigned an expression that cannot be evaluated at compile time.
The variable gets changed by a mutating function.
The variable gets assigned different values at different branches of the program.

I'll explain each case now.

Case 1

Consider this function:

fun test(v: Int) {   // A
   let a = 10;       // B
   a = v - v;        // C
   a = v;            // D
}

After line A, the bindings map is empty. Note it would be equivalent if instead we attach the binding:
v --> undefined
I decided not to add bindings while processing function declarations because in this way I did not have to add code on that part of the codebase. Independently of the decision to add or not the arguments in a function declaration, the procedure respects the following invariant while tracing a single branch in the code:

The keys in the bindings map only grow or remain the same.

Because this makes the process of merging different branches in the code easier (as will be explained in Case 3).

After line B, the map will be:
a --> 10
After line C:
a --> 0
Note that contrary to expectation, at line C, variable a actually has a value, because the analyzer uses partial evaluation, and v - v = 0 independently of the value of v.
After line D:
a --> undefined
because v is undetermined.

Case 2

Consider this program:

extends mutates fun changeMe(self: Int) {
   self = 5;
}

fun test(v: Int) {   // A
   let a = 10;       // B
   a.changeMe()      // C
}

After line A, the map is empty.
After line B:
a --> 10
After line C:
a --> undefined
The reason is that the analyzer treats mutating functions as black boxes. Therefore, after a.changeMe() executes, the analyzer concludes that a could have an arbitrary value.
This decision of treating mutating functions as black boxes is enough to emulate the behavior of FunC, because it seems that FunC stops the analysis whenever a variable gets assigned the result of a function call.
Note however that if we remove the attribute mutates from the changeMe function, then, the analyzer will conclude that after line C, the bindings map is actually:
a --> 10
Because changeMe does NOT mutate a.

Case 3

If control flow branches and then joins, then the binding maps of each branch will merge at the joint point using the following rule:

Suppose var is a variable that existed in the bindings map before 
the control flow branched. 
If var has the same value v in all the binding maps of each branch, then 
var --> v 
is in the binding maps at the joint point. Otherwise, 
var --> undefined
is in the binding maps at the joint point.

This is better exemplified with an example. Consider this function:

fun test (v: Int) {
   let a = 10;       
   let b = 6;        // A
   if (v >= 5) {
      a = 7;         
      b = 20;        // B
   } else { 
      a = 8;         
      b = 20;        // C
   }
                     // D
}

Control flow at A branches into B and C, which then join at D.
At A, the map is a --> 10, b --> 6. While the maps in B and C are a --> 7, b --> 20 and a --> 8, b --> 20, respectively.

We would compute the map at D as follows. For each variable x in the map at A,
check if x has the same value in the maps at B and C. If it does, add it to the map at D with the common value. If x does not have the same value in B and C, then add x --> undefined at D.

Hence, at D the map will be a --> undefined, b --> 20, because a has different values at B and C, but b has the same value 20.

Sometimes the analyzer is able to determine that a particular branch will always be taken. In those cases, instead of merging the binding maps at the joint point, the analyzer simply takes the map of the executed branch, by using the following rule:

Suppose var is a variable that existed in the bindings map before 
the control flow branched. 
Suppose that branch A will always be taken.
If var has value v in branch A, then 
var --> v 
is in the binding maps at the joint point.

For example, in the above program, condition v >= 5 cannot be evaluated at compile time. So, the analyzer will merge branches at D. But suppose instead that the program was:

fun test (v: Int) {
   let a = 10;       
   let b = 6;        // A
   if (a >= 5) {
      a = 7;         
      b = 20;        // B
   } else { 
      a = 8;         
      b = 20;        // C
   }
                     // D
}

Then, the bindings map at D will be a --> 8, b --> 20 because branch C will always be taken, i.e., the condition a >= 5 can be evaluated at compile time.

One last important note. Observe that the rules always start by stating: Suppose var is a variable that existed in the bindings map before the control flow branched. This means that variables declared inside the branches will not survive at the joint points. For example, in this program:

fun test (v: Int) {
   let a = 10;       
   let b = 6;        // A
   if (v >= 5) {
      let z = true;
      a = 7;         
      b = 20;        // B
   } else { 
      let x = false;
      a = 8;         
      b = 20;        // C
   }
                     // D
}

variables x and z will not be in the bindings map at D, because x and z where not in the bindings map at A before the control flow branched.

Handling loops

So far, I haven't talked about how loops are handled by the analyzer. Consider the following function:

fun test (v: Int) {
   let a = 10;       
   let b = 6;        // A
   while (v >= 5) {
      v -= 1;
      a = v;         
      b = 20;        // B
   }
                     // C
}

There are two possible branches at A: the loop executes (which follows branch B) or it does not (which jumps directly to C). Both branches join at C.

If we follow branch B, we must carry out the analysis under the assumption that the loop has already executed an arbitrary number of times. This implies that we cannot assume that the values of a and b are 10 and 6 at line v -= 1, because they could have changed a lot due to the fact that they get assigned inside the loop. In other words, we must start the analysis of branch of B with the bindings a --> undefined, b --> undefined, and then carry out the analysis of each line inside the loop body.

So, in the above example, the map at B will be: v --> undefined, a --> undefined, b --> 20, because a gets assigned v, which is undefined.
The case when the loop does not execute produces the map: a --> 10, b --> 6.
Therefore, merging these two maps at C will produce: a --> undefined, b --> undefined because both a and b have different values on each branch.

If instead, we have the program:

fun test (v: Int) {
   let a = 10;       
   let b = 6;        // A
   while (v >= 5) {
      v -= 1;
      a = v - v + 10;         
      b = 20;        // B
   }
                     // C
}

Then, the map at B will be: v --> undefined, a --> 10, b --> 20. And the map when the loop does not execute: a --> 10, b --> 6. Therefore, merging these maps at C will produce: a --> 10, b --> undefined, because a has the same value in both branches, but b does not.

The above examples suggest a general procedure to handle the branch inside the loop:

Suppose that var is a variable in the binding map before entering the loop. 
Suppose that var is assigned inside the loop.
Then,
v --> undefined 
should be in the bindings map at the first line inside the loop

As was the case with conditionals, sometimes the analyzer is able to determine if a loop will execute or not. In that case, it will take the binding map of the corresponding branch. For example, consider this function:

fun test (v: Int) {
   let a = 10;       
   let b = 6;         // A
   while (a >= 5) {
      a -= 1;
      b = 20;         // B
   }
                      // C
}

In this case, the condition a >= 5 is true at compile time. Hence, the analyzer takes the map of branch B to be the map at C without doing merging. In the example, the map at C will be: a --> undefined, b --> 20, which is the same at B.

Some pointers to the code

Probably I need to add the above explanation in the source code itself (maybe after the declaration of the bindings map in StatementContext in resolveStatements.ts).
Function mergeBranches in resolveStatements.ts implements the bindings map merging algorithm.
Function copyBindings in resolveStatements.ts implements the logic when the analyzer determines that a branch will always be taken, so it simply copies the bindings in the taken branch.
Function makeAssignedVariablesUndetermined in resolveStatements.ts prepares the bindings for the analysis of a loop body, i.e., mark as undefined all variables before the loop that get assigned inside loop.
I also recommend to read the explanation before function setVariableBinding in resolveStatements.ts, because it explains how structs and contracts are handled by this approach.

jeshecdom added 11 commits August 31, 2024 02:04

Variable values tracing. Currently supports different brances in cond…

2ae5c49

…itionals.

Added support for variable tracing in loops, try-catch and init() met…

1731548

…hod. It also supports variable tracing in structs.

Error messages now show the place in the code. Before, error messages…

1bd97f8

… did not show it. This was due to partial evaluator making use of dummySrcInfo.

Fixes failing test cases during yarn gen:

6f0888e

- forgot to change initialSctx in foreach statement to foreachSctx

Further fixes to the handling of structs and contracts.

bcba0b1

Added negative tests.

Added tracking of variable mutation through a mutating function.

d7ef552

Merge branch 'main' into issue716

211b768

Run prettier.

ec78d68

Fixes after merge with main.

d68a48e

Added positive test cases.

76f0115

Added documentation

919c6ce

jeshecdom requested a review from a team as a code owner September 19, 2024 16:06

anton-trunov self-assigned this Sep 19, 2024

anton-trunov added this to the v1.6.0 milestone Sep 19, 2024

Forgot to add test cases for init() function.

6a3a11e

anton-trunov changed the title ~~Issue 716~~ Simple constant propagation AST-based analysis Sep 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simple constant propagation AST-based analysis #852

Simple constant propagation AST-based analysis #852

jeshecdom commented Sep 19, 2024

jeshecdom commented Sep 20, 2024

Simple constant propagation AST-based analysis #852

Are you sure you want to change the base?

Simple constant propagation AST-based analysis #852

Conversation

jeshecdom commented Sep 19, 2024

Issue

Checklist

jeshecdom commented Sep 20, 2024

Case 1

Case 2

Case 3

Handling loops

Some pointers to the code