I started working at static analysis years ago, too many years ago from now. It was a time when static analysis and abstract interpretation of computer programs were in their infancy. I remember so many academic efforts, trying to define the most precise static analysis for… Prolog or ML code. It was an exciting time and we all felt like an underground community working over an esoteric theory, proudly detached from the real world.
Things have changed since then. Fast and furiously. Everybody talks about static analysis nowadays. Any serious programmer has some form of static analyzer in her tool belt. Static analysis is actually quite fashionable today. Just look at the overwhelming number of tools for static analysis out there: there are free and commercial tools, each pretending to be the best.
What remains of the pioneer spirit of decades ago? In my opinion, the most important relic is that static analysis is serious staff. The risk, with so many static analyzers on the market, is instead that programmers start thinking that static analysis is easy, since everybody does it. I have two stories on the subject.
Thou shalt never dereference null
Think at the hateful NullPointerException. I spent around three years devising a static analysis able to find all such exceptions at compile time, but not too many. Yet, I often received ghastly comments saying that, yes, that problem was already solved by an Eclipse check. Solved? Does anybody remember the undecidable nature of all non-trivial questions about computer programs? The identification of NullPointerException’s is an undecidable problem, it will never ever be solved. We can just get closer to a solution, that looks better and better every day, but we will never solve that problem.
Thou shalt avoid injections
Currently, the most fashionable static analyses are related to security. SQL injections, for instance, are so infamous that no static analyzer could be sold without a clear statement that, yes, it finds potential SQL injections in code, at compile time, in few seconds. Injections are very complicated. Yet, most static analyzer just check if the SQL query is a constant or not. If not, they warn against a potential SQL injection. This is like protecting your house by jailing everybody enters in it. You will definitely catch all thieves, but you will have to meet your visiting friends in jail as well. Not a great deal after all. The reality is that finding injections at compile time is complicated and their identification must pass through funny technicalities like relational domains, side effects analysis and data propagation.
Static analysis is serious staff, do not trust easy solutions. Static analysis is hard because programming languages have complex semantics, use complicated frameworks and allow you to do so many things in so many different ways that only semantic analysis can lead to good results.
In this blog, I will report examples of real problems related to the static analysis of real programs. Sometimes, such problems will come with solutions as well. The reality, however, is that we still don’t know how many static analysis problems can be tackled.
Static analysis is not a victory march. Let us be serious about static analysis.