Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Z3: A high-performance theorem prover from Microsoft Research (codeplex.com)
129 points by tchajed on Sept 15, 2013 | hide | past | favorite | 48 comments


In terms of applicability, SMT (satisfiability module theory) solvers basically allow you to do things which are slightly more complex than plain SAT solving.

If you're looking to learn more, Leonardo has a nice introduction to SMT and its applications over here: http://dl.acm.org/citation.cfm?id=1995394. To make a long story short, various problems in program verification and static analysis have seen success with SMT.

/some academic navel-gazing follows.

Yices (http://yices.csl.sri.com/index.shtml) and Z3 are the two main SMT solvers used by academics today. Yices is the "original" solver and was developed by Leonardo de Moura and Bruno Dutertre at SRI in the mid-2000s. At some point towards the end of the last decade, Leonardo moved to Microsoft Research and for reasons which are unclear started working on Z3 independently. I am told that Z3 is a slightly faster solver today.


Pardon the relative ignorance... but I thought these types of problems (SAT and SMT) were NP-Complete in the general case? Does Z3 do something special (eg apply heuristics)? It looks very impressive!


http://www.csl.sri.com/users/rushby/slides/jaist07.pdf

See "SMT Solvers: Disruptive Innovation in Theorem Proving", Rushby, 2007.

> SAT solving is the quintessential NP-complete problem. But now amazingly fast in practice (most of the time)

> SMT ... generalizes SAT solving by adding the ability to handle arithmetic and other decidable theories


Although SAT is indeed NP-complete, modern solvers are ridiculously fast for the kinds of SAT queries that appear in scenarios like hardware model checking and verification. It's not uncommon for queries with millions of variables and clauses to be solved in a matter of seconds. There are some algorithmic tricks combined with clever data structures and search space exploration heuristics that make this possible.

btw, if you're curious about the workings of a modern SAT solver, the first few sections of this paper: http://www.georg.weissenbacher.name/papers/mod12.pdf should make interesting and hopefully accessible reading.

Unfortunately, I don't know enough about the internals of Z3 to know what specifically makes it the fastest SMT solver. But I also don't think it's too far wrong to say that SMT solvers like Z3 are fast because the underlying SAT solver they use is very fast.


SAT is NP-complete, but there are certain properties that allow theorem provers to show that a particular instance is overconstrained (and therefore unsatisfiable), or underconstrained (and many solutions exist). It's at a particular point, so-called phase transitions, where one finds the very hard instances. As an example, when characterizing 3-SAT instances by grouping by their ratio of clauses to variables, the very hard instances are grouped around a value of ~4.3.


Is STP considered a qualitatively different kind of tool than Yices and Z3?


Though the source is available, it cannot be used for commercial purposes.

https://z3.codeplex.com/sourcecontrol/latest#LICENSE.txt

You may not redistribute the source on a website intending to teach the use of Z3 for commercial purposes.

You also agree that MS can sell any modifications or derivative software that you produce.


Is it even legal to prohibit teaching something due to copyright infringment?


They're not prohibiting teaching. They're setting the terms under which you can copy their bits.


"...for commercial purposes"


Yeah, I got that first time, thanks.

It is not trivial to separate commercial and non-commercial purposes. Creative commons licenses still don't have a clear criteria. So as it goes with copyright, this just has a chilling effect on the education of some compsci fundamentals. I just dont get why MS would do that, is it a competitive advantage?


There is no bright line determination. The details depends on how well the teaching falls within fair use. For example, it isn't fair use for a teacher to make copies of the complete works of Borges when the lesson only covers 'The Library of Babel', but making a copy of 'The Library of Babel' will likely fall into fair use.

Even then, if the teacher makes the text available to the students via a public web site, then its availability to non-students puts it outside of fair use.

However, the Z3 license does not prohibit teaching in general. In fact, it says:

"You may use, copy, reproduce, and distribute this Software for any non-commercial purpose, subject to the restrictions in this MSR-LA. Some purposes which can be non-commercial are teaching, academic research, public demonstrations and personal experimentation. You may also distribute this Software with books or other teaching materials, or publish the Software on websites, that are intended to teach the use of the Software for academic or other non-commercial purposes."

How does this have "a chilling effect on the education of some compsci fundamentals"?

If the software is going to be used for commercial purposes, then Microsoft wants some of the action. If you're going to teach how to use the software for commercial purposes, then Microsoft still wants some of the action.


Commercial = I make money from using this; Non-commercial = NOT commercial

(In case the sarcasm was a bit too thick, although it was at no less a level than your reply, I think it's plainly obvious what the licence is dictating and hope you will go into a little more detail of your viewpoint that it isn't)


What if you don't make money from it now, but use what you learned to profit ten minutes later? How about 10 days later? Or 10 years later?

Non-commercial use for something that is directly involved in the production of something else is fairly easy to determine; it depends on whether that something else is sold for money or not.

But for analytical software like this prover, you could indirectly use it to improve something that is sold for money, but the process of improvement is not necessarily a commercial transaction. It could be performed by students at a university, for example. I have in mind formal verification of program invariants, with potential identification of breaks, as an example.


Is there a direct link between you using this software under such a licence and then gaining monetary advantage (if not money directly itself)?

Using your process of improvement for me it is quite clear there would be.

[Perhaps the solver could be used to figure the applicability of the licence]


So if you make money by teaching, is that commercial?

Non-commercial is a terrible term, because it's simply too vague. You can simply say "well, MS will use a bit of discretion, and not sue someone for a marginal infringement". But it does feel a little dodgy.


"Commercial = I make money from using this; Non-commercial = NOT commercial "

False. There are cases where commercial was held to mean "any use by a for profit company, regardless if the use is research, earns money directly or indirectly", etc.


Admittedly with a European viewpoint, either common law based (UK) or administrative law based (France, Belgium), there have been cases where commercial was held to mean has the company benefited in a tangible way from its use of such licenced software.

Although if you speak to a lawyer I'm sure their training would immediately start highlighting this as a concern I don't believe it should be in the vast majority of cases. Wasn't there an article on HN a few days ago basically summarising this type of concern under "so sue me" (minus expletives)


I wonder what would happen here in Scotland - we don't have punitive damages so the most you can be awarded in court is the loss you can demonstrate to have suffered.

So if I use a software package commercially and don't pay for it perhaps the maximum I could be sued for in the local courts would be the license fee I didn't pay. If there is no "commercial" license then could it really be said that a loss has been suffered?

[NB I know contracts and licenses usually have clauses stating the jurisdiction where disputes will be resolved]


I suspect they would say that the commercial value of their solver, for which no publicly available license exists, is some very high dollar amount which they will offer to you as a courtesy.


It's worth noting that there are excellent Python bindings to Z3. http://rise4fun.com/z3py/tutorial


There's also a tremendous Haskell binding, SBV (http://hackage.haskell.org/package/sbv) though it's specialized to bitvector theory.


...and for Scala: https://github.com/epfl-lara/ScalaZ3 (/ shameless plug).


What are the advantages/differences between this and Google's or-tools (https://code.google.com/p/or-tools/), which is available under Apache license?


Without examining or-tools too much, I think it's solving a totally different problem. The Z3 guideis a good way to learn more (http://rise4fun.com/z3/tutorial)


Z3 is about "satisfiability modulo theories", which basically means a SAT solver with smart extensions for things like floating point arithmetic that would otherwise die under a naive encoding. It's used for program verification, and in my group, program synthesis. OR tools like Google's choke on such problems that require reasoning about logic, recursive functions, etc.

For an example of a program verifier using Z3, Pex is amazing: pex4fun.com . It does 'whitebox' fuzzing (program analysis + SMT) to find inputs that break your assert statements and open-ended unit tests.

For an example of a program synthesizer, I spent the past few days writing a regular expression / sed script generator that infers the code from input/output pairs: http://lmeyerov.blogspot.com/2013/09/sneak-peek-for-my-stran... . Underneath, it calls into a SAT solver (it goes through Rosette[1], which plugs in its own solver or Z3).

[1] http://splashcon.org/2013/program/onward-research-papers/909...


If people are using this for interesting projects, I'd love some details.


Last week I wrote a fun tool using it (actually, any SMT-LIB 2 compatible solver, of which Z3 seems to be the only open-source-ish one that works out of the box) to search for optimal branch-free load-free C implementations of small integer lookup tables (i.e. something the compiler should do but doesn't). It's really a minimal use of an SMT solver, but it saved me from having to think of a more clever algorithm than brute force.


In case you or someone else is still looking, CVC4 [1] is a competitive open source alternative to Z3 or Yices.

[1] http://cvc4.cs.nyu.edu/


Thanks! Last I checked (maybe a year back), something-or-other about CVC4 didn't work for me (I wholly forget what it was). I will try it out again.

I don't know if you are associated with CVC4 at all, but unfortunately the hashes of the amd64 Debian stable packages (cvc4, libcvc4-1, and libcvc4parser1) don't seem to match? It is probably just a package error however it makes me hesitant to install the packages…


I am not associated with CVC4 directly (or Z3 for that matter). Something that is true for all these tools is that they evolve very quickly. As a result, you're better off building from source. In both cases, that means running `make` and nothing else.


Did you publish the source anywhere, and if not, are you willing and able to? I'd love to see it.


Sure, I see no reason to keep it private: http://hub.darcs.net/squirrel/luts

(Note: it's a very hacked-together Prolog script. I wrote it in about 15 minutes in a fit of inspiration and free time. If it's useful to anyone I'll clean it up and add the features I enumerated in the README.)

How it works:

The input to the algorithm is the look-up table you want to optimize. I've mostly tested with a half-dozen or so pairs of small (1 or 2 digit) integers, as that is my use case. (Anything else is probably better served by an actual look-up table or binary branch tree.) Input values left out of the this table are considered "don't care" (i.e. there's no "default" output).

The Prolog portion just generates, via iterative-deepening, bit-arithmetic expressions. These expressions have as terminals the input variable (in theory, there could be multiple input variables; haven't tried this) and arbitrary (i.e. symbolic) named constants.

The expressions are, in turn, fed as-is to Z3 as a function over bit-vectors (note the constants are still arbitrary). The look-up table is fed to Z3 as a set of assertions about this function. Z3 is asked to prove (un)satisfiability; if the problem is satisfiable, (this is the important part!) it returns the values of the constants which make it satisfiable. These values are substituted in the original expression, which is then returned by the Prolog script as a solution.


I'd also love to see this; or, if it would be easier for you, I'd love to see a longer comment where you explain it in more detail.


See my reply to throwaway812 (which I'm updating with an explanation).


You could start by quickly browsing through the huge list of papers that cite Z3 [1].

Microsoft uses Z3 extensively for bug finding for all sort of products, notably through the SAGE tool. There is a great write up on its impact. [2]

From that paper:

"Finding all these bugs has saved millions of dollars to Microsoft, as well as to the world in time and energy, by avoiding expensive security patches to more than 1 billion PCs. The software running on your PC has been affected by SAGE. Since 2008, SAGE has been running 24/7 on an average of 100-plus machines/cores automatically fuzzing hundreds of applications in Microsoft security testing labs. This is more than 300 machine- years and the largest computational usage ever for any SMT (Satisfiability Modulo Theories) solver, with more than 1 billion constraints processed to date. SAGE is so effective at finding bugs that, for the first time, we faced “bug triage” issues with dynamic test generation."

[1] http://scholar.google.com/scholar?cites=4828743947843773221&...

[2] http://dl.acm.org/citation.cfm?id=2094081


I'm definitely familiar with Z3 and with solvers used offensively and defensively in software security. I'm just curious as to how widespread it's become with generalist software developers as a problem solving tool.

These are great links; thanks for posting them.


Do you have any links for using Z3 in security proofs?


Not about Z3 in particular, but a very interesting application of solvers:

This weblog talks about procedural content generation (PCG) for games: http://www.gamesbyangelina.org/

It's used in some roguelike games to make sure the procedurally generated levels/rooms are playable/winnable.

It's also used in puzzle games where constraints are set for a certain puzzle level to exhibit certain characteristics in the puzzle solution, but the puzzle itself is randomly generated according to these constraints.

There's a couple of articles about constraint solvers on that site, but I remember this one was most striking: http://www.gamesbyangelina.org/2013/06/the-saturday-paper-go...


Another reply (can't edit my original reply now): what I have not done, though will if I ever have the chance again, is to prove correctness of bit-arithmetic libraries (e.g. for use with symbolic execution).

So long as you can represent your function in the SMT-LIB language (easy for this class of problem), you can prove properties of your code by asserting their respective negations in Z3 and asking it to prove satisfiability. If one of these properties doesn't hold, Z3 will tell you, and it will give you an example.


http://bugchecker.net is a site that does automated testing (dynamic analysis). We use STP.


The small project i'm developing for my Phd, "Symbolic Exploit Assistant" [1] (SEA) is using it.

[1] https://github.com/neuromancer/sea


There's a stack of software verification projects, most probably using Z3, here - http://www.rise4fun.com/


Weirdly enough, I've studying this kind of thing lately.

So it seems from the various papers Z3 is oriented around infinite precision real arithmetic. It's that kind of overkill for most practical problems?

How does it compare with iSat or similar approaches in speed? One of the problems with SMT, Satisfiability Modulo Theories, is that it involves a somewhat loose connection between a SAT problem and an underlying theory prover, which can slow down the whole process if the underlying theory prover is not very fast.

http://en.wikipedia.org/wiki/Satisfiability_Modulo_Theories


Z3 supports many more theories than just real arithmetic. I've primarily dabbled with difference logic and bit vectors, both of which I believe benefit strongly from SMT's SAT underpinnings.


Did anyone make a Javascript version of this with emscripten? I am currently using Z3 on the server side but would really like a client side solution to save computational resources.


The license is a non-commercial one, which makes it impractical for many use cases. Could be why it hasn't been ported yet (I might have done so myself, had the license been otherwise).


If you want to use it for commercial purposes, there is probably a way to get a license from them for that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: