Closing the Gap Between Short and Long XORs for Model Counting

Shengjia Zhao
Sorathan Chaturapruek
Ashish Sabharwal
S. Ermon
AAAI
2016
View in Semantic Scholar

Abstract

Many recent algorithms for approximate model counting are based on a reduction to combinatorial searches over random subsets of the space defined by parity or XOR constraints. Long parity constraints (involving many variables) provide strong theoretical guarantees but are computationally difficult. Short parity constraints are easier to solve but have weaker statistical properties. It is currently not known how long these parity constraints need to be. We close the gap by providing matching necessary and sufficient conditions on the required asymptotic length of the parity constraints. Further, we provide a new family of lower bounds and the first non-trivial upper bounds on the model count that are valid for arbitrarily short XORs. We empirically demonstrate the effectiveness of these bounds on model counting benchmarks and in a Satisfiability Modulo Theory (SMT) application motivated by the analysis of contingency tables in statistics.

Introduction

Model counting is the problem of computing the number of distinct solutions of a given Boolean formula. It is a classical problem that has received considerable attention from a theoretical point of view (Valiant 1979b; Stockmeyer 1985) , as well as from a practical perspective (Sang et al. 2004; Gogate and Dechter 2007) . Numerous probabilistic inference and decision making tasks, in fact, can be directly translated to (weighted) model counting problems (Richardson and Domingos 2006; Gogate and Domingos 2011) . As a generalization of satisfiability testing, the problem is clearly intractable in the worst case. Nevertheless, there has been considerable success in both exact and approximate model counting algorithms, motivated by a number of applications (Sang, Beame, and Kautz 2005) .

searches on a randomly projected version of the original formula, obtained by augmenting it with randomly generated parity or XOR constraints. This approach allows one to leverage decades of research and engineering in combinatorial reasoning technology, such as fast satisfiability (SAT) and SMT solvers (Biere et al. 2009) .

While modern solvers have witnessed tremendous progress over the past 25 years, model counting techniques based on hashing tend to produce instances that are difficult to solve. In order to achieve strong (probabilistic) accuracy guarantees, existing techniques require each randomly generated parity constraint to be relatively long, involving roughly half of the variables in the original problem. Such constraints, while easily solved in isolation using Gaussian Elimination, are notoriously difficult to handle when conjoined with the original formula (Gomes et al. 2007; Ermon et al. 2014; Ivrii et al. 2015; Achlioptas and Jiang 2015) . Shorter parity constraints, i.e., those involving a relative few variables, are friendlier to SAT solvers, but their statistical properties are not well understood. Ermon et al. (2014) showed that long parity constraints are not strictly necessary, and that one can obtain the same accuracy guarantees using shorter XORs, which are computationally much more friendly. They provided a closed form expression, allowing an easy computation of an XOR length that suffices, given various parameters such as the number of problem variables, the number of constraints being added, and the size of the solution space under consideration. It is, however, currently now known how tight their sufficiency condition is, how it scales with various parameters, or whether it is in fact a necessary condition.

We resolve these open questions by providing an analysis of the optimal asymptotic constraint length required for obtaining high-confidence approximations to the model count. Surprisingly, for formulas with n variables, we find that when Θ(n) constraints are added, a constraint length of Θ(log n) is both necessary and sufficient. This is a significant improvement over standard long XORs, which have length Θ(n). Constraints of logarithmic length can, for instance, be encoded efficiently with a polynomial number of clauses. We also study how the sufficient constraint length evolves from O(log n) to O(n γ log 2 n) to n/2 across various regimes of the number of parity constraints.

As a byproduct of our analysis, we obtain a new fam-ily of probabilistic upper and lower bounds that are valid regardless of the constraint length used. These upper and lower bounds on the model count reach within a constant factor of each other as the constraint density approaches the aforementioned optimal value. The bounds gracefully degrade as we reduce the constraint length and the corresponding computational budget. While lower bounds for arbitrary XOR lengths were previously known (Gomes et al. 2007; Ermon et al. 2013a) , this is the first non-trivial upper bound in this setting. Remarkably, even though we rely on random projections and therefore only look at subsets of the entire space (a local view, akin to traditional sampling), we are able to say something about the global nature of the space, i.e., a probabilistic upper bound on the number of solutions. We evaluate these new bounds on standard model counting benchmarks and on a new counting application arising from the analysis of contingency tables is statistics. These data sets are common in many scientific domains, from sociological studies to ecology (Sheldon and Dietterich 2011) . We provide a new approach based on SMT solvers and a bit-vector arithmetic encoding. Our approach scales very well and produces accurate results on a wide range of benchmarks. It can also handle additional constraints on the tables, which are very common in scientific data analysis problems, where prior domain knowledge translates into constraints on the tables (e.g., certain entry must be zero because the corresponding event is known to be impossible). We demonstrate the capability to handle structural zeroes (Chen 2007) in real experimental data.

Preliminaries: Counting By Hashing

Let x 1 , • • • , x n be n Boolean variables. Let S ⊆ {0, 1} n be a large, high-dimensional set 1 . We are interested in computing |S|, the number of elements in S, when S is defined succinctly through conditions or constraints that the elements of S satisfy and membership in S can be tested using an NP oracle. For example, when S is the set of solutions of a Boolean formula over n binary variables, the problem of computing |S| is known as model counting, which is the canonical #-P complete problem (Valiant 1979b ).

In the past few years, there has been growing interest in approximate probabilistic algorithms for model counting. It has been shown (Gomes, Sabharwal, and Selman 2006; Ermon et al. 2013b ; Chakraborty, Meel, and Vardi 2013a; Achlioptas and Jiang 2015; Belle, Van den Broeck, and Passerini 2015) that one can reliably estimate |S|, both in theory and in practice, by repeating the following simple process: randomly partition S into 2 m cells and select one of these lower-dimensional cells, and compute whether S has at least 1 element in this cell (this can be accomplished with a query to an NP oracle, e.g., invoking a SAT solver). Somewhat surprisingly, repeating this procedure a small number of times provides a constant factor approximation to |S| with high probability, even though counting problems (in #-P) are believed to be significantly harder than decision problems (in NP).

The correctness of the approach relies crucially on how the space is randomly partitioned into cells. All existing approaches partition the space into cells using parity or XOR constraints. A parity constraint defined on a subset of variables checks whether an odd or even number of the variables take the value 1. Specifically, m parity (or XOR) constraints are generated, and S is partitioned into 2 m equivalence classes based on which parity constraints are satisfied.

The way in which these constraints are generated affects the quality of the approximation of |S| (the model count) obtained. Most methods randomly generate m parity constraints by adding each variable to each constraint with probability f ≤ 1/2. This construction can also be interpreted as defining a hash function, mapping the space {0, 1} n into 2 m hash bins (cells). Formally, Definition 1. Let A ∈ {0, 1} m×n be a random matrix whose entries are Bernoulli i.i.d. random variables of parameter f ≤ 1/2, i.e., Pr[A ij = 1] = f . Let b ∈ {0, 1} m be chosen uniformly at random, independently from A. Then,

H f m×n = {h A,b : {0, 1} n → {0, 1} m }, where h A,b (x) = Ax + b mod 2 and h A,b ∈ R H f

m×n is chosen randomly according to this process, is a family of f -sparse hash functions.

The idea to estimate |S| is to define progressively smaller cells (by increasing m, the number of parity constraints used to define h), until the cells become so small that no element of S can be found inside a (randomly) chosen cell. The intuition is that the larger |S| is, the smaller the cells will have to be, and we can use this information to estimate |S|.

Based on this intuition, we give a hashing-based counting procedure (Algorithm 1, SPARSE-COUNT), which relies on an NP oracle O S to check whether S has an element in the cell. It is adapted from the SPARSE-WISH algorithm of Ermon et al. (2014) . The algorithm takes as input n families of f -sparse hash functions {H fi i×n } n i=0 , used to partition the space into cells. In practice, line 7 is implemented using a SAT solver as an NP-oracle. In a model counting application, this is accomplished by adding to the original formula i parity constraints generated as in Definition 1 and checking the satisfiability of the augmented formula.

Typically, {H 1 2

i×n } is used, corresponding to XORs where each variable is added with probability 1/2 (hence with average length n/2). We call these long parity constraints. In this case, it can be shown that SPARSE-COUNT will output a factor 16 approximation of |S| with probability at least 1 − ∆ (Ermon et al. 2014) . Unfortunately, checking satisfiability (i.e., S(h i A,b ) ≥ 1, line 7) has been observed to be very difficult when many long parity constraints are added to a formula (Gomes et al. 2007; Ermon et al. 2014; Ivrii et al. 2015; Achlioptas and Jiang 2015) . Note, for instance, that while a parity constraint of length one simply freezes a variable right away, a parity constraint of length k can be propagated only after k − 1 variables have been set. From a theoretical perspective, parity constraints are known to be fundamentally difficult for the resolution proof system underlying SAT solvers (cf. exponential scaling of Tseitin tautologies (Tseitin 1968)). A natural question, therefore, is

Algorithm 1 SPARSE-COUNT (O S , ∆, α, {H fi i×n } n i=0 ) 1: T ← ln(1/∆) α ln n 2: i = 0 3: while i ≤ n do 4: for t = 1, • • • , T do 5: h i A,b ← hash function sampled from H fi i×n 6: Let S(h i A,b ) = |{x ∈ S | h i A,b (x) = 0}| 7: w t i ← I[S(h i A,b ) ≥ 1], invoking O S 8: end for 9: if Median(w 1 i , • • • , w T i ) < 1 then 10: break 11:

end if 12:

i = i + 1 13: end while 14: Return 2 i−1

whether short parity constraints can be used in SPARSE-COUNT and provide reliable bounds for |S|.

Intuitively, for the method to work we want the hash functions {H fi i×n } to have a small collision probability. In other words, we want to ensure that when we partition the space into cells, configurations from S are divided into cells evenly. This gives a direct relationship between the original number of solutions |S| and the (random) number of solutions in one (randomly chosen) cell, S(h). More precisely, we say that the hash family shatters S if the following holds:

Definition 2. For > 0, a family of hash functions H f i×n -shatters a set S if Pr[S(h) ≥ 1] ≥ 1/2 + when h ∈ R H f i×n , where S(h) = |{x ∈ S | h(x) = 0}|.

The crucial property we need to obtain reliable estimates is that the hash functions (equivalently, parity constraints) are able to shatter sets S with arbitrary "shape". This property is both sufficient and necessary for SPARSE-COUNT to provide accurate model counts with high probability:

Theorem 1. (Informal statement) A necessary and sufficient condition for SPARSE-COUNT to provide a constant factor approximation to |S| is that each family H fi i×n -shatters all sets S of size |S | = 2 i+c for some c ≥ 2.

A formal statement, along with all proofs, is provided in a companion technical report (Zhao et al. 2015 ).

Long parity constraints, i.e., 1/2-sparse hash functions, are capable of shattering sets of arbitrary shape. When h ∈ R H 1 2 i×n , it can be shown that configurations x ∈ {0, 1} n are placed into hash bins (cells) pairwise independently, and this guarantees shattering of sufficiently large sets of arbitrary shape. Recently, Ermon et al. (2014) showed that sparser hash functions can be used for approximate counting as well. In particular, f * -sparse hash functions, for sufficiently large f * 1/2, were shown to have good enough shattering capabilities. It is currently not known whether f * is the optimal constraint density.

Asymptotically Optimal Constraint Density

We analyze the asymptotic behavior of the minimum constraint density f needed for SPARSE-COUNT to produce correct bounds with high confidence. As noted earlier, the bottleneck lies in ensuring that f is large enough for a randomly chosen hash bin to receive at least one element of the set S under consideration, i.e., the hash family shatters S. Definition 3. Let n, m ∈ N, n ≥ m. For any fixed > 0, the minimum constraint density f =f (m, n) is defined as the pointwise smallest function such that for any constant c ≥ 2, H f m×n -shatters all sets S ∈ {0, 1} n of size 2 m+c . We will show (Theorem 2) that for any > 0,f (m, n) = Ω( log m m ), and this is asymptotically tight when is small enough and m = Θ(n), which in practice is often the computationally hardest regime of m. Further, for the regime of m = Θ(n β ) for β < 1, we show thatf (m, n) = O( log 2 m m ). Combined with the observation thatf (m, n) = Θ(1) when m = Θ(1), our results thus reveal how the minimum constraint density evolves from a constant to Θ( log m m ) as m increases from a constant to being linearly related to n.

The minimum average constraint length, n •f (m, n), correspondingly decreases from n/2 to O(n 1−β log 2 n) to Θ(log n), showing that in the computationally hardest regime of m = Θ(n), the parity constraints can in fact be represented using only 2 Θ(log n) , i.e., a polynomial number of SAT clauses. Theorem 2. Let n, m ∈ N, n ≥ m, and κ > 1. The minimum constraint density,f (m, n), behaves as follows:

1. Let > 0. There exists M κ such that for all m ≥ M κ : f (m, n) > log m κ m

2. Let ∈ (0, 3 10 ), α ∈ (0, 1), and m = αn. There exists N such that for all n ≥ N :

f (m, n) ≤ 3.6 − 5 4 log 2 α log m m 3. Let ∈ (0, 3 10 ), α, β ∈ (0, 1), and m = αn β . There exists N κ such that for all n ≥ N κ : f (m, n) ≤ κ (1 − β) 2β log 2 m m

The lower bound in Theorem 2 follows from analyzing the shattering probability of an m+c dimensional hypercube S c = {x | x j = 0 ∀j > m + c}. Intuitively, random parity constraints of density smaller than log m m do not even touch the m + c relevant (i.e., non-fixed) dimensions of S c with a high enough probability, and thus cannot shatter S c (because all elements of S c would be mapped to the same hash bin).

For the upper bounds, we exploit the fact thatf (m, n) is at most the f * function introduced by Ermon et al. 2014and provide an upper bound on the latter. Intuitively, f * was defined as the minimum function such that the variance of S(h) is relatively small. The variance was upper bounded by considering the worst case "shape" of |S|: points packed together unrealistically tightly, all fitting together within Hamming distance w * of a point. For the case of m = αn, we observe that w * must grow as Θ(n), and divide the expression bounding the variance into two parts: terms corresponding to points that are relatively close (within distance λn for a particular λ) are shown to contribute a vanishingly small amount to the variance, while terms corresponding to points that are farther apart are shown to behave as if they contribute to S(h) in a pairwise independent fashion. The log m m bound is somewhat natural and also arises in the analysis of the rank of sparse random matrices and random sparse linear systems (Kolchin 1999) . For example, this threshold governs the asymptotic probability that a matrix A generated as in Definition 1 has full rank (Cooper 2000) . The connection arises because, in our setting, the rank of the matrix A affects the quality of hashing. For example, an all-zero matrix A (of rank 0) would map all points to the same hash bucket.

Improved Bounds On The Model Count

In the previous sections, we established the optimal (smallest) constraint density that provides a constant factor approximation on the model count |S|. However, depending on the size and structure of S, even adding constraints of density f * 0.5 can lead to instances that cannot be solved by modern SAT solvers (see Table 1 below).

Table 1: Comparison of run time and bound quality on the ANOR2011 dataset. All true counts and bounds are in log scale.

In this section we show that for f < f * we can still obtain probabilistic upper and lower bounds. The bounds constitute a trade off between small f for easily solved NP queries and f close to f * for guaranteed constant factor approximation.

To facilitate discussion, we define S(h) = |{x ∈ S | h(x) = 0}| = |S ∩ h −1 (0)| to be the random variable indicating how many elements of S survive h, when h is randomly chosen from

Tighter Lower Bound On |S|

Our lower bound is based on Markov's inequality and the fact that the mean of S(h), µ S = |S|2 −m , has a simple linear relationship to |S|. Previous lower bounds (Gomes et al. 2007; Ermon et al. 2013a) are based on the observation that if at least half of T repetitions of applying h to S resulted in some element of S surviving, there are likely to be at least 2 m−2 solutions in S. Otherwise, µ S would be too small (specifically, ≤ 1/4) and it would be unlikely to see solutions surviving often. Unlike previous methods, we not only check whether the estimated Pr[S(h) ≥ 1] is at least 1/2, but also consider an empirical estimate of Pr[S(h) ≥ 1]. This results in a tighter lower bound, with a probabilistic correctness guarantee derived using Chernoff's bound. Lemma 1. Let S ⊆ {0, 1} n , f ∈ [0, 1/2], and for each m ∈ {1, 2, . . . , n}, let hash function h m be drawn randomly from H f m×n . Then,

EQUATION (1): Not extracted; please refer to original document.

Our theoretical lower bound is L = 2 m Pr [S(h) ≥ 1], which satisfies L ≤ |S| by the previous Lemma. In practice we cannot compute Pr[S(h) ≥ 1] with infinite precision, so our practical lower bound isL = 2 m Pr est [S(h) ≥ 1] , which will be defined in a moment. We would like to have a statement of the form Pr |S| ≥L ≥ 1 − δ. It turns out that we can guarantee that |S| is larger than the estimated lower bound shrunk by a (1 + κ) factor with high probability. Let

Y k = I h −1 k (0) ∩ S ≥ 1 and Y = T k=1 Y k . Let ν = E [Y ] = T E [Y 1 ] = T Pr [S(h) ≥ 1]

. We define our estimator to be Pr est [S(h) ≥ 1] = Y /T . Using Chernoff's bound, we have the following result.

Theorem 3. Let κ > 0. If Pr est [S(h) ≥ 1] ≥ c, then Pr |S| ≥ 2 m c (1 + κ) ≥ 1 − exp − κ 2 cT (1 + κ)(2 + κ)

New Upper Bound For |S|

The upper bound expression for f that we derive next is based on the contrapositive of the observation of Ermon et al. (2014) that the larger |S| is, the smaller an f suffices to shatter it. For n, m, f, and (n, m, q, f ) from (Ermon et al. 2014) , define:

v(q) = q 2 m 1 + (n, m, q, f ) • (q − 1) − q 2 m (2)

This quantity is an upper bound on the variance Var[S(h)] of the number of surviving solutions as a function of the size of the set q, the number of constraints used m, and the statistical quality of the hash functions, which is controlled by the constraint density f . The following Lemma characterizes the asymptotic behavior of this upper bound on the variance:

Lemma 2. q 2 /v(q)

is an increasing function of q. Using Lemma 2, we are ready to obtain an upper bound on the size of the set S in terms of the probability Pr[S(h) ≥ 1] that at least one configuration from S survives after adding the randomly generated constraints. Lemma 3. Let S ⊆ {0, 1} n and h ∈ R H f m×n . Then

|S| ≤ min z | 1 1 + 2 2m v(z)/z 2 > Pr[S(h) ≥ 1]

The probability Pr[S(h) ≥ 1] is unknown, but can be estimated from samples. In particular, we can draw independent samples of the hash functions and get accurate estimates using Chernoff style bounds. We get the following theorem:

Theorem 4. Let S ⊆ {0, 1} n . Let ∆ ∈ (0, 1). Suppose we draw T = 24 ln 1 ∆ hash functions h 1 , • • • , h T from H f m×n . If Median(I[S(h 1 ) = 0], • • • , I[S(h T ) = 0]) = 0 then |S| ≤ min z | 1 − 1 1 + 2 2m v(z)/z 2 ≥ 3 4 with probability at least 1 − ∆.

This theorem provides us with a way of computing upper bounds on the cardinality of S that hold with high probability. The bound is known to be tight (matching the lower bound derived in the previous section) if the family of hash function used is fully pairwise independent, i.e. f = 0.5.

Experimental Evaluation Model Counting Benchmarks

We evaluate the quality of our new bounds on a standard model counting benchmark (ANOR2011) from Kroc, Sabharwal, and Selman (2011) . Both lower and upper bounds presented in the previous section are parametric: they depend both on m, the number of constraints, and f , the constraint density. Increasing f is always beneficial, but can substantially increase the runtime. The dependence on m is more complex, and we explore it empirically. To evaluate our new bounds, we consider a range of values for f ∈ [0.01, 0.5], and use a heuristic approach to first identify a promising value for m using a small number of samples T , and then collect more samples for that m to reliably estimate P [S(h) ≥ 1] and improve the bounds on |S|.

We primarily compare our results to ApproxMC (Chakraborty, Meel, and Vardi 2013b) , which can compute a constant factor approximation to user specified precision with arbitrary confidence (at the cost of more computation time). ApproxMC is similar to SPARSE-COUNT, and uses long parity constraints. For both methods, Cryptominisat version 2.9.4 (Soos, Nohl, and Castelluccia 2009) is used as the NP-oracle O S (with Gaussian elimination enabled), and the confidence parameter is set to 0.95, so that bounds reported hold with 95% probability.

Our results on the Langford12 instance (576 variables, 13584 clauses, 10 5 ≈ exp(11.5) solutions) are shown in Figure 1 . The pattern is representative of all other instances we tested. The tradeoff between quality of the bounds and runtime, which is governed by f , clearly emerges. Instances with small f values can be solved orders of magnitude faster than with full length XORs (f ≈ 0.5), but provide looser bounds. Interestingly, lower bounds are not very sensitive to f , and we empirically obtain good bounds even for very small values of f . We also evaluate ApproxMC (horizontal and vertical lines) with parameter setting of = 0.75 and δ = 0.05, obtaining an 8-approximation with probability at least 0.95. The runtime is 47042 seconds. It can be seen that ApproxMC and our bounds offer comparable model counts for dense f ≈ 0.5. However, our method allows to trade off computation time against the quality of the bounds. We obtain non-trivial upper bounds using as little as 0.1% of the computational resources required with long parity constraints, a flexibility not offered by any other method. Table 1 summarizes our results on other instances from the benchmark and compares them with ApproxMC with a 12 hour timeout. We see that for the instances on which ApproxMC is successful, our method obtains approximate model counts of comparable quality and is generally faster. While ApproxMC requires searching for tens or even hundreds of solutions during each iteration, our method needs only one solution per iteration. Further, we see that long parity constraints can lead to very difficult instances that cannot be solved, thereby reinforcing the benefit of provable upper and lower bounds using sparse constraints (small f ). Our method is designed to produce non-trivial bounds even when the computational budget is significantly limited, with bounds degrading gracefully with runtime.

Figure 1: Log upper and lower bounds vs. computation time. Point labels indicate the value of f used.

Smt Models For Contingency Table Analysis

Table 2: Lower (LB) and upper (UB) bounds on log2 |S|. The trivial upper bound (Trv. UB) is the number of binary variables. f∗ denotes the best previously known minimum f (Ermon et al. 2014) required for provable upper bounds.

Conclusions

We introduced a novel analysis of the randomized hashing schemes used by numerous recent approximate model counters and probabilistic inference algorithms. We close a theoretical gap, providing a tight asymptotic estimate for the minimal constraint density required. Our analysis also shows, for the first time, that even very short parity constraints can be used to generate non-trivial upper bounds on model counts. Thanks to this finding, we proposed a new scheme for computing anytime upper and lower bounds on the model count. Asymptotically, these bounds are guaranteed to become tight (up to a constant factor) as the constraint density grows. Empirically, given very limited computational resources, we are able to obtain new upper bounds on a variety of benchmarks, including a novel application for the analysis of statistical contingency tables.

A promising direction for future research is the analysis of related ensembles of random parity constraints, such as low-density parity check codes (Achlioptas and Jiang 2015) .

We restrict ourselves to the binary case for the ease of exposition. Our work can be naturally extended to categorical variables.