How to decide, or not

Burton Rosenberg
1 April 2003

Overview

This note discusses randomized complexity classes. I take a slightly different approach in that my classification algorithms allow for three outputs, accept, reject, and undecided. Typically, classification is binary, 0, 1, mapping one output value to accept, the other to reject. Only in intersection of certain classes does the third state provide a crucial role. However, the three state approach, it seems to me, highlights better the intuitive content of these classes.

We are concerned with deciding set membership. That is, for a set X we seek a mechanical procedure, an algorithm A, such that A(x) will decide whether x is in X or not. This might seem a strong restriction. Algorithms are more often thought of as functions, computing an output. But these functions can be solved by use of set-deciding algorithms with a bit of additional algorithmic glue.

For instance, consider the two questions: does graph G contain a Hamiltonian cycle; and produce a Hamiltonian cycle in G if one exists. An algorithm which produces a Hamiltonian cycle will decide if such a cycle exists. Conversely, we can iteratively ask decision questions of the form does there exist a Hamilton cycle in G which uses edges {e_1, ..., e_k}, each time the answer is affirmative, another edge is added and the query repeated. If the answer is negative, remove the last edge and try another.

This situtation is general. Any function can be determined with a small iteration of set membership queries. So without further explanation or embarrassment, we restrict our discussion to set membership queuries.

To discuss sets sufficiently broadly, we will assume a universe of strings and let the subset be some among these strings. Using various encodings, pretty much anything falls under this interpretation. For instance, a graph can be represented by parenthesis enclosed pairs, delimited by commas, of integers, where each vertex has an integer name. For instance, the graph of four vertices with edges between any pair of vertices is represented by the string,

   (1,2)(2,3)(3,1)(1,4)(2,4)(3,4)
So the set of graphs with Hamiltonian cycles is a set of strings, of proper syntax, using the characters draw from the set {(,0123456789)}.

Our mechanical procedure is given a string, performs computation, and eventually declares either that the string is in the subset, is not in the subset, or that the procedure has not decided either way.

In the broadest sense, the possibilty of indecision is an inescapable condition of mathematics. Classes of formal and precise mathematical questions cannot be answered universally by some fixed mechanical procedure. Such problems are called undecidable. That is very interesting, but not of primary interest in this note. We instead look at procedures, hereafter referred to as algorithms, which are not only mechanical but efficient. Given an input, the decision occurs quickly. The speed is measured in the number of elementary steps as a function of the question's size. We are allowed to consider larger questions for longer time. However, the time must be bound by polynomial in the input size. These are the P-time problems.

Originally, P-time problems were considered the natural class for effectively computable set membership. Later it seemed that this definition was too restrictive. Effectively computable problems, it seemed, should include those set membership questions which can be decided, it not absolutely, possibly with a small probably of error. That is, classifying incorrectly a string in the set as being not in the set, or a string not in the set as being in the set. Everyone makes mistakes, and the world goes on, so why not computers? As long as the quantity of mistakes is small, if this relaxation of perfection allows more things to be quickly accomplished perhaps it is a better model for what is effectively computable.

We need to be a bit more specific about the meaning of error. We consider a probabilistic algorithm which, in the hope of running quickly, makes guesses at various moments in the computation. The guesses are modeled as coin flips. If the coin flips are lucky, the string is correctly classified. Unlucky combinations of coin flips will incorrectly classify the string. Overall, for any string, there is a certain probability of tossing coins in an unlucky manner, and one minus this probability is the probability of tossing coins in a lucky manner. The probability of error is the probability of an unlucky sequence of coin flips.

For emphasis: it is not that some strings will always be misclassified, rather that any string will probably be correctly classified. When the algorithm which flips coins determines membership with a polynomial number of steps, polynomial in the size of the input, the problem is of the class BPP, Bounded Probability of error Polynomial time. This is currently the leading candidate for a model of effective computability.

Formalities

The problem is encoded in some string of symbols. The set of symbols is S, and the notation S^* is the set of any finite length string of symbols from S, also called a word. A subset of S^*, L < S, is called a language. Each word x has a length, |x|, the number of symbols in the word. A polynomial time algorithm A takes x as an input and accepts, rejects or is undecided, in a number of steps bound by O(n^k) for some fixed k.

We further let the algorithm flip coins. Or rather, we provide the algorithm with a tape of ready-made coin-flips which it references whenever it wishes to flip a coin. The tape need only be of size O(n^k), the length of the longest computation. We assure that the contents of the tape fairly represents any sequence of coin flips. That is, it is drawn randomly and without bias from among every possible coin-flip sequence.

Formally, the algorithm A inputs the pair x, the problem description, and y, the tape of coin flips and accepts, rejects or returns undecided in polynomial time. We use two concepts in making a connection between the output of A and the fact of whether x is or is not in the language: effectiveness and correctness. Effectiveness refers to how likely it is that x be classified. Correctness refers to the guarentees that x, if classified, is classified correctly.

Languages P, NP and co-NP

Languages are P if there exists an algorithm which is one hundred percent effective and one hundred percent correct. That is,

    effectiveness: x in L implies for all y, A(x,y) accepts
    correctness: if A(x,y) accepts then x in L
In light of this, the algorithm need not be shy about reject. If it does not accept x, then x must not be in L and the algorithm will report reject. It is not necessary for the algorithm to report undecided, and even if it does, it necessarily means rejection.

From the viewpoint of selecting y at random, we have a probabilistic definition,

    effectiveness: x in L implies Prob( A(x,y) accepts ) = 1
    correctness: if A(x,y) accepts then x in L.
where the probability refers to the choice of y. From this we see that A doesn't require the input y at all, since any will do we can fix it, say, as the string of all heads, or omit it entirely from the algorithm and the notation.

The second main class to be identified is NP. It requires only a very weak effectiveness, but strong, if one sided, correctness,

    effectiveness: x in L implies exists a y such that A(x,y) accepts
    correctness: if A(x,y) accepts then x in L
Said in probability,
    effectiveness: x in L implies Prob( A(x,y) accepts ) > 0
    correctness: if A(x,y) accepts then x in L
What is curious about NP is that although it might be hard to classify a string, the classification x in L is absolute. The y for which A(x,y) accepts proves x in L, and y is called a witness. The class NP is often referred to as the class with polynomial time checkable proofs of membership. Furthermore every x in L has a witness for membership in L.

If A accepts then x is in L. However, if A does not accept there is nothing we can say about the membership of x. It is possible that A rejects only when x is not in L, but this is not required by the definition. Strictly speaking, any result besides accept should be interpreted as undecided.

The class NP, according to the definition, never produces a proof of non-membership. The class which does this is called co-NP,

    effectiveness: x not in L implies Prob( A(x,y) rejects ) > 0
    correctness: if A(x,y) rejects then x is not in L
The class co-NP is an NP algorithm for the complement of the language L. In this case, y is a witness proving the non-membership of x in L. Any output of A other than reject should be considered an undecided.

The intersection of NP and co-NP is the class of problems with weak effectiveness both for inclusion and exclusion in the language L, but if it concludes, it concludes correctly,

   effectiveness: x in L implies Prob( A(x,y) accepts ) > 0,
                  and x not in L implies Prob( A(x,y) rejects ) > 0
   correctness: if A(x,y) accepts then x is in L, and if
                A(x,y) rejects, then x is not in L.
The condition that A is undecided may be likely or unlikely. If it is not undecided, it correctly classifies x and provides a withness y to prove the classification correct.

Consider an oracle machine Or which given x returns its witness y immediately. Then the combination machine A(x,Or(x)) is a P time algorithm. One might try to simulate the oracle by having a non-oracle try all possible y looking for a witness, however the combination machine would then take time exponential in the length of x, since this is the number of different coin flip sequences of length n. A P time machine is an NP machine which has no need for an oracle. In Polynomial time it can generate its own witness, or the computation is its own witness. Hence all P languages are NP languages,

   P ≤ NP
It is not known whether all NP languages are also P languages. That is, whether oracles are necessary facts of life or whether every oracle can be replaced with a P time calculator of a witness.

The so called P equals NP question is the major outstanding question in the complexity of computation. It has practical consequences. Many important optimization problems are in NP. While P problems are considered effectively computable, NP problems are not. If the classes were proved unequal, then this would show that there really is something more difficult about the NP optimization problems, and their intractability is something we are just going to have to live with. If P equals NP, then the optimization problems in NP will have efficient solutions, with great benefit, at least to those seeking solutions to NP problems.

Language RP, co-RP and ZPP

If the effectiveness of NP is boosted somewhat, then by repeated random choices of y we can likely find a witness which correctly classifies x. This is the class RP, Random Polynomial time,

    effectiveness: x in L implies Prob( A(x,y) accepts ) > epsilon,
                   some fixed positive epsilon.
    correctness: if A(x,y) accepts then x is in L
Again, y is a witness to membership. If x is not in L, nothing is said except that A will not accept. It might be that A rejects, it might be that A is undecided. With some probability A will be undecided even if x is in L.

The probability that A will be undecided for an x in L can be made arbitrarily small by iterating A with various choices of y. Since k iterations will reduce this indecision to no more than,

  (1-epsilon)^k ≤ (1/e)^{k epsilon}
Setting k to a polynomial in n, not only will the probability of indecision approach zero with increasing problem size, the composite of A repeated n times will still be polynomial time.

This gives a randomized (or approximate) polynomial time algorithm for membership in L: if after a suitable number of tries A does not accept, reject. It is possible, with a small probability, that A will reject x which are in L. The error is one-sided, A will never accept an x which is not in L.

The class co-RP as bounded effectiveness for rejection of membership, and is always correct if it claims non-membership, providing a witness y of non-membership,

    effectiveness: x not in L implies Prob( A(x,y) rejects ) > epsilon,
                   some fixed positive epsilon.
    correctness: if A(x,y) rejects then x is not in L

The intersection of RP and co-RP is called ZPP. A ZPP machine might be undecided, but if it does decide, it does so correctly. Furthermore, chances of deciding are good. Even if epsilon is small, it can be driven towards one by repeated trials with different y until a witness is found --- either a witness of membership or of non-membership. The class ZPP is called Zero Probability of error Polynomial time.

Since NP requires just one witness, whereas RP requires a plentitude, relatively speaking, RP qualifies as NP. So we have the inclusions,

     P ≤ ZPP ≤ RP ≤ NP
     P ≤ ZPP ≤ co-RP ≤ co-NP
It is not known whether any of the relations are strict.

Language BPP

The language now thought of as best capturing the notion of effectively computable are those problems which can be quickly solved with a small probability of true error. That is, mistaking membership for non-membership or vice-a-versa. This class, BPP, has the following characterization,

      efficiency: Prob( A(x,y) accepts or rejects ) = 1
      correctness: if x in L, Prob( A(x,y) accepts ) > epsilon_a
                   and Prob( A(x,y) rejects ) < epsilon_r
                   where 0 ≤ epsilon_r < epsilon_a
There are several other definitions. Let A be a polynomial time algorithm which always either accepts or rejects. The following are equivalent,

  1. exist 0 < epsilon_e < epsilon_c such that
    if x in L, Prob( A(x,y) accepts ) > epsilon_c,
    if x not in L, Prob( A(x,y) accepts) < epsilon_e.

  2. exist 0 < epsilon_e < epsilon_c such that
    if x in L, Prob( A(x,y) rejects) < epsilon_e,
    if x not in L, Prob( A(x,y) rejects ) > epsilon_c.

  3. exists 0 < epsilon such that
    if x in L, Prob( A(x,y) accepts ) > 1/2 + epsilon,
    if x not in L, Prob( A(x,y) rejects ) > 1/2 + epsilon.

  4. for any 1/2 < epsilon < 1,
    if x in L, Prob( A(x,y) accepts ) > epsilon,
    if x not in L, Prob( A(x,y) rejects ) > epsilon.
For the first two, since A either accepts or rejects,
    x in L implies Prob( A(x,y) accepts ) > epsilon_c
    => Prob( A(x,y) rejects ) ≤ 1-epsilon_c = epsilon_e'
    x not in L implies Prob( A(x,y) accepts ) < epsilon_e
    => Prob( A(x,y) rejects ) ≥ 1-epsilon_e = epsilon_c'
    0 < 1-epsilon_c = epsilon_e' < 1-epsilon_e = epsilon_c'
Now perturb epsilon_c' a little smaller, epsilon_e' a little larger.