Punch22 is a problem posted several months ago. This article analyzes the problem and proposes a solution.

From the problem's page, the goal is *to come up with a 7-bit code for the symbols A–Z that similarly allows one overwrite*.

In this article, binary codes (e.g. `1010101`

) and their correspond hexadecimal values (e.g. `0x55`

) and used to represent values.

Since this bit code -or alphabet- is supposed to work with punch cards and the binary code is used to represent the holes (7 bits, 7 holes where a `0`

is *not punched* and `1`

is *punched*), the encoding should allow each letter to be overwritten by -at least- one code that represent all the other letters. This means that, for example, that if `0000001`

(`0x01`

) is used for a particular letter, it can be converted to `0000011`

(`0x03`

) but not to `0000010`

(`0x02`

).

There are 26 letters in the alphabet needed for this problem (A-Z) and 7 bits available (2 to the power of 7, or 128 possible codes), so we have *almost* 5 codes per letter in average.

Even when each letter will have several values associated with it, the first value is very important since it's the value used for the first write operation and this is the value that should be easily transformed to each one of the other letters, if needed, in the second write operation.

A good way to analyze this problem is using a directed graph. Each possible value can be represented as a node and each possible transformation can be represented as an edge from an original value to the new value. For example:

The transformations are transitive relations so we can simplify the graph. If a value X can be transformed to a value Y and a this value Y can be transformed to a value Z, then X can be also be transformed to Z. This means that if we can traverse from X to Z, X can be converted to Z, even if they are not directly connected by an edge. In practice, the edges connect from a given value to all the values that have exactly the same value plus one more bit 1 (or hole). This is an example:

A complete graph can be found here: pdf, png, original dot file.

There are several notorious values among all the options. For example, `0000000`

(`0x00`

) is special because it can be converted to any other value. Similarly, `1111111`

(`0x7f`

) can be converted from any other value.

The first approach was to generate the options in order from the top of the graph to the bottom, starting with `0000000`

(`0x00`

), `0000001`

(`0x01`

), `0000010`

(`0x02`

), `0000100`

(`0x04`

) and so on. This is different from the “integer value order” where the first three values are similar to the previous list (`0000000 0x00`

, `0000001 0x01`

, `0000010 0x02`

) but the fourth one is different: `0000011 0x03`

. Since we are looking for the values from the top of the graph -that means, the values that can be converted to as most other values as possible-, the algorithm should look for all the values with one bit (or one hole) before choosing 2-bit values. Here is the amount of options available for every group:

- 0 bit: 1 option
- 1 bit: 7 options
- 2 bit: 21 options

This can be generalized using combinatorics as:

where *n* is the amount of bits or holes in the punch card.

It’s important to note that the sum of combinations for 0, 1 and 2 bits give a total of 29 combinations. Since we need 26 initial values, one for each letter, we will use 1 0-bit option, 7 1-bit options and 18 2-bit options leaving 3 2-bit options available for secondary (or second write) values.

After assigning one value for each letter, the original algorithm tried to find the letter with least possible conversions and then assigning a new value to it.

This approach didn’t work as expected since the algorithm didn’t look for good values at the same time but selected the letter before the value was selected.

Other wrong approaches included a minimal 5-bit alphabet for the first values. This didn’t work well either since the algorithm assigned 3-bit values as the first value to some letters minimizing the values available that can be converted from these first values.

The current version of the software doesn’t didn’t get a correct solution to this problem. Instead, it followed these two simple steps:

- Select the 26 best options to be converted to other options (
`0000000 0x00`

is the best example since it can be converted to all the other values) and assign one of them to each letter. It’s important to note that the pair initial value / letter is not important at all. The values for letter A can be swapped with all the values of letter B and the solution remains correct or incorrect. - Get all the letters without all the conversions completed, all the unassigned values and calculate how many conversions are possible when this value is assigned to this letter. Then this list of possible assignments is sorted by possible conversion (more is better) and the best is picked and assigned. When there are several options/letters with the same amount of possible conversions, the assignment is chosen randomly among the best pairs.

When there are no more options available for assignment or these options can not be converted from other letters when assigned, the program generates a summary with the amount of possible conversions, missing conversions, values assigned per letter, etc. The current version of the software ignore the results if there are 2 or more conversions missing, running the algorithm until only 1 or no conversions are missing.

Several instances of the current version of the software were running in parallel for several minutes. The best result found by the algorithm was a solution with only one conversion missing and three options unassigned. One of these options had only two bits on (punched) and other of these options was convertible from this first option. Few simple manual changes were made to get the final solution.

The code used to find the solution is here: http://github.com/datalabscl/punch22

This is the solution proposed:

`A: 0x00 0000000, 0x7f 1111111`

`B: 0x10 0010000, 0x7b 1111011, 0x1e 0011110, 0x54 1010100, 0x07 0000111`

`C: 0x01 0000001, 0x7e 1111110, 0x31 0110001, 0x45 1000101, 0x0d 0001101, 0x43 1000011`

`D: 0x40 1000000, 0x3f 0111111, 0x5c 1011100, 0x62 1100010, 0x51 1010001`

`E: 0x20 0100000, 0x7d 1111101, 0x66 1100110, 0x2a 0101010, 0x13 0010011`

`F: 0x04 0000100, 0x6f 1101111, 0x35 0110101, 0x58 1011000`

`G: 0x02 0000010, 0x5f 1011111, 0x2b 0101011, 0x70 1110000`

`H: 0x08 0001000, 0x77 1110111, 0x1b 0011011, 0x2c 0101100, 0x4a 1001010`

`I: 0x21 0100001, 0x5e 1011110, 0x33 0110011, 0x2d 0101101, 0x61 1100001`

`J: 0x41 1000001, 0x7c 1111100, 0x67 1100111, 0x0b 0001011`

`K: 0x0a 0001010, 0x75 1110101, 0x2e 0101110, 0x4b 1001011, 0x1a 0011010`

`L: 0x50 1010000, 0x2f 0101111, 0x59 1011001, 0x74 1110100, 0x52 1010010`

`M: 0x0c 0001100, 0x73 1110011, 0x1d 0011101, 0x4e 1001110, 0x38 0111000`

`N: 0x44 1000100, 0x3b 0111011, 0x56 1010110, 0x65 1100101, 0x4c 1001100`

`O: 0x14 0010100, 0x6b 1101011, 0x55 1010101, 0x3c 0111100, 0x16 0010110`

`P: 0x03 0000011, 0x6e 1101110, 0x71 1110001, 0x19 0011001, 0x15 0010101`

`Q: 0x06 0000110, 0x79 1111001, 0x47 1000111, 0x36 0110110, 0x0e 0001110`

`R: 0x42 1000010, 0x3d 0111101, 0x5a 1011010, 0x63 1100011, 0x46 1000110`

`S: 0x22 0100010, 0x5d 1011101, 0x6a 1101010, 0x27 0100111, 0x32 0110010`

`T: 0x05 0000101, 0x7a 1111010, 0x4d 1001101, 0x17 0010111, 0x25 0100101`

`U: 0x48 1001000, 0x37 0110111, 0x5b 1011011, 0x6c 1101100`

`V: 0x18 0011000, 0x6d 1101101, 0x3e 0111110, 0x53 1010011`

`W: 0x09 0001001, 0x76 1110110, 0x39 0111001, 0x0f 0001111, 0x49 1001001`

`X: 0x28 0101000, 0x57 1010111, 0x3a 0111010, 0x29 0101001, 0x68 1101000, 0x1c 0011100`

`Y: 0x30 0110000, 0x4f 1001111, 0x78 1111000, 0x34 0110100, 0x23 0100011, 0x12 0010010, 0x26 0100110`

`Z: 0x60 1100000, 0x1f 0011111, 0x72 1110010, 0x69 1101001, 0x64 1100100`