Breaking Vigenere

by: burt rosenberg
at: university of miami
date: 1 sep 2021
NAME
    letter-utils.py
    
SYNOPSIS
    letter-utils.py [-h] [-s LETTER_SHIFT] [-c COLUMN_WIDTH] [-p] [-v] reference_text

DESCRIPTION
    
    Computer letter frequencies of stdin and calculate the correlation with the
    letter frequencies of the given reference text. The text is assumed to be the
    26 ASCII letters, and generally all non-letters are discarded and all letters
    are changed to lower case.
    
    The stdin frequency count is subject to a indexing rotation before correlating 
    against the frequency count of the reference text. For instance, if the specified
    rotation is 1, the number of a's in the reference text is matched to the number of
    b's in the stdin text, etc. If no specific rotation is given, the calculation
    is done for all possible rotations.

    Optionally outputs a histogram of the letter frequencies of stdin. 
    
POSITIONAL ARGUMENTS
    reference_text     file of reference text for the reference distribution

OPTIONAL ARGUMENTS
    -h, --help         show this help message and exit
    -s LETTER_SHIFT    shift the distribution
                       default: calculate for all shifts
    -c COLUMN_WIDTH    column width for histogram
    -p                 print histogram
    -v                 verbose

HISTORY
    Introduced in the 221 offering (Fall 2021–2022).

BUGS
Preparation

Copy the template, test and Makefile from [repo]/class/proj2 directory to your folder, with the same folder and file names. svn add and svn commit -m "initial commit".

Short coding assignment

The modify the letter-utils.py program so that it runs the three tests in the Makefile without error. This would be a good place to subversion commit.

Goals: This sub-assignment introduces letter frequency analysis, and use of linear algebra as a method to compare frequency statistics.

Frequency distribution exploration

There are three encryption keys in the Makefile. For each key, run letter-utils with no -s option to get the full output of 26 correlations. See the awk command in the Makefile for how to extract just a column of 26 numbers form the output.

Transfer these to three spreadsheet columns in a spread sheet and get the mean and standard deviation for ach column. Make a bar chart for each column. Remark how you might go about cracking vigenere for one, two and three letter keywords.

Commit your spreadsheet.

Goals: This sub-assignment explores how the vigenere cipher attempts to thwart attacks based on letter frequency statistics.
The Challenge Problem

The file challenge.txt is an vigenere encrypted text. Find the keyword.

Since the keyword is greater then 3 characters long, what techniques do the previous exercise suggest for the decryption?

Submit your code as well as your solution.
Goals: To put the observations to use on a test cryptogram.
Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

author: burton rosenberg
created: 31 aug 2021
update: 31 aug 2021