The Construction of Variable Length Codes with Good Synchronisation Properties

  • Matthew Higgs

    Student thesis: Doctoral Thesis

    Abstract

    This thesis addresses the problems of robust optimal data compression in circumstances where synchronisation may be lost and error propagation can occur.

    Some background information on binary variable length codes is given, particularly relating to the problems of compression and synchronisation. A review of the literature on the codes known as Huffman Equivalent codes (HE-codes) and T-codes, reveals many papers that deal with the synchronisation of these codes. HE-codes have short synchronising codewords forced by their construction, which will occur frequently. T-codes have good synchronisation capabilities, due to the copy and append nature of the construction algorithm.

    In chapter 2 a method developed by Takishima, Wada and Murakami is described and a theoretical result for the synchronisation delay after an inversion error has occurred is given.

    In chapter 3 the construction and properties of T-codes are described.

    In chapter 4, the construction of HE-codes is presented and some properties of these codes are given. Three algorithms are devised and discussed that give the existence or non-existence of certain HE-codes. These codes are HE-codes with shortest synchronising codeword 10m ~ 1 l where 0-nodes are extended wherever a choice exists, HE-codes with shortest synchronising codeword 10m-1 l where 1-nodes are extended wherever a choice exists, and HE-codes with two shortest synchronising codewords 10 m-1 l and 10 m . If any of these codes does exist, the relevant algorithm also gives the number of synchronising codewords guaranteed by the code's construction.

    In chapter 5 a number of new algorithms are given for the construction of a code known as a Hybrid code. This Hybrid code is a specialised T-code. It combines the synchronisation properties of both HE-codes and T-codes to produce a code with synchronisation properties at least as good as either, where the form and length of the synchronising codewords in the code are known. In particular, theoretical results are given on codes constructed by two algorithms which show why these are good choices as T-codes. This is also reinforced by simulation results which show that T-codes constructed by these algorithms outperform other T-codes in terms of their synchronisation capabilities.

    Chapter 6 introduces a new variable length code construction algorithm.The codes constructed are called Ordered Termination codes or OT-codes. OT-codes have the major advantage that they exist for all given source probabilities and are optimal in terms of their average codeword length. Theoretical results are given for the existence of certain synchronising codewords in any particular OT-code, along with a theorem that shows that the OT-codes contain at least as many synchronising codewords with paths that end lO"1 " 1 ! as an HEcode with shortest synchronising codeword lO"1 " 1 ! where 0-nodes are extended wherever a choice exists. Also an algorithm is presented that gives the guaranteed number of synchronising codewords with paths that end 10^1 where /3e{m-l,...,a-l} and a is the first level at which a node ending 0 must be terminated. Finally in this chapter, discussions are given on other factors that affect the synchronisation of the code. The results given in this chapter, along with the discussions, suggest that the synchronisation properties of the code should be "good".

    In chapter 7 theoretical results are given comparing OT-codes with certain codes in the literature. Theoretical and simulation results are also given comparing OT-codes, Huffman codes, HE-codes and T-codes (Hybrid codes). Large text samples are used to carry out these simulations. The results indicate that OT-codes are either significantly better than, or comparable with, the other
    codes.

    In chapter 8, conclusions and proposals for future work are presented. The conclusions emphasise that the performance of OT-codes is complemented by the advantage that they always exist, and by the relative simplicity of their construction. The future work proposed is mainly concerned with analysing
    how the OT-codes behave when insertion and deletion errors are introduced, and also when a combination of both insertion/deletion errors and inversion errors are present. Further work on guaranteed synchronising codewords and on other factors affecting synchronisation performance is also proposed.
    Date of Award26 Feb 2007
    Original languageEnglish
    SupervisorStephanie Perkins (Supervisor) & Derek Smith (Supervisor)

    Cite this

    '