Thesis
The Effect of Fasta Sequence Splitting with The Addition of Binary Position Mapping to LZ78 Algorithm for Single Genome Compression
As the availability of genome sequencers increases, the number of genome data needed to be stored
increases exponentially. A strategy to manage this problem is needed. One of the solutions is through
a genome compressor. This study proposes a data splitting process on the input of the LZ78 algorithm
to increase the redundancy which potentially increases the compression performance of the
algorithm. AT and GC characters are split into two different substrings. Other additional ambiguity
characters were also assigned to the substrings. The last substring was a mapping to the position of
characters of the first and second substrings located in the original sequence. The proposed algorithm
successfully reduced the compression time and decompression memory peak. However, the
compression ratio did not significantly differ from the original LZ78 algorithm. The proposed
algorithm’s compression ratio was also not able to compete with the current available FASTA
compressor, but the effect of the proposed algorithm on the LZ78 algorithm might be able to be
implemented on other algorithms with the basis of the LZ-like algorithm.
No other version available