Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology (英语) 精装 – 1997年5月28日
'The readers of this book will be serious programmers, but of course anybody working in bio-computing will find the book of immense practical, scientific and commercial importance … you should get the book, whether you want to do some string processing, fundamental computing research, or want to impress a biotech firm.' Harold Thimbleby, The Times Higher Education Supplement
'… could well be used as the basis for a graduate-level course, particularly as it contains over 400 exercises to reinforce presented material and to develop further topics. It is recommended most highly.' P. Gibbons, Zentralblatt für Mathematik
Part I. Exact String Matching: The Fundamental String Problem: 1. Exact matching: fundamental preprocessing and first algorithms; 2. Exact matching: classical comparison-based methods; 3. Exact matching: a deeper look at classical methods; 4. Semi-numerical string matching; Part II. Suffix Trees and their Uses: 5. Introduction to suffix trees; 6. Linear time construction of suffix trees; 7. First applications of suffix trees; 8. Constant time lowest common ancestor retrieval; 9. More applications of suffix trees; Part III. Inexact Matching, Sequence Alignment and Dynamic Programming: 10. The importance of (sub)sequence comparison in molecular biology; 11. Core string edits, alignments and dynamic programming; 12. Refining core string edits and alignments; 13. Extending the core problems; 14. Multiple string comparison: the Holy Grail; 15. Sequence database and their uses: the motherlode; Part IV. Currents, Cousins and Cameos: 16. Maps, mapping, sequencing and superstrings; 17. Strings and evolutionary trees; 18. Three short topics; 19. Models of genome-level mutations.
|5 星 (0%)|
|4 星 (0%)|
|3 星 (0%)|
|2 星 (0%)|
|1 星 (0%)|
All of the major exact string algorithms are covered, including Knuth-Morris-Pratt, Boyer-Moore, Aho-Corasick and the focus of the book, suffix trees for the much harder probem of finding all repeated substrings of a given string in linear time. In addition to exact string matching, there are extensive discussions of inexact matching. Even the discussions of widely known topics like dynamic programming for edit distance are insightful; for instance, we find how to easily cut space requirements from quadratic to linear. There is also a short chapter on semi-numerical matching methods, which are also of use in information retrieval applications. Inexact matching is extended to the threshold all-against-all problem, which finds all substrings of a string that match up to a given edit distance threshold. The theoretical development concludes with the much more difficult problem of aligning multiple sequences with ultrametric trees, with applications to phylogenetic alignment for evolutionary trees (an approach that has also been applied to the evolution of natural languages).
Note that there is no discussion of statistical string matching. For that, Durbin, Eddy, Krogh and Mitchison's "Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acides" is a good choice, or for those more interested in language than biology, Manning and Schuetze's "Statistical Natural Language Processing". There is also no information on more structured string matching models such as context-free grammars, as are commonly used to analyze RNA folding or natural language syntax. Luckily, Durbin et al. and Manning and Schuetze also provide excellent coverage of these higher-order models in their books.
This book is not about efficient implementation. If you need to build these algorithms, you'll also need to know how to write efficient code and tune it for your needs. This is an algorithms book, pure and simple.
As a computer scientist, I found the discussions of computational biology to be more enlightening than in other textbooks on similar topics such as Durbin et al., because Gusfield does not assume the reader has any background in cellular biology. Instead, he provides his own clear and gentle introductions illustrated with algorithms, applications, open problems and extensive references. Like most Cambridge University Press books, this one is beautifully typeset and edited.