Mastering Regular Expressions_ Powerful Techniques for Perl and Other Tools (2nd ed.) [Friedl 2002-07-15].pdf

(6308 KB) Pobierz
81440327 UNPDF
Powerful Techniques for Perl and Other Tools
Regular
Expressions
Jeffrey E. F. Friedl
Mastering
81440327.002.png
Ta ble of Contents
Preface ..................................................................................................................... xv
1: Introduction to Regular Expressions ...................................................... 1
Solving Real Problems ........................................................................................ 2
Regular Expressions as a Language ................................................................... 4
The Filename Analogy ................................................................................. 4
The Language Analogy ................................................................................ 5
The Regular-Expr ession Frame of Mind ............................................................ 6
If You Have Some Regular-Expr ession Experience ................................... 6
Searching Text Files: Egrep ......................................................................... 6
Egr ep Metacharacters .......................................................................................... 8
Start and End of the Line ............................................................................. 8
Character Classes .......................................................................................... 9
Matching Any Character with Dot ............................................................. 11
Alter nation .................................................................................................. 13
Ignoring Differ ences in Capitalization ...................................................... 14
Word Boundaries ........................................................................................ 15
In a Nutshell ............................................................................................... 16
Optional Items ............................................................................................ 17
Other Quantiers: Repetition .................................................................... 18
Par entheses and Backrefer ences ............................................................... 20
The Great Escape ....................................................................................... 22
Expanding the Foundation ............................................................................... 23
Linguistic Diversication ............................................................................ 23
The Goal of a Regular Expression ............................................................ 23
81440327.003.png
viii
Table of Contents
AFew MoreExamples ............................................................................... 23
Regular Expression Nomenclature............................................................ 27
Impr oving on the Status Quo .................................................................... 30
Summary ..................................................................................................... 32
Personal Glimpses ............................................................................................ 33
2: Extended Introductor y Examples .......................................................... 35
About the Examples .......................................................................................... 36
AShort Introduction to Perl ...................................................................... 37
Matching Text with Regular Expressions ......................................................... 38
Toward a MoreReal-World Example ........................................................ 40
Side Effects of a Successful Match ............................................................ 40
Intertwined Regular Expressions ............................................................... 43
Inter mission ................................................................................................ 49
Modifying Text with Regular Expressions ....................................................... 50
Example: FormLetter ................................................................................. 50
Example: Prettifying a Stock Price ............................................................ 51
Automated Editing ...................................................................................... 53
ASmall Mail Utility ..................................................................................... 53
Adding Commas to a Number with Lookaround ..................................... 59
Text-to- HTML Conversion ........................................................................... 67
That Doubled-Word Thing ......................................................................... 77
3: Over viewofRegular Expression Features and Flavors ................ 83
ACasual Stroll Across the Regex Landscape ................................................... 85
The Origins of Regular Expressions .......................................................... 85
At a Glance ................................................................................................. 91
Car e and Handling of Regular Expressions ..................................................... 93
Integrated Handling ................................................................................... 94
Pr ocedural and Object-Oriented Handling ............................................... 95
ASearch-and-Replace Example ................................................................. 97
Search and Replace in Other Languages .................................................. 99
Car e and Handling: Summary ................................................................. 101
Strings, Character Encodings, and Modes ...................................................... 101
Strings as Regular Expressions ................................................................ 101
Character-Encoding Issues ....................................................................... 105
Regex Modes and Match Modes .............................................................. 109
Common Metacharacters and Features .......................................................... 112
Character Representations ....................................................................... 114
81440327.004.png
Ta ble of Contents
ix
Character Classes and Class-Like Constructs .......................................... 117
Anchors and Other Zero-Width Assertions .......................................... 127
Comments and Mode Modiers .............................................................. 133
Gr ouping, Capturing, Conditionals, and Control ................................... 135
Guide to the Advanced Chapters ................................................................... 141
4: The Mechanics of Expression Processing .......................................... 143
Start Your Engines! .......................................................................................... 143
TwoKinds of Engines .............................................................................. 144
New Standards .......................................................................................... 144
Regex Engine Types ................................................................................. 145
Fr om the Department of Redundancy Department ................................ 146
Testing the Engine Type .......................................................................... 146
Match Basics .................................................................................................... 147
About the Examples ................................................................................. 147
Rule 1: The Match That Begins Earliest Wins ......................................... 148
Engine Pieces and Parts ........................................................................... 149
Rule 2: The Standard Quantiers AreGreedy ........................................ 151
Regex-Dir ected Versus Text-Dir ected ............................................................ 153
NFA Engine: Regex-Directed .................................................................... 153
DFA Engine: Text-Dir ected ....................................................................... 155
First Thoughts: NFA and DFA in Comparison .......................................... 156
Backtracking .................................................................................................... 157
AReally Crummy Analogy ....................................................................... 158
TwoImportant Points on Backtracking .................................................. 159
Saved States .............................................................................................. 159
Backtracking and Greediness .................................................................. 162
Mor e About Greediness and Backtracking .................................................... 163
Pr oblems of Greediness ........................................................................... 164
Multi-Character Quotes ......................................................................... 165
Using Lazy Quantiers ............................................................................. 166
Gr eediness and Laziness Always Favor a Match .................................... 167
The Essence of Greediness, Laziness, and Backtracking ....................... 168
Possessive Quantiers and Atomic Grouping ........................................ 169
Possessive Quantiers, ?+ , + + , ++ ,and {m,n}+ ......................................... 172
The Backtracking of Lookaround ............................................................ 173
Is Alternation Greedy? .............................................................................. 174
Taking Advantage of Ordered Alternation .............................................. 175
NFA , DFA ,and POSIX ....................................................................................... 177
81440327.005.png
x
T able of Contents
The Longest-Leftmost ............................................................................ 177
POSIX and the Longest-Leftmost Rule ..................................................... 178
Speed and Efciency ................................................................................ 179
Summary: NFA and DFA in Comparison .................................................. 180
Summary .......................................................................................................... 183
5: Practical Regex Techniques .................................................................... 185
Regex Balancing Act ....................................................................................... 186
AFew Short Examples .................................................................................... 186
Continuing with Continuation Lines ....................................................... 186
Matching an IP Addr ess ........................................................................... 187
Working with Filenames .......................................................................... 190
Matching Balanced Sets of Parentheses .................................................. 193
Watching Out for Unwanted Matches ..................................................... 194
Matching Delimited Text .......................................................................... 196
Knowing Your Data and Making Assumptions ...................................... 198
Stripping Leading and Trailing Whitespace ............................................ 199
HTML -Related Examples .................................................................................. 200
Matching an HTML Tag............................................................................. 200
Matching an HTML Link ............................................................................ 201
Examining an HT TP URL .......................................................................... 203
Validating a Hostname ............................................................................. 203
Plucking Out a URL in the Real World .................................................... 205
Extended Examples ........................................................................................ 208
Keeping in Sync with Your Data ............................................................. 208
Parsing CSV Files ...................................................................................... 212
6: Crafting an Efcient Expression ........................................................... 221
ASobering Example ....................................................................................... 222
ASimple Change Placing Your Best Foot Forward ............................. 223
Ef ciency Verses Correctness .................................................................. 223
Advancing Further Localizing the Greediness ..................................... 225
Reality Check ............................................................................................ 226
AGlobal View of Backtracking ...................................................................... 228
Mor e Work for a POSIX NFA ..................................................................... 229
Work Required During a Non-Match ...................................................... 230
Being MoreSpecic ................................................................................. 231
Alter nation Can Be Expensive ................................................................. 231
Benchmarking ................................................................................................. 232
81440327.001.png
Zgłoś jeśli naruszono regulamin