O'Reilly - Mastering Regular Expressions 2nd Edition.pdf

(6308 KB) Pobierz
81440327 UNPDF
Powerful Techniques for Perl and Other Tools
Regular
Expressions
Jeffrey E. F. Friedl
Mastering
81440327.002.png
Ta ble of Contents
Preface ..................................................................................................................... xv
1: Introduction to Regular Expressions ...................................................... 1
Solving Real Problems ........................................................................................ 2
Regular Expressions as a Language ................................................................... 4
The Filename Analogy ................................................................................. 4
The Language Analogy ................................................................................ 5
The Regular-Expr ession Frame of Mind ............................................................ 6
If You Have Some Regular-Expr ession Experience ................................... 6
Searching Text Files: Egrep ......................................................................... 6
Egr ep Metacharacters .......................................................................................... 8
Start and End of the Line ............................................................................. 8
Character Classes .......................................................................................... 9
Matching Any Character with Dot ............................................................. 11
Alter nation .................................................................................................. 13
Ignoring Differ ences in Capitalization ...................................................... 14
Word Boundaries ........................................................................................ 15
In a Nutshell ............................................................................................... 16
Optional Items ............................................................................................ 17
Other Quantiers: Repetition .................................................................... 18
Par entheses and Backrefer ences ............................................................... 20
The Great Escape ....................................................................................... 22
Expanding the Foundation ............................................................................... 23
Linguistic Diversication ............................................................................ 23
The Goal of a Regular Expression ............................................................ 23
81440327.003.png
viii
Table of Contents
AFew MoreExamples ............................................................................... 23
Regular Expression Nomenclature............................................................ 27
Impr oving on the Status Quo .................................................................... 30
Summary ..................................................................................................... 32
Personal Glimpses ............................................................................................ 33
2: Extended Introductor y Examples .......................................................... 35
About the Examples .......................................................................................... 36
AShort Introduction to Perl ...................................................................... 37
Matching Text with Regular Expressions ......................................................... 38
Toward a MoreReal-World Example ........................................................ 40
Side Effects of a Successful Match ............................................................ 40
Intertwined Regular Expressions ............................................................... 43
Inter mission ................................................................................................ 49
Modifying Text with Regular Expressions ....................................................... 50
Example: FormLetter ................................................................................. 50
Example: Prettifying a Stock Price ............................................................ 51
Automated Editing ...................................................................................... 53
ASmall Mail Utility ..................................................................................... 53
Adding Commas to a Number with Lookaround ..................................... 59
Text-to- HTML Conversion ........................................................................... 67
That Doubled-Word Thing ......................................................................... 77
3: Over viewofRegular Expression Features and Flavors ................ 83
ACasual Stroll Across the Regex Landscape ................................................... 85
The Origins of Regular Expressions .......................................................... 85
At a Glance ................................................................................................. 91
Car e and Handling of Regular Expressions ..................................................... 93
Integrated Handling ................................................................................... 94
Pr ocedural and Object-Oriented Handling ............................................... 95
ASearch-and-Replace Example ................................................................. 97
Search and Replace in Other Languages .................................................. 99
Car e and Handling: Summary ................................................................. 101
Strings, Character Encodings, and Modes ...................................................... 101
Strings as Regular Expressions ................................................................ 101
Character-Encoding Issues ....................................................................... 105
Regex Modes and Match Modes .............................................................. 109
Common Metacharacters and Features .......................................................... 112
Character Representations ....................................................................... 114
81440327.004.png
Ta ble of Contents
ix
Character Classes and Class-Like Constructs .......................................... 117
Anchors and Other Zero-Width Assertions .......................................... 127
Comments and Mode Modiers .............................................................. 133
Gr ouping, Capturing, Conditionals, and Control ................................... 135
Guide to the Advanced Chapters ................................................................... 141
4: The Mechanics of Expression Processing .......................................... 143
Start Your Engines! .......................................................................................... 143
TwoKinds of Engines .............................................................................. 144
New Standards .......................................................................................... 144
Regex Engine Types ................................................................................. 145
Fr om the Department of Redundancy Department ................................ 146
Testing the Engine Type .......................................................................... 146
Match Basics .................................................................................................... 147
About the Examples ................................................................................. 147
Rule 1: The Match That Begins Earliest Wins ......................................... 148
Engine Pieces and Parts ........................................................................... 149
Rule 2: The Standard Quantiers AreGreedy ........................................ 151
Regex-Dir ected Versus Text-Dir ected ............................................................ 153
NFA Engine: Regex-Directed .................................................................... 153
DFA Engine: Text-Dir ected ....................................................................... 155
First Thoughts: NFA and DFA in Comparison .......................................... 156
Backtracking .................................................................................................... 157
AReally Crummy Analogy ....................................................................... 158
TwoImportant Points on Backtracking .................................................. 159
Saved States .............................................................................................. 159
Backtracking and Greediness .................................................................. 162
Mor e About Greediness and Backtracking .................................................... 163
Pr oblems of Greediness ........................................................................... 164
Multi-Character Quotes ......................................................................... 165
Using Lazy Quantiers ............................................................................. 166
Gr eediness and Laziness Always Favor a Match .................................... 167
The Essence of Greediness, Laziness, and Backtracking ....................... 168
Possessive Quantiers and Atomic Grouping ........................................ 169
Possessive Quantiers, ?+ , + + , ++ ,and {m,n}+ ......................................... 172
The Backtracking of Lookaround ............................................................ 173
Is Alternation Greedy? .............................................................................. 174
Taking Advantage of Ordered Alternation .............................................. 175
NFA , DFA ,and POSIX ....................................................................................... 177
81440327.005.png
x
T able of Contents
The Longest-Leftmost ............................................................................ 177
POSIX and the Longest-Leftmost Rule ..................................................... 178
Speed and Efciency ................................................................................ 179
Summary: NFA and DFA in Comparison .................................................. 180
Summary .......................................................................................................... 183
5: Practical Regex Techniques .................................................................... 185
Regex Balancing Act ....................................................................................... 186
AFew Short Examples .................................................................................... 186
Continuing with Continuation Lines ....................................................... 186
Matching an IP Addr ess ........................................................................... 187
Working with Filenames .......................................................................... 190
Matching Balanced Sets of Parentheses .................................................. 193
Watching Out for Unwanted Matches ..................................................... 194
Matching Delimited Text .......................................................................... 196
Knowing Your Data and Making Assumptions ...................................... 198
Stripping Leading and Trailing Whitespace ............................................ 199
HTML -Related Examples .................................................................................. 200
Matching an HTML Tag............................................................................. 200
Matching an HTML Link ............................................................................ 201
Examining an HT TP URL .......................................................................... 203
Validating a Hostname ............................................................................. 203
Plucking Out a URL in the Real World .................................................... 205
Extended Examples ........................................................................................ 208
Keeping in Sync with Your Data ............................................................. 208
Parsing CSV Files ...................................................................................... 212
6: Crafting an Efcient Expression ........................................................... 221
ASobering Example ....................................................................................... 222
ASimple Change Placing Your Best Foot Forward ............................. 223
Ef ciency Verses Correctness .................................................................. 223
Advancing Further Localizing the Greediness ..................................... 225
Reality Check ............................................................................................ 226
AGlobal View of Backtracking ...................................................................... 228
Mor e Work for a POSIX NFA ..................................................................... 229
Work Required During a Non-Match ...................................................... 230
Being MoreSpecic ................................................................................. 231
Alter nation Can Be Expensive ................................................................. 231
Benchmarking ................................................................................................. 232
81440327.001.png
Zgłoś jeśli naruszono regulamin