Sharing

2013年5月2日 星期四

Perl 學習手冊第六版筆記(三)


Chapter 7 In the World of Regular Expressions


http://perldoc.perl.org/perlre.html

Using Simple Patterns


Unicode Properties


http://perldoc.perl.org/perluniprops.html


if (/\p{Space}/) { # 26 different possible characters
    print "The string has some whitespace.\n";
}

if (/\p{Digit}/) { # 411 different possible characters
    print "The string has a digit.\n";
}

if (/\P{Space}/) { # Not space (many many characters!)
    print "The string has one or more non-whitespace characters.\n";
}


Back Reference

You denote a back reference as a backslash followed by a number, like \1, \2, and so on.
Refers to the capture of an already completed pattern match, like $1, $2, and so on,

$_ = "abba";
if (/(.)\1/) { # matches 'bb'
    print "It matched same character next to itself!\n";
}

$_ = "Hello there, neighbor";
if (/(\S+) (\S+), (\S+)/) {
print "words were $1 $2 $3\n";
}



Perl 5.10 introduced a new way to denote back references. Instead of using the back-slash and a number, you can use \g{N}, where N is the number of the back reference that you want to use.
$_ = "aa11bb";
if (/(.)\g{1}11/) {
    print "It matched!\n";
}

you can specify a relative back reference. You can rewrite the last example to use –1 as the number to do the same thing:

$_ = "aa11bb";
if (/(.)\g{–1}11/) {
    print "It matched!\n";
}

Chapter 8 Matching with Regular Expressions


Match Modifiers

Case-Insensitive Matching with /i


Matching Any Character with /s

If you might have newlines in your strings, and you want the dot to be able to match them, the /s modifier will do the job.

Adding Whitespace with /x

allows you to add arbitrary whitespace to a pattern, in order to make it easier to read

/-?[0-9]+\.?[0-9]*/               # what is this doing?
/ -? [0-9]+ \.? [0-9]* /x         # a little better

The Match Variables

Noncapturing Parentheses

To skip a match variable, you use (?:PATTERN)
if (/(?:bronto)?saurus (steak|burger)/) {
    print "Fred wants a $1\n";
}

Named Captures

To label a match variable, you use (?
my $names = 'Fred or Barney';
  if ( $names =~ m/(?<name1>\w+) (?:and|or) (?<name2>\w+)/ ) {
  say "I saw $+{name1} and $+{name2}";
}

my $names = 'Fred Flintstone and Wilma Flintstone';
  if ( $names =~ m/(?<last_name>\w+) and \w+ \g{last_name}/ ) {
  say "I saw $+{last_name}";
}

The Automatic Match Variables

$& : entire matched section
$` : holds whatever the regular expression engine had to skip over before it found the match
$' : has the remainder of the string that the pattern never got to

if ("Hello there, neighbor" =~ /\s(\w+),/) {
    print "That was ($`)($&)($').\n";       # show (Hello)( there,)( neighbor)
}

Instead of $`, $&, or $', you use ${^PREMATCH}, ${^MATCH}, or ${^POSTMATCH}


General Quantifiers

A comma-separated pair of numbers inside curly braces ({}) to specify exactly how few and how many repetitions you want.
So the pattern /a{5,15}/ will match from five to fifteen repetitions of the letter a
So, /(fred){3,}/ will match if there are three or more instances of fred


Precedence

Regular expression feature
Example
Parentheses (grouping or capturing)
(...), (?:...), (?
Quantifiers
a*, a+, a?, a{n,m}
Anchors and sequence
abc, ^, $, \A, \b, \z, \Z
Alternation
a|b|c
Atoms
a, [abc], \d, \1, \g{2}


Chapter 9 Processing Text with Regular Expressions


Substitutions with s///

Global Replacements with /g

Different Delimiters


These are acceptable.
s#^https://#http://#;
s{fred}{barney};
s[fred](barney);
s<fred>#barney#;

Case Shifting


\U: forces what follows to all uppercase
\L: forces what follows to all uppercase
\u: uppercase next character
\l: lowercase next character
\u\L: all lowercase but captialize the first letter
\l\U: all Uppercase but lower the first letter
\E: turn off case shifting

The split Operator

The join Function

m// in List Context

my $text = "Fred dropped a 5 ton granite block on Mr. Slate";
my @words = ($text =~ /([a-z]+)/ig);
print "Result: @words\n";
# Result: Fred dropped a ton granite block on Mr Slate

Nongreedy Quantifiers


+?
*?
??

$_ = "I thought you said Fred and <BOLD>Velma</BOLD>, not <BOLD>Wilma</BOLD>";
s#<BOLD>(.*?)</BOLD>#$1#g;


沒有留言: