Javascript required
Skip to content Skip to sidebar Skip to footer

How to Read Values From Every Nth Line Perl

Perl - Regular Expressions


A regular expression is a cord of characters that defines the design or patterns you lot are viewing. The syntax of regular expressions in Perl is very similar to what you lot will find within other regular expression.supporting programs, such equally sed, grep, and awk.

The basic method for applying a regular expression is to use the blueprint binding operators =~ and !~. The get-go operator is a test and assignment operator.

In that location are iii regular expression operators within Perl.

  • Match Regular Expression - m//
  • Substitute Regular Expression - s///
  • Transliterate Regular Expression - tr///

The frontward slashes in each case act equally delimiters for the regular expression (regex) that you are specifying. If you are comfortable with any other delimiter, and then you can utilize in place of forward slash.

The Match Operator

The match operator, k//, is used to match a cord or statement to a regular expression. For example, to match the character sequence "foo" against the scalar $bar, you might use a statement like this −

#!/usr/bin/perl  $bar = "This is foo and once again foo"; if ($bar =~ /foo/) {    impress "First fourth dimension is matching\due north"; } else {    print "First time is not matching\n"; }  $bar = "foo"; if ($bar =~ /foo/) {    print "Second fourth dimension is matching\due north"; } else {    print "2nd fourth dimension is non matching\n"; }        

When above plan is executed, it produces the following result −

First time is matching Second time is matching        

The m// actually works in the same fashion as the q// operator series.you can apply any combination of naturally matching characters to human action as delimiters for the expression. For instance, 1000{}, 1000(), and k>< are all valid. So above example can be re-written as follows −

#!/usr/bin/perl  $bar = "This is foo and again foo"; if ($bar =~ m[foo]) {    print "Start time is matching\north"; } else {    print "First time is not matching\n"; }  $bar = "foo"; if ($bar =~ m{foo}) {    print "2nd time is matching\n"; } else {    print "Second time is not matching\n"; }        

You lot can omit m from m// if the delimiters are forward slashes, just for all other delimiters you lot must use the m prefix.

Note that the unabridged match expression, that is the expression on the left of =~ or !~ and the match operator, returns true (in a scalar context) if the expression matches. Therefore the statement −

$true = ($foo =~ m/foo/);        

will gear up $true to 1 if $foo matches the regex, or 0 if the match fails. In a list context, the match returns the contents of whatsoever grouped expressions. For example, when extracting the hours, minutes, and seconds from a time cord, nosotros can use −

my ($hours, $minutes, $seconds) = ($fourth dimension =~ m/(\d+):(\d+):(\d+)/);        

Match Operator Modifiers

The lucifer operator supports its own set of modifiers. The /g modifier allows for global matching. The /i modifier will make the match instance insensitive. Here is the complete list of modifiers

Sr.No. Modifier & Description
one

i

Makes the friction match example insensitive.

ii

k

Specifies that if the cord has newline or carriage render characters, the ^ and $ operators will at present lucifer against a newline boundary, instead of a string boundary.

3

o

Evaluates the expression only once.

4

s

Allows use of . to friction match a newline graphic symbol.

5

x

Allows you lot to use white space in the expression for clarity.

six

thousand

Globally finds all matches.

7

cg

Allows the search to proceed even after a global friction match fails.

Matching Only One time

At that place is likewise a simpler version of the friction match operator - the ?PATTERN? operator. This is basically identical to the yard// operator except that it simply matches once within the string you lot are searching between each call to reset.

For case, you can use this to get the first and concluding elements within a list −

#!/usr/bin/perl  @list = qw/food foosball subeo footnote terfoot canic footbrdige/;  foreach (@list) {    $first = $1 if /(foo.*?)/;    $last = $1 if /(foo.*)/; } print "First: $first, Final: $last\n";        

When above plan is executed, it produces the following result −

First: foo, Last: footbrdige        

Regular Expression Variables

Regular expression variables include $, which contains whatever the final grouping match matched; $&, which contains the unabridged matched cord; $`, which contains everything before the matched string; and $', which contains everything after the matched cord. Following code demonstrates the consequence −

#!/usr/bin/perl  $string = "The food is in the salad bar"; $cord =~ m/foo/; impress "Before: $`\n"; print "Matched: $&\north"; print "After: $'\n";        

When above program is executed, it produces the post-obit issue −

Before: The Matched: foo After: d is in the salad bar        

The Substitution Operator

The substitution operator, due south///, is really just an extension of the lucifer operator that allows you to replace the text matched with some new text. The basic form of the operator is −

due south/PATTERN/REPLACEMENT/;        

The PATTERN is the regular expression for the text that we are looking for. The REPLACEMENT is a specification for the text or regular expression that we want to apply to replace the found text with. For instance, we can replace all occurrences of dog with cat using the following regular expression −

#/user/bin/perl  $cord = "The cat sat on the mat"; $string =~ s/cat/dog/;  print "$string\northward";        

When above programme is executed, information technology produces the following result −

The dog saturday on the mat        

Substitution Operator Modifiers

Here is the list of all the modifiers used with substitution operator.

Sr.No. Modifier & Description
1

i

Makes the match case insensitive.

2

k

Specifies that if the string has newline or wagon return characters, the ^ and $ operators will now match confronting a newline boundary, instead of a string purlieus.

three

o

Evaluates the expression only one time.

4

s

Allows use of . to match a newline character.

v

10

Allows y'all to use white infinite in the expression for clarity.

6

g

Replaces all occurrences of the found expression with the replacement text.

7

e

Evaluates the replacement as if it were a Perl statement, and uses its return value as the replacement text.

The Translation Operator

Translation is similar, but not identical, to the principles of substitution, but dissimilar substitution, translation (or transliteration) does non utilize regular expressions for its search on replacement values. The translation operators are −

tr/SEARCHLIST/REPLACEMENTLIST/cds y/SEARCHLIST/REPLACEMENTLIST/cds        

The translation replaces all occurrences of the characters in SEARCHLIST with the corresponding characters in REPLACEMENTLIST. For instance, using the "The true cat saturday on the mat." string we take been using in this chapter −

#/user/bin/perl  $cord = 'The cat saturday on the mat'; $cord =~ tr/a/o/;  print "$string\n";        

When above plan is executed, it produces the following event −

The cot sot on the mot.        

Standard Perl ranges can likewise be used, allowing you lot to specify ranges of characters either by letter or numerical value. To change the case of the cord, yous might use the following syntax in identify of the uc function.

$string =~ tr/a-z/A-Z/;        

Translation Operator Modifiers

Post-obit is the listing of operators related to translation.

Sr.No. Modifier & Description
1

c

Complements SEARCHLIST.

2

d

Deletes found but unreplaced characters.

three

s

Squashes duplicate replaced characters.

The /d modifier deletes the characters matching SEARCHLIST that do not have a respective entry in REPLACEMENTLIST. For example −

#!/usr/bin/perl   $string = 'the cat sabbatum on the mat.'; $string =~ tr/a-z/b/d;  print "$string\due north";        

When above program is executed, it produces the following issue −

b b   b.        

The last modifier, /s, removes the duplicate sequences of characters that were replaced, so −

#!/usr/bin/perl  $string = 'food'; $string = 'nutrient'; $string =~ tr/a-z/a-z/s;  print "$cord\northward";        

When above program is executed, it produces the following result −

fod        

More Complex Regular Expressions

Yous don't merely have to match on fixed strings. In fact, y'all can lucifer on just about anything y'all could dream of by using more circuitous regular expressions. Hither'due south a quick cheat canvass −

Following tabular array lists the regular expression syntax that is available in Python.

Sr.No. Pattern & Description
i

^

Matches beginning of line.

2

$

Matches end of line.

3

.

Matches whatever single grapheme except newline. Using chiliad option allows it to match newline as well.

4

[...]

Matches any single grapheme in brackets.

5

[^...]

Matches any single character non in brackets.

6

*

Matches 0 or more than occurrences of preceding expression.

seven

+

Matches one or more occurrence of preceding expression.

8

?

Matches 0 or i occurrence of preceding expression.

9

{ n}

Matches exactly n number of occurrences of preceding expression.

10

{ n,}

Matches n or more occurrences of preceding expression.

11

{ n, k}

Matches at least n and at nearly m occurrences of preceding expression.

12

a| b

Matches either a or b.

thirteen

\w

Matches word characters.

14

\W

Matches nonword characters.

fifteen

\s

Matches whitespace. Equivalent to [\t\n\r\f].

16

\S

Matches nonwhitespace.

17

\d

Matches digits. Equivalent to [0-nine].

18

\D

Matches nondigits.

19

\A

Matches commencement of cord.

20

\Z

Matches end of string. If a newline exists, information technology matches just before newline.

21

\z

Matches end of cord.

22

\Thousand

Matches indicate where terminal friction match finished.

23

\b

Matches give-and-take boundaries when outside brackets. Matches backspace (0x08) when within brackets.

24

\B

Matches nonword boundaries.

25

\n, \t, etc.

Matches newlines, carriage returns, tabs, etc.

26

\1...\9

Matches nth grouped subexpression.

27

\x

Matches nth grouped subexpression if information technology matched already. Otherwise refers to the octal representation of a character code.

28

[aeiou]

Matches a unmarried character in the given set

29

[^aeiou]

Matches a single character exterior the given set

The ^ metacharacter matches the first of the string and the $ metasymbol matches the end of the string. Hither are some brief examples.

# nothing in the string (outset and finish are adjacent) /^$/     # a three digits, each followed past a whitespace # character (eg "3 4 five ") /(\d\due south) {3}/    # matches a string in which every # odd-numbered letter is a (eg "abacadaf") /(a.)+/    # string starts with i or more digits /^\d+/  # string that ends with one or more digits /\d+$/        

Lets take a look at another example.

#!/usr/bin/perl  $string = "Cats go Catatonic\nWhen given Catnip"; ($first) = ($cord =~ /\A(.*?) /); @lines = $string =~ /^(.*?) /gm; impress "First word: $beginning\n","Line starts: @lines\n";        

When above program is executed, information technology produces the following result −

First word: Cats Line starts: Cats When        

Matching Boundaries

The \b matches at any word purlieus, equally defined by the departure betwixt the \w class and the \W form. Considering \w includes the characters for a word, and \W the opposite, this normally means the termination of a word. The \B assertion matches any position that is not a discussion boundary. For case −

/\bcat\b/ # Matches 'the cat sat' but not 'true cat on the mat' /\Bcat\B/ # Matches 'verification' just not 'the cat on the mat' /\bcat\B/ # Matches 'catatonic' but non 'polecat' /\Bcat\b/ # Matches 'polecat' but non 'catatonic'        

Selecting Alternatives

The | grapheme is just like the standard or bitwise OR inside Perl. Information technology specifies alternate matches inside a regular expression or group. For example, to match "cat" or "domestic dog" in an expression, you might use this −

if ($string =~ /true cat|dog/)        

You tin group individual elements of an expression together in order to support complex matches. Searching for two people'due south names could be achieved with ii split tests, like this −

if (($string =~ /Martin Brown/) ||  ($cord =~ /Sharon Brown/))  This could be written as follows  if ($cord =~ /(Martin|Sharon) Dark-brown/)        

Group Matching

From a regular-expression point of view, there is no departure between except, perhaps, that the quondam is slightly clearer.

$string =~ /(\S+)\s+(\South+)/;  and   $string =~ /\S+\southward+\Southward+/;        

However, the benefit of grouping is that information technology allows usa to extract a sequence from a regular expression. Groupings are returned as a list in the order in which they appear in the original. For example, in the following fragment we have pulled out the hours, minutes, and seconds from a string.

my ($hours, $minutes, $seconds) = ($time =~ m/(\d+):(\d+):(\d+)/);        

Too every bit this direct method, matched groups are also available within the special $ten variables, where x is the number of the group within the regular expression. We could therefore rewrite the preceding example as follows −

#!/usr/bin/perl  $time = "12:05:thirty";  $time =~ one thousand/(\d+):(\d+):(\d+)/; my ($hours, $minutes, $seconds) = ($1, $2, $three);  print "Hours : $hours, Minutes: $minutes, Second: $seconds\northward";        

When in a higher place program is executed, it produces the following event −

Hours : 12, Minutes: 05, Second: 30        

When groups are used in exchange expressions, the $ten syntax tin exist used in the replacement text. Thus, we could reformat a date string using this −

#!/usr/bin/perl  $date = '03/26/1999'; $date =~ s#(\d+)/(\d+)/(\d+)#$iii/$1/$2#;  print "$date\n";        

When above program is executed, information technology produces the following effect −

1999/03/26        

The \Thousand Exclamation

The \G exclamation allows you to continue searching from the point where the last friction match occurred. For example, in the following code, nosotros accept used \Chiliad so that nosotros can search to the correct position and then extract some information, without having to create a more complex, single regular expression −

#!/usr/bin/perl  $cord = "The time is: 12:31:02 on 4/12/00";  $string =~ /:\south+/g; ($time) = ($string =~ /\G(\d+:\d+:\d+)/); $string =~ /.+\southward+/g; ($date) = ($cord =~ m{\K(\d+/\d+/\d+)});  impress "Time: $fourth dimension, Date: $date\northward";        

When above program is executed, it produces the following result −

Fourth dimension: 12:31:02, Date: four/12/00        

The \G assertion is really just the metasymbol equivalent of the pos role, then between regular expression calls yous can continue to apply pos, and even modify the value of pos (and therefore \K) by using pos equally an lvalue subroutine.

Regular-expression Examples

Literal Characters

Sr.No. Example & Clarification
1

Perl

Lucifer "Perl".

Graphic symbol Classes

Sr.No. Example & Description
1

[Pp]ython

Matches "Python" or "python"

2

rub[ye]

Matches "ruby" or "rube"

3

[aeiou]

Matches any one lowercase vowel

4

[0-nine]

Matches any digit; same as [0123456789]

5

[a-z]

Matches whatsoever lowercase ASCII alphabetic character

half dozen

[A-Z]

Matches any uppercase ASCII letter of the alphabet

seven

[a-zA-Z0-9]

Matches any of the to a higher place

8

[^aeiou]

Matches anything other than a lowercase vowel

nine

[^0-9]

Matches anything other than a digit

Special Character Classes

Sr.No. Instance & Description
1

.

Matches any character except newline

ii

\d

Matches a digit: [0-9]

three

\D

Matches a nondigit: [^0-9]

4

\southward

Matches a whitespace character: [ \t\r\n\f]

5

\S

Matches nonwhitespace: [^ \t\r\north\f]

6

\west

Matches a single discussion graphic symbol: [A-Za-z0-9_]

7

\Due west

Matches a nonword character: [^A-Za-z0-9_]

Repetition Cases

Sr.No. Case & Description
ane

crimson?

Matches "rub" or "ruby": the y is optional

2

carmine*

Matches "rub" plus 0 or more ys

3

ruby+

Matches "rub" plus 1 or more than ys

4

\d{3}

Matches exactly 3 digits

5

\d{3,}

Matches 3 or more digits

half dozen.

\d{iii,5}

Matches three, 4, or 5 digits

Nongreedy Repetition

This matches the smallest number of repetitions −

Sr.No. Example & Description
1

<.*>

Greedy repetition: matches "<python>perl>"

ii

<.*?>

Nongreedy: matches "<python>" in "<python>perl>"

Group with Parentheses

Sr.No. Example & Description
one

\D\d+

No group: + repeats \d

2

(\D\d)+

Grouped: + repeats \D\d pair

3

([Pp]ython(, )?)+

Match "Python", "Python, python, python", etc.

Backreferences

This matches a previously matched grouping again −

Sr.No. Example & Description
i

([Pp])ython&\1ails

Matches python&pails or Python&Pails

ii

(['"])[^\1]*\1

Single or double-quoted string. \ane matches whatever the 1st group matched. \2 matches whatever the second grouping matched, etc.

Alternatives

Sr.No. Example & Description
one

python|perl

Matches "python" or "perl"

two

rub(y|le))

Matches "red" or "ruble"

3

Python(!+|\?)

"Python" followed by one or more ! or one ?

Anchors

This demand to specify match positions.

Sr.No. Example & Description
1

^Python

Matches "Python" at the get-go of a string or internal line

2

Python$

Matches "Python" at the end of a string or line

3

\APython

Matches "Python" at the outset of a string

4

Python\Z

Matches "Python" at the stop of a string

5

\bPython\b

Matches "Python" at a discussion boundary

6

\brub\B

\B is nonword purlieus: friction match "rub" in "rube" and "red" but not alone

7

Python(?=!)

Matches "Python", if followed by an exclamation point

8

Python(?!!)

Matches "Python", if not followed by an exclamation point

Special Syntax with Parentheses

Sr.No. Example & Clarification
1

R(?#comment)

Matches "R". All the rest is a comment

ii

R(?i)uby

Case-insensitive while matching "uby"

3

R(?i:uby)

Same as above

iv

rub(?:y|le))

Group simply without creating \1 backreference

Useful Video Courses


Perl Online Training

Video

COMPLETE PERL Programming

Video

Perl for Beginners: Learn A to Z of Perl Scripting Hands-on

Video

Hyperledger Fabric 2.x - First Practical Blockchain

Video

vrolandbacticeived.blogspot.com

Source: https://www.tutorialspoint.com/perl/perl_regular_expressions.htm