Regular Expressions 

"Regex" 

Regex: here we will answer three questions (why - how - when).

Why is Regex needed?

or what is the problem that faced by developer that made that invent shut a feature.

so, we can make the next problem and let's thinking together how we can solve this problem without using Regex and using Regex and then compere this effort if we migrate this problem into a big scale one, Let's get started.

if we have this article and we have a mistake on it and we need to solve this mistake, we can simply move on each occurrence of this mistake in our article and select it then remove add the right one. 

let's say we have 3 lines with. separations between each word and we need to replace this with whitespace. 

let's say if this mistake is in let's say a big book let's imagen the overhead that we need to tackle this problem. 

BUT 

the answer will be come to be with Regex as you see in the right image but just using regex, I can select all these dots. and easily I can replace them with any string that I want.

as we see the solution is very easy with regex 

How to use ?

First, we 'll explain regex rules then we will take some examples on how we can use them into our project. 

there is some special characters in Regex as following:

{  }   [   ]   .   *   ?   +   ^    | 

these special characters have some special meaning in regex we will learn the meaning of all of them so if we need to use them as literal string so we need to use backslash character but be careful because adding backslash to literal string will give you a special meaning ex \n it is a new line character. 

but \. this will give you a period as literal string.

Some abbreviations:

and it's meaning. 

This is the format of the following regex: \ then followed by [capital letter or small letter]: 

Capital letter: mean exclude from search.

Small letter: mean include in search. 

\d: include all digits [0-9]

\D: exclude all digits [^0-9] 

\s and \S for whitespaces

\w and \W for alphanumeric 

[a-zA-Z0-9_]

\b and \B for word boundary 

Some special characters 

also, with its meaning. 

these 3 characters used for repetitions ("?",  ".",  " +"  ," {min , max} ")

"?" has two uses 

1-match zero or one [eg. [+-]?  match + or - or none ]

2- Lazy matching so it will be used after the regex to ask it to stop after first occurrence. [ ex  ".*?r"].



"." match any character except new line (\n) it is equal to this character class [\s\S] or [\d\D] and so on 


"+" match one or more character [at least one ] 


"Expression {min, max}" min determine the min number of repetition, max determine the max number of repetition.


"Expression{num}"   num is the exact number of repetitions. 


".+" match  at least one character (exclude  End of Line )


".*" match any character ( exclude the EOL )


this char  ^  this has two special meaning :


"^"inside [ ] brackets exclude from group [ eg.[^adc]  exclude a or b or c from beginning ]


 "^ then something  $"      this pattern used to identify [ ^  first and  $ for end of regex ]


"|" match one from multiples ((or) meaning) the difference between this and [] , here | will used at expression level not character level.


But if you need to catch also EOL you can use () grouping method   (.|\n)*


“\” escape characters that need to be used with special characters [eg. * , / ]


"()" capturing group, group some string or pattern together. 


"{}" character class 


[xx - XX] : to specify characters class from xx to XX  (eg. [0-9] -> any number from 0 to 9 )


\1 reference to first group (ex (\w)(\w)a\1\2 "hhahh haa dhadh")

(?: )  : Non-capturing Grouping : used when you need to not reference the matched string as a group

Flags

these flags help us when we write scripts using any scripting language it will be used as one of the input parameters 

g ( Global ) : allow us to select all occurrence not first match only.

m ( Multiline ) : allow to handle each line separately.

i ( case Insensitive ) : match all characters even upper case or lower case.


Online site that you can use to practice with  

practice makes perfects 

now let's try with complex examples :

we need to select all multiline comment in c 

How can we do this?????

simply use this regex: (/\*)(.|\n)*?(\*/)

this example will cover this pattern and how we can specify the first and the end of the regex.

here we search for word that end with character 'c' which mean one character. 


word start with char 'a' and any char then end with char 'c'  

here we search for any word that start with any character except 'n' and any numbers of character's then char 'g'

I want you to figure out this pattern and why this is the result form the search.

Look around either Lookahead or Lookbehind: 

this regex will give you the easiest way to search for string so let's give you a little brief about these regex search engines.

Lookahead (positive and negative)

also 

Lookbehind is (positive and negative)

positive lookahead

Format is : (?= Something )

something here may be a regex expression or any normal string that we need to match

this positive lookahead say that if this something is matched then start form here 

the following example is form Regex101 website 


/foo(?=bar)/

foobar foobaz

negative lookahead

Format is : (?! Something )

something here may be a regex expression or any normal string that we need to match

this positive lookahead say that if this something is not matched then start form here 

the following example is form Regex101 website. 


/foo(?!bar)/

foobar foobaz

positive lookbehind

Format is : (?<=Something )

something here may be a regex expression or any normal string that we need to match

this positive lookbehind say that if this something is matched then start form behind it.

the following example is form Regex101 website 


/(?<=foo)bar/

foobar fuubar


negative lookbehind

Format is : (?<! Something )

something here may be a regex expression or any normal string that we need to match

this negative lookbehind say that if this something is not matched then start behind it.


the following example is form Regex101 website 


/(?<!not )foo/

not foo but foo


now if you reached here you are able to take the next step of dealing with regex so we will used this lesson as a base to continue

with python we will use it to make our script more and more generic and efficient so you can click on this link blue to go to remaining section of  the relation between regex and python