Regex
with python
before you will get started with this tutorial about how to use regex with python I hope you have the minimum idea about what is regex.
if you need some help you can first check this tutorial and then come back to resume your learning phase.
now I think we can start, so first we will see all methods that is important to us to deal with regex then we will take an example on it
so we will start with module flags
re.I or re.IGNORECASE :
perform search case-insensitive mode
re.M or re.MULITILINE :
when '^' character is specified this mean that match at the beginning of the line and with each start line when we in string this mean that after each \n
'$' also when this character is specified this mean at the end of the line and each of end line
re.NOFLAG :
we can provide this flag when we don't need to speciy any flag
re.S or re.DOTALL :
when this flag is provided this mean when '.' is provided into your pattern then this mean match any character include new line otherwise '.' in your pattern will only match all characters except the new line
re.X or re.VERBOSE :
this flag allow you to add comment into your regex without any effect on your regex so you can do it by adding your comment after '# ' and anything came after this character till the end of line will not took into account
regex and format string
as we see here we can concatenate any need of regex with each other to built a complex one
this is import feature that you may be need some day not only with regex but with any string that you need to interpreter it as a normal string.
now we can take some examples on the previous flags and the meaning of each one of them so we can use one method "split()' until we will explain all methods later
slipt() will split you string into a list of strings depending on the matched regex.
before we start I just need to just refersh your memeory about how to built your regex in python
so we need from python only to pass regex to regex methods as it is without any interpretter to its character as they may have special meaning in normal python strring so we can use this format to built your regex
rf"some string {variale name}"
here r mean row string where tell python that don't interpretter any specail character and all characters as it is
f : mean that you tell python that this is fromat string which give you tha ability to concatinate any another variable into your string and you can give the varaible name between {}
re.I or re.IGNORECASE
as we see when we splitting using re.ignorecase the result is as we see ignoring the case of You word
Syntac:
match = re.split(r'''^you''' ,String, flags= re.DOTALL | re.I )
re.X or re.VERBOSE
as we see we inject string as a comment without any affect on the result , the result still the same
Syntac:
match = re.split(r'''^you# here we search for 'you' word through this artical
#also any character come after '#' character will not effect on your regex ''' ,String
, flags= re.DOTALL | re.I | re.X )
re.S or re.DOTALL
as we see here this difference between using DoTALL flag and without using it the main difference is how the regex will interpreter the new line character when the regex match it I hope the main idea behind this flag is clear enough because miss use of this flag may be will make a major impact into your matching result
Syntac:
match = re.split(r'''^you# here we search for 'you' word through this artical
#also any character come after '#' character will not effect on your regex ''' ,String
, flags= re.I | re.X | re.M )
re.M or re.MULITILINE
as you can see from the output the multiline flag allow us to search into all lines as separated lines without use this flag only the first match only the will be matched regardless what is the number of occurance into the next lines.
Syntac:
match = re.split(r'''^you# here we search for 'you' word through this article
#also any character come after '#' character will not effect on your regex ''' ,String
, flags= re.I | re.X )
now after we know each flag and it's functionality we'll go to the next step the regex python methods
we we can go deeper on regex methods and syntax
we will list all methods that we will use then will will see how we can use them in our python script
[ compile - match - split - findall - finditer - search - match ].
compile() and match() :
the purpose of compile method is only allow you to make an object of your pattern then use it with any string using '.' access operator
syntax :
object = re.compile(pattern)
then we can use this object through
result = object.match(StringToSearch)
match will return match object if the pattern is in the first of string other wise it will be return null even if pattern is founded in the string more than one time but not the first of the string the alternate method is search().
match method have some attributes but you will no act more with match() so if you need to know more you can check resources section.
search() :
we can use this method if we need to scan through the text about specific string
and return a group
findall() :
we can use this method if we need to scan through the text about specific string
and return a list of all matched strings.
finditer() :
here this is as findall() method but with some improvement as we can iterate over the result object and the result object has the characteristics as match() method.
split() :
split is similar to split in string class but here we have more flexibility through we can split using pattern not specific character or specific word
syntax :
re.split(pattern , string , flags= , maxsplit= )
or if we use compile method
object = compile (pattern , flags )
match = object.split(string )
sub() :
sub() is a method that we can see it as a replace all feature in any search of any editor.
syntax :
result = Object.sub(repl="Love" , string= String ,count= 2)