Chapter 8, Lesson 2 Text

Lesson Two: String Functions

 

Puzzle spelling 'Data'Over time, programmers have found themselves doing the same operations on string data over and over again. Python contains a useful set of string functions to help you do these tasks without re-inventing the same code within your program.

You have already worked with one powerful string function - str.format() - which lets you produce nicely formatted output messages using a format string and other input data. In this lesson, we will explore some of the other functions that belong to the string object.

Calling String Functions - 3 Ways

There are at least three ways you can call Python's string functions! You have already seen one approach when we described how to call the string's format() function as follows:

str.format(<format string>,<other parameters>)        # general format
result = str.format("Your change is ${:.2f}",5.25)    # specific example

With this approach, you write "str." at the beginning, followed by the function name. The first parameter inside the parentheses is always a string value on which you want to do some work. With the format() function, that is the format string. Afterwards, you can add any remaining parameters that the function requires.

If your target string is stored in a string variable, then you can instead call the function using the variable nameinstead of "str" at the beginning. In the example below, we store the format string in the myChange variable, then call format() directly on that variable.

myChange = "Your change is ${:.2f}"     # store format string in variable first
result = myChange.format(5.25)          # call format() using string variable

With this approach, you do not need to pass in the main string value as the first parameter. The function will automatically use the value stored in the variable as the main string value.

One last approach lets you call the string function directly on a hard-coded string value, without storing it in a variable first.

result = "Your change is ${:.2f}".format(5.25)    # call format() using hard-coded string

In summary, you can call any string function using "str." at the beginning, and pass in a string value or variable as the first parameter. Or, you can call any string function using a string variable name or a hard-coded string at the beginning, and just pass in any additional parameters that the function may require.

In our lessons, we will generally use the first approach with "str." placed before the function name. This makes it obvious we are calling a string function. However, all approaches work equally well and you may see them in code written by others.

To convince yourself that each of the three ways to call string functions work as described, study the code below. We have demonstrated each approach using a string function called count(). This function will search the main string and count how many times a smaller string is found within the larger string. So, str.count("Hello","l") will return 2, because the lowercase "L" letter appears twice in that word.

Run the code yourself to see the results. We are counting the number of times the string "s" appears in bigWord, and it count() should find it 3 times.

Try It Now

bigWord = "supercalifragilisticexpialidocious"
# call function through "str." and pass in the string value or variable as the first parameter
print("Method 1:",str.count(bigWord,"s"))
# call function through variable name and pass in any remaining parameters
print("Method 2:",bigWord.count("s"))
# call function directly on a hard-coded string value and pass in any remaining parameters
print("Method 3:","supercalifragilisticexpialidocious".count("s"))
 

 

Searching Strings

Programs will often want to closely examine string data to see if the overall value contains shorter or smaller pieces (also called substrings). For example, you might ask the user to enter a name and they respond with a value that includes a prefix like "Mr.", "Mrs.", "Miss" or a suffix like "Jr.", "II". How would you figure out that an input like "Mr. Davey Jones II" contains the substring "Mr." on the front or "II" on the end?

Python contains several string functions that can be used to search a larger string for a smaller substring.

FunctionDescription
str.count(<string>,<substring>)Returns an integer number of times that substring is found in the larger string.
str.startswith(<string>,<substring>)Returns True if the substring is found at the beginning of the string, or False otherwise.
str.endswith(<string>,<substring>)Returns True if the substring is found at the end of the string, or False otherwise.
str.find(<string>,<substring>)Returns the numeric index of the first position of the substring found in the larger string. If substring is not found, -1 is returned.
str.index(<string>,<substring>)Returns the numeric index of the first position of the substring found in the larger string. If substring is not found, a run-time exception will occur.

 

Use the example code below to experiment with each of these functions. Can you predict the output in each case?

Try It Now

rhyme = "Bippety Boppety Boo"
print("count(\"B\") =",str.count(rhyme,"B"))
print("find(\"pp\") =",str.find(rhyme,"pp"))
print("find(\"bb\") =",str.find(rhyme,"bb"))
print("startswith(\"Bip\") =",str.startswith(rhyme,"Bip"))
print("endswith(\"Boo\") =",str.endswith(rhyme,"Boo"))
print("index(\"bb\") =",str.index(rhyme,"bb")) # "bb" not present, so will throw exception
 

 

Notice what happens on the last line when calling str.index(). Because the target substring is not found, an exception will be thrown, which completely halts the program. If you are not confident the target substring will be found, use str.find() instead. str.find() will return -1 when the substring is not found, and your program can continue running.

count("B")        = 3
find("pp")        = 2
find("bb")        = -1
startswith("Bip") = True
endswith("Boo")   = True
Traceback (most recent call last):
  File "code1.py", line 8, in 
    print("index(\"bb\")       =",str.index(rhyme,"bb"))        # "bb" not present, so will throw exception
ValueError: substring not found

 

Python supports one additional way you can quickly check for the presence of a substring inside a larger string. Instead of calling a string function, you can use the "in" keyword instead. Use "in" when you want to just get a quick True or False answer and are not worried about the actual position of the substring within the larger string.

The expression "<substring> in <string>" will return True if the substring is found anywhere inside the string, or False otherwise. Similarly, the expression "<substring> not in <string>" will do the exact opposite, returning Trueif the substring is not found.

Run the code below to see the "in" and "not in" logical expressions at work!

Try It Now

rhyme = "Bippety Boppety Boo"
print("\"B\" in rhyme =","B" in rhyme)
print("\"B\" not in rhyme =","B" not in rhyme)
if ("B" in rhyme ):
print("Yes, for sure, \"B\" is found in \"" + rhyme + "\"")

 

Remember that the characters (\") found inside a string is an escape sequence that results in a single double-quote (") appearing at that location. You should now be familiar with escape sequences, and we'll be using them frequently to produce nicely formatted output.

Modifying Strings

Sometimes, you will find that a string is not in exactly the right format, and you will want to make some modifications to the string value. As you know, string values are immutable, which means the original string can't be changed. However, Python contains a number of functions that will create a new string based on the original string contents.

FunctionDescription
str.captialize(<string>)Returns a new string with the first letter capitalized and the remaining characters in lower case.
str.lower(<string>)Returns a new string with all lowercase letters.
str.upper(<string>)Returns a new string with all uppercase letters.
str.replace(<string>,<old substring>, <new substring>)Returns a new string where all occurrences of the old substring are replaced by the new substring.
str.split(<string>,<separator>)Returns a list of individual words or items in the string. See below for details!
str.strip(<string>)Returns a new string with all white-space characters (spaces, tabs, etc) removed from the front and end of the string.

 

The capitalize(), lower(), upper() and strip() functions are very straight-forward. Use the sample code below to practice converting an input string into different cases and removing white-space from both ends of a string.

Try It Now

original = " i'M aLl MiXeD uP "
print(str.capitalize(original))
print(str.lower(original))
print(str.upper(original))
print(str.strip(original))

 

The replace() function is a great way to swap out all occurrences of one character or substring with another. Try running the code below to see how it works. Can you modify the code to produce the string "Yebbe Debbe Do!" ?

Try It Now

original = "Yabba Dabba Do!"
print(str.replace(original,"bb","zz")) # replace all "bb" with "zz"
print(str.replace(original,"!","?")) # replace all "!" with "?"

 

The split() function is a useful tool for data processing. Given some long input string, split() will return a list of the individual words or data fields within the string. By default, white-space characters like spaces are treated as separators, so using split() on a regular sentence will return a list of the words in the sentence. Consider the following code, which splits a quote by Julius Caesar:

quote = "Experience is the teacher of all things. - Julius Caesar"        
print(str.split(quote))
        

split() will return the following list:

['Experience', 'is', 'the', 'teacher', 'of', 'all', 'things.', '-', 'Julius', 'Caesar']

split() has a second, optional parameter that lets you change the separator. You can split() on any character or substring. The example below uses the "-" character to split the quote instead of whitespace.

quote = "Experience is the teacher of all things. - Julius Caesar"        
print(str.split(quote,"-"))
        

Because there is only one "-" character in the quote, split() will a list with just two substrings.

['Experience is the teacher of all things. ', ' Julius Caesar']

Can you add a third split() statement to the example below? Try printing out the results when you split on the character "e".

Try It Now

quote = "Experience is the teacher of all things. - Julius Caesar"
print(str.split(quote))
print(str.split(quote,"-"))

 

Notice that the separator character (in this case, 'e') is not present in the output list at all. The separator character itself is discarded, and you just get the text between each separator.

['Exp', 'ri', 'nc', ' is th', ' t', 'ach', 'r of all things. - Julius Ca', 'sar']

 

String Slices

In Python, it is easy to pull out a smaller piece of a string. This smaller substring is also called a slice. Selecting a slice is similar to selecting a character with numeric index values and square brackets. But, instead of using a single numeric value, you use a range like [0:2] instead! The starting index is placed to the left of the colon and the ending index to the right. The actual slice will not include the character at the ending index, so that value should be one greater than the last character you want to grab.

If you leave out the first number (before the colon), the slice will start at the beginning of the string. If you leave out the last number (after the colon), the slice will continue until the end of the string. Try running the example below and observe the results. Can you change the first slice to get just the first "hop" substring?

Try It Now

original = "Chop Chop!"
print(original[1:6]) # slice from position 1 up through 5, not including the character at index 6
print(original[:4]) # slice from position 0 up through 3, not including the character at index 4
print(original[5:]) # slice from position 5 through the end of the string
 

 

Further Reading

Of course, Python has more advanced and powerful string functions that might be useful in your own programs. To read more about them, click on the link below to visit the official Python documentation.

https://docs.python.org/3/library/stdtypes.html#string-methods

 

 

Work with Me: Comma-Separated-Values (CSV)

 

Imagine that you want to store someone's address in a line of text. An address has several parts like the street, citystate and zip code. You might normally write an address like this:

123 Main Street, Anywhere, GA 30004

However, a computer program will likely want to understand each piece as a separate data value. For example, you might want to search many addresses for a certain city, state or zip code. It is easiest for programs to break an address into separate data fields in order to search for particular items. We might convert the address above into a list of 4 values (one for streetcitystate and zip) as shown below.

['123 Main Street', 'Anywhere', 'GA', '30004']

How would a program convert text like "123 Main Street, Anywhere, GA 30004" into a list of data fields? Some parts are separated by commas ("Anywhere, GA"), some parts are separated by spaces ("GA 30004"), and in some cases a space doesn't really mark a new field ("123 Main Street").

You might want to use the str.split() function, but that won't work if the separator characters marking each new field are different. Programmers therefore often use line of text in Comma-Separated-Value(CSV) format to represent multiple data fields in a single string. Instead of using natural spacing like you were writing by hand, use commas between every field. The example below writes our address string in CSV format. The street, city, state and zip are all separated by commas.

123 Main Street, Anywhere, GA, 30004

Now, we can use the str.split() function with a comma separator to build our list of individual data fields!

address = "123 Main Street, Anywhere, GA, 30004"
fields = str.split(address,",")                   # split using commas
print(fields)

As a result, our fields list now contains 4 separate values:

['123 Main Street', ' Anywhere', ' GA', ' 30004']

Notice that str.split() will carefully break apart the input string based on the commas, but it does not automatically strip out any extra white-space you have added between words. So, splitting " Anywhere, GA" on a comma will produce " Anywhere" and " GA" strings, each with an extra space in front. Fortunately, you can use the str.strip() function on each value to get rid of that extra whitespace!

In this exercise, you are going to write code that turns an input address line of text into a list of data values. The user will type in a CSV-formatted address and your program will print the list of fields. You will want to remove white-space from around each data field. Complete the steps below to finish the program.

  1. Get a CSV-formatted line from the user and store in the address variable (this is already done for you).
  2. Create a list called fields by splitting the address string on the comma "," character.
  3. Set up a "for" loop with an index variable named "i" that will iterate from 0 up through the last element in the fields list. Remember, the len() function will tell you how many elements are in a list. Check the "Hint" if you need help!
  4. Inside the "for" loop, you will:
    1. Read the fields element at position "i" and use str.strip() to remove any whitespace from the ends. Store the resulting value in a variable called field.
    2. Store the field variable back into the fields list at position "i".
  5. After the loop is done, print the final fields list to the screen (this is already done for you).

 

Try It Now

address = input("Please enter an address in CSV format: ")
# split address using comma
# for loop iterating over each field index
# read the list element at index "i", strip whitespace from it, and store in field variable
# store the field value back in the list at index "i"
# Print the final, clean list of fields
print(fields)
 
  

Console

 

When you are done, try running the program a few times with different inputs to verify the results. Here are a couple of example runs.

Please enter an address in CSV format: 456 Peachtree Blvd, Atlanta, GA 30041
['456 Peachtree Blvd', 'Atlanta', 'GA 30041']
Please enter an address in CSV format:    412 Oak Lane  ,  Houston   ,  TX  77004 
['412 Oak Lane', 'Houston', 'TX 77004']

Your program should not actually care how many CSV fields are present in the input, so you can try entering addresses that have more or fewer commas and observe the results. You should also see that any whitespace you add before or after a comma is removed from the fields in the final list.


Last modified: Sunday, 18 August 2019, 9:46 PM