## Lesson One: Character Data

As you know, in Python a string data type can hold a character, word, phrase or longer stretches of text. You have learned how to gather string input from the user and print formatted string messages to the screen. In this chapter, we'll take a closer look at how strings are formed and explore some of the Python functions that allow you to search, extract and transform string content. Let's begin by closely examining the characters that make up a string value.

### Strings as Lists of Characters

A Python string is made of a series of individual characters, stored back-to-back in a certain order. You can visualize a string just like a Python list, where each element in the list is simply one character like "G". Just like list elements, string characters can be identified by an integer index that starts at 0 and goes up by 1 for each additional character in the string. For example, the string "Giraffe" has 7 characters, so the matching index values are 0 through 6.

Remember, you can read individual list elements using the list variable name, square brackets and the numeric index of the element. The same type of access can be used to get individual letters of a string! In the example below, we initialize a string to hold the value "Giraffe". We then use the variable name, square brackets and index values to read some of the characters from the string and print them to the screen. Can you predict which three characters will be displayed?

Try It Now

myString = "Giraffe"
letter0 = myString[0]
letter3 = myString[3]
letter6 = myString[6]
print(letter0)
print(letter3)
print(letter6)

Of course, you need to be careful not to use an index value that greater than or equal to the string length. Try editing the code above and using the numeric index 7 inside the square brackets. When you run that code, you will see an IndexError exception at run-time.

Traceback (most recent call last):
File "code1.py", line 3, in
letter0 = myString[7]
IndexError: string index out of range

### Strings are Immutable

In Python, once you create a string value, it is immutable, which means that object can't change. So, while you can update individual list elements by index, don't try to do the same thing to Python strings! The code below will throw a run-time exception when you try to set a character at a specific index equal to some other character.

myString = "Giraffe"
myString[0] = "X"       # try to create "Xiraffe" - will not work!

This does NOT mean that you can never change a string variable, of course. You can always assign a completely different string to a variable. In that case, you are replacing one string value with another, which is perfectly fine.

myString = "Giraffe"
myString = "Xiraffe"    # replace one string value with another - will always work

### String Length with len()

You have already used the len() function to tell you how many elements are in a list or tuple. This function works the same way on strings, and will tell you how many characters are in the string. Try experimenting with some different values for myString in the example below; do you get the expected length printed each time?

Try It Now

myString = "Rhino"
length = len(myString)
print(str.format("{} has {} characters", myString, length))

### Escape Sequences

It's easy to create strings with simple alphanumeric characters like "a"-"z" or "0"-"9". Simply type the characters you want in-between double quotes. However, if the string you want to build actually contains a double-quote, what would happen? Consider the phrase below from Edgar Allen Poe's poem, "The Raven":

Quoth the Raven, "Nevermore."

Let's try to print this to the screen as a Python string. We might first try to write code as follows:

print("Quoth the Raven, "Nevermore."")    # COMPILE ERROR - Will not work

However, this won't work! Python uses double quotes to mark the beginning and ending of strings. So, Python would recognize the characters between the first two double-quotes as a string, followed by some unrecognized and invalid characters, and finally find another (empty) string at the end.

"Quoth the Raven, "Nevermore.""

In order to successfully create a string that includes special characters like double-quotes, you need to use escape sequence inside the double-quoted string. An escape sequence is actually a combination of characters that, when present, represent a single character that might otherwise not be easy to type into a Python string.

Most escape sequences start with a backslash (\). To place a double-quote inside a string, you would write a backslash followed by the double quote (\"). Python will see this sequence as a single double-quote character. Let's fix our example so the quote will be printed correctly to the screen. Try it and verify the results.

Try It Now

print("Quoth the Raven, \"Nevermore.\"") # use double-quote escape sequences

Python defines escape sequences for a number of characters that can't be easily typed into a string within your code. Tabs, the "Enter" key, and similar special characters all have escape sequences. For example, the sequence "\n" is a string of length 1 that holds the "new line" character. This causes your console output to skip down to the next line. The print() function will automatically add a "\n" to the end of your output message (unless you tell it otherwise) - that's why each of your print() messages normally appear on a different line!

The table below describes some common Python escape sequences.

Escape Character

Description

\n

New Line

\r

Carriage Return

\t

Horizontal Tab

\’

Single Quote (‘)

\”

Double Quote (“)

\\

Backslash (\”)

Notice that the backslash character itself (\) has an escape sequence! Why is this needed? Python will normally assume that a backslash inside a string is the beginning of an escape sequence. If it is not, then you need to use an escape sequence just to add a normal backslash. The example below show the right and wrong ways to create the string "\I like backslashes\".

print("\I like backslashes\")    # Will not work - \I and \" treated as escape sequences
print("\\I like backslashes\\")  # Correct approach with escaped backslashes

Take some time to experiment with escape sequences in the example code below. You can run the code first to see sample output from each sequence, then change the strings on your own.

Try It Now

print("New\nline")
print("Carriage\rreturn")
print("Tab:\t#")
print("I can\'t believe it!")
print("I said, \"DUCK\"")
print("My Documents\\My Projects\\Hello World\\")

Notice that the new line character (\n) and the carriage return (\r) seem to have the same effect. Historically, different operating systems have used either a new line (\n) alone or a combination of carriage return and new line (\r\n) to mark the end of lines in a text file. The print() statement will use either character to advance to the next line in a Python console. However, if you are actually writing data to a file that will be used on a specific operating system, you will want to be sure to match the End-of-Line (EOL) marker required by that operating system.

Work with Me: First and Last Characters

In this exercise, you are going to write a simple program that asks the user for an input name. You'll then find the length of that name and display the first and last characters with print() statements. Use tab escape characters in both your input question and the two resulting lines of output so the data on each line is nicely lined up.

Here is an example program run. Notice that the user's input and the first and last characters all start in the same column. You can get that effect by adding one or more tab characters (\t) in the input and output strings.

What is your name?      Chris
The first character is: C
The last character is:  s

Complete the steps below to finish your program. For your input() and print() statements, match the messages displayed in the example above. Add one or more tab characters (\t) as needed to make the name and first/last characters line up vertically.

1. Use input() to get the user's name and store it in a variable called name.
2. Get the length of the name string and store it in a variable called length.
3. Get the first character from name using an index value and square brackets [] and print it to the screen.
4. Get the last character from name using the length and square brackets [] and print it to the screen. Remember the last valid index into a string is one less than the length of the string!

Try It Now

# get an input string from the user
# get the length of the string
# print the first and last characters

Console

Hint: Remember that print("string",value) will automatically add another space between the "string" and the value. However, print("string"+value) will not add any spaces! To use the plus sign (+) to join two values, both values must have a string or character data type. In order to get your data to line up, you will want to use the plus sign (+) to avoid adding extra spaces in the print() outputs, or add an extra space yourself after the tabs in the input() message.

Here is another example program run.

What is your name?      AlbertoThe first character is: AThe last character is:  o