3.7. Sequence Methods and Working with Strings and Lists

As discussed in the last chapter, all values in Python are objects that comes bundled with any number of associated methods. In this section, we will point out some useful methods for working with sequences.

3.7.1. String Methods

We previously saw that each turtle instance has its own attributes and a number of methods that can be applied to the instance. For example, we wrote tess.right(90) when we wanted the turtle object tess to perform the right method to turn to the right 90 degrees. The “dot notation” is the way we connect the name of an object to the name of a method it can perform.

Strings are also objects. Each string instance has its own attributes and methods. The most important attribute of the string is the collection of characters. There are a wide variety of methods. Consider the following program.

In [1]: ss = "Hello, World"

In [2]: ss.upper()
Out[2]: 'HELLO, WORLD'

In [3]: ss.lower()
Out[3]: 'hello, world'

In this example, upper is a method that can be invoked on any string object to create a new string in which all the characters are in uppercase. lower works in a similar fashion changing all characters in the string to lowercase. (The original string ss remains unchanged. A new string tt is created.)

In addition to upper and lower, the following table provides a summary of some other useful string methods. There are a few examples that follow so that you can try them out.

Method Parameters Description
upper none Returns a string in all uppercase
lower none Returns a string in all lowercase
capitalize none Returns a string with first character capitalized, the rest lower
strip none Returns a string with the leading and trailing whitespace removed
lstrip none Returns a string with the leading whitespace removed
rstrip none Returns a string with the trailing whitespace removed
count item Returns the number of occurrences of item
replace old, new Replaces all occurrences of old substring with new
center width Returns a string centered in a field of width spaces
ljust width Returns a string left justified in a field of width spaces
rjust width Returns a string right justified in a field of width spaces
find item Returns the leftmost index where the substring item is found
rfind item Returns the rightmost index where the substring item is found
index item Like find except causes a runtime error if item is not found
rindex item Like rfind except causes a runtime error if item is not found

You should experiment with these methods so that you understand what they do. Note once again that the methods that return strings do not change the original. You can also consult the Python documentation for strings. Also recall that you can explore all of the methods for an object such as a strong using the dir and help functions (any string will do!).

In [4]: dir("a")
Out[4]: 
['__add__',
 '__class__',
 '__contains__',
 '__delattr__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__getnewargs__',
 '__gt__',
 '__hash__',
 '__init__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__new__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__rmod__',
 '__rmul__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__subclasshook__',
 'capitalize',
 'casefold',
 'center',
 'count',
 'encode',
 'endswith',
 'expandtabs',
 'find',
 'format',
 'format_map',
 'index',
 'isalnum',
 'isalpha',
 'isdecimal',
 'isdigit',
 'isidentifier',
 'islower',
 'isnumeric',
 'isprintable',
 'isspace',
 'istitle',
 'isupper',
 'join',
 'ljust',
 'lower',
 'lstrip',
 'maketrans',
 'partition',
 'replace',
 'rfind',
 'rindex',
 'rjust',
 'rpartition',
 'rsplit',
 'rstrip',
 'split',
 'splitlines',
 'startswith',
 'strip',
 'swapcase',
 'title',
 'translate',
 'upper',
 'zfill']

Here are some additional examples of useful methods associated with strings. First we can use the various forms of strip to remove whitespace from a string. rstrip and lstrip stand for right strip and left strip respectively. replace is used to replace one sequence of characters with another.

In [5]: ss = "    Hello, World    "

In [6]: "***" + ss.strip()  + "***"
Out[6]: '***Hello, World***'

In [7]: "***" + ss.lstrip() + "***"
Out[7]: '***Hello, World    ***'

In [8]: "***" + ss.rstrip() + "***"
Out[8]: '***    Hello, World***'

In [9]: ss.replace("o", "***")
Out[9]: '    Hell***, W***rld    '

Here are some other methods for transforming a string,

In [10]: food = "banana bread"

In [11]: food.capitalize()
Out[11]: 'Banana bread'

In [12]: "*" + food.center(25) + "*"
Out[12]: '*       banana bread      *'

In [13]: "*" + food.ljust(25)  + "*"     # stars added to show bounds
Out[13]: '*banana bread             *'

In [14]: "*" + food.rjust(25)  + "*"
Out[14]: '*             banana bread*'

We can use startswith to ask questions about the beginning of the string.

In [15]: food.startswith("b")
Out[15]: True

In [16]: food.startswith("Banana")
Out[16]: False

In [17]: food.startswith("bana")
Out[17]: True

and finally some methods for finding and counting sub-sequences.

In [18]: food.count("a")
Out[18]: 4

In [19]: food.find("e")
Out[19]: 9

In [20]: food.find("na")
Out[20]: 2

In [21]: food.find("b")
Out[21]: 0

In [22]: food.rfind("e")
Out[22]: 9

In [23]: food.rfind("na")
Out[23]: 4

In [24]: food.rfind("b")
Out[24]: 7

In [25]: food.index("e")
Out[25]: 9

Check your understanding

    rec-5-39: What is printed by the following statements?

    s = "python rocks"
    print(s.count("o") + s.count("p"))
    
  • (A) 0
  • There are definitely o and p characters.
  • (B) 2
  • There are 2 o characters but what about p?
  • (C) 3
  • Yes, add the number of o characters and the number of p characters.

    rec-5-40: What is printed by the following statements?

    s = "python rocks"
    print(s[1] * s.index("n"))
    
  • (A) yyyyy
  • Yes, s[1] is y and the index of n is 5, so 5 y characters. It is important to realize that the index method has precedence over the repetition operator. Repetition is done last.
  • (B) 55555
  • Close. 5 is not repeated, it is the number of times to repeat.
  • (C) n
  • This expression uses the index of n
  • (D) Error, you cannot combine all those things together.
  • This is fine, the repetition operator used the result of indexing and the index method.

    rec-5-41: What is printed by the following statements?

    bool = "hello".startswith("He")
    print(bool)
    
  • (A) True
  • Remember that "H" and "h" are different characters.
  • (B) False
  • "H" and "h" are different characters.

Note

This workspace is provided for your convenience. You can use this activecode window to try out anything you like.

3.7.2. List Methods

The dot operator can also be used to access built-in methods of list objects. This example shows several other list methods, all of which are easy to understand.

In [26]: mylist = [5, 27, 3, 12]

In [27]: mylist.count(12)
Out[27]: 1

In [28]: mylist.index(3)
Out[28]: 2

In [29]: list2 = sorted(mylist)

In [30]: list2
Out[30]: [3, 5, 12, 27]

In [31]: mylist is list2
Out[31]: False

In [32]: l3 = reversed(mylist)

In [33]: l3
Out[33]: <list_reverseiterator at 0x11143b358>

In [34]: l4 = list(l3)

In [35]: l4
Out[35]: [12, 3, 27, 5]

In [36]: l4 is mylist
Out[36]: False

Note

It should be noted that many of Python’s list methods mutate the list in place. While mutating data in place in memory can be efficient, it also makes code hard to read. To understand code that mutates a list (or even a variable) we are forced to track the state of each object throughout the program. Programs that focus on mutation are not only harder to understand, but harder to distribute over many machines. It is for this second reason that distributed systems such as Hadoop, MapReduce, and Spark use stateless, immutable constructions.

Note

This workspace is provided for your convenience. You can use this activecode window to try out anything you like.

3.7.3. Strings and Lists

Two of the most useful methods on strings involve lists of strings. The split method breaks a string into a list of words. By default, any number of whitespace characters is considered a word boundary.

In [37]: song = "The rain in Spain..."

In [38]: wds = song.split()

In [39]: wds
Out[39]: ['The', 'rain', 'in', 'Spain...']

An optional argument called a delimiter can be used to specify which characters to use as word boundaries. The following example uses the string ai as the delimiter:

In [40]: wds = song.split('ai')

In [41]: wds
Out[41]: ['The r', 'n in Sp', 'n...']

Notice that the delimiter doesn’t appear in the result.

The inverse of the split method is join. You choose a desired separator string, (often called the glue) and join the list with the glue between each of the elements.

In [42]: wds = ["red", "blue", "green"]

In [43]: glue = ';'

In [44]: s = glue.join(wds)

In [45]: s
Out[45]: 'red;blue;green'

In [46]: wds
Out[46]: ['red', 'blue', 'green']

In [47]: "***".join(wds)
Out[47]: 'red***blue***green'

In [48]: "".join(wds)
Out[48]: 'redbluegreen'

The list that you glue together (wds in this example) is not modified. Also, you can use empty glue or multi-character strings as glue.

Check your understanding

    rec-5-42: What is printed by the following statements?

    myname = "Edgar Allan Poe"
    namelist = myname.split()
    init = ""
    for aname in namelist:
        init = init + aname[0]
    print(init)
    
  • (A) Poe
  • Three characters but not the right ones. namelist is the list of names.
  • (B) EdgarAllanPoe
  • Too many characters in this case. There should be a single letter from each name.
  • (C) EAP
  • Yes, split creates a list of the three names. The for loop iterates through the names and creates a string from the first characters.
  • (D) William Shakespeare
  • That does not make any sense.

Note

This workspace is provided for your convenience. You can use this activecode window to try out anything you like.

3.7.4. list Type Conversion Function

Python has a built-in type conversion function called list that tries to turn whatever you give it into a list. For example, try the following:

In [49]: xs = list("Crunchy Frog")

In [50]: xs
Out[50]: ['C', 'r', 'u', 'n', 'c', 'h', 'y', ' ', 'F', 'r', 'o', 'g']

The string "Crunchy Frog" is turned into a list by taking each character in the string and placing it in a list. In general, any sequence can be turned into a list using this function. The result will be a list containing the elements in the original sequence. It is not legal to use the list conversion function on any argument that is not a sequence.

Note

Readers familiar with object oriented programming should note that these type conversion functions are actually constructors for the associated classes. The type function that we introduced earlier is actually the constructor for a meta-class, which is a class that constructs other classes.

It is also important to point out that the list conversion function will place each element of the original sequence in the new list. When working with strings, this is very different than the result of the split method. Whereas split will break a string into a list of “words”, list will always break it into a list of characters.

We give more information about working with strings and lists in the section on Common Comprehension Patterns.

Note

This workspace is provided for your convenience. You can use this activecode window to try out anything you like.

Next Section - 3.8. Computational Complexity and Big O Notation.