4.2. Working with Strings and Lists¶
Data wrangling involves transforming data from some raw form to another more
useful form. Often this raw form is text saved in a text file, which
corresponds to the Python string data structure. In this section, we will
illustrate the process of converting and transforming textual data list
comprehensions and the split, join and format methods.
4.2.1. Character classification¶
It is often helpful to examine a character and test whether it is upper- or
lowercase, or whether it is a character or a digit. The string module
provides several constants that are useful for these purposes. One of these,
string.digits is equivalent to “0123456789”. It can be used to check if a
character is a digit using the in operator.
The string string.ascii_lowercase contains all of the ascii letters that the
system considers to be lowercase. Similarly, string.ascii_uppercase contains
all of the uppercase letters. string.punctuation comprises all the
characters considered to be punctuation. Try the following and see what you get.
In [1]: import string
In [2]: string.ascii_lowercase
Out[2]: 'abcdefghijklmnopqrstuvwxyz'
In [3]: string.ascii_uppercase