4.2. Working with Strings and Lists¶
Data wrangling involves transforming data from some raw form to another more
useful form. Often this raw form is text saved in a text file, which
corresponds to the Python string data structure. In this section, we will
illustrate the process of converting and transforming textual data list
comprehensions and the split
, join
and format
methods.
4.2.1. Character classification¶
It is often helpful to examine a character and test whether it is upper- or
lowercase, or whether it is a character or a digit. The string
module
provides several constants that are useful for these purposes. One of these,
string.digits
is equivalent to “0123456789”. It can be used to check if a
character is a digit using the in
operator.
The string string.ascii_lowercase
contains all of the ascii letters that the
system considers to be lowercase. Similarly, string.ascii_uppercase
contains
all of the uppercase letters. string.punctuation
comprises all the
characters considered to be punctuation. Try the following and see what you get.
In [1]: import string
In [2]: string.ascii_lowercase
Out[2]: 'abcdefghijklmnopqrstuvwxyz'
In [3]: string.ascii_uppercase