5.2. Sets¶
The set
is a useful associative Python data structure for collecting unique
values and checking membership. If you find yourself building a data structure
that will only be used to check if items are in
the group, the set is the
correct data structure for the task. Sets are based on mathematical sets and
come with methods for unions, intersections, etc. Consequently, they are also
useful for comparing the membership of two groups.
5.2.1. Using a set
to count unique values¶
Sets can be useful for determining the number of unique values in a collection. Suppose that we are interested in determining how many unique words there are in the Zen of Python
In [1]: zen_no_punc = '''
...: The Zen of Python by Tim Peters
...: Beautiful is better than ugly
...: Explicit is better than implicit
...: Simple is better than complex
...: Complex is better than complicated
...: Flat is better than nested
...: Sparse is better than dense
...: Readability counts
...: Special cases arent special enough to break the rules
...: Although practicality beats purity
...: Errors should never pass silently
...: Unless explicitly silenced
...: In the face of ambiguity refuse the temptation to guess
...: There should be one and preferably only one obvious way to do it
...: Although that way may not be obvious at first unless youre Dutch
...: Now is better than never
...: Although never is often better than right now
...: If the implementation is hard to explain its a bad idea
...: If the implementation is easy to explain it may be a good idea
...: Namespaces are one honking great idea lets do more of those'''
...:
The easiest way to construct a set is using the set
conversion function on a
list of values.
In [2]: words_zen = zen_no_punc.lower().split()
In [3]: unique_words_zen = set(words_zen)
In [4]: unique_words_zen
Out[4]:
{'a',
'although',
'ambiguity',
'and',
'are',
'arent',
'at',
'bad',
'be',
'beats',
'beautiful',
'better',
'break',
'by',
'cases',
'complex',
'complicated',
'counts',
'dense',
'do',
'dutch',
'easy',
'enough',
'errors',
'explain',
'explicit',
'explicitly',
'face',
'first',
'flat',
'good',
'great',
'guess',
'hard',
'honking',
'idea',
'if',
'implementation',
'implicit',
'in',
'is',
'it',
'its',
'lets',
'may',
'more',
'namespaces',
'nested',
'never',
'not',
'now',
'obvious',
'of',
'often',
'one',
'only',
'pass',
'peters',
'practicality',
'preferably',
'purity',
'python',
'readability',
'refuse',
'right',
'rules',
'should',
'silenced',
'silently',
'simple',
'sparse',
'special',
'temptation',
'than',
'that',
'the',
'there',
'those',
'tim',
'to',
'ugly',
'unless',
'way',
'youre',
'zen'}
In [5]: len(unique_words_zen)