In [1]:
%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd

plt.style.use('fivethirtyeight')
sns.set_context("notebook")

Reading in DataFrames from Files

Pandas has a number of very useful file reading tools. You can see them enumerated by typing "pd.re" and pressing tab. We'll be using read_csv today.

In [2]:
elections = pd.read_csv("elections.csv")
elections # if we end a cell with an expression or variable name, the result will print
Out[2]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss
5 Bush Republican 53.4 1988 win
6 Dukakis Democratic 45.6 1988 loss
7 Clinton Democratic 43.0 1992 win
8 Bush Republican 37.4 1992 loss
9 Perot Independent 18.9 1992 loss
10 Clinton Democratic 49.2 1996 win
11 Dole Republican 40.7 1996 loss
12 Perot Independent 8.4 1996 loss
13 Gore Democratic 48.4 2000 loss
14 Bush Republican 47.9 2000 win
15 Kerry Democratic 48.3 2004 loss
16 Bush Republican 50.7 2004 win
17 Obama Democratic 52.9 2008 win
18 McCain Republican 45.7 2008 loss
19 Obama Democratic 51.1 2012 win
20 Romney Republican 47.2 2012 loss
21 Clinton Democratic 48.2 2016 loss
22 Trump Republican 46.1 2016 win

We can use the head command to show only a few rows of a dataframe.

In [3]:
elections.head(7)
Out[3]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss
5 Bush Republican 53.4 1988 win
6 Dukakis Democratic 45.6 1988 loss

There is also a tail command.

In [4]:
elections.tail(7)
Out[4]:
Candidate Party % Year Result
16 Bush Republican 50.7 2004 win
17 Obama Democratic 52.9 2008 win
18 McCain Republican 45.7 2008 loss
19 Obama Democratic 51.1 2012 win
20 Romney Republican 47.2 2012 loss
21 Clinton Democratic 48.2 2016 loss
22 Trump Republican 46.1 2016 win

The read_csv command lets us specify a column to use an index. For example, we could have used Year as the index.

In [5]:
elections_year_index = pd.read_csv("elections.csv", index_col = "Year")
elections_year_index.head(5)
Out[5]:
Candidate Party % Result
Year
1980 Reagan Republican 50.7 win
1980 Carter Democratic 41.0 loss
1980 Anderson Independent 6.6 loss
1984 Reagan Republican 58.8 win
1984 Mondale Democratic 37.6 loss

Alternately, we could have used the set_index commmand.

In [6]:
elections_party_index = elections.set_index("Party")
elections_party_index.head(5)
Out[6]:
Candidate % Year Result
Party
Republican Reagan 50.7 1980 win
Democratic Carter 41.0 1980 loss
Independent Anderson 6.6 1980 loss
Republican Reagan 58.8 1984 win
Democratic Mondale 37.6 1984 loss

The set_index command (along with all other data frame methods) does not modify the dataframe. That is, the original "elections" is untouched. Note: There is a flag called "inplace" which does modify the calling dataframe.

In [7]:
elections.head() #the index remains unchanged
Out[7]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss

By contast, column names MUST be unique. For example, if we try to read in a file for which column names are not unique, Pandas will automatically any duplicates.

In [8]:
dups = pd.read_csv("duplicate_columns.csv")
dups
Out[8]:
name name.1 name.1.1 flavor
0 john smith x vanilla
1 zhang shan x chocolate
2 fulan alfulani strawberry NaN
3 hong gildong x banana

The [] Operator

The DataFrame class has an indexing operator [] that lets you do a variety of different things. If your provide a String to the [] operator, you get back a Series corresponding to the requested label.

In [9]:
elections["Candidate"].head(6)
Out[9]:
0      Reagan
1      Carter
2    Anderson
3      Reagan
4     Mondale
5        Bush
Name: Candidate, dtype: object

The [] operator also accepts a list of strings. In this case, you get back a DataFrame corresponding to the requested strings.

In [10]:
elections[["Candidate", "Party"]].head(6)
Out[10]:
Candidate Party
0 Reagan Republican
1 Carter Democratic
2 Anderson Independent
3 Reagan Republican
4 Mondale Democratic
5 Bush Republican

A list of one label also returns a DataFrame. This can be handy if you want your results as a DataFrame, not a series.

In [11]:
elections[["Candidate"]].head(6)
Out[11]:
Candidate
0 Reagan
1 Carter
2 Anderson
3 Reagan
4 Mondale
5 Bush

Note that we can also use the to_frame method to turn a Series into a DataFrame.

In [12]:
elections["Candidate"].to_frame()
Out[12]:
Candidate
0 Reagan
1 Carter
2 Anderson
3 Reagan
4 Mondale
5 Bush
6 Dukakis
7 Clinton
8 Bush
9 Perot
10 Clinton
11 Dole
12 Perot
13 Gore
14 Bush
15 Kerry
16 Bush
17 Obama
18 McCain
19 Obama
20 Romney
21 Clinton
22 Trump

The [] operator also accepts numerical slices as arguments. In this case, we are indexing by row, not column!

In [13]:
elections[0:3]
Out[13]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss

If you provide a single argument to the [] operator, it tries to use it as a name. This is true even if the argument passed to [] is an integer.

In [14]:
#elections[0] #this does not work, try uncommenting this to see it fail in action, woo

The following cells allow you to test your understanding.

In [15]:
weird = pd.DataFrame({1:["topdog","botdog"], "1":["topcat","botcat"]})
weird
Out[15]:
1 1
0 topdog topcat
1 botdog botcat
In [16]:
weird[1] #try to predict the output
Out[16]:
0    topdog
1    botdog
Name: 1, dtype: object
In [17]:
weird[["1"]] #try to predict the output
Out[17]:
1
0 topcat
1 botcat
In [18]:
weird[1:] #try to predict the output
Out[18]:
1 1
1 botdog botcat

Boolean Array Selection

The [] operator also supports array of booleans as an input. In this case, the array must be exactly as long as the number of rows. The result is a filtered version of the data frame, where only rows corresponding to True appear.

In [19]:
elections[[False, False, False, False, False, 
          False, False, True, False, False,
          True, False, False, False, True,
          False, False, False, False, False,
          False, False, True]]
Out[19]:
Candidate Party % Year Result
7 Clinton Democratic 43.0 1992 win
10 Clinton Democratic 49.2 1996 win
14 Bush Republican 47.9 2000 win
22 Trump Republican 46.1 2016 win

One very common task in Data Science is filtering. Boolean Array Selection is one way to achieve this in Pandas. We start by observing logical operators like the equality operator can be applied to Pandas Series data to generate a Boolean Array. For example, we can compare the 'Result' column to the String 'win':

In [20]:
elections.head(5)
Out[20]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss
In [21]:
iswin = elections['Result'] == 'win'
iswin.head(5)
Out[21]:
0     True
1    False
2    False
3     True
4    False
Name: Result, dtype: bool
In [22]:
elections[iswin]
Out[22]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
3 Reagan Republican 58.8 1984 win
5 Bush Republican 53.4 1988 win
7 Clinton Democratic 43.0 1992 win
10 Clinton Democratic 49.2 1996 win
14 Bush Republican 47.9 2000 win
16 Bush Republican 50.7 2004 win
17 Obama Democratic 52.9 2008 win
19 Obama Democratic 51.1 2012 win
22 Trump Republican 46.1 2016 win

The output of the logical operator applied to the Series is another Series with the same name and index, but of datatype boolean. The entry with index i represents the result of the application of that operator to the entry of the original Series with index i.

In [23]:
elections[elections['Party'] == 'Independent']
Out[23]:
Candidate Party % Year Result
2 Anderson Independent 6.6 1980 loss
9 Perot Independent 18.9 1992 loss
12 Perot Independent 8.4 1996 loss
In [24]:
elections['Result'].head(5)
Out[24]:
0     win
1    loss
2    loss
3     win
4    loss
Name: Result, dtype: object

These boolean Series can be used as an argument to the [] operator. For example, the following code creates a DataFrame of all election winners since 1980.

In [25]:
elections.loc[iswin]
Out[25]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
3 Reagan Republican 58.8 1984 win
5 Bush Republican 53.4 1988 win
7 Clinton Democratic 43.0 1992 win
10 Clinton Democratic 49.2 1996 win
14 Bush Republican 47.9 2000 win
16 Bush Republican 50.7 2004 win
17 Obama Democratic 52.9 2008 win
19 Obama Democratic 51.1 2012 win
22 Trump Republican 46.1 2016 win

Above, we've assigned the result of the logical operator to a new variable called iswin. This is uncommon. Usually, the series is created and used on the same line. Such code is a little tricky to read at first, but you'll get used to it quickly.

In [26]:
elections[elections['Result'] == 'win']
Out[26]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
3 Reagan Republican 58.8 1984 win
5 Bush Republican 53.4 1988 win
7 Clinton Democratic 43.0 1992 win
10 Clinton Democratic 49.2 1996 win
14 Bush Republican 47.9 2000 win
16 Bush Republican 50.7 2004 win
17 Obama Democratic 52.9 2008 win
19 Obama Democratic 51.1 2012 win
22 Trump Republican 46.1 2016 win

We can select multiple criteria by creating multiple boolean Series and combining them using the & operator.

In [27]:
elections[(elections['Result'] == 'win')
          & (elections['%'] < 50)]

# __and__ overrides & not and.
Out[27]:
Candidate Party % Year Result
7 Clinton Democratic 43.0 1992 win
10 Clinton Democratic 49.2 1996 win
14 Bush Republican 47.9 2000 win
22 Trump Republican 46.1 2016 win

Loc and ILOC

In [28]:
elections.head(5)
Out[28]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss
In [29]:
elections.loc[[0, 1, 2, 3, 4], ['Candidate','Party', 'Year']]
Out[29]:
Candidate Party Year
0 Reagan Republican 1980
1 Carter Democratic 1980
2 Anderson Independent 1980
3 Reagan Republican 1984
4 Mondale Democratic 1984

Loc also supports slicing (for all types, including numeric and string labels!). Note that the slicing for loc is inclusive, even for numeric slices.

In [30]:
elections.loc[0:4, 'Candidate':'Year']
Out[30]:
Candidate Party % Year
0 Reagan Republican 50.7 1980
1 Carter Democratic 41.0 1980
2 Anderson Independent 6.6 1980
3 Reagan Republican 58.8 1984
4 Mondale Democratic 37.6 1984

If we provide only a single label for the column argument, we get back a Series.

In [31]:
elections.loc[0:4, 'Candidate']
Out[31]:
0      Reagan
1      Carter
2    Anderson
3      Reagan
4     Mondale
Name: Candidate, dtype: object

If we want a data frame instead and don't want to use to_frame, we can provde a list containing the column name.

In [32]:
elections.loc[0:4, ['Candidate']]
Out[32]:
Candidate
0 Reagan
1 Carter
2 Anderson
3 Reagan
4 Mondale

If we give only one row but many column labels, we'll get back a Series corresponding to a row of the table. This new Series has a neat index, where each entry is the name of the column that the data came from.

In [33]:
elections.loc[0, 'Candidate':'Year']
Out[33]:
Candidate        Reagan
Party        Republican
%                  50.7
Year               1980
Name: 0, dtype: object
In [34]:
elections.loc[[0], 'Candidate':'Year']
Out[34]:
Candidate Party % Year
0 Reagan Republican 50.7 1980

If we omit the column argument altogether, the default behavior is to retrieve all columns.

In [35]:
elections.loc[[2, 4, 5]]
Out[35]:
Candidate Party % Year Result
2 Anderson Independent 6.6 1980 loss
4 Mondale Democratic 37.6 1984 loss
5 Bush Republican 53.4 1988 win

Loc also supports boolean array inputs instead of labels. If the arrays are too short, loc assumes the missing values are False.

In [36]:
elections.loc[[True, False, False, True], [True, False, False, True]]
Out[36]:
Candidate Year
0 Reagan 1980
3 Reagan 1984
In [37]:
elections.loc[[0, 3], ['Candidate', 'Year']]
Out[37]:
Candidate Year
0 Reagan 1980
3 Reagan 1984

We can use boolean array arguments for one axis of the data, and labels for the other.

In [38]:
elections.loc[[True, False, False, True], 'Candidate':'%']
Out[38]:
Candidate Party %
0 Reagan Republican 50.7
3 Reagan Republican 58.8

Boolean Series are also boolean arrays, so we can use the Boolean Array Selection from earlier using loc as well.

In [39]:
elections.loc[(elections['Result'] == 'win') & (elections['%'] < 50), 
              'Candidate':'%']
Out[39]:
Candidate Party %
7 Clinton Democratic 43.0
10 Clinton Democratic 49.2
14 Bush Republican 47.9
22 Trump Republican 46.1

Let's do a quick example using data with string-labeled rows instead of integer labeled rows, just to make sure we're really understanding loc.

In [40]:
mottos = pd.read_csv("mottos.csv", index_col = "State")
mottos.head(5)
Out[40]:
Motto Translation Language Date Adopted
State
Alabama Audemus jura nostra defendere We dare defend our rights! Latin 1923
Alaska North to the future English 1967
Arizona Ditat Deus God enriches Latin 1863
Arkansas Regnat populus The people rule Latin 1907
California Eureka (Εὕρηκα) I have found it Greek 1849

As you'd expect, the rows to extract can be specified using slice notation, even if the rows have string labels instead of integer labels.

In [41]:
mottos.loc['California':'Florida', ['Motto', 'Language']]
Out[41]:
Motto Language
State
California Eureka (Εὕρηκα) Greek
Colorado Nil sine numine Latin
Connecticut Qui transtulit sustinet Latin
Delaware Liberty and Independence English
Florida In God We Trust English

Sometimes students are so used to thinking of rows as numbered that they try the following, which will not work.

In [42]:
mottos_extreme = pd.read_csv("mottos_extreme.csv", index_col='State')
mottos_extreme.loc['California']
Out[42]:
Motto Translation Language Date Adopted
State
California Eureka (Εὕρηκα) I have found it Greek 1849
California We are the real California English 2006
In [43]:
mottos_extreme.loc['California':'Delaware']
#did i mess up my experiment or is the answer?
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-43-b0a310fb8439> in <module>()
----> 1 mottos_extreme.loc['California':'Delaware']
      2 #did i mess up my experiment or is the answer?

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in __getitem__(self, key)
   1476 
   1477             maybe_callable = com._apply_if_callable(key, self.obj)
-> 1478             return self._getitem_axis(maybe_callable, axis=axis)
   1479 
   1480     def _is_scalar_access(self, key):

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _getitem_axis(self, key, axis)
   1864         if isinstance(key, slice):
   1865             self._validate_key(key, axis)
-> 1866             return self._get_slice_axis(key, axis=axis)
   1867         elif com.is_bool_indexer(key):
   1868             return self._getbool_axis(key, axis=axis)

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexing.py in _get_slice_axis(self, slice_obj, axis)
   1509         labels = obj._get_axis(axis)
   1510         indexer = labels.slice_indexer(slice_obj.start, slice_obj.stop,
-> 1511                                        slice_obj.step, kind=self.name)
   1512 
   1513         if isinstance(indexer, slice):

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
   4105         """
   4106         start_slice, end_slice = self.slice_locs(start, end, step=step,
-> 4107                                                  kind=kind)
   4108 
   4109         # return a slice

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
   4306         start_slice = None
   4307         if start is not None:
-> 4308             start_slice = self.get_slice_bound(start, 'left', kind)
   4309         if start_slice is None:
   4310             start_slice = 0

~/anaconda3/lib/python3.6/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   4253             if isinstance(slc, np.ndarray):
   4254                 raise KeyError("Cannot get %s slice bound for non-unique "
-> 4255                                "label: %r" % (side, original_label))
   4256 
   4257         if isinstance(slc, slice):

KeyError: "Cannot get left slice bound for non-unique label: 'California'"

iloc

loc's cousin iloc is very similar, but is used to access based on numerical position instead of label. For example, to access to the top 3 rows and top 3 columns of a table, we can use [0:3, 0:3]. iloc slicing is exclusive, just like standard Python slicing of numerical values.

In [44]:
elections.head(5)
Out[44]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss
In [45]:
elections.iloc[0:3, 0:3]
Out[45]:
Candidate Party %
0 Reagan Republican 50.7
1 Carter Democratic 41.0
2 Anderson Independent 6.6
In [46]:
mottos.iloc[0:3, 0:3]
Out[46]:
Motto Translation Language
State
Alabama Audemus jura nostra defendere We dare defend our rights! Latin
Alaska North to the future English
Arizona Ditat Deus God enriches Latin

We will use both loc and iloc in the course. Loc is generally preferred for a number of reasons, for example:

  1. It is harder to make mistakes since you have to literally write out what you want to get.
  2. Code is easier to read, because the reader doesn't have to know e.g. what column #31 represents.
  3. It is robust against permutations of the data, e.g. the social security administration switches the order of two columns.

However, iloc is sometimes more convenient. We'll provide examples of when iloc is the superior choice.

Handy Properties and Utility Functions for Series and DataFrames

The head, shape, size, and describe methods can be used to quickly get a good sense of the data we're working with. For example:

In [47]:
mottos.head(5)
Out[47]:
Motto Translation Language Date Adopted
State
Alabama Audemus jura nostra defendere We dare defend our rights! Latin 1923
Alaska North to the future English 1967
Arizona Ditat Deus God enriches Latin 1863
Arkansas Regnat populus The people rule Latin 1907
California Eureka (Εὕρηκα) I have found it Greek 1849
In [48]:
mottos.size
Out[48]:
200

The fact that the size is 200 means our data file is relatively small, with only 200 total entries.

In [49]:
mottos.shape
Out[49]:
(50, 4)

Since we're looking at data for states, and we see the number 50, it looks like we've mostly likely got a complete dataset that omits Washington D.C. and U.S. territories like Guam and Puerto Rico.

In [50]:
mottos.describe()
Out[50]:
Motto Translation Language Date Adopted
count 50 49 50 50
unique 50 30 8 47
top Nil sine numine Latin 1847
freq 1 20 23 2

Above, we see a quick summary of all the data. For example, the most common language for mottos is Latin, which covers 23 different states. Does anything else seem surprising?

We can get a direct reference to the index using .index.

In [51]:
mottos.index
Out[51]:
Index(['Alabama', 'Alaska', 'Arizona', 'Arkansas', 'California', 'Colorado',
       'Connecticut', 'Delaware', 'Florida', 'Georgia', 'Hawaii', 'Idaho',
       'Illinois', 'Indiana', 'Iowa', 'Kansas', 'Kentucky', 'Louisiana',
       'Maine', 'Maryland', 'Massachusetts', 'Michigan', 'Minnesota',
       'Mississippi', 'Missouri', 'Montana', 'Nebraska', 'Nevada',
       'New Hampshire', 'New Jersey', 'New Mexico', 'New York',
       'North Carolina', 'North Dakota', 'Ohio', 'Oklahoma', 'Oregon',
       'Pennsylvania', 'Rhode Island', 'South Carolina', 'South Dakota',
       'Tennessee', 'Texas', 'Utah', 'Vermont', 'Virginia', 'Washington',
       'West Virginia', 'Wisconsin', 'Wyoming'],
      dtype='object', name='State')

We can also access individual properties of the index, for example, mottos.index.name.

In [52]:
mottos.index.name
Out[52]:
'State'

This reflects the fact that in our data frame, the index IS the state!

In [53]:
mottos.head(2)
Out[53]:
Motto Translation Language Date Adopted
State
Alabama Audemus jura nostra defendere We dare defend our rights! Latin 1923
Alaska North to the future English 1967

It turns out the columns also have an Index. We can access this index by using .columns.

In [54]:
mottos.columns
Out[54]:
Index(['Motto', 'Translation', 'Language', 'Date Adopted'], dtype='object')

There are also a ton of useful utility methods we can use with Data Frames and Series. For example, we can create a copy of a data frame sorted by a specific column using sort_values.

In [55]:
elections.sort_values('%')
Out[55]:
Candidate Party % Year Result
2 Anderson Independent 6.6 1980 loss
12 Perot Independent 8.4 1996 loss
9 Perot Independent 18.9 1992 loss
8 Bush Republican 37.4 1992 loss
4 Mondale Democratic 37.6 1984 loss
11 Dole Republican 40.7 1996 loss
1 Carter Democratic 41.0 1980 loss
7 Clinton Democratic 43.0 1992 win
6 Dukakis Democratic 45.6 1988 loss
18 McCain Republican 45.7 2008 loss
22 Trump Republican 46.1 2016 win
20 Romney Republican 47.2 2012 loss
14 Bush Republican 47.9 2000 win
21 Clinton Democratic 48.2 2016 loss
15 Kerry Democratic 48.3 2004 loss
13 Gore Democratic 48.4 2000 loss
10 Clinton Democratic 49.2 1996 win
16 Bush Republican 50.7 2004 win
0 Reagan Republican 50.7 1980 win
19 Obama Democratic 51.1 2012 win
17 Obama Democratic 52.9 2008 win
5 Bush Republican 53.4 1988 win
3 Reagan Republican 58.8 1984 win

As mentioned before, all Data Frame methods return a copy and do not modify the original data structure, unless you set inplace to True.

In [56]:
elections.head(5)
Out[56]:
Candidate Party % Year Result
0 Reagan Republican 50.7 1980 win
1 Carter Democratic 41.0 1980 loss
2 Anderson Independent 6.6 1980 loss
3 Reagan Republican 58.8 1984 win
4 Mondale Democratic 37.6 1984 loss

If we want to sort in reverse order, we can set ascending=False.

In [57]:
elections.sort_values('%', ascending=False)
Out[57]:
Candidate Party % Year Result
3 Reagan Republican 58.8 1984 win
5 Bush Republican 53.4 1988 win
17 Obama Democratic 52.9 2008 win
19 Obama Democratic 51.1 2012 win
0 Reagan Republican 50.7 1980 win
16 Bush Republican 50.7 2004 win
10 Clinton Democratic 49.2 1996 win
13 Gore Democratic 48.4 2000 loss
15 Kerry Democratic 48.3 2004 loss
21 Clinton Democratic 48.2 2016 loss
14 Bush Republican 47.9 2000 win
20 Romney Republican 47.2 2012 loss
22 Trump Republican 46.1 2016 win
18 McCain Republican 45.7 2008 loss
6 Dukakis Democratic 45.6 1988 loss
7 Clinton Democratic 43.0 1992 win
1 Carter Democratic 41.0 1980 loss
11 Dole Republican 40.7 1996 loss
4 Mondale Democratic 37.6 1984 loss
8 Bush Republican 37.4 1992 loss
9 Perot Independent 18.9 1992 loss
12 Perot Independent 8.4 1996 loss
2 Anderson Independent 6.6 1980 loss

We can also use sort_values on Series objects.

In [58]:
mottos['Language'].sort_values().head(10)
Out[58]:
State
Washington       Chinook Jargon
Wyoming                 English
New Jersey              English
New Hampshire           English
Nevada                  English
Nebraska                English
Wisconsin               English
Pennsylvania            English
Rhode Island            English
South Dakota            English
Name: Language, dtype: object

For Series, the value_counts method is often quite handy.

In [59]:
elections['Party'].value_counts()
Out[59]:
Republican     10
Democratic     10
Independent     3
Name: Party, dtype: int64
In [60]:
mottos['Language'].value_counts()
Out[60]:
Latin             23
English           21
Chinook Jargon     1
Greek              1
Hawaiian           1
Spanish            1
French             1
Italian            1
Name: Language, dtype: int64

Also commonly used is the unique method, which returns all unique values as a numpy array.

In [61]:
mottos['Language'].unique()
Out[61]:
array(['Latin', 'English', 'Greek', 'Hawaiian', 'Italian', 'French',
       'Spanish', 'Chinook Jargon'], dtype=object)