Introduction to Pandas, Part 2¶

Advanced Pandas syntax, aggregation, and joining.

%matplotlib inline
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import pandas as pd
from IPython.display import display

plt.style.use('fivethirtyeight')
sns.set_context("notebook")

Let's start by loading the California baby names again.

import urllib.request
import os.path
import zipfile

data_url = "https://www.ssa.gov/oact/babynames/state/namesbystate.zip"
local_filename = "babynamesbystate.zip"
if not os.path.exists(local_filename): # if the data exists don't download again
    with urllib.request.urlopen(data_url) as resp, open(local_filename, 'wb') as f:
        f.write(resp.read())

zf = zipfile.ZipFile(local_filename, 'r')

ca_name = 'CA.TXT'
field_names = ['State', 'Sex', 'Year', 'Name', 'Count']
with zf.open(ca_name) as fh:
    babynames = pd.read_csv(fh, header=None, names=field_names)

babynames.sample(5)

Goal 1: Find the most popular baby name in California in 2018

babynames[babynames["Year"] == 2018].sort_values(by = "Count", ascending = False).head(5)

babynames[babynames["Year"] == 2018].sort_values(by = "Count").tail(5)

Goal 2: Find baby names that start with j.¶

Approach 1: Combine lecture 4 syntax with CS61A/CS88 ideas.¶

babynames["Name"].head(20)

0          Mary
1         Helen
2       Dorothy
3      Margaret
4       Frances
5          Ruth
6        Evelyn
7         Alice
8      Virginia
9     Elizabeth
10     Florence
11        Marie
12      Mildred
13         Rose
14        Hazel
15       Louise
16    Josephine
17      Lucille
18        Grace
19       Gladys
Name: Name, dtype: object

starts_with_j = [ x.startswith('J') for x in babynames["Name"]]
#starts_with_j = [ x[0] == 'J'       for x in babynames["Name"]]  # Alternative. Why?
starts_with_j[:20]

[False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 False,
 True,
 False,
 False,
 False]

starts_with_j = [x.startswith('J') for x in babynames["Name"]]
babynames[starts_with_j].sample(5)

j_names = babynames[ [x.startswith('J') for x in babynames["Name"]] ]

Approach 2: Use the Series.str methods.¶

First, what are these special str things in there??

s = babynames["Name"].str
n = "Jane" # a Simple string, so we can compare...

s  # Let's look at s, and then use <TAB> on `s.` to get a sense of what's there...

<pandas.core.strings.StringMethods at 0x10d74e850>

babynames["Name"].str.startswith('J').head(10)

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
Name: Name, dtype: bool

starts_with_j = babynames["Name"].str.startswith('J')
starts_with_j.head(10)

0    False
1    False
2    False
3    False
4    False
5    False
6    False
7    False
8    False
9    False
Name: Name, dtype: bool

babynames[babynames["Name"].str.startswith('J')].sample(5)

Quick detour - which option performs better?

%timeit [x[0] == 'J' for x in babynames["Name"]]
%timeit [x.startswith('J') for x in babynames["Name"]]
%timeit babynames["Name"].str.startswith('J')

51.8 ms ± 501 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
86.1 ms ± 464 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
97.6 ms ± 519 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

babynames[babynames["Name"].str.contains('ad')].sample(5)

babynames["Name"].str.split('a').to_frame().head(5)

In-lecture challenge¶

Try to write a line of code that creates a list (or Series or array) of all names that end with “ert”.

Answer

    babynames[babynames["Name"].str.endswith('ert')]["Name"].unique()

Goal 3: Sort names by their length.¶

Suppose we want to sort all baby names in California by their length.

As before, there are ways to do this using only lecture 4 content. For example, the montrosity below was concocted during the Sp19 version of this class.

babynames.iloc[[i for i, m in sorted(enumerate(babynames['Name']), key=lambda x: -len(x[1]))]].head(5)

Approach 1: Create a temporary column.¶

Create a new series of only the lengths. Then add that series to the dataframe as a column. Then sort by that column. Then drop that column.

#create a new series of only the lengths
babyname_lengths = babynames["Name"].str.len()

#add that series to the dataframe as a column
babynames["name_lengths"] = babyname_lengths
babynames.head(5)

#sort by the temporary column
babynames = babynames.sort_values(by = "name_lengths", ascending=False)
babynames.head(25)

#drop the temporary column
babynames = babynames.drop("name_lengths", axis = 'columns')
babynames.head(5)

We can also use the Python map function if we want to use an arbitrarily defined function. Suppose we want to sort by the number of occurrences of "dr" plus the number of occurences of "ea".

def dr_ea_count(string):
    return string.count('dr') + string.count('ea')

#create the temporary column
babynames["dr_ea_count"] = babynames["Name"].map(dr_ea_count)

#sort by the temporary column
babynames = babynames.sort_values(by = "dr_ea_count", ascending=False)
babynames.head()

#drop that column
babynames = babynames.drop("dr_ea_count", 1)
babynames.head(5)

Approach 2: Generate an index sorted in the desired order.¶

#let's start over by first scrambling the order of babynames
babynames = babynames.sample(frac=1)
babynames.head(5)

Another approach is to take advantage of the fact that .loc can accept an index. That is:

df.loc[idx] returns df with its rows in the same order as the given index.
Only works if the index exactly matches the DataFrame.

The first step was to create a sequence of the lengths of the names.

name_lengths = babynames["Name"].str.len()
name_lengths.head(5)

306522    5
340179    6
60604     7
8678      4
83654     7
Name: Name, dtype: int64

The next step is to sort the new series we just created.

name_lengths_sorted_by_length = name_lengths.sort_values()
name_lengths_sorted_by_length.head(5)

335969    2
321209    2
315453    2
229598    2
93930     2
Name: Name, dtype: int64

Next, we pass the index of the sorted series to the loc method of the original dataframe.

index_sorted_by_length = name_lengths_sorted_by_length.index
index_sorted_by_length

Int64Index([335969, 321209, 315453, 229598,  93930, 384651, 299549, 307079,
             89494, 104477,
            ...
            310865, 326250, 316422, 333388, 326425, 305837, 323115, 302930,
            310744, 102493],
           dtype='int64', length=387781)

babynames.loc[index_sorted_by_length].tail(5)

Note we can also do this all in one line:

babynames.loc[babynames["Name"].str.len().sort_values().index].head(5)

Goal 4: Name whose popularity has changed the most.¶

First we need to define change in popularity.

For the purposes of lecture, let’s stay simple and use the AMMD (absolute max/min difference): max(count) - min(count).

To make sure we understand this quantity, let's consider the name Jennifer.

jennifer_counts = babynames.query("Name == 'Jennifer'")["Count"]
jennifer_counts.head(5)

267174       5
272939      11
127582    2940
325063      15
149045    2263
Name: Count, dtype: int64

The AMMD for Jennifer is 6,059, as seen below:

max(jennifer_counts) - min(jennifer_counts)

6059

def ammd(series):
    return max(series) - min(series)

ammd(jennifer_counts)

6059

ammd(babynames.query("Name == 'Jessica'")["Count"])

6946

Approach 1: Naive For Loop¶

As a first approach, we can try to use a for loop.

#build dictionary where entry i is the ammd for the given name
#e.g. ammd["jennifer"] should be 6059
ammd_of_babyname_counts = {}
for name in babynames["Name"].unique()[0:10]:
    counts_of_current_name = babynames[babynames["Name"] == name]["Count"]
    ammd_of_babyname_counts[name] = ammd(counts_of_current_name)
    
#convert to series
ammd_of_babyname_counts = pd.Series(ammd_of_babyname_counts) 
ammd_of_babyname_counts

Louis        355
Edmund        70
Micheal      397
Faye          53
Antonia       79
Kayliana       3
Townes         1
Robert      7091
Brian       2935
Santos        76
dtype: int64

Answer below. Note that we only used the first 100 names because otherwise the code takes ages to complete running.

#build dictionary where entry i is the ammd for the given name
#e.g. ammd["jennifer"] should be 6059
ammd_of_babyname_counts = {}
for name in sorted(babynames["Name"].unique())[0:100]:
    counts_of_current_name = babynames[babynames["Name"] == name]["Count"]
    ammd_of_babyname_counts[name] = ammd(counts_of_current_name)
    
#convert to series
ammd_of_babyname_counts = pd.Series(ammd_of_babyname_counts) 
ammd_of_babyname_counts.head(5)

Aadan        2
Aadarsh      0
Aaden      148
Aadhav       2
Aadhira      4
dtype: int64

Approach 2: Use groupby.agg¶

Instead, we can use the very powerful groupby.agg operation, which allows us to simply and efficiently compute what we want.

babynames.groupby("Name").agg(max).head(5)

babynames.head()

%%time
babynames.groupby("Name").agg(ammd).head(5)

CPU times: user 1.42 s, sys: 23.3 ms, total: 1.44 s
Wall time: 1.45 s

Note that the result includes both a Year and Count column. The Count column is what we want, namely the AMMD for the name in that row. To check your understanding, try to figure out what the Year column represents.

To understand how groupby works, consider the visual diagram below. The groupby function clusters rows from the original dataframe into groups (which I call subframes). The agg function then condenses each subframe into a single representative row using the provided function f.

Let's actually build this DataFrame and play with it interactively to understand groupby a little better:

d = pd.DataFrame(dict(x=[3,1,4,1,5,9,2,5,6], y=[12,7,3,2,7,3,8,6,7]), index=list('ABCABCACB') )
d

dgb = d.groupby(d.index)
dgb

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x10d73d8d0>

dgb.groups

{'A': Index(['A', 'A', 'A'], dtype='object'),
 'B': Index(['B', 'B', 'B'], dtype='object'),
 'C': Index(['C', 'C', 'C'], dtype='object')}

dgb.groups['A']

Index(['A', 'A', 'A'], dtype='object')

dgb.get_group('A')

dgb.agg(ammd)

Some Additional Groupby Puzzles¶

Groupby puzzle #1: To test your understanding, try to interpret the result of the code below.

babynames.head(5)

babynames.groupby("Year").agg(ammd).plot();

for i, (k, v) in enumerate(babynames.groupby("Year")):
    if i >= 3: break
    print(f"Year: {k}")
    display(v.head())

Year: 1910

Year: 1911

Year: 1912

For reference, the first 5 values from the plot above are:

babynames.groupby("Year").agg(ammd).head(5)

groupby Puzzle #2¶

elections = pd.read_csv("elections.csv")
elections.sample(5)

We have to be careful when using aggregation functions. For example, the code below might be misinterpreted to say that Woodrow Wilson ran for election in 2016. Why is this happening?

elections.groupby("Party").agg(max).head(10)

elections.groupby("Party").agg(max).head(10)

groupby puzzle #3¶

Inspired by above, try to predict the results of the groupby operation shown. The answer is below the image.

The top ?? will be "hi", the second ?? will be "tx", and the third ?? will be "sd".

ds = pd.DataFrame(dict(x=[3,1,4,1,5,9,2,5,6], 
                      y=['ak', 'tx', 'fl', 'hi', 'mi', 'ak', 'ca', 'sd', 'nc']), 
                      index=list('ABCABCACB') )
ds

ds.groupby(ds.index).agg(max)

groupby puzzle #4¶

Next we'll write code that properly returns the best result by each party. That is, each row should show the Year, Candidate, Popular Vote, Result, and % for the election in which that party saw its best results (rather than mixing them as in the example above), here's what the first rows of the correct output should look like:

elections_sorted_by_percent = elections.sort_values("%", ascending=False)
elections_sorted_by_percent.head(5)

#def get_first(s):
#    return s.iloc[0]
    
elections_sorted_by_percent = elections.sort_values("%", ascending=False)
elections_sorted_by_percent.groupby("Party").agg(lambda x : x.iloc[0]).head(10)

You'll soon discover that with Pandas' rich tool set, there's typically more than one way to get to the same answer. Each approach has different tradeoffs in terms of readability, performance, memory consumption, complexity and more. It will take some experience for you to develop a sense of which approach is better for each problem, but you should in general try to think if you can at least envision a different solution to a given problem, especially if you find your current solution to be particularly convoluted or hard to read.

Here's a couple of other ways of obtaining the same result (in each case we only show the top part with head()). The first approach uses groupby but finds the location of the maximum value via the idxmax() method (look up its documentation!). We then index and sort by party to match the requested formatting:

elections.head()

elections.groupby('Party')['%'].idxmax()

Party
American                  22
American Independent     115
Anti-Masonic               6
Anti-Monopoly             38
Citizens                 127
Communist                 89
Constitution             164
Constitutional Union      24
Democratic               114
Democratic-Republican      0
Dixiecrat                103
Farmer–Labor              78
Free Soil                 15
Green                    155
Greenback                 35
Independent              143
Liberal Republican        31
Libertarian              175
National Democratic       50
National Republican        3
National Union            27
Natural Law              148
New Alliance             136
Northern Democratic       26
Populist                  48
Progressive               68
Prohibition               49
Reform                   150
Republican               120
Socialist                 66
Southern Democratic       25
States' Rights           110
Taxpayers                147
Union                     93
Union Labor               42
Whig                      11
Name: %, dtype: int64

best_per_party = elections.loc[elections.groupby('Party')['%'].idxmax()]  # This is the computational part
best_per_party.set_index('Party').sort_index().head()  # This indexes by Party to match the formatting above

And this one doesn't even use groupby! This approach instead uses the drop_duplicates method to keep only the last occurrence of of each party after having sorted by "%", which is the best performance. Again, the 2nd line is purely formatting:

best_per_party2 = elections.sort_values('%').drop_duplicates(['Party'], keep='last')
best_per_party2.set_index('Party').sort_index().head()  # Formatting

Challenge: see if you can find yet another approach that still gives the same answer.

Other groupby Features¶

groupby.size()¶

elections.groupby("Party")

<pandas.core.groupby.generic.DataFrameGroupBy object at 0x1a2d251ed0>

#size returns a Series giving the size of each group
elections.groupby("Party").size().head(15)

Party
American                  2
American Independent      3
Anti-Masonic              1
Anti-Monopoly             1
Citizens                  1
Communist                 1
Constitution              3
Constitutional Union      1
Democratic               46
Democratic-Republican     2
Dixiecrat                 1
Farmer–Labor              1
Free Soil                 2
Green                     6
Greenback                 1
dtype: int64

groupby.filter()¶

# filter gives a copy of the original DataFrame where row r is included
# if its group obeys the given condition
#
# Note: Filtering is done per GROUP, not per ROW.
elections.groupby("Year").filter(lambda sf: sf["%"].max() < 45)

for i, (n, g) in enumerate(elections.groupby("Party")):
    print(n)
    display(g.head(2))
    if i>3: break

American

American Independent

Anti-Masonic

Anti-Monopoly

Citizens

groupby.sum(), groupby.mean(), etc.¶

As an alternative to groupby.agg(sum), we can also simply do groupby.sum().

elections.groupby("Year").agg(sum).head()

elections.groupby("Year").sum().head()

The same applies for many other common operations.

elections.groupby("Year").agg(max).head()

elections.groupby("Year").max().head()

#elections.groupby("Year").mean().head()
#elections.groupby("Year").median().head()
elections.groupby("Year").max().head()

Pivot Tables¶

It is possible to group a DataFrame by multiple features. For example, if we group by Year and Sex we get back a DataFrame with the total number of babies of each sex born in each year.

babynames.groupby(["Year", "Sex"]).agg(sum).head(6)

babynames.groupby(["Sex", "Year"]).agg(sum).head(6)

babynames.head(10)

babynames.groupby(["Count", "Year", "Sex"]).agg(sum).head(6)

The DataFrame resulting from the aggregation operation is now multi-indexed. That is, it has more than one dimension to its index. We will explore this in next week's exercises.

Goal 5: Finding the number of babies born in each year of each sex.¶

Suppose we want to build a table showing the total number of babies born of each sex in each year. One way is to groupby using both columns of interest.

babynames.groupby(["Year", "Sex"]).agg(sum).head(6)

A more natural approach is to use a pivot table (like we saw in data 8).

babynames.head(5)

babynames_pivot = babynames.pivot_table(
    index='Year', # the rows (turned into index)
    columns='Sex', # the column values
    values=['Count', 'Name'], # the field(s) to processed in each group
    aggfunc=np.max, # group operation
)
babynames_pivot.head(6)

The basic idea behind pivot tables is shown in the image below.

Extra Groupby Puzzle¶

groupby puzzle #5: More careful look at the most popular 2018 name in California.¶

In goal 1, we didn't take into account the unlikely possibility that the most popular name was actually spread across both birth sexes. For example, what if in the table below it turns out that there were 300 female Noahs born in CA in 2018. In that case, Noah would actually be the most popular.

Since our queries are getting pretty long, I've stuck them inside parentheses which allows us to spread them over many lines.

(
babynames[babynames["Year"] == 2018]
    .sort_values(by = "Count", ascending = False)
    .head(5)
)

Try to add a single line to the operation above so that each row represents the sum of male and female babies born in 2018 with that name. To do this, fill in the ??? below.

(
babynames[babynames["Year"] == 2018]
    #.???
    .sort_values(by = "Count", ascending = False)
    .head(5)
)

def ammd2(L):
    return max(L) - min(L)

print(ammd2([1, 2, 3, 4, 5]))

4

ammd2 = lambda L: max(L) - min(L)
print(ammd2([1, 2, 3, 4, 5]))

4

	State	Sex	Year	Name	Count
93934	CA	F	1983	Ying	5
285204	CA	M	1979	Howard	69
182581	CA	F	2008	Suzanne	18
206116	CA	F	2014	Madelynn	72
131780	CA	F	1995	Tiara	68

	State	Sex	Year	Name	Count
321790	CA	M	1996	Joe	196
292076	CA	M	1983	Jamie	104
42014	CA	F	1959	Janise	5
128127	CA	F	1994	Jacklyn	68
190934	CA	F	2010	Jessa	18

	State	Sex	Year	Name	Count
330538	CA	M	1999	Jorje	8
93698	CA	F	1983	Jennafer	5
289653	CA	M	1981	Jeron	6
357216	CA	M	2009	Jordin	13
39502	CA	F	1958	Juana	20

	State	Sex	Year	Name	Count
269460	CA	M	1967	Amador	5
201276	CA	F	2012	Jadzia	5
239491	CA	M	1933	Trinidad	10
368871	CA	M	2013	Angad	12
292668	CA	M	1983	Braden	13

	Name
0	[M, ry]
1	[Helen]
2	[Dorothy]
3	[M, rg, ret]
4	[Fr, nces]

	State	Sex	Year	Name	Count
221154	CA	F	2018	Emma	2741
382082	CA	M	2018	Noah	2567
221155	CA	F	2018	Mia	2497
221156	CA	F	2018	Olivia	2464
382083	CA	M	2018	Liam	2411

	State	Sex	Year	Name	Count
102493	CA	F	1986	Mariadelosangel	5
301497	CA	M	1987	Franciscojavier	5
302930	CA	M	1988	Franciscojavier	10
305837	CA	M	1989	Franciscojavier	6
310744	CA	M	1991	Ryanchristopher	7

	State	Sex	Year	Name	Count	name_lengths
0	CA	F	1910	Mary	295	4
1	CA	F	1910	Helen	239	5
2	CA	F	1910	Dorothy	220	7
3	CA	F	1910	Margaret	163	8
4	CA	F	1910	Frances	134	7

	State	Sex	Year	Name	Count	dr_ea_count
101964	CA	F	1986	Deandrea	6	3
297086	CA	M	1985	Deandrea	6	3
131014	CA	F	1994	Leandrea	5	3
115943	CA	F	1990	Deandrea	5	3
108718	CA	F	1988	Deandrea	5	3

	State	Sex	Year	Name	Count
306522	CA	M	1990	Louis	280
340179	CA	M	2003	Edmund	14
60604	CA	F	1969	Micheal	6
8678	CA	F	1927	Faye	26
83654	CA	F	1980	Antonia	61

	State	Sex	Year	Name	Count
335969	CA	M	2001	Kc	6
321209	CA	M	1995	Al	5
315453	CA	M	1993	Kc	9
229598	CA	M	1914	Al	5
93930	CA	F	1983	Vy	5

	State	Sex	Year	Count
Name
Aadan	CA	M	2014	7
Aadarsh	CA	M	2019	6
Aaden	CA	M	2019	158
Aadhav	CA	M	2019	8
Aadhira	CA	F	2018	10

	State	Sex	Year	Name	Count
106	CA	F	1910	Alberta	14
64	CA	F	1910	Pearl	26
225	CA	F	1910	Margie	5
90	CA	F	1910	Ramona	18
208	CA	F	1910	Billie	5

	State	Sex	Year	Name	Count
299	CA	F	1911	Ruby	27
287	CA	F	1911	Jennie	32
427	CA	F	1911	Tillie	7
228640	CA	M	1911	Fred	50
249	CA	F	1911	Marjorie	78

	State	Sex	Year	Name	Count
671	CA	F	1912	Lila	10
592	CA	F	1912	Bessie	23
739	CA	F	1912	Erna	6
523	CA	F	1912	Betty	60
508	CA	F	1912	Hazel	85

	State	Sex	Year	Name	Count	name_lengths
310744	CA	M	1991	Ryanchristopher	7	15
302930	CA	M	1988	Franciscojavier	10	15
333388	CA	M	2000	Franciscojavier	6	15
102493	CA	F	1986	Mariadelosangel	5	15
316422	CA	M	1993	Ryanchristopher	5	15
301497	CA	M	1987	Franciscojavier	5	15
326425	CA	M	1997	Ryanchristopher	5	15
323115	CA	M	1996	Franciscojavier	8	15
305837	CA	M	1989	Franciscojavier	6	15
326250	CA	M	1997	Franciscojavier	5	15
328420	CA	M	1998	Franciscojavier	6	15
310865	CA	M	1991	Franciscojavier	6	15
316309	CA	M	1993	Johnchristopher	5	15
163342	CA	F	2003	Mariadelcarmen	7	14
107964	CA	F	1988	Mayraalejandra	8	14
108181	CA	F	1988	Mariadelcarmen	7	14
142970	CA	F	1998	Mariaguadalupe	25	14
349603	CA	M	2006	Christianjames	5	14
107644	CA	F	1988	Mariaguadalupe	10	14
192470	CA	F	2010	Mariaguadalupe	7	14
325531	CA	M	1997	Michaelanthony	9	14
172296	CA	F	2005	Mariadelcarmen	5	14
349438	CA	M	2006	Michaelanthony	6	14
140186	CA	F	1997	Mariadelcarmen	12	14
129515	CA	F	1994	Mariaguadalupe	11	14

	Count
Year
1910	290
1911	385
1912	529
1913	609
1914	768

	Year	Candidate	Party	Popular vote	Result	%
175	2016	Gary Johnson	Libertarian	4489235	loss	3.307714
171	2012	Mitt Romney	Republican	60933504	loss	47.384076
161	2004	Ralph Nader	Independent	465151	loss	0.380663
73	1916	Frank Hanly	Prohibition	221302	loss	1.197041
40	1884	James G. Blaine	Republican	4856905	loss	48.312208

	Year	Candidate	Popular vote	Result	%
Party
American	1976	Thomas J. Anderson	873053	loss	21.554001
American Independent	1976	Lester Maddox	9901118	loss	13.571218
Anti-Masonic	1832	William Wirt	100715	loss	7.821583
Anti-Monopoly	1884	Benjamin Butler	134294	loss	1.335838
Citizens	1980	Barry Commoner	233052	loss	0.270182
Communist	1932	William Z. Foster	103307	loss	0.261069
Constitution	2016	Michael Peroutka	203091	loss	0.152398
Constitutional Union	1860	John Bell	590901	loss	12.639283
Democratic	2016	Woodrow Wilson	69498516	win	61.344703
Democratic-Republican	1824	John Quincy Adams	151271	win	57.210122

	Year	Candidate	Party	Popular vote	Result	%
114	1964	Lyndon Johnson	Democratic	43127041	win	61.344703
91	1936	Franklin Roosevelt	Democratic	27752648	win	60.978107
120	1972	Richard Nixon	Republican	47168710	win	60.907806
79	1920	Warren Harding	Republican	16144093	win	60.574501
133	1984	Ronald Reagan	Republican	54455472	win	59.023326

	Year	Candidate	Popular vote	Result	%
Party
American	1856	Millard Fillmore	873053	loss	21.554001
American Independent	1968	George Wallace	9901118	loss	13.571218
Anti-Masonic	1832	William Wirt	100715	loss	7.821583
Anti-Monopoly	1884	Benjamin Butler	134294	loss	1.335838
Citizens	1980	Barry Commoner	233052	loss	0.270182
Communist	1932	William Z. Foster	103307	loss	0.261069
Constitution	2008	Chuck Baldwin	199750	loss	0.152398
Constitutional Union	1860	John Bell	590901	loss	12.639283
Democratic	1964	Lyndon Johnson	43127041	win	61.344703
Democratic-Republican	1824	Andrew Jackson	151271	loss	57.210122

	Year	Candidate	Party	Popular vote	Result	%
23	1860	Abraham Lincoln	Republican	1855993	win	39.699408
24	1860	John Bell	Constitutional Union	590901	loss	12.639283
25	1860	John C. Breckinridge	Southern Democratic	848019	loss	18.138998
26	1860	Stephen A. Douglas	Northern Democratic	1380202	loss	29.522311
66	1912	Eugene V. Debs	Socialist	901551	loss	6.004354
67	1912	Eugene W. Chafin	Prohibition	208156	loss	1.386325
68	1912	Theodore Roosevelt	Progressive	4122721	loss	27.457433
69	1912	William Taft	Republican	3486242	loss	23.218466
70	1912	Woodrow Wilson	Democratic	6296284	win	41.933422
115	1968	George Wallace	American Independent	9901118	loss	13.571218
116	1968	Hubert Humphrey	Democratic	31271839	loss	42.863537
117	1968	Richard Nixon	Republican	31783783	win	43.565246
139	1992	Andre Marrou	Libertarian	290087	loss	0.278516
140	1992	Bill Clinton	Democratic	44909806	win	43.118485
141	1992	Bo Gritz	Populist	106152	loss	0.101918
142	1992	George H. W. Bush	Republican	39104550	loss	37.544784
143	1992	Ross Perot	Independent	19743821	loss	18.956298

	Popular vote	%
Year
1824	264413	100.0
1828	1143703	100.0
1832	1287655	100.0
1836	1460216	100.0
1840	2404437	100.0

		Count
Year	Sex
1910	F	5950
1910	M	3213
1911	F	6602
1911	M	3381
1912	F	9803
1912	M	8142

	Count		Name
Sex	F	M	F	M
Year
1910	295	237	Yvonne	William
1911	390	214	Zelma	Willis
1912	534	501	Yvonne	Woodrow
1913	584	614	Zelma	Yoshio
1914	773	769	Zelma	Yoshio
1915	998	1033	Zita	Yukio