Live Lecture 2 – Data 100, Summer 2020

by Suraj Rampure and Allen Shen

The goal of this demo is to walk through some of the misconceptions that students had when working through the Quick Check questions from the pandas lectures this week.

In [1]:
import numpy as np
import pandas as pd

We will use NBA player data from the 2018-2019 season. You can download this CSV yourself from basketballreference.com.

In [2]:
df = pd.read_csv('nba18-19.csv')
In [83]:
df
Out[83]:
Rk Player Pos Age Tm G GS MP FG FGA ... ORB DRB TRB AST STL BLK TOV PF PTS Name
0 1 Álex Abrines\abrinal01 SG 25 OKC 31 2 19.0 1.8 5.1 ... 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3 Álex Abrines
1 2 Quincy Acy\acyqu01 PF 28 PHO 10 0 12.3 0.4 1.8 ... 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7 Quincy Acy
2 3 Jaylen Adams\adamsja01 PG 22 ATL 34 1 12.6 1.1 3.2 ... 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2 Jaylen Adams
3 4 Steven Adams\adamsst01 C 25 OKC 80 80 33.4 6.0 10.1 ... 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9 Steven Adams
4 5 Bam Adebayo\adebaba01 C 21 MIA 82 28 23.3 3.4 5.9 ... 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9 Bam Adebayo
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
703 528 Tyler Zeller\zellety01 C 29 MEM 4 1 20.5 4.0 7.0 ... 2.3 2.3 4.5 0.8 0.3 0.8 1.0 4.0 11.5 Tyler Zeller
704 529 Ante Žižić\zizican01 C 22 CLE 59 25 18.3 3.1 5.6 ... 1.8 3.6 5.4 0.9 0.2 0.4 1.0 1.9 7.8 Ante Žižić
705 530 Ivica Zubac\zubaciv01 C 21 TOT 59 37 17.6 3.6 6.4 ... 1.9 4.2 6.1 1.1 0.2 0.9 1.2 2.3 8.9 Ivica Zubac
706 530 Ivica Zubac\zubaciv01 C 21 LAL 33 12 15.6 3.4 5.8 ... 1.6 3.3 4.9 0.8 0.1 0.8 1.0 2.2 8.5 Ivica Zubac
707 530 Ivica Zubac\zubaciv01 C 21 LAC 26 25 20.2 3.8 7.2 ... 2.3 5.3 7.7 1.5 0.4 0.9 1.4 2.5 9.4 Ivica Zubac

708 rows × 31 columns

Misconception 1 – Indices don't need to be unique!

Usually, we like our indices to be unique, but they don't need to be... We'll make a copy of df here to demonstrate this.

In [5]:
df2 = df.copy()
df2.index = np.ones(len(df2))
In [6]:
df2
Out[6]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
1.0 1 Álex Abrines\abrinal01 SG 25 OKC 31 2 19.0 1.8 5.1 ... 0.923 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3
1.0 2 Quincy Acy\acyqu01 PF 28 PHO 10 0 12.3 0.4 1.8 ... 0.700 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7
1.0 3 Jaylen Adams\adamsja01 PG 22 ATL 34 1 12.6 1.1 3.2 ... 0.778 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2
1.0 4 Steven Adams\adamsst01 C 25 OKC 80 80 33.4 6.0 10.1 ... 0.500 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9
1.0 5 Bam Adebayo\adebaba01 C 21 MIA 82 28 23.3 3.4 5.9 ... 0.735 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1.0 528 Tyler Zeller\zellety01 C 29 MEM 4 1 20.5 4.0 7.0 ... 0.778 2.3 2.3 4.5 0.8 0.3 0.8 1.0 4.0 11.5
1.0 529 Ante Žižić\zizican01 C 22 CLE 59 25 18.3 3.1 5.6 ... 0.705 1.8 3.6 5.4 0.9 0.2 0.4 1.0 1.9 7.8
1.0 530 Ivica Zubac\zubaciv01 C 21 TOT 59 37 17.6 3.6 6.4 ... 0.802 1.9 4.2 6.1 1.1 0.2 0.9 1.2 2.3 8.9
1.0 530 Ivica Zubac\zubaciv01 C 21 LAL 33 12 15.6 3.4 5.8 ... 0.864 1.6 3.3 4.9 0.8 0.1 0.8 1.0 2.2 8.5
1.0 530 Ivica Zubac\zubaciv01 C 21 LAC 26 25 20.2 3.8 7.2 ... 0.733 2.3 5.3 7.7 1.5 0.4 0.9 1.4 2.5 9.4

708 rows × 30 columns

Misconception 2 – How .loc and .iloc work

Recall, with both .loc and .iloc, we use square brackets. Whatever comes before the comma specifies which rows to select, and whatever comes after the comma specifies which columns to select. iloc stands for integer location, and works with integer positions.

Suppose we want the names of players who averaged more than 10 PTS per game last season. How can we do that?

In [8]:
df[df['PTS'] > 10]['Player']
Out[8]:
3               Steven Adams\adamsst01
7          LaMarcus Aldridge\aldrila01
10             Jarrett Allen\allenja01
19     Giannis Antetokounmpo\antetgi01
21           Carmelo Anthony\anthoca01
                    ...               
695             Delon Wright\wrighde01
698           Thaddeus Young\youngth01
699               Trae Young\youngtr01
700              Cody Zeller\zelleco01
703             Tyler Zeller\zellety01
Name: Player, Length: 223, dtype: object
In [9]:
df.loc[df['PTS'] > 10, 'Player']
Out[9]:
3               Steven Adams\adamsst01
7          LaMarcus Aldridge\aldrila01
10             Jarrett Allen\allenja01
19     Giannis Antetokounmpo\antetgi01
21           Carmelo Anthony\anthoca01
                    ...               
695             Delon Wright\wrighde01
698           Thaddeus Young\youngth01
699               Trae Young\youngtr01
700              Cody Zeller\zelleco01
703             Tyler Zeller\zellety01
Name: Player, Length: 223, dtype: object

What does the following line of code do?

In [17]:
df.loc[4:14, 'Age':'FGA']
Out[17]:
Age Tm G GS MP FG FGA
4 21 MIA 82 28 23.3 3.4 5.9
5 21 CLE 19 3 10.2 0.6 1.9
6 25 DEN 7 0 3.1 0.4 1.4
7 33 SAS 81 81 33.2 8.4 16.3
8 21 CHI 10 1 12.0 1.3 3.9
9 23 UTA 38 2 10.9 1.8 4.7
10 20 BRK 80 80 26.2 4.2 7.1
11 26 NYK 19 1 21.9 3.4 7.4
12 28 POR 81 81 28.3 3.2 7.3
13 25 ATL 48 4 9.6 1.3 3.3
14 25 MEM 43 40 29.8 3.5 6.4

What does the following line of code do?

In [18]:
df.iloc[4:14, 3:10]
Out[18]:
Age Tm G GS MP FG FGA
4 21 MIA 82 28 23.3 3.4 5.9
5 21 CLE 19 3 10.2 0.6 1.9
6 25 DEN 7 0 3.1 0.4 1.4
7 33 SAS 81 81 33.2 8.4 16.3
8 21 CHI 10 1 12.0 1.3 3.9
9 23 UTA 38 2 10.9 1.8 4.7
10 20 BRK 80 80 26.2 4.2 7.1
11 26 NYK 19 1 21.9 3.4 7.4
12 28 POR 81 81 28.3 3.2 7.3
13 25 ATL 48 4 9.6 1.3 3.3

Let's examine the behavior of .loc and .iloc on our weird DataFrame df2 where the index is equal to 1 for each row.

What does the following line of code do?

In [19]:
df2.iloc[4:14, 3:10]
Out[19]:
Age Tm G GS MP FG FGA
1.0 21 MIA 82 28 23.3 3.4 5.9
1.0 21 CLE 19 3 10.2 0.6 1.9
1.0 25 DEN 7 0 3.1 0.4 1.4
1.0 33 SAS 81 81 33.2 8.4 16.3
1.0 21 CHI 10 1 12.0 1.3 3.9
1.0 23 UTA 38 2 10.9 1.8 4.7
1.0 20 BRK 80 80 26.2 4.2 7.1
1.0 26 NYK 19 1 21.9 3.4 7.4
1.0 28 POR 81 81 28.3 3.2 7.3
1.0 25 ATL 48 4 9.6 1.3 3.3

What does the following line of code do?

In [25]:
df2.iloc[1]
Out[25]:
Rk                         2
Player    Quincy Acy\acyqu01
Pos                       PF
Age                       28
Tm                       PHO
G                         10
GS                         0
MP                      12.3
FG                       0.4
FGA                      1.8
FG%                    0.222
3P                       0.2
3PA                      1.5
3P%                    0.133
2P                       0.2
2PA                      0.3
2P%                    0.667
eFG%                   0.278
FT                       0.7
FTA                        1
FT%                      0.7
ORB                      0.3
DRB                      2.2
TRB                      2.5
AST                      0.8
STL                      0.1
BLK                      0.4
TOV                      0.4
PF                       2.4
PTS                      1.7
Name: 1.0, dtype: object

What does the following line of code do?

In [21]:
df2.loc[1]
Out[21]:
Rk Player Pos Age Tm G GS MP FG FGA ... FT% ORB DRB TRB AST STL BLK TOV PF PTS
1.0 1 Álex Abrines\abrinal01 SG 25 OKC 31 2 19.0 1.8 5.1 ... 0.923 0.2 1.4 1.5 0.6 0.5 0.2 0.5 1.7 5.3
1.0 2 Quincy Acy\acyqu01 PF 28 PHO 10 0 12.3 0.4 1.8 ... 0.700 0.3 2.2 2.5 0.8 0.1 0.4 0.4 2.4 1.7
1.0 3 Jaylen Adams\adamsja01 PG 22 ATL 34 1 12.6 1.1 3.2 ... 0.778 0.3 1.4 1.8 1.9 0.4 0.1 0.8 1.3 3.2
1.0 4 Steven Adams\adamsst01 C 25 OKC 80 80 33.4 6.0 10.1 ... 0.500 4.9 4.6 9.5 1.6 1.5 1.0 1.7 2.6 13.9
1.0 5 Bam Adebayo\adebaba01 C 21 MIA 82 28 23.3 3.4 5.9 ... 0.735 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
1.0 528 Tyler Zeller\zellety01 C 29 MEM 4 1 20.5 4.0 7.0 ... 0.778 2.3 2.3 4.5 0.8 0.3 0.8 1.0 4.0 11.5
1.0 529 Ante Žižić\zizican01 C 22 CLE 59 25 18.3 3.1 5.6 ... 0.705 1.8 3.6 5.4 0.9 0.2 0.4 1.0 1.9 7.8
1.0 530 Ivica Zubac\zubaciv01 C 21 TOT 59 37 17.6 3.6 6.4 ... 0.802 1.9 4.2 6.1 1.1 0.2 0.9 1.2 2.3 8.9
1.0 530 Ivica Zubac\zubaciv01 C 21 LAL 33 12 15.6 3.4 5.8 ... 0.864 1.6 3.3 4.9 0.8 0.1 0.8 1.0 2.2 8.5
1.0 530 Ivica Zubac\zubaciv01 C 21 LAC 26 25 20.2 3.8 7.2 ... 0.733 2.3 5.3 7.7 1.5 0.4 0.9 1.4 2.5 9.4

708 rows × 30 columns

What does the following line of code?

In [84]:
df.sort_values('PTS').loc[14:4]
Out[84]:
Rk Player Pos Age Tm G GS MP FG FGA ... ORB DRB TRB AST STL BLK TOV PF PTS Name
14 15 Kyle Anderson\anderky01 SF 25 MEM 43 40 29.8 3.5 6.4 ... 1.1 4.7 5.8 3.0 1.3 0.9 1.3 2.6 8.0 Kyle Anderson
640 473 Isaiah Thomas\thomais02 SG 29 DEN 12 0 15.1 2.8 8.3 ... 0.4 0.7 1.1 1.9 0.4 0.1 1.5 1.4 8.1 Isaiah Thomas
560 422 Austin Rivers\riverau01 SG 26 TOT 76 15 26.7 3.1 7.5 ... 0.3 1.8 2.1 2.2 0.6 0.3 0.9 2.7 8.1 Austin Rivers
325 253 Frank Jackson\jacksfr01 PG 20 NOP 61 16 19.2 3.2 7.3 ... 0.4 1.8 2.2 1.1 0.4 0.0 0.8 1.5 8.1 Frank Jackson
514 386 Jahlil Okafor\okafoja01 C 23 NOP 59 24 15.8 3.6 6.1 ... 1.4 3.3 4.7 0.7 0.3 0.7 0.9 1.6 8.2 Jahlil Okafor
690 521 Christian Wood\woodch01 PF 23 TOT 21 2 12.0 2.9 5.6 ... 0.8 3.1 4.0 0.4 0.3 0.5 0.8 0.8 8.2 Christian Wood
229 176 Markelle Fultz\fultzma01 SG 20 PHI 19 15 22.5 3.4 8.2 ... 1.4 2.3 3.7 3.1 0.9 0.3 1.3 2.7 8.2 Markelle Fultz
431 325 Boban Marjanović\marjabo01 C 30 PHI 22 3 13.9 3.4 5.5 ... 1.5 3.7 5.1 1.5 0.2 0.5 1.0 1.5 8.2 Boban Marjanović
304 234 Richaun Holmes\holmeri01 C 25 PHO 70 4 16.9 3.2 5.2 ... 1.6 3.1 4.7 0.9 0.6 1.1 0.7 2.8 8.2 Richaun Holmes
534 400 Cameron Payne\payneca01 PG 24 CLE 9 1 19.6 3.0 6.1 ... 0.3 1.8 2.1 2.6 0.9 0.3 1.2 1.7 8.2 Cameron Payne
71 63 Avery Bradley\bradlav01 SG 28 LAC 49 49 29.9 3.3 8.6 ... 0.7 2.0 2.7 2.0 0.6 0.3 1.2 2.7 8.2 Avery Bradley
461 348 Darius Miller\milleda01 SF 28 NOP 69 15 25.5 2.7 7.0 ... 0.2 1.7 1.9 2.1 0.6 0.3 0.9 2.4 8.2 Darius Miller
330 256 Justin Jackson\jacksju01 SF 23 DAL 29 11 18.3 3.2 6.6 ... 0.7 1.6 2.3 1.0 0.3 0.0 0.2 1.1 8.2 Justin Jackson
591 444 Landry Shamet\shamela01 SG 21 PHI 54 4 20.5 2.8 6.4 ... 0.3 1.2 1.4 1.1 0.4 0.1 0.5 2.0 8.3 Landry Shamet
105 85 Bruno Caboclo\cabocbr01 SF 23 MEM 34 19 23.5 2.8 6.6 ... 1.2 3.4 4.6 1.5 0.4 1.0 1.1 2.4 8.3 Bruno Caboclo
77 66 Mikal Bridges\bridgmi01 SF 22 PHO 82 56 29.5 3.0 6.9 ... 0.7 2.5 3.2 2.1 1.6 0.5 0.9 2.5 8.3 Mikal Bridges
198 154 Wayne Ellington\ellinwa01 SG 31 MIA 25 12 21.3 2.8 7.4 ... 0.2 1.6 1.9 1.2 1.0 0.1 0.6 1.6 8.4 Wayne Ellington
52 45 DeAndre' Bembry\bembrde01 SG 24 ATL 82 15 23.5 3.4 7.5 ... 0.7 3.7 4.4 2.5 1.3 0.5 1.7 2.3 8.4 DeAndre' Bembry
660 491 Noah Vonleh\vonleno01 PF 23 NYK 68 57 25.3 3.0 6.5 ... 1.7 6.1 7.8 1.9 0.7 0.8 1.3 2.6 8.4 Noah Vonleh
231 178 Langston Galloway\gallola01 SG 27 DET 80 4 21.8 2.9 7.3 ... 0.6 1.5 2.1 1.1 0.5 0.1 0.3 1.7 8.4 Langston Galloway
98 82 Alec Burks\burksal01 SG 27 UTA 17 0 15.8 2.8 6.7 ... 0.1 1.5 1.6 1.2 0.4 0.2 0.9 1.3 8.4 Alec Burks
382 290 Rodions Kurucs\kurucro01 SF 20 BRK 63 46 20.5 3.2 7.1 ... 0.9 3.0 3.9 0.8 0.7 0.4 1.2 2.3 8.5 Rodions Kurucs
418 318 Trey Lyles\lylestr01 PF 23 DEN 64 2 17.5 3.2 7.7 ... 0.7 3.2 3.8 1.4 0.5 0.4 1.1 1.5 8.5 Trey Lyles
706 530 Ivica Zubac\zubaciv01 C 21 LAL 33 12 15.6 3.4 5.8 ... 1.6 3.3 4.9 0.8 0.1 0.8 1.0 2.2 8.5 Ivica Zubac
374 284 Brandon Knight\knighbr03 PG 27 CLE 27 26 22.9 3.3 7.9 ... 0.3 1.5 1.9 2.3 0.7 0.1 0.9 1.8 8.5 Brandon Knight
549 411 Norman Powell\powelno01 SG 25 TOR 60 3 18.8 3.2 6.7 ... 0.3 2.1 2.3 1.5 0.7 0.2 1.1 1.6 8.6 Norman Powell
378 288 Kyle Korver\korveky01 SG-PF 37 TOT 70 0 19.1 2.9 6.9 ... 0.1 2.2 2.3 1.2 0.4 0.2 0.8 1.5 8.6 Kyle Korver
364 278 Frank Kaminsky\kaminfr01 C 25 CHO 47 0 16.1 2.9 6.3 ... 0.8 2.6 3.5 1.3 0.3 0.3 0.9 1.4 8.6 Frank Kaminsky
693 522 Delon Wright\wrighde01 PG 26 TOT 75 13 22.7 3.2 7.4 ... 0.9 2.6 3.5 3.3 1.2 0.4 1.0 1.4 8.7 Delon Wright
259 200 JaMychal Green\greenja01 PF 28 LAC 24 2 19.6 3.3 6.8 ... 1.3 5.2 6.5 0.6 0.5 0.3 1.0 2.9 8.7 JaMychal Green
562 422 Austin Rivers\riverau01 SG 26 HOU 47 13 28.6 3.3 8.0 ... 0.3 1.6 1.9 2.3 0.6 0.3 0.7 2.9 8.7 Austin Rivers
288 222 Mario Hezonja\hezonma01 SF 23 NYK 58 24 20.8 3.3 8.0 ... 0.5 3.6 4.1 1.5 1.0 0.1 1.5 1.9 8.8 Mario Hezonja
97 82 Alec Burks\burksal01 SG 27 TOT 64 24 21.5 3.0 7.4 ... 0.5 3.2 3.7 2.0 0.6 0.3 1.0 1.4 8.8 Alec Burks
705 530 Ivica Zubac\zubaciv01 C 21 TOT 59 37 17.6 3.6 6.4 ... 1.9 4.2 6.1 1.1 0.2 0.9 1.2 2.3 8.9 Ivica Zubac
4 5 Bam Adebayo\adebaba01 C 21 MIA 82 28 23.3 3.4 5.9 ... 2.0 5.3 7.3 2.2 0.9 0.8 1.5 2.5 8.9 Bam Adebayo

35 rows × 31 columns

Misconception 3 – using .str

The Player column is a little messy. Suppose we want to extract just the player's name (e.g. Ivica Zubac), and get rid of the weird code that follows the slash (e.g. \zubaciv01).

In [30]:
df['Player']
Out[30]:
0      Álex Abrines\abrinal01
1          Quincy Acy\acyqu01
2      Jaylen Adams\adamsja01
3      Steven Adams\adamsst01
4       Bam Adebayo\adebaba01
                ...          
703    Tyler Zeller\zellety01
704      Ante Žižić\zizican01
705     Ivica Zubac\zubaciv01
706     Ivica Zubac\zubaciv01
707     Ivica Zubac\zubaciv01
Name: Player, Length: 708, dtype: object
In [33]:
df['Player'].str.split('\\')
Out[33]:
0      [Álex Abrines, abrinal01]
1          [Quincy Acy, acyqu01]
2      [Jaylen Adams, adamsja01]
3      [Steven Adams, adamsst01]
4       [Bam Adebayo, adebaba01]
                 ...            
703    [Tyler Zeller, zellety01]
704      [Ante Žižić, zizican01]
705     [Ivica Zubac, zubaciv01]
706     [Ivica Zubac, zubaciv01]
707     [Ivica Zubac, zubaciv01]
Name: Player, Length: 708, dtype: object
In [34]:
df['Player'].str.split('\\')[0]
Out[34]:
['Álex Abrines', 'abrinal01']
In [35]:
df['Player'].str.split('\\').str[0]
Out[35]:
0      Álex Abrines
1        Quincy Acy
2      Jaylen Adams
3      Steven Adams
4       Bam Adebayo
           ...     
703    Tyler Zeller
704      Ante Žižić
705     Ivica Zubac
706     Ivica Zubac
707     Ivica Zubac
Name: Player, Length: 708, dtype: object
In [37]:
df['Player'].str[0]
Out[37]:
0      Á
1      Q
2      J
3      S
4      B
      ..
703    T
704    A
705    I
706    I
707    I
Name: Player, Length: 708, dtype: object

Can you think of a way to do this without using .str methods?

In [42]:
df['Player'].apply(lambda x: x[:x.index('\\')])
Out[42]:
0      Álex Abrines
1        Quincy Acy
2      Jaylen Adams
3      Steven Adams
4       Bam Adebayo
           ...     
703    Tyler Zeller
704      Ante Žižić
705     Ivica Zubac
706     Ivica Zubac
707     Ivica Zubac
Name: Player, Length: 708, dtype: object

Let's save this so we can use it later.

In [40]:
df['Name'] = df['Player'].str.split('\\').str[0]

Misconception 4: Grouping

One thing to note is that the result of calling .groupby on a DataFrame is not a Series or DataFrame – it is a pd.core.groupby.generic.DataFrameGroupBy object. To get this into a usable form, we typically have to use some aggregate function, or use .apply or .filter.

In [43]:
df.groupby('Tm')
Out[43]:
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x117998510>

Let's way we want to find the player that averaged the most points per game on each team.

What does the following line of code do?

In [44]:
df.sort_values('PTS', ascending = False).groupby('Tm').agg(lambda x: x.iloc[0])
Out[44]:
Rk Player Pos Age G GS MP FG FGA FG% ... ORB DRB TRB AST STL BLK TOV PF PTS Name
Tm
ATL 108 John Collins\collijo01 PF 21 61 59 30.0 7.6 13.6 0.560 ... 3.6 6.2 9.8 2.0 0.4 0.6 2.0 3.3 19.5 John Collins
BOS 249 Kyrie Irving\irvinky01 PG 26 67 67 33.0 9.0 18.5 0.487 ... 1.1 3.9 5.0 6.9 1.5 0.5 2.6 2.5 23.8 Kyrie Irving
BRK 433 D'Angelo Russell\russeda01 PG 22 81 81 30.2 8.1 18.7 0.434 ... 0.7 3.2 3.9 7.0 1.2 0.2 3.1 1.7 21.1 D'Angelo Russell
CHI 294 Zach LaVine\lavinza01 SG 23 63 62 34.5 8.4 18.0 0.467 ... 0.6 4.0 4.7 4.5 1.0 0.4 3.4 2.2 23.7 Zach LaVine
CHO 496 Kemba Walker\walkeke02 PG 28 82 82 34.9 8.9 20.5 0.434 ... 0.6 3.8 4.4 5.9 1.2 0.4 2.6 1.6 25.6 Kemba Walker
CLE 312 Kevin Love\loveke01 PF 30 22 21 27.2 5.0 12.9 0.385 ... 1.5 9.4 10.9 2.2 0.3 0.2 1.9 2.5 17.0 Kevin Love
DAL 142 Luka Dončić\doncilu01 SG 19 72 72 32.2 7.0 16.5 0.427 ... 1.2 6.6 7.8 6.0 1.1 0.3 3.4 1.9 21.2 Luka Dončić
DEN 269 Nikola Jokić\jokicni01 C 23 80 80 31.3 7.7 15.1 0.511 ... 2.9 8.0 10.8 7.3 1.4 0.7 3.1 2.9 20.1 Nikola Jokić
DET 202 Blake Griffin\griffbl01 PF 29 75 75 35.0 8.3 17.9 0.462 ... 1.3 6.2 7.5 5.4 0.7 0.4 3.4 2.7 24.5 Blake Griffin
GSW 124 Stephen Curry\curryst01 PG 30 69 69 33.8 9.2 19.4 0.472 ... 0.7 4.7 5.3 5.2 1.3 0.4 2.8 2.4 27.3 Stephen Curry
HOU 206 James Harden\hardeja01 PG 29 78 78 36.8 10.8 24.5 0.442 ... 0.8 5.8 6.6 7.5 2.0 0.7 5.0 3.1 36.1 James Harden
IND 389 Victor Oladipo\oladivi01 SG 26 36 36 31.9 6.9 16.3 0.423 ... 0.6 5.0 5.6 5.2 1.7 0.3 2.3 2.0 18.8 Victor Oladipo
LAC 212 Tobias Harris\harrito02 PF 26 55 55 34.6 7.7 15.5 0.496 ... 0.7 7.2 7.9 2.7 0.7 0.4 2.0 2.2 20.9 Tobias Harris
LAL 258 LeBron James\jamesle01 SF 34 55 55 35.2 10.1 19.9 0.510 ... 1.0 7.4 8.5 8.3 1.3 0.6 3.6 1.7 27.4 LeBron James
MEM 112 Mike Conley\conlemi01 PG 31 70 70 33.5 7.0 16.0 0.438 ... 0.6 2.8 3.4 6.4 1.3 0.3 1.9 1.8 21.1 Mike Conley
MIA 420 Josh Richardson\richajo01 SG 25 73 73 34.8 5.8 14.1 0.412 ... 0.7 2.9 3.6 4.1 1.1 0.5 1.5 2.7 16.6 Josh Richardson
MIL 18 Giannis Antetokounmpo\antetgi01 PF 24 72 72 32.8 10.0 17.3 0.578 ... 2.2 10.3 12.5 5.9 1.3 1.5 3.7 3.2 27.7 Giannis Antetokounmpo
MIN 480 Karl-Anthony Towns\townska01 C 23 77 77 33.1 8.8 17.1 0.518 ... 3.4 9.0 12.4 3.4 0.9 1.6 3.1 3.8 24.4 Karl-Anthony Towns
NOP 126 Anthony Davis\davisan02 C 25 56 56 33.0 9.5 18.3 0.517 ... 3.1 8.9 12.0 3.9 1.6 2.4 2.0 2.4 25.9 Anthony Davis
NYK 205 Tim Hardaway\hardati02 SG 26 46 46 32.6 6.1 15.8 0.388 ... 0.6 2.9 3.5 2.7 0.9 0.1 1.8 2.3 19.1 Tim Hardaway
OKC 183 Paul George\georgpa01 SF 28 77 77 36.9 9.2 21.0 0.438 ... 1.4 6.8 8.2 4.1 2.2 0.4 2.7 2.8 28.0 Paul George
ORL 492 Nikola Vučević\vucevni01 C 28 80 80 31.4 8.8 16.9 0.518 ... 2.8 9.2 12.0 3.8 1.0 1.1 2.0 2.0 20.8 Nikola Vučević
PHI 155 Joel Embiid\embiijo01 C 24 64 64 33.7 9.1 18.7 0.484 ... 2.5 11.1 13.6 3.7 0.7 1.9 3.5 3.3 27.5 Joel Embiid
PHO 61 Devin Booker\bookede01 SG 22 64 64 35.0 9.2 19.6 0.467 ... 0.6 3.5 4.1 6.8 0.9 0.2 4.1 3.1 26.6 Devin Booker
POR 305 Damian Lillard\lillada01 PG 28 80 80 35.5 8.5 19.2 0.444 ... 0.9 3.8 4.6 6.9 1.1 0.4 2.7 1.9 25.8 Damian Lillard
SAC 224 Buddy Hield\hieldbu01 SG 26 82 82 31.9 7.6 16.6 0.458 ... 1.3 3.7 5.0 2.5 0.7 0.4 1.8 2.5 20.7 Buddy Hield
SAS 8 LaMarcus Aldridge\aldrila01 C 33 81 81 33.2 8.4 16.3 0.519 ... 3.1 6.1 9.2 2.4 0.5 1.3 1.8 2.2 21.3 LaMarcus Aldridge
TOR 301 Kawhi Leonard\leonaka01 SF 27 60 60 34.0 9.3 18.8 0.496 ... 1.3 6.0 7.3 3.3 1.8 0.4 2.0 1.5 26.6 Kawhi Leonard
TOT 212 Tobias Harris\harrito02 PF 26 82 82 34.7 7.5 15.3 0.487 ... 0.8 7.0 7.9 2.8 0.6 0.5 1.8 2.2 20.0 Tobias Harris
UTA 354 Donovan Mitchell\mitchdo01 SG 22 77 77 33.7 8.6 19.9 0.432 ... 0.8 3.3 4.1 4.2 1.4 0.4 2.8 2.7 23.8 Donovan Mitchell
WAS 40 Bradley Beal\bealbr01 SG 25 82 82 36.9 9.3 19.6 0.475 ... 1.1 3.9 5.0 5.5 1.5 0.7 2.7 2.8 25.6 Bradley Beal

31 rows × 30 columns

In [47]:
df.sort_values('PTS', ascending = False).groupby('Tm').first()
Out[47]:
Rk Player Pos Age G GS MP FG FGA FG% ... ORB DRB TRB AST STL BLK TOV PF PTS Name
Tm
ATL 108 John Collins\collijo01 PF 21 61 59 30.0 7.6 13.6 0.560 ... 3.6 6.2 9.8 2.0 0.4 0.6 2.0 3.3 19.5 John Collins
BOS 249 Kyrie Irving\irvinky01 PG 26 67 67 33.0 9.0 18.5 0.487 ... 1.1 3.9 5.0 6.9 1.5 0.5 2.6 2.5 23.8 Kyrie Irving
BRK 433 D'Angelo Russell\russeda01 PG 22 81 81 30.2 8.1 18.7 0.434 ... 0.7 3.2 3.9 7.0 1.2 0.2 3.1 1.7 21.1 D'Angelo Russell
CHI 294 Zach LaVine\lavinza01 SG 23 63 62 34.5 8.4 18.0 0.467 ... 0.6 4.0 4.7 4.5 1.0 0.4 3.4 2.2 23.7 Zach LaVine
CHO 496 Kemba Walker\walkeke02 PG 28 82 82 34.9 8.9 20.5 0.434 ... 0.6 3.8 4.4 5.9 1.2 0.4 2.6 1.6 25.6 Kemba Walker
CLE 312 Kevin Love\loveke01 PF 30 22 21 27.2 5.0 12.9 0.385 ... 1.5 9.4 10.9 2.2 0.3 0.2 1.9 2.5 17.0 Kevin Love
DAL 142 Luka Dončić\doncilu01 SG 19 72 72 32.2 7.0 16.5 0.427 ... 1.2 6.6 7.8 6.0 1.1 0.3 3.4 1.9 21.2 Luka Dončić
DEN 269 Nikola Jokić\jokicni01 C 23 80 80 31.3 7.7 15.1 0.511 ... 2.9 8.0 10.8 7.3 1.4 0.7 3.1 2.9 20.1 Nikola Jokić
DET 202 Blake Griffin\griffbl01 PF 29 75 75 35.0 8.3 17.9 0.462 ... 1.3 6.2 7.5 5.4 0.7 0.4 3.4 2.7 24.5 Blake Griffin
GSW 124 Stephen Curry\curryst01 PG 30 69 69 33.8 9.2 19.4 0.472 ... 0.7 4.7 5.3 5.2 1.3 0.4 2.8 2.4 27.3 Stephen Curry
HOU 206 James Harden\hardeja01 PG 29 78 78 36.8 10.8 24.5 0.442 ... 0.8 5.8 6.6 7.5 2.0 0.7 5.0 3.1 36.1 James Harden
IND 389 Victor Oladipo\oladivi01 SG 26 36 36 31.9 6.9 16.3 0.423 ... 0.6 5.0 5.6 5.2 1.7 0.3 2.3 2.0 18.8 Victor Oladipo
LAC 212 Tobias Harris\harrito02 PF 26 55 55 34.6 7.7 15.5 0.496 ... 0.7 7.2 7.9 2.7 0.7 0.4 2.0 2.2 20.9 Tobias Harris
LAL 258 LeBron James\jamesle01 SF 34 55 55 35.2 10.1 19.9 0.510 ... 1.0 7.4 8.5 8.3 1.3 0.6 3.6 1.7 27.4 LeBron James
MEM 112 Mike Conley\conlemi01 PG 31 70 70 33.5 7.0 16.0 0.438 ... 0.6 2.8 3.4 6.4 1.3 0.3 1.9 1.8 21.1 Mike Conley
MIA 420 Josh Richardson\richajo01 SG 25 73 73 34.8 5.8 14.1 0.412 ... 0.7 2.9 3.6 4.1 1.1 0.5 1.5 2.7 16.6 Josh Richardson
MIL 18 Giannis Antetokounmpo\antetgi01 PF 24 72 72 32.8 10.0 17.3 0.578 ... 2.2 10.3 12.5 5.9 1.3 1.5 3.7 3.2 27.7 Giannis Antetokounmpo
MIN 480 Karl-Anthony Towns\townska01 C 23 77 77 33.1 8.8 17.1 0.518 ... 3.4 9.0 12.4 3.4 0.9 1.6 3.1 3.8 24.4 Karl-Anthony Towns
NOP 126 Anthony Davis\davisan02 C 25 56 56 33.0 9.5 18.3 0.517 ... 3.1 8.9 12.0 3.9 1.6 2.4 2.0 2.4 25.9 Anthony Davis
NYK 205 Tim Hardaway\hardati02 SG 26 46 46 32.6 6.1 15.8 0.388 ... 0.6 2.9 3.5 2.7 0.9 0.1 1.8 2.3 19.1 Tim Hardaway
OKC 183 Paul George\georgpa01 SF 28 77 77 36.9 9.2 21.0 0.438 ... 1.4 6.8 8.2 4.1 2.2 0.4 2.7 2.8 28.0 Paul George
ORL 492 Nikola Vučević\vucevni01 C 28 80 80 31.4 8.8 16.9 0.518 ... 2.8 9.2 12.0 3.8 1.0 1.1 2.0 2.0 20.8 Nikola Vučević
PHI 155 Joel Embiid\embiijo01 C 24 64 64 33.7 9.1 18.7 0.484 ... 2.5 11.1 13.6 3.7 0.7 1.9 3.5 3.3 27.5 Joel Embiid
PHO 61 Devin Booker\bookede01 SG 22 64 64 35.0 9.2 19.6 0.467 ... 0.6 3.5 4.1 6.8 0.9 0.2 4.1 3.1 26.6 Devin Booker
POR 305 Damian Lillard\lillada01 PG 28 80 80 35.5 8.5 19.2 0.444 ... 0.9 3.8 4.6 6.9 1.1 0.4 2.7 1.9 25.8 Damian Lillard
SAC 224 Buddy Hield\hieldbu01 SG 26 82 82 31.9 7.6 16.6 0.458 ... 1.3 3.7 5.0 2.5 0.7 0.4 1.8 2.5 20.7 Buddy Hield
SAS 8 LaMarcus Aldridge\aldrila01 C 33 81 81 33.2 8.4 16.3 0.519 ... 3.1 6.1 9.2 2.4 0.5 1.3 1.8 2.2 21.3 LaMarcus Aldridge
TOR 301 Kawhi Leonard\leonaka01 SF 27 60 60 34.0 9.3 18.8 0.496 ... 1.3 6.0 7.3 3.3 1.8 0.4 2.0 1.5 26.6 Kawhi Leonard
TOT 212 Tobias Harris\harrito02 PF 26 82 82 34.7 7.5 15.3 0.487 ... 0.8 7.0 7.9 2.8 0.6 0.5 1.8 2.2 20.0 Tobias Harris
UTA 354 Donovan Mitchell\mitchdo01 SG 22 77 77 33.7 8.6 19.9 0.432 ... 0.8 3.3 4.1 4.2 1.4 0.4 2.8 2.7 23.8 Donovan Mitchell
WAS 40 Bradley Beal\bealbr01 SG 25 82 82 36.9 9.3 19.6 0.475 ... 1.1 3.9 5.0 5.5 1.5 0.7 2.7 2.8 25.6 Bradley Beal

31 rows × 30 columns

What does the following line of code do?

In [49]:
df.groupby('Tm').agg(max)
Out[49]:
Rk Player Pos Age G GS MP FG FGA FG% ... ORB DRB TRB AST STL BLK TOV PF PTS Name
Tm
ATL 528 Vince Carter\cartevi01 SG 42 82 81 30.9 7.6 15.5 1.000 ... 3.6 6.2 9.8 8.1 1.3 1.1 3.8 3.3 19.5 Vince Carter
BOS 523 Terry Rozier\roziete01 SG 32 80 79 33.0 9.0 18.5 0.706 ... 1.8 5.2 6.7 6.9 1.8 1.3 2.6 3.0 23.8 Terry Rozier
BRK 511 Treveon Graham\grahatr01 SG 33 81 81 30.2 8.1 18.7 0.667 ... 2.7 6.0 8.6 7.0 1.2 1.5 3.1 2.8 21.1 Treveon Graham
CHI 487 Zach LaVine\lavinza01 SG 30 81 62 34.9 8.4 18.0 0.568 ... 2.1 7.6 9.0 6.0 1.8 1.3 3.4 3.6 23.7 Zach LaVine
CHO 527 Willy Hernangómez\hernawi01 SG 36 82 82 34.9 8.9 20.5 0.571 ... 2.2 4.7 6.8 5.9 1.2 0.8 2.6 3.3 25.6 Willy Hernangómez
CLE 529 Tristan Thompson\thomptr01 SG 37 82 75 32.2 6.5 14.7 0.553 ... 4.0 9.4 10.9 4.2 1.5 0.6 2.3 2.9 17.0 Tristan Thompson
DAL 459 Wesley Matthews\matthwe02 SG 40 81 72 32.3 7.0 16.5 0.644 ... 3.2 10.5 13.7 6.0 1.3 1.1 3.4 2.6 21.2 Wesley Matthews
DEN 524 Will Barton\bartowi01 SG 33 82 80 32.6 7.7 15.6 0.593 ... 2.9 8.0 10.8 7.3 1.4 0.9 3.1 3.1 20.1 Will Barton
DET 474 Zaza Pachulia\pachuza01 SG 37 82 82 35.0 8.3 17.9 0.584 ... 5.4 10.2 15.6 5.4 1.7 1.7 3.4 3.4 24.5 Zaza Pachulia
GSW 476 Stephen Curry\curryst01 SG 35 80 78 34.6 9.2 19.4 0.716 ... 2.4 6.8 8.2 6.9 1.4 1.5 2.9 3.6 27.3 Stephen Curry
HOU 483 Zhou Qi\qizh01 SG 36 82 82 36.8 10.8 24.5 1.000 ... 4.4 8.2 12.7 8.2 2.0 1.5 5.0 3.2 36.1 Zhou Qi
IND 525 Wesley Matthews\matthwe02 SG 32 82 81 31.9 6.9 16.3 0.590 ... 2.5 6.8 9.3 6.0 1.7 2.7 2.3 3.2 18.8 Wesley Matthews
LAC 530 Ángel Delgado\delgaan01 SG 34 82 73 34.6 7.7 15.5 0.615 ... 2.3 7.2 7.9 5.4 1.2 1.3 2.4 3.4 20.9 Ángel Delgado
LAL 530 Tyson Chandler\chandty01 SG 36 82 68 35.2 10.1 19.9 0.667 ... 2.7 7.4 8.5 8.3 1.5 2.0 3.6 2.9 27.4 Tyson Chandler
MEM 528 Yuta Watanabe\watanyu01 SG 34 70 70 33.7 8.0 16.0 0.571 ... 2.7 7.9 10.7 6.4 1.6 1.6 2.7 4.0 21.1 Yuta Watanabe
MIA 520 Yante Maten\matenya01 SG 38 82 73 34.8 5.8 14.1 0.576 ... 3.6 7.8 11.3 4.8 1.1 1.9 2.3 2.7 16.6 Yante Maten
MIL 521 Trevon Duval\duvaltr01 SG 38 81 81 32.8 10.0 17.3 0.667 ... 2.2 10.3 12.5 5.9 1.5 2.2 3.7 3.2 27.7 Trevon Duval
MIN 512 Tyus Jones\jonesty01 SG 33 77 77 36.1 8.8 17.1 0.566 ... 3.4 9.0 12.4 8.2 2.4 1.6 3.1 3.8 24.4 Tyus Jones
NOP 521 Wesley Johnson\johnswe01 SG 32 73 67 35.9 9.5 18.3 0.620 ... 3.1 8.9 12.0 7.7 1.6 2.4 3.1 3.4 25.9 Wesley Johnson
NYK 491 Wesley Matthews\matthwe02 SG 33 75 57 32.6 6.1 15.8 0.694 ... 3.9 8.0 11.4 5.4 1.3 2.4 2.6 3.3 19.1 Wesley Matthews
OKC 506 Álex Abrines\abrinal01 SG 34 80 80 36.9 9.2 21.0 0.595 ... 4.9 9.6 11.1 10.7 2.2 1.3 4.5 3.4 28.0 Álex Abrines
ORL 492 Wesley Iwundu\iwundwe01 SG 31 81 81 33.8 8.8 16.9 0.625 ... 2.8 9.2 12.0 5.3 1.0 1.4 2.1 2.8 20.8 Wesley Iwundu
PHI 457 Zhaire Smith\smithzh01 SG 34 79 79 35.0 9.1 18.7 0.652 ... 2.5 11.1 13.6 7.7 1.8 1.9 3.5 3.5 27.5 Zhaire Smith
PHO 502 Tyson Chandler\chandty01 SG 38 82 70 35.0 9.2 19.6 0.667 ... 3.1 7.1 10.3 6.8 1.6 1.1 4.1 3.1 26.6 Tyson Chandler
POR 484 Zach Collins\colliza01 SG 30 81 81 35.5 8.5 19.2 0.684 ... 3.7 7.0 10.4 6.9 1.1 1.4 2.7 3.5 25.8 Zach Collins
SAC 518 Yogi Ferrell\ferreyo01 SG 32 82 82 33.9 7.6 16.6 0.556 ... 2.6 6.1 8.4 7.3 1.6 1.0 2.8 2.8 20.7 Yogi Ferrell
SAS 507 Rudy Gay\gayru01 SG 38 82 81 34.9 8.4 17.1 0.645 ... 3.1 6.1 9.2 6.2 1.1 1.3 2.6 2.3 21.3 Rudy Gay
TOR 522 Serge Ibaka\ibakase01 SG 34 80 80 34.0 9.3 18.8 0.575 ... 2.1 6.0 8.1 8.7 1.8 1.4 2.8 3.0 26.6 Serge Ibaka
TOT 530 Wilson Chandler\chandwi01 SG-SF 38 82 82 34.7 7.5 15.3 0.641 ... 3.8 9.8 13.1 4.8 2.1 1.3 2.9 3.6 20.0 Wilson Chandler
UTA 486 Tyler Cavanaugh\cavanty01 SG 37 82 82 33.7 8.6 19.9 0.694 ... 3.8 9.0 12.9 6.1 1.4 2.3 2.8 2.9 23.8 Tyler Cavanaugh
WAS 508 Wesley Johnson\johnswe01 SG 33 82 82 36.9 9.3 19.6 1.000 ... 2.7 6.6 9.2 8.7 1.6 0.9 3.8 3.8 25.6 Wesley Johnson

31 rows × 30 columns

What does the following line of code do?

In [50]:
df.groupby('Tm').sort_values('PTS', ascending = False).agg(lambda x: x.iloc[0])
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-50-af8dead85976> in <module>
----> 1 df.groupby('Tm').sort_values('PTS', ascending = False).agg(lambda x: x.iloc[0])

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in __getattr__(self, attr)
    561             return self[attr]
    562         if hasattr(self.obj, attr):
--> 563             return self._make_wrapper(attr)
    564 
    565         raise AttributeError(

/opt/anaconda3/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in _make_wrapper(self, name)
    602                 "using the 'apply' method".format(kind, name, type(self).__name__)
    603             )
--> 604             raise AttributeError(msg)
    605 
    606         self._set_group_selection()

AttributeError: Cannot access callable attribute 'sort_values' of 'DataFrameGroupBy' objects, try using the 'apply' method

Suppose we want to determine the names of all teams whose highest scorer averaged less than 25 PTS last season.

In [56]:
df.groupby('Tm').filter(lambda x: x['PTS'].max() < 25)['Tm'].unique()
Out[56]:
array(['ATL', 'MIA', 'CLE', 'DEN', 'SAS', 'CHI', 'UTA', 'BRK', 'NYK',
       'MEM', 'TOT', 'IND', 'DAL', 'ORL', 'SAC', 'MIN', 'BOS', 'LAC',
       'DET'], dtype=object)

Misconception 5: Grouping on multiple columns

What does the following line of code do?

In [70]:
df.groupby(['Tm', 'Pos']).mean()['PTS']
Out[70]:
Tm   Pos
ATL  C       5.550000
     PF      9.475000
     PG     11.000000
     SF      6.900000
     SG      6.500000
              ...    
WAS  C       7.775000
     PF      9.866667
     PG      9.700000
     SF      8.983333
     SG      8.040000
Name: PTS, Length: 161, dtype: float64
In [58]:
df.groupby(['Tm', 'Pos']).mean()['PTS']
Out[58]:
Tm   Pos
ATL  C       5.550000
     PF      9.475000
     PG     11.000000
     SF      6.900000
     SG      6.500000
              ...    
WAS  C       7.775000
     PF      9.866667
     PG      9.700000
     SF      8.983333
     SG      8.040000
Name: PTS, Length: 161, dtype: float64

What does the following line of code do?

In [59]:
df.groupby(['Tm', 'Pos']).mean()['PTS']['ATL', 'SF']
Out[59]:
6.8999999999999995
In [72]:
df.groupby(['Tm', 'Pos']).mean()['PTS'].loc[('ATL', 'SF')]
Out[72]:
6.8999999999999995

What does the following line of code do?

In [73]:
df.groupby(['Tm', 'Pos']).mean()['PTS'][:, 'SF']
Out[73]:
Tm
ATL     6.900000
BOS    15.700000
BRK     8.725000
CHI    12.375000
CHO     8.400000
CLE     4.800000
DAL     9.600000
DEN     5.700000
DET     4.566667
GSW    12.133333
HOU     5.700000
IND    12.650000
LAC    12.050000
LAL    13.375000
MEM     6.462500
MIA     9.800000
MIL     9.733333
MIN     8.800000
NOP     5.520000
NYK     8.800000
OKC     8.425000
ORL     5.000000
PHI     8.283333
PHO    10.620000
POR     7.650000
SAC     6.480000
SAS     1.800000
TOR    10.650000
TOT     7.206667
UTA     6.966667
WAS     8.983333
Name: PTS, dtype: float64

Misconception 6: Merging

In [75]:
college = pd.read_csv('college.csv')
In [76]:
college
Out[76]:
Unnamed: 0 active_from active_to birth_date college height name position url weight ... NCAA__3ptpg NCAA_efgpct NCAA_fgapg NCAA_fgpct NCAA_fgpg NCAA_ft NCAA_ftapg NCAA_ftpg NCAA_games NCAA_ppg
0 0 1991 1995 June 24, 1968 Duke University 6-10 Alaa Abdelnaby F-C /players/a/abdelal01.html 240.0 ... 0.0 NaN 5.6 0.599 3.3 0.728 2.5 1.8 134.0 8.5
1 1 1969 1978 April 7, 1946 Iowa State University 6-9 Zaid Abdul-Aziz C-F /players/a/abdulza01.html 235.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2 2 1970 1989 April 16, 1947 University of California, Los Angeles 7-2 Kareem Abdul-Jabbar C /players/a/abdulka01.html 225.0 ... NaN NaN 16.8 0.639 10.7 0.628 7.9 5.0 88.0 26.4
3 3 1991 2001 March 9, 1969 Louisiana State University 6-1 Mahmoud Abdul-Rauf G /players/a/abdulma02.html 162.0 ... 2.7 NaN 21.9 0.474 10.4 0.863 6.4 5.5 64.0 29.0
4 4 1998 2003 November 3, 1974 University of Michigan, San Jose State University 6-6 Tariq Abdul-Wahad F /players/a/abdulta01.html 223.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
4571 4571 2018 2018 January 4, 1997 NaN 6-11 Ante Zizic F-C /players/z/zizican01.html 250.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4572 4572 1983 1983 December 20, 1953 Kent State University 7-1 Jim Zoet C /players/z/zoetji01.html 240.0 ... NaN NaN 2.9 0.476 1.4 0.429 1.0 0.4 63.0 3.2
4573 4573 1971 1971 June 7, 1948 Duquesne University 6-1 Bill Zopf G /players/z/zopfbi01.html 170.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4574 4574 2017 2018 March 18, 1997 NaN 7-1 Ivica Zubac C /players/z/zubaciv01.html 265.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
4575 4575 1949 1949 December 19, 1919 George Washington University 6-3 Matt Zunic G-F /players/z/zunicma01.html 195.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN

4576 rows × 34 columns

How can we merge df with this new college dataframe?

In [77]:
df.merge(college, left_on = 'Name', right_on = 'name')[['Name', 'college', 'PTS', 'NCAA_ppg']]
Out[77]:
Name college PTS NCAA_ppg
0 Quincy Acy Baylor University 1.7 9.8
1 Steven Adams University of Pittsburgh 13.9 7.2
2 Bam Adebayo University of Kentucky 8.9 13.0
3 LaMarcus Aldridge University of Texas at Austin 21.3 13.5
4 Jarrett Allen University of Texas at Austin 10.9 13.4
... ... ... ... ...
552 Tyler Zeller University of North Carolina 0.0 12.8
553 Tyler Zeller University of North Carolina 11.5 12.8
554 Ivica Zubac NaN 8.9 NaN
555 Ivica Zubac NaN 8.5 NaN
556 Ivica Zubac NaN 9.4 NaN

557 rows × 4 columns

In [78]:
len(df), len(college)
Out[78]:
(708, 4576)
In [79]:
df.merge(college, left_on = 'Name', right_on = 'name', how = 'left')[['Name', 'college', 'PTS', 'NCAA_ppg']]
Out[79]:
Name college PTS NCAA_ppg
0 Álex Abrines NaN 5.3 NaN
1 Quincy Acy Baylor University 1.7 9.8
2 Jaylen Adams NaN 3.2 NaN
3 Steven Adams University of Pittsburgh 13.9 7.2
4 Bam Adebayo University of Kentucky 8.9 13.0
... ... ... ... ...
709 Tyler Zeller University of North Carolina 11.5 12.8
710 Ante Žižić NaN 7.8 NaN
711 Ivica Zubac NaN 8.9 NaN
712 Ivica Zubac NaN 8.5 NaN
713 Ivica Zubac NaN 9.4 NaN

714 rows × 4 columns

In [80]:
college['name'].value_counts()
Out[80]:
Charles Jones             3
Charles Smith             3
George Johnson            3
Bob Duffy                 2
Glenn Robinson            2
                         ..
Cleveland Buckner         1
Dave Henderson            1
Alton Ford                1
Micheal Ray Richardson    1
Alex Abrines              1
Name: name, Length: 4526, dtype: int64
In [82]:
college[college['name'] == 'Charles Jones']
Out[82]:
Unnamed: 0 active_from active_to birth_date college height name position url weight ... NCAA__3ptpg NCAA_efgpct NCAA_fgapg NCAA_fgpct NCAA_fgpg NCAA_ft NCAA_ftapg NCAA_ftpg NCAA_games NCAA_ppg
2075 2075 1984 1998 April 3, 1957 Albany State University 6-9 Charles Jones F-C /players/j/jonesch01.html 215.0 ... NaN NaN NaN NaN NaN NaN NaN NaN NaN NaN
2076 2076 1985 1989 January 12, 1962 University of Louisville 6-8 Charles Jones F /players/j/jonesch02.html 215.0 ... NaN NaN 5.7 0.558 3.2 0.604 4.0 2.4 129.0 8.7
2077 2077 1999 2000 July 17, 1975 Rutgers University, Long Island University 6-3 Charles Jones G /players/j/jonesch03.html 180.0 ... 3.0 NaN 19.7 0.429 8.4 0.640 4.0 2.5 108.0 22.5

3 rows × 34 columns