Defining NBA players by role with k-means clustering

With the rise of small ball and 3-point shooting, the old power forward position has been phased out of the NBA. Strong or athletic power forwards became small ball centers, while shooting power forwards became more and more like small forwards. Following this revolution, Brad Stevens claims that the NBA no longer has five positions – it has three (ball handlers, wings, and bigs).

Even within a position, players serve significantly different roles. While Clint Capela will spend almost all of his time on offense setting screens then rolling to the rim, Brook Lopez will spend most of it on the perimeter. Given this difference and the general direction of the NBA, it’s not informative anymore to just classify a player by the typical five positions.

By clustering, or grouping players by their stats, we can determine the different roles of an NBA player.

What is k-means clustering

k-means clustering is a method of grouping a data set in a certain number of ways by their stats. Though the concept seems mathematically daunting, it is not hard to visualize. As an example of k-means clustering, let’s look at the top five players in steals and blocks.

If we take the top five players in steals per game and the top five players in blocks per game, and compare their steals and blocks, we’ll see a very pronounced difference.

The large space between the groups makes it clear to us that the bottom right corner shows the steals leaders, while the top left shows the blocks leaders. So, just by looking at the graph, we can separate the data intp two groups like this.

k-means clustering just makes this separation mathematical. First, the algorithm creates k random “cluster centroids.” k represents the number of clusters we want to create. The algorithm groups the data into the closest cluster, then moves the centroid to the average coordinates of the data for that cluster. It repeats this process until the centroids are at the “best” position, where they can’t move any closer to the average of the cluster. Note that in this example, the clusters will end up being equally sized. However, they don’t always have to have the same number of data points.

Let’s see what happens when we create two clusters for the steals and blocks leaders.

The algorithm successfully grouped together the steals leaders and the blocks leaders. Each point was assigned to the cluster based on its distance from the cluster centroid (or center), represented by the large grey dot.

We can cluster with more than just two predictors. If we add points, rebounds, and assists as features to our clustering algorithm of the steals and blocks leaders, it separates Paul George, James Harden, and Russell Westbrook (one cluster) from Chris Paul, Jimmy Butler, Myles Turner, Mitchell Robinson, Rudy Gobert, Brook Lopez, and JaVale McGee (another cluster). So, when we added these other factors, it separated the true superstars from the rest.

We can also create more than two clusters. If we look at the top 50 players in PPG, we can clearly distinguish some roles with a few clusters. For example, let’s look at PPG and usage (USG%).

All the secondary scorers like Klay Thompson, CJ McCollum, and Tobias Harris were placed in a cluster at the bottom left, separate from the primary scorers. James Harden’s stats were so extraordinary that he earned his own cluster (just goes to show how amazing he is; if you create three groups from the top 50 players in PPG, his stats are so unique that he earns his own group).

Now, let’s create four clusters for assists and usage.

The four clusters separated the data into scorers, passers, stars, and James Harden. The scorers group contains shooters like Thompson, McCollum, and Buddy Hield who are often not going to handle the ball. A mix of point guards like Trae Young, De’Aaron Fox, and Mike Conley compose the passers group along with a few bigmen like Nikola Jokic, Nikola Vucevic, and Julius Randle. The superstar group combines team-leading players like D’Angelo Russell and Lou Williams with the true superstars. Yet again, Harden earns his own group (hard to see in this graph, but it’s in the top right, behind where it says “Cluster 1” in the legend).


Now that we’ve looked at a few examples of k-means clustering, we can dive into the analysis. First, I restricted the data set to players who played at least 1,000 minutes and 50 games in the regular season. For these players, I recorded all of Basketball Reference’s stats (not all of them were used in the analysis – more on this a bit later). These included:

Shooting (raw)Shooting (percentages)ReboundingPassingDefenseAdvanced efficiency statsOther

* POS = position as a number; 1 = guard, 2 = wing, 3 = big

Dimensionality reduction

The above table has 44 different stats, many of which are related. For example, each counting stat except for points also has a rate stat (i.e. there’s both STL and STL%). This creates a problem, as players with more steals will also have a higher steal percentage. Therefore, we don’t gain much information – if any – by having both steals and steal percentage as parameters in the clusterer.

To solve this dimensionality issue, we’ll use something called principal component analysis (PCA). PCA essentially does a fancy mathematical transformation to all our features into different “principal components.” These components have no actual meaning (i.e. they’re just numbers, not data like steals or steal percentage). With this transformation, we can ignore some variables, as our principal components capture most of the differences – or variance – in the data set.

Now that we’ve established how PCA works, we need to decide how many principal components we want. To measure the “effectiveness” of each number of principal components, we’ll look at the explained variance ratio, meaning “how much of the variance in our initial data set do these n principal components capture.” The graph below shows the explained variance ratio for each n_components between two and ten.

As the number of components increases, the difference in explained variance between the nth component and the (n-1)th component decreases. From what I’ve seen, there’s no commonly accepted way to determine how many components to use. Intuitively, I think looking at the derivative makes sense.

The derivative, which in this case represents the change in explained variance, can help us determine the optimal number of components by looking at where there are the sharpest changes in the derivative. This seems to occur at n_components = 6.

Visually, this means that after n_components = 6, the explained variance ratio increases at a very slow, decreasing rate. Before n_components = 6, the explained variance ratio is still increasing significantly. Therefore, I chose to use 6 components in the PCA. These 6 components explain 85.85% of the data set’s variance.

Determining n_clusters

Along with determining the number of principal components, it’s important to find the “correct” number of clusters for our k-means clusterer to create. To find the right number of clusters, we’ll use silhouette scores. 

The silhouette score measures how similar a data point in a cluster is similar to its own cluster compared to other clusters by comparing the point’s distance from its own cluster to its distance from other clusters. If a point is “perfectly” matched to its cluster, it will have a silhouette score of 1, while the worst possible score is -1. The graph below shows the silhouette score for each n_cluster no greater than 20.

Though the silhouette score generally decreases as n_clusters increases, this doesn’t mean we’re going to use n_clusters = 2. With n_clusters = 2, the data will be separated in a way that gives us no information about player roles. Furthermore, the difference between n_clusters = 2 and n_clusters = 3 is negative, meaning that the clusters get worse if we use 3 clusters instead of 2.

To pick a number of clusters, we’ll look at this difference. However, we won’t look at the difference through subtraction; we’ll look at it in a standardized fashion. The percent improvement in silhouette score for the n_clusters = x over n_clusters = x – 1 will equal 1 – (1 – silhouette score at x) / (1 – silhouette score at x – 1). This standardizes the percent improvement relative to the best possible silhouette score of 1. The graph below shows this improvement for each n_cluster value.  

We want to pick pick an n_cluster value where there’s a positive improvement in silhouette score. This leaves us with possible values of n_clusters = 10, 12, 13, 17 and 19 (n_clusters = 3, 7, and 8 also have positive improvements, but wouldn’t give us much information. Out of these values, n_clusters = 10, 12, and 13 had the best silhouette scores. Because n_clusters = 12 had the biggest improvement, we’ll use that.


Now that we’ve gone over k-means clustering, principal component analysis, and silhouette scores, we can finally group together NBA players. As we can see from the relatively low silhouette score, the clusters will not be extremely well defined. This combined with the fact that many players fit multiple archetypes makes the analysis imperfect. However, the trends in the clusters are still noticeable and interesting.

By looking at the types of players in each cluster, we can define the cluster as a “role,” such as a 3&D player. We’ll look at some of the players in the cluster who embody this role, and then examine the average stats of the cluster.

Note that the average stats of the cluster are not the cluster centers because we’re giving the clustering algorithm the principal components and not player stats. So, the average stats for each cluster is just the average of each stat for all players in the given cluster.

Several players fit multiple of these roles, as evidenced by the fact the silhouette score is not very high. The table below shows the cluster, description, example players, and average stats.

3&D forwardP.J. Tucker, OG Anunoby7.5/4.4/1.30.7/0.546%/33%/74%/15%
3&D guardDanny Green, Wesley Matthews8.9/2.4/1.60.6/0.244%/39%/83%/16%
???Collin Sexton, Josh Jackson9.2/3.1/2.50.8/0.342%/32%/75%/19%
Do it all bigAl Horford, Paul Millsap12.9/6.9/2.21/1.251%/37%/74%/19%
Floor generalRicky Rubio, Kyle Lowry11.2/4.2/5.31.3/0.444%/35%/78%/18%
Inside bigJulius Randle, LaMarcus Aldridge17.3/9.3/2.40.7/0.956%/25%/75%/24%
Perimiter scorerTobias Harris, Jayson Tatum14.7/5.9/2.10.8/0.546%/35%/77%/22%
Rim runnerClint Capela, Jarrett Allen9.6/7.5/1.30.7/1.261%/9%/65%/17%
ShooterKlay Thompson, Buddy Hield16.7/3.7/3.20.8/0.345%/38%/85%/23%
Star ball handlerJames Harden, Damian Lillard27.3/6.6/5.91.4/0.547%/38%/84%/31%
Star bigGiannis Antetokounmpo, Joel Embiid21.9/12.2/51.3/1.553%/24%/73%/27%
Team-leading guardTrae Young, Jrue Holiday21.9/4.6/6.11.3/0.445%/34%/81%/29%

The “???” cluster consists mostly of young, inefficient players on bad teams and role players on bad teams. This isn’t necessarily a “role,” but all these players weren’t good enough to make it into the other roles for one reason or another.

The table below includes the role and stats of every player who met the qualifications (min. 50 games and 1,000 minutes). To find a player, use the search bar. You can also sort by different stats; the ten rows showed are just a preview.

James Harden7Star ball handler36.16.67.520.70.4420.3680.87940.5
Paul George7Star ball handler288.
Giannis Antetokounmpo5Star big27.712.
Joel Embiid5Star big27.513.
LeBron James7Star ball handler27.
Stephen Curry7Star ball handler27.
Devin Booker8Team-leading guard26.
Kawhi Leonard7Star ball handler26.
Kevin Durant7Star ball handler266.
Anthony Davis5Star big25.9123.
Damian Lillard7Star ball handler25.
Bradley Beal8Team-leading guard25.655.
Kemba Walker8Team-leading guard25.
Blake Griffin7Star ball handler24.
Karl-Anthony Towns5Star big24.412.
Kyrie Irving7Star ball handler23.856.
Donovan Mitchell8Team-leading guard23.
Zach LaVine8Team-leading guard23.74.74.510.40.4670.3740.83230.5
Russell Westbrook5Star big22.911.
Klay Thompson6Shooter21.
Julius Randle11Inside big21.
LaMarcus Aldridge11Inside big21.
DeMar DeRozan8Team-leading guard21.
Luka Doncic8Team-leading guard21.27.861.10.30.4270.3270.71330.5
Jrue Holiday8Team-leading guard21.
Mike Conley8Team-leading guard21.
D'Angelo Russell8Team-leading guard21.13.971.20.20.4340.3690.7831.9
CJ McCollum6Shooter21430.80.40.4590.3750.82825.5
Nikola Vucevic5Star big20.8123.811.10.5180.3640.78928
Buddy Hield6Shooter20.752.
Nikola Jokic5Star big20.
Tobias Harris2Perimiter scorer207.
Lou Williams8Team-leading guard2035.
Danilo Gallinari6Shooter19.
John Collins11Inside big19.59.820.40.60.560.3480.76323.7
Trae Young8Team-leading guard19.
Jimmy Butler8Team-leading guard18.75.341.90.60.4620.3470.85522.3
Kyle Kuzma2Perimiter scorer18.
Lauri Markkanen2Perimiter scorer18.791.
Brandon Ingram2Perimiter scorer18.
Khris Middleton6Shooter18.364.310.10.4410.3780.83725.1
Jamal Murray6Shooter18.
Tim Hardaway6Shooter18.
J.J. Redick6Shooter18.
Andrew Wiggins2Perimiter scorer18.14.82.510.70.4120.3390.69924.4
Bojan Bogdanovic6Shooter184.120.900.4970.4250.80722.4
Derrick Rose6Shooter182.
Andre Drummond5Star big17.315.
De'Aaron Fox8Team-leading guard17.
Pascal Siakam9Do it all big16.
Ben Simmons5Star big16.
Jordan Clarkson6Shooter16.
Spencer Dinwiddie6Shooter16.
Collin Sexton4???16.72.930.50.10.430.4020.83925.2
Clint Capela3Rim runner16.612.
Montrezl Harrell11Inside big16.66.520.91.30.6150.1760.64323.5
Josh Richardson6Shooter16.
Harrison Barnes6Shooter16.
Deandre Ayton11Inside big16.310.
Eric Gordon6Shooter16.
Aaron Gordon2Perimiter scorer167.
Eric Bledsoe10Floor general15.
Rudy Gobert5Star big15.912.920.82.30.66900.63617.8
Jayson Tatum2Perimiter scorer15.762.
Malcolm Brogdon6Shooter15.
Jusuf Nurkic11Inside big15.610.
Chris Paul10Floor general15.
Dennis Schroder4???
Reggie Jackson6Shooter15.
Jeremy Lamb6Shooter15.
Kelly Oubre2Perimiter scorer15.
Evan Fournier6Shooter15.
Terrence Ross6Shooter15.
Serge Ibaka9Do it all big158.
Dwyane Wade2Perimiter scorer1544.
Marvin Bagley2Perimiter scorer14.97.610.510.5040.3130.69124.2
Emmanuel Mudiay4???
Jabari Parker2Perimiter scorer14.
Kyle Lowry10Floor general14.
Bobby Portis2Perimiter scorer14.
Bogdan Bogdanovic6Shooter14.13.53.810.20.4180.360.82722.3
Domantas Sabonis11Inside big14.
Steven Adams3Rim runner13.
Marcus Morris2Perimiter scorer13.
Otto Porter2Perimiter scorer13.
Jaren Jackson9Do it all big13.
Rudy Gay2Perimiter scorer13.
Joe Harris6Shooter13.
Enes Kanter11Inside big13.
Marc Gasol9Do it all big13.
Jerami Grant9Do it all big13.
Al Horford9Do it all big13.
Dennis Smith4???
Taurean Waller-Prince13&D guard13.
Myles Turner9Do it all big13.
Jaylen Brown2Perimiter scorer134.
Cedi Osman4???
Gary Harris4???
Kevin Knox4???
Ricky Rubio10Floor general12.
Paul Millsap9Do it all big12.
Justise Winslow10Floor general12.
Thaddeus Young9Do it all big12.
Trevor Ariza4???
Brook Lopez9Do it all big12.
Jeff Green2Perimiter scorer12.341.
Hassan Whiteside3Rim runner12.311.
Wesley Matthews13&D guard12.
Joe Ingles10Floor general12.
JaVale McGee3Rim runner127.50.70.620.6240.0830.63220.2
Willie Cauley-Stein9Do it all big11.
Jae Crowder2Perimiter scorer11.
E'Twaun Moore13&D guard11.
Derrick Favors3Rim runner11.
Bryn Forbes13&D guard11.
D.J. Augustin6Shooter11.
Kent Bazemore4???
Gordon Hayward2Perimiter scorer11.
Josh Jackson4???
Kentavious Caldwell-Pope13&D guard11.
Malik Beasley13&D guard11.
Reggie Bullock13&D guard11.32.720.60.20.4120.3770.85915.8
Darren Collison10Floor general11.
Rodney Hood13&D guard11.
DeMarre Carroll2Perimiter scorer11.
Alex Len2Perimiter scorer11.
DeAndre Jordan3Rim runner1113.
Fred VanVleet6Shooter112.
Jarrett Allen3Rim runner10.
Trey Burke4???
Tyler Johnson4???10.932.
Allonzo Trier4???
Dewayne Dedmon9Do it all big10.
Taj Gibson9Do it all big10.
Shai Gilgeous-Alexander10Floor general10.
Damyean Dotson4???
Dwight Powell3Rim runner10.
Dario Saric2Perimiter scorer10.
Marco Belinelli13&D guard10.
Thomas Bryant3Rim runner10.
Justin Holiday4???
Monte Morris13&D guard10.
Wayne Ellington13&D guard10.321.410.10.4030.3710.79617.2
Danny Green13&D guard10.341.
Tyreke Evans4???
Marvin Williams123&D forward10.
Kelly Olynyk123&D forward104.
Avery Bradley4???
Patty Mills13&D guard9.
Derrick White10Floor general9.93.73.910.70.4790.3380.77217.7
Kevin Huerter4???
Luke Kennard13&D guard9.
Nemanja Bjelica123&D forward9.
Jonathan Isaac123&D forward9.
Jeremy Lin4???
Robin Lopez123&D forward9.
Tony Parker4???
Al-Farouq Aminu123&D forward9.
JaMychal Green123&D forward9.
Markieff Morris123&D forward9.
Larry Nance9Do it all big9.
Nicolas Batum123&D forward9.
Jalen Brunson4???
Gerald Green13&D guard9.
Landry Shamet13&D guard9.
Terry Rozier4???
Bam Adebayo3Rim runner8.
Rondae Hollis-Jefferson4???
Malik Monk4???
Tomas Satoransky10Floor general8.93.5510.20.4850.3950.81914.1
Marcus Smart10Floor general8.92.941.80.40.4220.3640.80614.6
Ish Smith4???
Ivica Zubac3Rim runner8.
Alec Burks4???8.83.720.60.30.4050.3630.82319
Mario Hezonja4???
Delon Wright10Floor general8.
Kyle Korver13&D guard8.
Norman Powell13&D guard8.
Rodions Kurucs123&D forward8.
Trey Lyles4???
DeAndre' Bembry4???
Langston Galloway13&D guard8.
Noah Vonleh123&D forward8.
Mikal Bridges123&D forward8.
Richaun Holmes3Rim runner8.
Darius Miller13&D guard8.
Frank Jackson4???
Austin Rivers4???
Davis Bertans13&D guard83.
Jamal Crawford4???
Seth Curry13&D guard7.
Josh Hart123&D forward7.83.71.410.60.4070.3360.68813.5
James Johnson4???
Mason Plumlee3Rim runner7.86.430.80.90.5930.20.56116.2
Garrett Temple4???
Ante Zizic123&D forward7.
Maurice Harkless123&D forward7.
Josh Okogie4???
Patrick Beverley10Floor general7.653.
George Hill13&D guard7.
Jake Layman123&D forward7.
Rodney McGruder4???
Miles Bridges123&D forward7.541.
Dorian Finney-Smith123&D forward7.
Shelvin Mack4???
Iman Shumpert4???7.531.810.40.3740.3480.814.5
Vince Carter13&D guard7.
Draymond Green10Floor general7.
Doug McDermott13&D guard7.
Mitchell Robinson3Rim runner7.
P.J. Tucker123&D forward7.
Justin Jackson13&D guard7.
Lance Stephenson4???
OG Anunoby123&D forward72.
Derrick Jones123&D forward740.
Mike Muscala123&D forward73.
Pat Connaughton123&D forward6.
Quinn Cook13&D guard6.
Terrance Ferguson13&D guard6.91.910.50.20.4290.3660.72510.6
Stanley Johnson4???
Tyus Jones4???6.924.
Wayne Selden4???
Ersan Ilyasova123&D forward6.
Maxi Kleber123&D forward6.84.610.51.10.4530.3530.78413.5
Evan Turner4???
Ryan Arcidiacono13&D guard6.
James Ennis123&D forward6.
Michael Kidd-Gilchrist123&D forward6.73.810.50.60.4760.340.77215.7
Zach Collins123&D forward6.
Shaquille Harrison4???6.531.
Cory Joseph4???
Jonathon Simmons4???
Sterling Brown123&D forward6.
Gorgui Dieng123&D forward6.
T.J. McConnell4???
Devin Harris4???
Jonas Jerebko123&D forward6.
Kevon Looney3Rim runner6.
Wilson Chandler123&D forward64.
Tony Snell13&D guard62.
Yogi Ferrell13&D guard5.
Nik Stauskas13&D guard5.
Ed Davis3Rim runner5.
Juan Hernangomez123&D forward5.
Mike Scott13&D guard5.
Torrey Craig123&D forward5.73.510.50.60.4420.3240.712.4
Andre Iguodala123&D forward5.
Jakob Poeltl3Rim runner5.
Tim Frazier4???
Royce O'Neale123&D forward5.
Wesley Iwundu4???
Anthony Tolliver13&D guard52.
Jared Dudley123&D forward4.
Nerlens Noel3Rim runner4.
Alfonzo McKinnie123&D forward4.
Bruce Brown4???


Many additional subsets can be made from these 12 clusters and many players fit multiple of these roles. For example, Giannis fits into the star big category because of his high rebound total, but he is also a star ball-handler. A versatile player like Draymond Green or Marcus Smart will fit many of the roles.

Though the roles aren’t perfect, they embody the general 12 roles an NBA player will serve. These roles give much more information on a player’s purpose to a team than naming them by the typical five positions.

Share this post

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.