Thursday, December 31, 2020

Correction and Feedback on a Statistical Method to Determine a Team of the "Decade"

This is a follow up to this thread, which generated some very interesting discussion. Usually, I like to leave work as it is, but a few good points were raised, and in discussion with a poster on something unrelated, I discovered an error in my calculation. This thread seeks to correct this error, and take into account some suggestions that I personally felt were quite valid.

The first, funnily enough came from me about 4 seconds after I posted the thread. The use of 0.6×WPM to estimate the standard deviation for each player felt sketchy when I wrote it, and I regretted this choice immediately after, simply because I didn't feel it was well justified. So, I figured I'd do a slightly longer justification of this, and see if we can get a better estimate of it. To achieve this, I'll be using the 15 bowlers with 50+ tests bowling in the first 4 roles, though looking at their overall statistics. I've also made a minor adjustment to the requirements here of bowlers needing to bowl at least 20 overs per match, which changes virtually nothing, but I already did a day or so ago, and figured I'll leave in. In any case, the results are like the more general tests I did a while ago, though 0.60 does seem a bit of a low 'rule of thumb' estimate, with the average being 0.6410, and the standard deviation in this being 0.0802. Due to this, I figured rather than going through and redoing my code entirely, etc, I'd just up the rule of thumb from 0.6000 to 0.7000. This is a dirty, dirty way of doing it, but the method was already dirty as hell.

The second concern raised was around the nature of this being a team of the decade. The question was raised as to whether or not someone who has only been playing a handful of years this decade should even be considered. On this basis, on the corrected teams, I'll disqualify anyone who has not been active for at least 5 years of the decade. This will be defined by the year of their first and last tests that decade. So someone who played in 2011 and 2015 would be eligible, even if they didn't play in 2012-14.

The third was around selection criteria. Selecting players an allrounder before a specialist does indeed make no sense, and I'm glad a couple of people raised this as a concern. The reason I agree with them here can be seen in the actual team as picked. Picking Ashwin as an allrounder first meant that a weaker batter was in the side, and an even weaker one was picked as the spinner in Herath. If we went the other way, it would be Ashwin as the spinner, and Jadeja as the allrounder. This makes sense, as ideally you pick your best bowlers as... well, the bowlers, and then others come into the frame due to their other qualities, rather than the other way around. This means that the order of picking players will go:

  1. Wicket Keepers
  2. Bowlers
  3. Openers
  4. Number 3
  5. Top Order Batsmen
  6. All Rounders

ie, start from the hard to fill roles. There is certainly more discussion that can be had around this.

Finally, the big one, and the reason I made this post was an error with how I propagated uncertainty. The calculations with a geometric mean should propagate uncertainty with a factor of a half, but... well, when I wrote the code I just forgot to put it in there. This means that in all calculations which involved one of these geometric means, the uncertainty was doubled. This happened for wicket keepers, bowlers and allrounders, twice over for all rounders even. This does indeed have an impact. Whoopsies.

While batters are completely unaffected by this error, I'll post their data anyhow, just as a chance to discuss if we need to for eligibility. Eligibility will only be discussed for players who would be picked if not for it.

Anyhow, let's get picking, this time in selection order.

Wicket Keeper

Player Mat Runs Ave Dis D/M Rating B-Rating
AB de Villiers (SA) 21 1955 63.06 83 3.952 15.79 13.33
Q de Kock (SA) 46 2902 40.31 206 4.478 13.43 12.80
BJ Watling (NZ) 64 3374 40.17 249 3.891 12.50 12.18
RR Pant (INDIA) 14 843 38.32 65 4.643 13.34 12.00
JM Bairstow (ENG) 48 3028 37.85 181 3.771 11.95 11.70
TD Paine (AUS) 29 1130 31.39 134 4.621 12.04 11.60
MJ Prior (ENG) 40 2069 39.04 142 3.550 11.77 11.50
Sarfaraz Ahmed (PAK) 48 2651 37.34 163 3.396 11.26 11.14
LD Chandimal (SL) 24 1602 41.08 72 3.000 11.10 10.94
MS Dhoni (INDIA) 37 1951 34.84 126 3.405 10.89 10.82

So, with the correction AB is clearly favoured, largely on his batting.

Spinner

Player Mat W WPM Ave Rating B-Rating
R Ashwin (INDIA) 73 375 5.137 25.22 0.4513 0.4417
Saeed Ajmal (PAK) 26 145 5.577 25.46 0.4680 0.4401
HMRKB Herath (SL) 69 355 5.145 26.30 0.4423 0.4333
PP Ojha (INDIA) 13 71 5.462 24.27 0.4744 0.4247
RA Jadeja (INDIA) 50 216 4.320 24.49 0.4200 0.4108
Yasir Shah (PAK) 43 227 5.279 30.85 0.4136 0.4045
S Shillingford (WI) 11 56 5.091 29.00 0.4190 0.3895
Abdur Rehman (PAK) 18 79 4.389 26.85 0.4043 0.3874
NM Lyon (AUS) 98 394 4.020 31.64 0.3565 0.3554
Shakib Al Hasan (BDESH) 35 135 3.857 30.57 0.3552 0.3524

Ashwin, of course, wins. Ajmal was damn close, but would not have been eligible anyhow, as he was only playing for a 4 year stretch in the decade, 2011-2014.

Seamers

Player Mat W WPM Ave Rating B-Rating
PJ Cummins (AUS) 32 153 4.781 21.52 0.4714 0.4460
JJ Bumrah (INDIA) 16 76 4.750 20.68 0.4792 0.4327
K Rabada (SA) 43 197 4.581 22.96 0.4467 0.4317
DW Steyn (SA) 48 207 4.313 22.56 0.4373 0.4250
D Olivier (SA) 10 48 4.800 19.25 0.4994 0.4249
RJ Harris (AUS) 22 93 4.227 23.33 0.4256 0.4053
BW Hilfenhaus (AUS) 11 47 4.273 22.06 0.4401 0.3999
JM Anderson (ENG) 100 395 3.950 24.33 0.4029 0.3992
N Wagner (NZ) 51 219 4.294 26.33 0.4039 0.3969
MA Starc (AUS) 59 252 4.271 26.75 0.3996 0.3939

Bumrah's impressive record really stands out, here, but ultimately he is not eligible due to only playing since 2018, so a 3 year stretch. This means an unchanged trio of Cummins, Rabada and Steyn are picked. Cummins is eligible as he debuted in 2011, and has played in 5 separate years anyhow (2011, 2017, 2018, 2019 and 2020). Bumrah's efforts deserve a mention though. He's yet to even play a home Test, yet boasts such a record, which is truly ridiculous. Not to get too off on a tangent here, but look at the best away records (min 50 wickets):

Player Mat W WPM Ave Rat
WJ O'Reilly (AUS) 15 85 5.667 21.19 0.5171
JA Snow (ENG) 12 62 5.167 20.92 0.4970
Sir RJ Hadlee (NZ) 43 230 5.349 21.72 0.4962
PM Pollock (SA) 11 60 5.455 22.23 0.4953
J Garner (WI) 29 136 4.690 19.74 0.4874
PJ Cummins (AUS) 14 72 5.143 21.86 0.4850
JJ Bumrah (INDIA) 16 76 4.750 20.68 0.4792
FH Tyson (ENG) 13 56 4.308 18.96 0.4766
GD McGrath (AUS) 58 274 4.724 20.81 0.4764
Mohammad Asif (PAK) 19 96 5.053 22.50 0.4739

That is some high company, and honestly, it's hard to not get excited about a potential addition to the pantheon of all time greats. Not for this "decade" though. Moving along:

Openers

Player Mat Inns Runs Ave B-Ave
DA Warner (AUS) 84 152 7205 49.69 47.60
AN Cook (ENG) 97 176 7482 44.54 43.61
Azhar Ali (PAK) 20 37 1556 45.76 42.03
CJL Rogers (AUS) 24 46 1996 44.36 41.81
TWM Latham (NZ) 54 94 3867 42.97 41.78
MA Agarwal (INDIA) 13 21 1005 47.86 41.67
CH Gayle (WI) 12 23 841 46.72 40.89
GC Smith (SA) 27 48 1843 41.89 40.25
D Elgar (SA) 56 100 3757 40.40 39.77
S Dhawan (INDIA) 34 58 2315 40.61 39.63

No eligibility concerns, it's Warner and Cook still. I did read some crying about Warner's away record. You can read my thoughts on that here.

Number 3

Player Match Inns Runs Ave B-Ave
KC Sangakkara (SL) 39 71 4068 61.64 51.84
KS Williamson (NZ) 72 124 6283 56.10 51.44
SPD Smith (AUS) 17 29 1744 67.08 46.99
CA Pujara (INDIA) 72 115 5314 48.31 46.06
HM Amla (SA) 61 100 4503 48.42 45.82
M Labuschagne (AUS) 10 17 1203 70.76 44.56
Azhar Ali (PAK) 56 95 4000 43.96 42.54
GS Ballance (ENG) 16 29 1254 46.44 41.80
IR Bell (ENG) 11 15 742 53.00 41.65
R Dravid (INDIA) 13 24 943 42.86 39.83

I got a number of posts question why Sangakkara would bat three, and not play as keeper. As noted, he didn't keep this decade, and wasn't any better than someone like de Kock with the bat when he did the previous decade, and he was a monster at 3, as shown above. He did indeed play in 5 years of the decade, from 2011-2015, so he is eligible. Sangakkara it is.

Other Top Order Batsmen

Player Mat Inns Runs Average B-Ave
SPD Smith (AUS) 71 127 7050 64.09 56.00
KC Sangakkara (SL) 40 77 4156 57.72 50.52
V Kohli (INDIA) 87 147 7318 53.42 50.24
KS Williamson (NZ) 79 138 6665 53.32 49.93
S Chanderpaul (WI) 35 61 2804 60.96 49.40
Younis Khan (PAK) 53 97 4659 54.17 49.32
AB de Villiers (SA) 49 80 4063 54.17 48.83
MJ Clarke (AUS) 47 86 3946 51.92 47.57
DA Warner (AUS) 84 155 7244 48.95 47.05
Misbah-ul-Haq (PAK) 54 95 3994 49.93 46.48

It's Smith and Kohli, as Sangakkara is at 3, just as before.

All Rounders

Player Mat Bat-Ave WPM Bowl-Ave Rating AllRnd B-AllRnd
Shakib Al Hasan (BDESH) 35 44.72 4 30.57 0.3552 3.985 3.607
RA Jadeja (INDIA) 50 35.67 4 24.49 0.4200 3.871 3.535
R Ashwin (INDIA) 73 27.48 5 25.22 0.4513 3.521 3.376
BA Stokes (ENG) 67 37.85 2 31.41 0.2740 3.220 3.143
JO Holder (WI) 45 32.05 3 27.95 0.3037 3.120 3.021
VD Philander (SA) 64 24.04 4 22.32 0.3960 3.085 3.008
MA Starc (AUS) 59 22.16 4 26.75 0.3996 2.976 2.914
C de Grandhomme (NZ) 24 37.03 2 31.64 0.2488 3.035 2.890
MG Johnson (AUS) 32 22.47 4 27.07 0.3963 2.984 2.887
CR Woakes (ENG) 38 27.52 3 29.30 0.3171 2.954 2.872

The correction really helps Shakib, whose only issue was uncertainty around his performances. This is entirely my mistake, and he makes the list handily in the end.

Final Corrected XI:

Position Player Bat Ave DPM WPM Bowl Ave
1 Warner 49.69 NA NA NA
2 Cook 44.54 NA NA NA
3 Sangakkara 61.64 NA NA NA
4 Smith 64.09 NA 0.197 57.64
5 de Villiers† 63.06 3.952 NA NA
6 Kohli* 53.42 NA 0.000 NA
7 Shakib 44.72 NA 3.857 30.57
8 Ashwin 27.48 NA 5.137 25.22
9 Cummins 16.54 NA 4.781 21.52
10 Steyn 13.53 NA 4.313 22.56
11 Rabada 11.43 NA 4.581 22.96

So yeah, that's not far off what a lot of people suggested in the comments. The idea of Kohli coming in at 6 after the keeper tickles me slightly, but I did just follow the method. Shakib in the side looks damn good too, and the issues surrounding a long tail have been well and truly dealt with.

submitted by /u/Anothergen
[link] [comments]

from Cricket https://ift.tt/2JCsErW
via IFTTT

No comments:

Post a Comment