Rating Players Basic theory
#61
Posted 2010-October-29, 17:53
#62
Posted 2011-March-12, 13:55
If my memory is working, this is how it worked 25 years ago:
1) Ratings run from a low of about 1200 (beginning players) to roughly 2800 (National Masters). Above that, a player's FIDE (International) rating is more important, although it is calculated in a similar fashion.
2) Each recorded game contributes to a player's rating, except as noted below. Typically, a tournament player will play 3 or 4 games per day in a local tournament.
3) A player's rating increases by 16 points for each win, and decreases by 16 points for each loss, plus or minus 4% of the difference in the players ratings, up to a difference of 400 points. So, if you beat a player who is 400 points lower in the rankings than you are, the effect on your rating is (16 - (.04 * 400)) = 0. Similarly, losing to a player who is ranked 400 points higher than you, you don't lose any rating points. If you beat a player who is ranked 200 points higher, the effect is (16 + (.04 * 200)) = 24;
4) As a corollary to (3), players never lose rating points as a result of winning a game, and never gain points by losing a game. A player can never lose more than 32 points or gain more than 32 points as a result of a single game.
5) Ratings are provisional until a player has some number of rated games - 24 I think.
There are some grumblings about the chess rating system (of course, since there are grumblings about everything), but it has worked well. Bridge players who want a rating system should take a look at it.
#63
Posted 2011-March-12, 14:07
matmat, on 2010-July-26, 02:26, said:
Excluding pairs that play and practice together (THE VAST VAST VAST VAST VAST VAST VAST VAST (that's a lot of VASTs) majority of who DO NOT CHEAT) is ridiculous. it randomizes the field, causes bad and high variance bridge and rewards good luck rather than good actions.
How do you know that the (VAST *8) majority of established partnerships do not cheat?
I wouldn't say that they DO cheat, because I have no evidence. Apparently you have some evidence that they don't cheat. What is it?
#65
Posted 2011-March-12, 21:36
dlbalt, on 2011-March-12, 13:55, said:
I don't care much for the chess rating system. Last time i played, my p led the pawn of black, later crossed the rook with the king and sacrificed the queen to the opponents' knight. Before I could do anything my p already blew the board and my rating went down... and the other pair were NOOOOBS.
#66
Posted 2011-March-12, 21:38
dlbalt, on 2011-March-12, 14:07, said:
I wouldn't say that they DO cheat, because I have no evidence. Apparently you have some evidence that they don't cheat. What is it?
I am not sure how you get through life thinking that unless you have some evidence to the contrary people are doing you wrong?
Or perhaps all the partnerships you have been involved in have cheated?
#67
Posted 2011-March-14, 11:11
How good is LeBron? Really, really good, we all know that. But how good is LeBron when paired with a high school centre? Or with Random Ball Hog (I'm sure we all can think of at least one example). How good is LeBron, against competition at his level, playing pickup street against a professional team allowed to practise together? And how does that affect his rankings?
I'm a decent flight A player in the fields I play in. Not great, decent. With a team of people at my level, we were one exhausted cardplay mistake away from qualifying for day 2 of the Sat-Sun Swiss in Reno last year. But I was playing with my regular partner, my teammates played together frequently; if I had to play pickup - even if my partner was measurably "better" than the one I had - we wouldn't have done as well, pretty much guaranteed. So, what does that do to rating?
The other thing that ratings-that-can-drop do is to make people not *want* to play with new people, or lesser players, because it might affect their rating in ways they can't help. So, welcome to even *more* cliques and stratifying of the game. I haven't *proven*, but it's pretty obvious, that there is no way to set a single-person rating that can't be obviously gamed (either way, really - if I want a *bad* rating, so that I can get into an event where I can clean up, I can arrange that, without "dumping").
Really, the only sane thing one can measure is *partnership* ratings; and that's going to leave many players, who don't just don't play "regular" partnerships, out in the cold. It certainly won't help the reason "everybody" wants a rating system - "how good is he? Am I willing to play with him?" And even those would probably have to have separate "MPs/IMPs/BAM" ratings, the same way FIDE has separate regular/speed ratings - it's a different game.
Attendance points suck for rating. But really, very few things are *unambiguously* better - just less worse.
#68
Posted 2011-April-26, 15:22
dlbalt, on 2011-March-12, 13:55, said:
I think that developing a rating system for bridge requires considerable research. Apologies for the technical language that follows, but you cannot really discuss the technical merits of a rating system without some mathematical language. I don't have anything to say about social and business questions such as whether BBO should have a rating system, whether some games should be rated and others unrated and so on.
The post above game the numerical calculations involved in running a rating system, but omitted the heart of the system -- that mathematical model that leads to these calculation. The basic model for chess, proposed by Arpad Elo, is the following:
1. We assume that every player has a current "average ability level" (this is the number that the rating system will be trying to measure). In any individual game, the player will play at an ability level which is randomly distributed around his "average" level, and the winner of the game is the player who happened to display a higher level during that game.
2. The key assumption (to be discussed below): the distribution of abilities around the average is normal (Gaussian), with a standard deviation of 200 rating points.
3. Say we have two players with true abilities r_1, r_2. We can calculate the probability that player 1 will beat player 2 (that's the probability that a random sample from N(r_1,200) is larger than a random sample from N(r_2,200)). This is the "expected result of the game". Now assume that r_1 and r_2 were only the ratings associated to the players instead of their true abilities. We can still compare the calculated probability to the actual result of the game (player 1 can score 0,0.5, or 1 point). If player 1 did better than expected, we infer that we underestimated r_1 and overestimated r_2, and make a small adjustment to the ratings accordingly (dlbalt's post above explains how this is done). Conversely if player 1 did worse than expected.
4. Mathematical fact: If assumption 2 is valid, then then o a long series of games the ratings of all players will converge to their true ability levels.
The main weakness of this model is the fixed standard deviation of 200 points. If all players of a given level had the same standard deviation, this would simply serve to fix the value of "1 rating point". In practice they don't quite. Another weakness is that an input to the rating update process is a measure of the "rating uncertainty" (if we are very confident that r_1 is close to player 1's ability, we want to only make a small change after the game; if we are unsure we want to make a large change), which affects the speed of convergence of the ratings to the true abilities. The original Elo system had a fixed value (later versions give players a higher uncertainty during their first games). Glicko's system improves on this in two ways. First, he modifies assumption 2 by assigning a different variance to each player (which the system will also try to measure). Second, we improve step 3 by assigning to every player a "ratings deviation" which more directly measures our uncertainty about his rating. The first change is fundamental (makes the model closer to reality) while the second is designed to improve the convergence properties).
How does one test the system? Take a set of players and a large collection of games. Use the first part of the sample to calculate ratings as described above, using enough games for the ratings to converge. Then examine the remaining games to see whether the statistical model really predicts the results (i.e. whether the probability of winning is really given by the larger-of-two-Gaussian-samples model, and whether our model for the variance [either 200 points or per-player] agrees with reality). In chess this works to some extent -- see for example the papers by Mark Glickman.
What about adapting this to bridge? Here are some thoughts:
1. Formally, one could simply look at each hand played separately (say looking at the points margin). Given enough time (hands), the ratings may converge. I think the general perception is that convergence here would be too slow, and that it's better to compare what player did with the same cards rather than compare NS to EW directly. Also probably different models are needed for team games (only one other table) and duplicate games (many other tables).
2. On the other hand, bridge has more detailed scoring information than just "win/draw/lose", which should help. It is not a-priori clear which component of the score best predicts future success.
This much is for rating pairs. But what we really want is rating individuals, and this raises the last question:
3. Can players be assigned individual ratings which effectively predict the performance of pairs? The naive answer is that established partnerships perform much better than pick-up partnerships with the same ability, but I don't know of research into how large this effect actually is and (more importantly) to what extent it limits the accuracy of individual ratings in predicting results. To get a genuine answer we have to create a statistical rating model and then test it against actual data.
Side thought: in trying to extract individual ability from pair data, it would be helpful to incorporate the identity of the declarer into the model (that is, probably the ability of declarer affects the result of the hand more than the ability of dummy), but since the identity of declarer is not simply a function of the cards I don't see an obvious way to do this.
In summary: before we discuss whether BBO should have a rating system, it may be worth doing some statistical research and actually <i>validate</i> a rating system for bridge.
#69
Posted 2011-April-26, 17:24
slior, on 2011-April-26, 15:22, said:
Developing an accurate rating system for a given population of players isn't particularly hard.
Convincing half the players that they are below average is much more daunting, as is the mind numbing tedium trying to convince numerically illiterate yahoos why this algorithm says that you suck...
Figure out how to deal with this and I'll invest the time/effort to develop an accurate rating system
#71
Posted 2011-April-27, 01:21
#72
Posted 2011-May-02, 11:46
Quote
So.... convincing.
-- Bertrand Russell
#73
Posted 2011-May-02, 12:25
mgoetze, on 2011-May-02, 11:46, said:
For what its worth, I had the chance to discuss this topic (bridge ratings) with Glickman after a talk he gave a few monthes back.
I posited (and Glickman concurred) that the best way to approach the problem was to focus on developing an accurate rating system for pairs.
Once you have an accurate system for rating pairs, you can then try to decompose accurate ratings for individuals out of a sets of pair ratings.
It's entirely possible that Glickman doesn't believe any such thing and thought that agreeing with me was the best way to get me to go away.
#74
Posted 2011-May-02, 18:50
hrothgar, on 2011-May-02, 12:25, said:
I posited (and Glickman concurred) that the best way to approach the problem was to focus on developing an accurate rating system for pairs.
Once you have an accurate system for rating pairs, you can then try to decompose accurate ratings for individuals out of a sets of pair ratings.
It's entirely possible that Glickman doesn't believe any such thing and thought that agreeing with me was the best way to get me to go away.
one of my partner always says this about rating pairs
but says also that no bridge organization will consider such a thing for pecuniary reasons
#75
Posted 2011-October-23, 18:51
http://www.eloratings.net/system.html
and "about ratings" gives details. The important thing is to have a high weight constant in the early stages reducing as the players get more experience. A percentage in any event can be converted into a rating difference, and compared with the player's expected result.
#76
Posted 2012-April-26, 11:59
Jlall, on 2009-November-20, 15:58, said:
Of course BBO is now much bigger than OKB ever was, and OKB charged membership fee which kinda biases the results of how many play rated vs unrated a lot (if you're willing to pay 100 bucks a year, you'll probably want to play rated), so maybe this wouldn't be the case on BBO.
♥ ♥ ♥
655321 ???
Really.
Take care
I will not follow...
♥ ♥ ♥
#77
Posted 2012-April-27, 22:40
#78
Posted 2012-April-28, 01:08
On the Hand Records screen, the following options appear
Interval to retrieve: days / weeks / months. Choosing months only allows the last month. Add another one last 6 months.
Similarly for the Show summaries every option, add every 6 months.
Now get the programmers to extract the 6 month summaries into a sub-file and sort them from the highest average to the lowest average. The top X% get automatically graded expert, the next Y% get automatically graded advanced, the next Z% get graded intermediate etc.
Remove the self rating option altogether from each players profile. Instead replace it programmatically with the rating as calculated above. One can decide upon the frequency upon which the rating gets recalculated / replaced in each players BBO profile e.g. daily / weekly / monthly. My guess is weekly should be fine (24h00 on Sundays USA (BBO headquarters) time.
The only way you can progress from say, advanced to expert is to up your game. You can try and bullshit the system by playing with a lot of weaker players to get a higher average. However, as soon as you start playing against real experts you will be exposed, your average will plummet and you will drop back into a lower category where you probably belong anyway.
Other things to consider: a) New players may need to be excluded from the calculation until they have played enough hands; b) similarly for players who havent played in a long time (can consider maintaining their last average).
#80
Posted 2012-May-09, 19:49
(1) Suppose two very unequal partners pair up. Assume ratings something like ACBL masterpoints: Alice with 1200 and Bob with 50 partner up against Charlie and Doug with 500 each. If Alice and Bob win, does it mean that only Bob gains rating points since Alice was better than their opponents? Or do we count them both as their average of 625 points, so that neither gains anything?
(2) How to deal with a pair that used to be much better than they are now. I like the way WBF masterpoints decay over time; something like that might be called for.
(3) How to deal with players who avoid joining the league so that their points aren't counted. I have run up against some very good players in this category at clubs. In the US you could do the same thing by joining one of ACBL/ABA and playing at the other.
(4) For that matter, a person could have multiple 'nyms on BBO. I'm sure this is against the rules but I'm not at all sure it can be caught. Even if there are 5 BBO login names on the same PC, maybe they're a family or housemates.
My feeling is that rating systems are a good idea for clubs (though (1) through (3), at least, need to be dealt with) but online games should not award points of any kind, unless they're the online forum's own private points, because it's impossible to police adequately. Only noticing cheaters if they consistently get "too good" results will catch only the stupidly greedy.