http://homes.cs.wash...es/InfoGain.pdf
Anyone here familiar with decision trees (say in computer programming)? I don't quite get the idea of "entropy", especially the math, but it seems like this and other terminology (such as "information gain") could be useful in describing system design choices.
Page 1 of 1
Decision Trees
#2
Posted 2016-June-21, 10:26
Yes, some of us do work with these topics on a regular basis .
The simplest explanation for entropy in the CS context is that it's a measure of randomness in a system. Regarding decision trees, a simple explanation is to think in terms of a bridge relay system, i.e., there's some amount of branching involved depending on the hand that's held, and each given hand has a well defined path through the tree (depending on the HCPs and shape).
Matt Ginsberg (GIB fame) did some related research on computer bidding systems more than a decade ago, and exchanged emails with interested folks on the now-defunct exotic bidding mailing systems list, but it never went anywhere.
The simplest explanation for entropy in the CS context is that it's a measure of randomness in a system. Regarding decision trees, a simple explanation is to think in terms of a bridge relay system, i.e., there's some amount of branching involved depending on the hand that's held, and each given hand has a well defined path through the tree (depending on the HCPs and shape).
Matt Ginsberg (GIB fame) did some related research on computer bidding systems more than a decade ago, and exchanged emails with interested folks on the now-defunct exotic bidding mailing systems list, but it never went anywhere.
#3
Posted 2016-June-21, 10:33
Thanks. So can you give an example of a situation that would be high entropy vs low entropy?
I'm thinking that decision trees could be viewed in terms of several different outputs. For example, the output could be the final contract....3N vs 4H vs 6D. Or the output could be one's hand pattern and high cards.
I'm thinking that decision trees could be viewed in terms of several different outputs. For example, the output could be the final contract....3N vs 4H vs 6D. Or the output could be one's hand pattern and high cards.
#4
Posted 2016-June-21, 11:40
straube, on 2016-June-21, 10:33, said:
So can you give an example of a situation that would be high entropy vs low entropy?
Finding a good source with high entropy is a really hard problem, but it's easier to think in terms of lower vs. higher entropy. A pack of well shuffled cards has higher entropy (more randomness) than the same set of cards that's fresh out of the pack (and presumably sorted).
Quote
I'm thinking that decision trees could be viewed in terms of several different outputs. For example, the output could be the final contract....3N vs 4H vs 6D. Or the output could be one's hand pattern and high cards.
Computing optimal scores is typically done using a payoff matrix that compares the results from the various possible outcomes (assuming DD defence and play). There are several programs that already compute it from hand records (BridgeCaptain, etc.). The score can be one of the parameters used in the decision tree used by a robot to determine whether a sacrifice is profitable.
#5
Posted 2016-June-21, 19:56
Entropy of different ways of partitioning sets is a common notion in many fields - from computer science to, say, calculating the diversity of a biological system, based on the abundances of several species. In that context, entropy usually is defined as Sum[ p_i * Log[p_i]] or something proportional to it.
In the bidding context, you can imagine a system where you always pass as dealer, with all 52C13 ~ 635 billion hands you might be dealt. That's the highest-entropy opening strategy you could have. With probability 1 you make a call with e^27.177 or 2^39.208 possible meanings; to exactly specify what hand you hold, you need to pass partner at least another 39.2 bits of information.
You can also imagine systems where you either pass or open 1C as dealer, nothing else. Opening 1C if you hold exactly AK32 K64 Q7654 8, and passing on the other 635,013,559,599 hands conveys just a tiny bit more information to your partner than always passing does. On average, you now need to convey only an additional 39.20799644213 bits rather than 39.20799644219 bits to exactly specify your hand.
On the other hand, you could, say, pass with all 0-10 HCP hands and open 1C with all 11+ HCP hands. 56.2% of the time you'll hold one of ~357 billion weak hands, and 43.8% of the time you'll hold one of ~278 billion strong hands. On average you now need to convey only 38.219 additional bits of information to exactly specify your hand.
If you found a way to open exactly 50% of your hands and pass exactly 50% of your hands, you would pass 1 full bit of information and only have 38.207 to go.
If your goal was to convey maximum information at your first call, you'd split hands into 36 equal classes for each action from Pass through 7NT -- well, 16 classes of 17,639,265,545 hands and 20 classes of 17,639,265,544 -- and need only 34.038 more bits to exactly specify your hand.
Of course in the real world you run into some other issues: opening 7NT that often is going to result in a lot of sets; and lower opening bids have more future chances to convey information than higher opening bids; so we design bidding systems so that the lower openings are much more frequent than 7NT openings are.
In the bidding context, you can imagine a system where you always pass as dealer, with all 52C13 ~ 635 billion hands you might be dealt. That's the highest-entropy opening strategy you could have. With probability 1 you make a call with e^27.177 or 2^39.208 possible meanings; to exactly specify what hand you hold, you need to pass partner at least another 39.2 bits of information.
You can also imagine systems where you either pass or open 1C as dealer, nothing else. Opening 1C if you hold exactly AK32 K64 Q7654 8, and passing on the other 635,013,559,599 hands conveys just a tiny bit more information to your partner than always passing does. On average, you now need to convey only an additional 39.20799644213 bits rather than 39.20799644219 bits to exactly specify your hand.
On the other hand, you could, say, pass with all 0-10 HCP hands and open 1C with all 11+ HCP hands. 56.2% of the time you'll hold one of ~357 billion weak hands, and 43.8% of the time you'll hold one of ~278 billion strong hands. On average you now need to convey only 38.219 additional bits of information to exactly specify your hand.
If you found a way to open exactly 50% of your hands and pass exactly 50% of your hands, you would pass 1 full bit of information and only have 38.207 to go.
If your goal was to convey maximum information at your first call, you'd split hands into 36 equal classes for each action from Pass through 7NT -- well, 16 classes of 17,639,265,545 hands and 20 classes of 17,639,265,544 -- and need only 34.038 more bits to exactly specify your hand.
Of course in the real world you run into some other issues: opening 7NT that often is going to result in a lot of sets; and lower opening bids have more future chances to convey information than higher opening bids; so we design bidding systems so that the lower openings are much more frequent than 7NT openings are.
#6
Posted 2016-June-23, 10:30
Entropy??? Which of us are capable of thinking in terms of logs at the table?
entropy H(x) = -Σ pilog(pi)
Use variance. Or the square root of variance, standard deviation. Flat patterns have low std dev. Skewed patterns have high std dev.
entropy H(x) = -Σ pilog(pi)
Use variance. Or the square root of variance, standard deviation. Flat patterns have low std dev. Skewed patterns have high std dev.
#7
Posted 2016-June-24, 09:42
Ah, Shannon Theory, my favourite topic (even though it's 20 years since I read the papers).
As others have said, entropy is an expression of how much you know or can predict about the original [edit: or future] thing from the information you have. English (and most natural languages) are fairly low-entropy; there's about one bit of entropy per letter of English text (more if it's a string of random words, less if it's a speech, even more less if it's a speech on a known topic). You can tell that by the games of "can you read this? Only the first and last letter of each word is correct, the rest are random" on the web, or similarly the ability to understand Bridge spoken in French (or other language). Note that low entropy in natural systems of communication is a Good Thing; without the redundancy and predictabililty, communication in a noisy environment (or with people with an accent, or against another discussion, or...) would be effectively impossible.
For information transmission, the goal is to minimize entropy of the message, so it is understandable even over a noisy channel. A competing goal is to minimize the size of the transmission/maximize the speed of the transmission - and an entire department of Electrical Engineering is dedicated to discovering new ways to do this and optimizing the current ones.
For security concerns, the goal is to maximize entropy of the message (often, but not always, with a competing goal of minimizing the chance of not being able to recover the message), so that even if the attacker has complete knowledge of all the old messages, new messages are almost as difficult to decrypt as before. The difference between pseudorandomness suitable for simulation and for crypto purposes is this matter of predictability: for a simulation, as long as the numbers come out in the correct distribution and streak lengths, you're basically good; for crypto, if knowledge of X bits of the output stream allows you to predict the next ones (or if the same stream is going to come out if you trigger a "reset"), you might as well be sending cleartext. This is why a record of my postings where it comes to hand generation reads like a crypto class (viz. "you can't just use 'rand()'", and "you can't use functions to 'improve' the randomness of a 32-bit seed", and "just generate enough randomness and 'look it up' in the Big Book", and "it's probably worth (for world championships, at least) buying a source of true randomness and not having to worry about bad crypto-PRNGs").
For bridge (which I realize is all you care about :-) the goal is to build a system that provides enough information to partner to make intelligent decisions while minimizing leakage of non-useful information to the opponents to help their defence; similarly, when you don't have the cards, to use "noise" (high-entropy calls, taking away bidding room) to disrupt the opponents' communication. The concepts of information theory (entropy, resilience to noise, and information leakage) is a useful lens through which to visualize what you're doing.
Off-topic Footnote:
As others have said, entropy is an expression of how much you know or can predict about the original [edit: or future] thing from the information you have. English (and most natural languages) are fairly low-entropy; there's about one bit of entropy per letter of English text (more if it's a string of random words, less if it's a speech, even more less if it's a speech on a known topic). You can tell that by the games of "can you read this? Only the first and last letter of each word is correct, the rest are random" on the web, or similarly the ability to understand Bridge spoken in French (or other language). Note that low entropy in natural systems of communication is a Good Thing; without the redundancy and predictabililty, communication in a noisy environment (or with people with an accent, or against another discussion, or...) would be effectively impossible.
For information transmission, the goal is to minimize entropy of the message, so it is understandable even over a noisy channel. A competing goal is to minimize the size of the transmission/maximize the speed of the transmission - and an entire department of Electrical Engineering is dedicated to discovering new ways to do this and optimizing the current ones.
For security concerns, the goal is to maximize entropy of the message (often, but not always, with a competing goal of minimizing the chance of not being able to recover the message), so that even if the attacker has complete knowledge of all the old messages, new messages are almost as difficult to decrypt as before. The difference between pseudorandomness suitable for simulation and for crypto purposes is this matter of predictability: for a simulation, as long as the numbers come out in the correct distribution and streak lengths, you're basically good; for crypto, if knowledge of X bits of the output stream allows you to predict the next ones (or if the same stream is going to come out if you trigger a "reset"), you might as well be sending cleartext. This is why a record of my postings where it comes to hand generation reads like a crypto class (viz. "you can't just use 'rand()'", and "you can't use functions to 'improve' the randomness of a 32-bit seed", and "just generate enough randomness and 'look it up' in the Big Book", and "it's probably worth (for world championships, at least) buying a source of true randomness and not having to worry about bad crypto-PRNGs").
For bridge (which I realize is all you care about :-) the goal is to build a system that provides enough information to partner to make intelligent decisions while minimizing leakage of non-useful information to the opponents to help their defence; similarly, when you don't have the cards, to use "noise" (high-entropy calls, taking away bidding room) to disrupt the opponents' communication. The concepts of information theory (entropy, resilience to noise, and information leakage) is a useful lens through which to visualize what you're doing.
Off-topic Footnote:
Spoiler
When I go to sea, don't fear for me, Fear For The Storm -- Birdie and the Swansong (tSCoSI)
#8
Posted 2016-June-24, 11:03
I guess I'm interested in the terminology to recognize and describe good vs poor system design.
As an example, consider the "Impossible Negative" where 1C-1D is 0-7 or any 8+ (4441) hand. I think one would say that this is higher entropy than just 0-7.
But I haven't found a word or phrase to describe the unfortunate occurrence of say 1C-1D, 2S-3N where 2S is forcing with spades and 3N identifies 8+ 1444. I think it's
unfortunate anyway because the auction has gotten rather high rather fast. Opener presumed incorrectly that he was dealing with 0-7; had he known he might have rebid 1S and then partner could bid 2N to show the same hand. In a sense, one might say that the "decision tree" for this auction starts at 3N (after the 2S rebid) or 2N (after a 1S rebid).
Starting a decision tree early seems like it should be a design goal.
I have the same problem with my system (an adaptation of IMprecision) where 1C-1D, 3D is GF with long diamonds (this isn't an IMprecision sequence btw). The 1D is 0-4 or 9+ and if opener were aware that pard had 9+ he'd simply rebid 2D and allow his hand to be relayed....but 2D isn't forcing so opener sometimes has to make a false presumption that responder has the weak hand...which is suboptimal when he doesn't. "Proceeding on a false premise"? "Jumping the gun?" "misdirection" "wrong forking"? Maybe of interest to no one, but it would be nice to have a term to describe taking a turn that one wouldn't take if one had known it to be unnecessary or counterproductive. We could then say "this or that system has too much of "fill in the blank" going on"
As an example, consider the "Impossible Negative" where 1C-1D is 0-7 or any 8+ (4441) hand. I think one would say that this is higher entropy than just 0-7.
But I haven't found a word or phrase to describe the unfortunate occurrence of say 1C-1D, 2S-3N where 2S is forcing with spades and 3N identifies 8+ 1444. I think it's
unfortunate anyway because the auction has gotten rather high rather fast. Opener presumed incorrectly that he was dealing with 0-7; had he known he might have rebid 1S and then partner could bid 2N to show the same hand. In a sense, one might say that the "decision tree" for this auction starts at 3N (after the 2S rebid) or 2N (after a 1S rebid).
Starting a decision tree early seems like it should be a design goal.
I have the same problem with my system (an adaptation of IMprecision) where 1C-1D, 3D is GF with long diamonds (this isn't an IMprecision sequence btw). The 1D is 0-4 or 9+ and if opener were aware that pard had 9+ he'd simply rebid 2D and allow his hand to be relayed....but 2D isn't forcing so opener sometimes has to make a false presumption that responder has the weak hand...which is suboptimal when he doesn't. "Proceeding on a false premise"? "Jumping the gun?" "misdirection" "wrong forking"? Maybe of interest to no one, but it would be nice to have a term to describe taking a turn that one wouldn't take if one had known it to be unnecessary or counterproductive. We could then say "this or that system has too much of "fill in the blank" going on"
Page 1 of 1