The Home Stretch

I will be busy doing schoolwork the next few weeks: my Statistical Consulting class will have a “consultation memo” due Monday, which is simply a short summary of an actual statistical consult we students sat in on this past Monday. And on Tuesday I have to give a presentation for my Case Studies in Bioinformatics class. There are only two students in this class, and each of us needs to present an analysis of some 2D protein gel electrophoresis data.

There will be a few more hurdles, but I am rapidly approaching the end of my studies in the Master’s program. On Monday, I will be one of the designated two “head students” for the consult; as we sit in on an actual statistical consultation another student and I will sit closest to the consultation, and each of us will be responsible for giving a short presentation on the consultation the following Thursday.

On December 8, I need to give a presentation discussing/critiquing this paper; the teacher (who happens to be the head of the department) also requires a 3-5 page write-up. And December 14 is the deadline for a take-home exam for the Consulting class.

And of course there’s the big T. (Working title: Non-negative Matrix Factorization: Assessing Methods for Normalization and for Estimating the Number of Components.) In about a week, I must submit an advance copy of my thesis to my thesis committee; this means that the document must be in a presentable form by that time. (Not to worry, I think everything is falling into place.) Because of a departmental requirement, I have written my thesis in LaTeX. And on December 10, I need to defend the thesis, which means I need to compose a PowerPoint presentation for that day.

Whew! It seems like a lot. But I’m really coming into the home stretch here.


Machines Versus Biologics

A machine that eats organic creatures; there’s something unsettling, disturbing, wrong about that.

It reminds me of one of my favorite Stephen King short stories, The Mangler. To my delight, they have made a movie based on this short story, and there’s even a sequel. I haven’t seen either movie yet, but they are in my Netflix queue. (As an aside, the Netflix Prize may have been won! Via MetaFilter.)

I just remembered — some years ago, there was a report about a robot that eats slugs, and uses the energy from the slugs to power itself. It was called the SlugBot. Maybe they can revamp one of those robot lawn mowers so that it is powered by its own grass clippings; it would be a sort of robot cow, so maybe they could paint it with the “cow spot” pattern.

Man versus machine is a recurring theme in science fiction. SF author Gregory Benford wrote a sequence of books called The Galactic Center saga; I think there are seven books in the series. In Benford’s universe, there is an epic galactic war between all mechanical life (the Mechs) and all biological life, spanning thousands of years. In the Matrix movies, you have the sinister machines that enslave humans, using them as a source of energy, as if they were living batteries. The Matrix scenario was very reminiscent of a short story by Dean R. Koontz entitled Wake Up To Thunder which I read back in the 80’s, in an anthology of SF short stories; here, enslaved humans were used for computational power, which seems more plausible than using humans as a source of energy (but I note that we already have robots that use flies and slugs as sources of energy!). SF author Dan Simmons’ Hyperion series also pits machines (the TechnoCore) against humans; I might be mis-remembering, but I think the machines used humans for computational power every time humans used teleportation technology that the machines provided. In Battlestar Galactica, you’ve got the Cylons. And of course, in the Terminator movies there’s Skynet.

Addendum (07/16/09): Biomass-Eating Military Robot Is a Vegetarian, Company Says (via MetaFilter)

Addendum (07/17/09): Company Denies its Robots Feed on the Dead (via FARK)

Peanut Butter, Bad and Good

King Nut recalls salmonella-tainted peanut butter.

Reminds me of the fact that William F. Buckley, Jr. was a big fan of peanut butter; his favored brand was Red Wing. Mr. Buckley was once challenged to write an Ode to Peanut Butter, but he declined. Still, J.M. La Salla of Cheshire, CT, tried his hand at it, with a favorable response from Mr. Buckley.

According to this, Red Wing is now called Carriage House.

As an aside, the first talk in the Bio3 Seminar Series at Georgetown University was on multivariate methods of analysis, which were demonstrated on microarray data of wild type and a mutant corA strain of Salmonella typhimurium.

Centaur vs. Two-Handed Sword

For B.M.

According to the Monster Manual, 3rd ed. (Gary Gygax; Lake Geneva, WI:TSR Hobbies, Inc., 1978), p. 14, centaurs are armor class 5 (leaders are AC 4) and have 4 hit dice. (For the record, they also have 2 attacks per melee round, doing 1-6 damage with their hooves with one attack and a variable amount of damage with the other attack, depending on the human weapon they’re wielding.) Importantly, the size is listed as Large.

So the expected value of the number of hit points of a centaur is E{HP} = E{4*1D8} = 4*E{1D8} = 4*4.5 = 18.

According to the Player’s Handbook (Gary Gygax; Lake Geneva, WI:TSR Hobbies, Inc., 1978), p. 38, a non-magical two-handed sword does 3-18 hit points of damage against large creatures, meaning the expected value (conditional on your having actually hit) is E{3*1D6} = 3*E{1D6} = 3*3.5 = 10.5. So on average you’d need to connect with a two-handed sword about twice before the average centaur buys it.

However, you’re not guaranteed to connect every time you swing. So let’s estimate how many times you’d need to swing that sword to defeat the centaur. On page 38 of the Player’s Handbook, it says that two-handed swords have a +2 Armor Class Adjustment against both AC 4 and 5. And on page 74 of the Dungeon Master’s Guide (Gary Gygax; Lake Geneva, WI:TSR Hobbies, Inc., 1979), it says that a first-level fighter needs a 15 (16) to hit AC 5 (4). With the +2 Armor Class Adjustment of the two-handed sword against AC 5 (4), this means that he/she’d need “only” a 13 (14). So every time he/she swings that sword, the expected damage against AC 5 is ((20-13+1)/20)*10.5 = 4.2 (3.675 for AC 4). This means that, on average, a first-level fighter would need to swing 18 / 4.2 ~ 4 times to kill an average non-leader centaur (and about 5 times for an average leader centaur).

But maybe it’s not fair to have a 1st-level fighter (“Veteran”) take on a centaur. Perhaps we should instead match a 4th-level fighter (“Hero”) against the centaur. OK, going through the same computations for a 4th-level fighter, I compute that the expected damage against AC 5 every time he/she swings that two-handed sword is ((20-11+1)/20)*10.5 = 5.25 (4.725 for AC 4), meaning he/she’d need to swing about 3 times (okay, 3.43) before an average non-leader centaur bites the dust (3.81, closer to 4 times for an average centaur leader).

Now, I don’t remember centaurs being quite this tough in Rogue. In Rogue, I thought that one or two hits would do. I think the rules were different in Rogue.

If I’ve made any mistakes in my computations, please let me know. But I have used the power of MATLAB (v. 7.4.0 R2007a, Student Version) to do these calculations, so they must be correct. 🙂

I knew there was a reason I’m studying to get a Master’s degree in biostatistics. If you studied statistics, you too could estimate fairly precisely whether you could take on that centaur. (Wait a sec, that word “precisely” bothers me. Maybe I should compute 95% confidence intervals…) You’d need to take your Panasonic Toughbook with you on your dungeon campaign, and be sure you’ve got MATLAB installed.

Future project: write a program that takes player character attributes (race, class, level, weapon, etc.) and monster attributes (hit dice, armor class, number of attacks, damage per attack, etc.) as input, and gives as output estimates for the outcome of combat (whether you’d win/lose, how many turns melee would last, damage dealt, etc.), with confidence intervals as appropriate. Actually, I bet somebody has already written this program.

I know. This was another really geeky post.

Published in: on 14 December 2008 at 7:01 pm  Comments (1)  
Tags: , , , , , ,

Exploratory vs. Hypothesis-Driven Experiments

The night of December 9, 2008, I had dinner at the P.R. Grill in Pentagon City with Drs. K.H.K. and W.J.K., as well as with I.K.. An animated dinner conversation accompanied the good food. This post is a follow-up to one of the topics covered.

In exploratory analyses or experiments, very many variables may be correlated against an effect just to see which, if any, might possible be related to the effect of interest. It is a sort of fishing expedition. Such “data mining” can lead to many false positives (Type I errors).

You may recall that I brought up as an example the case of butter production in Bangladesh. D. Leinweber sifted through a UN data CD ROM and correlated multiple (I do not know exactly how many) indicators with the S&P 500, and found that butter production in Bangladesh was the single best predictor of the S&P 500 (He Who Mines Data May Strike Fool’s Gold, Business Week, 6/16/1997). So why don’t we all follow Bangladeshi butter production to time the stock market?

Here’s a Gedankenexperiment. Imagine you perform 10,000 experiments, and that all of them are just random noise without a true effect or signal. If you do your significance testing with an alpha-level of 0.05, then you’d expect about 5% or 500 of those experiments to appear to be significant at the 0.05 level just by chance alone. (As an aside, the way we do statistics these days is apparently a horrible conflation of Fisher’s significance testing with p-values and Neyman-Pearson hypothesis testing with alpha-levels.) I.e., the data could be telling you that 500 of the experiments were “successes”, when in reality they were just random noise.

Here’s another thought experiment. Ravens’ Stadium in Baltimore can hold 70,000 people. If the stadium were full and each of the 70,000 people flipped a coin fifteen times, there’s a greater than 88% chance that at least one person to flip ten heads in a row. But if this happened, it is due to chance; it is not because that particular person is an expert in flipping coins.

So, we need multiple testing procedures to protect ourselves against these errors. For example, there is the famous Bonferroni correction. In functional brain imaging, we use methods based on (Gaussian) random field theory. More recently, we (in functional brain imaging) have started using methods based on controlling the expected false discovery rate (FDR).

Exploratory analyses can suggest some interesting phenomenon, e.g., that two variables are correlated; so, sometimes they are called “hypothesis-generating experiments”. But then you need to do a hypothesis-driven experiment to really convince everybody that the two variables are indeed correlated, and that they weren’t discovered in a fishing expedition (i.e., that it isn’t merely a case of finding somebody who flipped 15 heads in a row in Ravens’ Stadium). In a hypothesis-driven experiment, you declare at the outset that your hypothesis is that X and Y are correlated, and you design the experiment specifically to test that hypothesis. It’s a surgical strike, rather than a fishing expedition.

So, exploratory analyses are a case where the data may not truly reflect a “real” signal, which was actually your original question. I think that another case, somewhat slippery, is the selection of a significance threshold. You might generate a t-test that is significant if you had chosen an alpha level of 0.05, but non-significant if you had chosen an alpha level of 0.01. So, “significance” depends on what you had chosen. (Again, I am aware that there are controversies regarding the use of p-values; e.g., see this book.)

Published in: on 14 December 2008 at 3:17 pm  Comments (4)