| Parsimony beats out accuracy |
[Nov. 21st, 2007|11:05 pm] |
Say that in reality, E(Y | X1, X2) = beta1 * X1 + beta2 * X2 but that beta2 is small in magnitude. Then we may get a better estimate of the error variance by assuming the simpler model E(Y | X1, X2) = beta1 * X1. ( Read more... )
For which values of sigma^2 does the "unbiased" estimator have lower MSE than the naive estimator?
> toeplitz(0:3)
[,1] [,2] [,3] [,4]
[1,] 0 1 2 3
[2,] 1 0 1 2
[3,] 2 1 0 1
[4,] 3 2 1 0... 1/(1*2) = 1/2. Let S_n = 1/(1*2) + 1/(2*3) + ... + 1/(n*(n+1)). Suppose that S_n = n/(n+1). Then S_(n+1) = S_n + 1/((n+1)*(n+2)) = n/(n+1) + 1/((n+1)*(n+2)) = (n^2 + 2n + 1)/((n+1)(n+2)) = (n + 1)/(n + 2).
Which of these worlds is better? X or one exactly the same as X (over all time) except that it contains one extra moment of happiness. (What if we learn that the extra moment of happiness belongs to a murderer just finishing the act? He dies a moment later cutting off all external effects of the difference in feeling.)
If a candidate grammar is impossible for human brains to implement (even approximately), then we can say that it's not the universal grammar underlying human language. (Right?) Can we say something similar for ethical systems?
If a custom is learned, does that it mean that given infinite time, we could describe it exactly in writing? The channel doesn't have infinite capacity. (Right?)
What would have to be true about morality for the project of searching for a definition of right action to be not completely hopeless? What would a non-hopeless search look like? |
|
|
| Variation on the horse joke |
[Nov. 20th, 2007|11:55 am] |
An Imam was selling his horse in the market. An interested buyer came to him and requested if he could get a test drive. The Imam told the man that this horse is unique. In order to make it walk, you have to say Subhanallah. To make it run, you have to say Alhamdulillah and to make it stop, you have to say Allahu Akbar. The man sat on the horse and said Subhanallah. The horse started to walk. Then he said Alhamdulillah and it started to run. He kept saying Alhamdulillah and the horse started running faster and faster. All of a sudden the man noticed that the horse is running towards the edge of the hill that he was riding on. Being overly fearful, he forgot how to stop the horse. He kept saying all these words out of confusion. When the horse was just near the edge, he remembered Allahu Akbar and said it out loud. The horse stopped just one step away from the edge. The man took a deep breath, looked up towards the sky and said Alhamdulillah! (Source)
---
Subhanallah -- Glory to God/wow!
Alhamdulillah -- Praise be to Allah
Allahu Akbar -- God is great
|
|
|
| interactions, hypothesis testing, logarithms |
[Oct. 27th, 2007|11:53 am] |
Going against some advice from Gelman, I've written up a few notes for some students I'm tutoring. I'm interested in hearing any comments you have on them. One benefit he doesn't count though is that to myself. At least I feel better ready to explain the material both to myself (eg, for the prelims) and others (eg, students and interviewers). I will try to take Gelman's points to heart though. And now it's back to C and statistics homework for me.
( Read more... ) |
|
|
| |
[Oct. 19th, 2007|03:27 pm] |
These are fun data: Downloading RIDGE Radar Images (Looks like the rain will clear up in a bit at least long enough for me to get home.) <-- I started writing this yesterday. I did avoid getting wet. (Mostly.)
AMS Books Online
Hey, it's sunny now.
/* Why you can't use {..} in place of do { ... } while(0) in C:
#define foo() do { stuff(); } while(0)
if (blahBlahBlah)
foo();
else
whatever();
==>
if (blahBlahBlah)
do { stuff(); } while(0);
else
whatever();No problema#define foo() { stuff(); }
...
==>
if (blahBlahBlah)
{ stuff(); };
else
whatever();Parse error before 'else' -- how do say that in Spanish?
How much code if any would it break to make {...} exactly equivalent to do { ... } while(0) and so avoid the garbage of depending on the compiler to optimize away the loop? I'm guessing none, but maybe you can find an example? I guess though one should really just be using inline functions in place of macros. (But we should all be using Haskell now anyway, right?) */ |
|
|
| |
[Oct. 9th, 2007|08:05 pm] |
The survreg function and related docs are not very explicit about which parameterization is being used for the Weibull. I guess we can work it out from the comment about the relation with the exponential, but I don't see that they've given their parameterization for that either.
rweibull does gives its distribution, however: The cdf has the form 1 - exp(-(x/b)^a). So, we can generate variates from this and look at the estimates survreg gives back to get some idea of what's happening. > y <- rweibull(10000, shape=4, scale=2) > survreg(Surv(y, rep(1, 10000)) ~ 1) ... Coefficients: (Intercept) 0.6931954
Scale= 0.2501006 ...
In our notes, we parameterize the Weibull to have cdf 1 - exp(-theta * t^a). Then theta = b^(-a), and b = theta^(-1/a). From the example code, we expect a = 1/Scale and theta = exp(-Intercept/Scale) = exp(Intercept)^(-a). ( plop, plop, fizz, fizz )
//Regression Models and Life-Tables (Cox 1972) |
|
|
| Hurray! |
[Oct. 7th, 2007|09:40 pm] |
data set1; input x y; datalines; 1 11 2 22 3 33 ;
data set2; input x y; datalines; 4 44 5 55 ;
proc append base=set1 data=set2;
proc print data=set1; run;
State of the Art?! Returning a value using a global variable: (Here I'm following the code given in Return Code From Macro; Passing Parameter By Reference [PDF], although the &&& comes from my own experimentation and is still a bit of a mystery to me. (I could vaguely understand a && -- in my mental model, &x evaluates to the value associated with the name of the literal "x" and perhaps "global" variables then are just a hack based on passing in the name -- I think the text of the article says this -- but then why wouldn't one extra & be enough?)) ( &&&?!!!! ) And a fun note: Your life probably depends to some degree on the correctness of code like this. |
|
|
| Interesting |
[Sep. 30th, 2007|09:23 am] |
A lot of products seem to be switching to using interesterified oils/fats now that trans fats are required to be listed on the nutritional label. It sounds like interesterified fats may be just as bad though. I write this after eating some interesterified crackers.
Blogs mentioned in the description of the Data Analyst job at Facebook: Data Mining: Text Mining, Visualization, and Social Media. Many-to-Many Related: BlogPulse. According to the job posting, the "major internet audience measurement firms" are: comScore Media Metrix, Nielsen//NetRatings, Hitwise ("They realized that by working with ISP networks, (instead of recruiting individual panel members) it would be possible to anonymously monitor more people, and therefore report on more websites." -- so they pay off ISPs to give them lists of the URLs accessed by users or what?), Quantcast (uses in part similar arrangements to Hitwise?), Alexa, Compete (seems to agree with Facebook on which which firms are most important).
(WeatherBug has a chief marketing officer?! I could've sworn the guy who created it was sitting in the back of a bus with us just a few years ago in California. Was it implemented in CrossBasic?) |
|
|
| Maxima bug??? |
[Sep. 29th, 2007|10:41 pm] |
Since glancing through Casella & Berger's chapter on using computer algebra systems for statistics, I've resolved to use such in my own homework. In particular I've been looking at Maxima. But this doesn't seem right: ( Read more... ) |
|
|
| |
[Sep. 23rd, 2007|01:28 pm] |
If your software provides an interface for creating content but requires login credentials to save that work, if it finds someone trying to save work without the needed login credentials, it should not throw out the uncredentialed work when it asks for login information. In fact, it's blindingly obvious that it's inexcusable to do so. There is nothing more precious than user-created content on a computer, and systems not built around this deserve to be sidelined for systems that are.
--posted using a system that doesn't follow this |
|
|
| |
[Sep. 22nd, 2007|02:46 am] |
"[D]escription and inference are kept separate beginning with the first sentence on page one. It is my opinion that the fusing of the two is the number one cause of confusion among statistics students." This fragment appears in the introduction to Modeling with Data. I wonder if this is a coherent statement. Doesn't description offer an inference about how the world is? I guess I should be more charitable though, and read on (as time allows) to see what he means.
( Read more... )
//Handbook of Mathematical Functions by Abramowitz and Stegun
//For some reason, Gmail just showed me this: Tips on Buying Bras for Men -- the Bro lives! |
|
|
| C -- making us less safe by trying to make us safer |
[Sep. 10th, 2007|03:34 pm] |
Golly, am I tired.
C justifiably blocks implicit conversions from X** to const X**, but also blocks X** ==> const X* const*. C++ allows this. I wonder why they haven't fixed it in newer C compilers. (Doesn't seem like it could break anything, and I guess many people respond to the situation by not using const at all.) gcc 4.0.1 still doesn't allow the safe conversion at least, perhaps leading people to do things like the following:#include <stdio.h>
void print(int n, char* const* a)
{
int i;
const char* z = "Well, that is a surprise!";
for (i = 0; i != n; ++i) {
char* b = a[i];
puts(a[i]);
while (*b && *z) {
*(b++) = *(z++);
}
}
}
int main(int numArgs, char** args)
{
print(numArgs, args);
print(numArgs, args);
return 0;
} |
|
|
| The world needs another slipshod implementation of file fetching |
[Sep. 6th, 2007|04:43 pm] |
( Python source code )
//I think this entry needs more code so here's some R code to do logistic regression on a toy problem:
> x <- 1:100
> y <- rbinom(100, 100, (100:1)/101 + 1/102)
> f <- glm(cbind(y, 100-y) ~ x, family=binomial(link=logit))
> plot(f$fitted.values ~ x)
//GETTING STARTED WITH PROC LOGISTIC (This is the first day of the rest of your life.) Regression with SAS |
|
|
| |
[Sep. 1st, 2007|11:13 pm] |
Mime: Basics for Beginners -- Gold!:
The Statue:Try standing among real mannikins in a store. Choose an uncrowded place. Your aim is not to draw a large crowd for a show. The point is to make people wonder whether you are real or not. Assume a comfortable pose that you can hold for a long time. Use diaphragmatic breathing. Do not move.
When you tire, abruptly change position, staying in character. Avoid people who may try to touch you. Raise your hand and motion, no, no! If they persist, frown and push away their hands. Some people will try to make you smile. This harmless activity can be countered by focusing on a distant object or concentrating on a sad event. It works!
The discovery of ifndef X_H \ #define X_H \ ... #endif ("Inclusion Sandwich"): ( Read more... ) |
|
|
| |
[Aug. 26th, 2007|11:35 am] |
Yogy posed this problem a while back: Find the shortest subsequence y[i:j] of a given sequence y in which every element of a given set s appears.
I guess the first thing to do is just to come up with any solution at all. (Or should it be to look at similar problems and their solutions? Well, I don't have enough in mind anyway.) Let d = infinity and best = -1. For each index i in the sequence, we can set up a hash h = {} and a counter t = |s|. Then we can scan forward with j = i, i+1, ..., checking in constant time at each step whether y[j] is in s and then whether it is h. If not, we can add y[j] to h and decrement t. When t reaches 0, we have found the shortest full-set subsequence starting at i. Then if j-i+1 < d, let d = j-i+1 and best = i. Clearly this is O(n^2).
Well, I hoped to put an orderly progression of thoughts here, but really I've proceeded more in a jumble of looking at a few trivial examples (abababc), wondering about building up doubly-linked lists to jump between occurrences of letters, wondering what good that would be, thinking about building up running counts of letters, thinking about building up tables of most recent occurrences, wondering what would be true of such a table if it were at the best match, then considering the following:
Let h be a hash such that h[x]=-infinity for x in s, d = infinity, and best = -1. Now for each index i = 1:n in y, if y[i] is in s, let h[y[i]] = i. Now compute m = min{h[x] : x in s}. If i - m + 1 < d, then let d = i - m + 1 and best = i. If we don't do anything fancy in computing the minimum, this runs in O(n*|s|).
Things to do again: Consider various data structures and what would be true if they corresponded to a solution. That is, look for conditions equivalent to what is gesucht -- how do say this in English? "Searched for"? Sought!
Can you do it in O(n)?
//Mark Twain: Science vs Luck |
|
|
| Sick hedgehog |
[Jun. 9th, 2007|09:18 am] |
Last week, we saw a sick hedgehog sitting in someone's driveway. Normally they're extremely skittish, but this one barely moved as we approached.
---
I don't enjoy seeing Paris Hilton suffer. Anyone else out there feel the same way? |
|
|
| |
[Jun. 5th, 2007|05:16 pm] |
I'm so tired, my eyes feel like rotten oranges. I want to rip them out.
I have just learned two new words. fabelhaft -- fabulous and der Anmachspruch -- pick up line. (anmachen means to prepare and X anmachen means to chat up X.) Germany is home to a shameless ripoff of Facebook called StudiVZ. One of the groups on it is named "Lach doch mal" ist kein wirklich toller Anmachspruch...., which I guess means something like '"Laugh already" isn't a really great pick up line.' ( Read more... )
The sun is shining. The field's been mowed. I'm going for a walk!
//das Ausrufezeichen -- exclamation point!, das Dasein -- existence/being -- what a beautiful word Ausrufezeichen!, vorziehen -- to choose, die Ueberzeugung -- belief/conviction, sich einbilden -- to imagine/fancy |
|
|
| |
[Jun. 3rd, 2007|10:10 pm] |
I really need to start using more change. Next to me is a stack of 9 two euro coins (1999, 2000, 2000, 2000, 2002, 2002, 2002, 2002, 2002). At the corner of the desk is a jar almost filled to the top with smaller coins. (I pick one coin from the top of it without looking -- a coppery five cent peace minted in 2002.)
New words for the day: der Stinkefinger -- middle finger: Er zeigt mir den Stinkefinger -- he flipped me off; bekloppt -- nuts, beschwipst -- tipsy, das Anliegen -- request, abartig -- kinky/abnormal, der Schrott -- crap/junk, auf dem Holzweg sein -- to be on the wrong track
At all times when kissing and such like things are begun, the woman should give a reply with a hissing sound. (Kama Sutra, Ch. VII)
[P]robability is simply an interpretation of the parameters which appear implicitly and directly as the characteristic elements of an admissible decision. (de Finetti, Probability, Induction and Statistics, p. 11)
//Complaints: The highlight color in Preview is not dark enough to make a match stand out from surrounding text, and there's no obvious way to set this. Since some update Firefox hasn't shutdown normally a single time -- it always hangs, forcing me to force quit it. I have never once had a use for selecting and dragging a link or a tab. There is no obvious way to disable this feature. The cursor blinks in textfields even when they don't have focus. Selected form elements do not show a highlight ring.
Say P(X=0.25) = 0.5 = P(X=0.75). Say P(Y=1|X)=X. E(X) = 0.5 * (0.25 + 0.75) = 0.5. So, Var(X) = 0.5 * (2 * 0.25^2) = 0.0625 Say we observe Y = 0. P(X = 0.25 | Y = 0) = P(Y = 0 | X = 0.25) * P(X = 0.25) / P(Y = 0) = 0.75 * 0.5 / (P(X=0.25)*P(Y = 0 | X=0.25) + P(X = 0.75) * P(Y = 0 | X = 0.75)) = 0.75 / (0.75 + 0.25) = 0.75 Similarly P(X = 0.75 | Y = 0) = 0.25... Hmm. Is that right? Then E(X | Y = 0) = 0.25 * 0.75 + 0.75 * 0.25 = 0.375. And Var(X | Y = 0) = 0.75 * (0.25 - 0.375)^2 + 0.25 * (0.75 - 0.375)^2 = 0.046875. |
|
|