AMERICAN STATISTICAL ASSOCIATION (ASA) MEETING OF THE ASA COMMITTEE ON ENERGY STATISTICS WITH THE ENERGY INFORMATION ADMINISTRATION (EIA) OF THE UNITED STATES DEPARTMENT OF ENERGY Washington, D.C. Friday, October 21, 2005 324 1 COMMITTEE MEMBERS: 2 NICOLAS HENGARTNER, Chair Los Alamos National Laboratory 3 MARK BERNSTEIN 4 RAND Corporation 5 JOHNNY BLAIR Abt Associates 6 MARK BURTON 7 University of Tennessee 8 JAE EDMONDS University of Maryland 9 MOSHE FEDER 10 Research Triangle Institute 11 BARBARA FORSYTH Westat 12 WALTER HILL 13 St. Mary's College of Maryland 14 NAGRAJ NEERCHAL University of Maryland 15 THOMAS RUTHERFORD 16 MPSGE 17 DARIUS SINGPURWALLA LECG 18 RANDY SITTER 19 Simon Fraser University 20 EIA PERSONNEL: 21 BOB ADLER 22 LEJLA ALIC BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 325 1 EIA PERSONNEL (CONT'D): 2 MARGOT ANDERSON 3 COLLEEN BLESSING 4 TOM BROENE 5 GUY CARUSO 6 JOHN PAUL DELEY 7 ALOULOU FAWZI 8 HOWARD BRADSHER-FREDRICK 9 STAN FREEDMAN 10 CAROL FRENCH 11 DWIGHT FRENCH 12 BILL GIFFORD 13 BEHJAT HOJJATI 14 SUSAN HOLTE 15 ALETHEA JENNINGS 16 RAY KASS 17 ROBERT KING 18 NANCY KIRKENDALL 19 TOM LORENZ 20 RUEY-PYNG LU 21 PRESTON McDOWNEY 22 RENEE MILLER BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 326 1 EIA PERSONNEL (CONT'D): 2 MICHAEL MORRIS 3 ERIK RASMUSSEN 4 MARK RODEKOHR 5 BHIMA SASTRI 6 MARK SCHIPPER 7 BOB SCHNAPP 8 JOHN STAUB 9 LAWRENCE STROUD 10 AMY SWEENEY 11 PHILLIP TSENG 12 KEN VAGTS 13 ANGELA VEITCH 14 BILL WATSON 15 SHAWNA WAUGH 16 PAULA WEIR 17 ALSO PRESENT: 18 SUSAN BUCCI United States Department of Commerce 19 Census Bureau 20 STACEY COLE United States Department of Commerce 21 Census Bureau 22 JOEL DOUGLAS Science Applications International Corporation BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 327 1 ALSO PRESENT (CONT'D): 2 VICKI HAITOT United States Department of Commerce 3 Census Bureau 4 SUSAN LISS Federal Highway Administration 5 RICHARD HOUGH 6 United States Department of Commerce Census Bureau 7 NANCY McGUCKIN 8 United States Department of Transportation 9 ASHLEY ROBINSON United States Department of Commerce 10 Census Bureau 11 WILLIAM WEINIG ASA 12 KATHLEEN WERT 13 ASA 14 15 16 * * * * * 17 18 19 20 21 22 BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 328 1 C O N T E N T S 2 AGENDA SESSION: PAGE 3 Data Errors, Structural Change, 331 and Time Series Shocks in the 4 Electricity Market 5 Frames Comparisons of the EIA-3 402 and EIA-860 with the Manufacturing 6 Sector of the 2002 Economic Census and the 2002 Manufacturing Energy 7 Consumption Survey 8 9 10 11 * * * * * 12 13 14 15 16 17 18 19 20 21 22 BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 329 1 P R O C E E D I N G S 2 (8:32 a.m.) 3 DR. HENGARTEN: All right, to 4 order, enough of this Canadian thing. Good 5 morning, everybody. Welcome back. We have 6 another day of work here today. There is one 7 announcement that I'd like to do before we 8 start. I'd like to invite Tom Rutherford to 9 replace Mr. Cleveland's discussion and he 10 will be summarizing the break-out session on 11 the relationship between various price 12 series, are future contracts good predictors 13 of future spot prices. So that's the one 14 change to the program that I have to 15 announce. 16 Next thing, I'd like those who were 17 not here yesterday to please introduce 18 themselves. If you can use the microphone 19 over there that would be good. 20 MS. ROBINSON: Hi, my name is 21 Ashley Robinson and I am here from the Census 22 Bureau. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 330 1 MS. BUCCI: Susan Bucci from the 2 Census Bureau. 3 MS. HAITOT: Vicki Haitot from the 4 Census Bureau. 5 MS. HOUGH: Rick Hough from the 6 Census Bureau. 7 MR. COLE: Last and least, Stacey 8 Cole from Census Bureau. 9 MS. ALIC: Lejla Alic from the 10 Office of Oil and Gas in EIA. 11 DR. HENGARTEN: Welcome. Now to 12 Nancy to introduce the next speaker. 13 MS. KIRKENDALL: I would like to 14 introduce Joel. He's going to be making the 15 presentation for Lindolfo Pedraza. Just to 16 put this in context, in the past you have 17 heard us talk about looking at outlier 18 detection and various estimation methods for 19 some of the electric power data, the monthly 20 EIA 826, and there is an annual 861. 21 Earlier this summer the same data 22 were being used only a historical time series BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 331 1 framework by people in support of STEO, the 2 short-term energy outlook, down in the Office 3 of Energy Markets and End Use. And he 4 observed that there are still some 5 irregularities in the time series. There are 6 some errors that have snuck into data 7 especially at the state level and so he 8 talked to Lindolfo and Lindolfo got excited 9 about trying these ideas by looking at the 10 time series context. And so that's how this 11 stuff started. He came back and talked to me 12 —————————— do some work on it. 13 So Lindolfo wanted to talk to you 14 about his work and then two weeks ago he said 15 oh, I'm going to Sweden, and so Joel is the 16 one who gets stuck making the presentation 17 today. He was gracious enough to agree to do 18 it. So if we get into technical details and 19 we don't know all the answers it's Lindolfo's 20 fault. He's not here. 21 MR. DOUGLAS: Good morning. My 22 name is Joel Douglas, as Nancy said, and I'm BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 332 1 here to present the paper titled "Data 2 Errors, Structural Change, and Time Series 3 Shocks in the Electricity Market." The 4 contributors to the paper are Nancy 5 Kirkendall of the DoE, more specifically the 6 Statistical Methods Group, Joe Sedransk of 7 that specific group also, and Lindolfo 8 Pedraza of Science Applications International 9 Corporation. You went through Fred's story 10 already so that's fine. 11 The two purposes of this paper, one 12 is they wanted to improve the methods of 13 imputation for not only data nonrespondents 14 within a survey but also data that has not 15 been sampled within that survey which needs 16 to be estimated for on a monthly basis as 17 opposed to a yearly basis. Again, a couple 18 of months ago the data was being sifted 19 through so that aggregates and databases of 20 electricity generation and sales data could 21 be compiled for forecasting techniques over 22 at STEO. The two forms that were being used BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 333 1 at the time were EIA 826 and the Form 861, 2 which I will go into in a little bit, but 3 essentially 826 is a sample from the universe 4 of 861. 5 What the colleague hoped to find 6 specifically in a Form 826 is the published 7 versus the reported data. If you look at the 8 blue line it is consistently below the red 9 line and its peaks are the peaks of the red 10 line and its troughs are also the troughs of 11 the red line. The movement in sync is 12 specifically what he was looking for and the 13 spread between the two is made up for by the 14 imputation system. So essentially the 15 reported data is the sample and the published 16 data is a sample plus the imputed values. 17 Here is another example. This one 18 is Mississippi residential sales. The spread 19 is a little bit different in this one, it's a 20 little bit greater, but if you notice it's 21 still consistent with the troughs and the 22 peaks and everything moving in sync more or BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 334 1 less from about 1993 until 2000 so for a good 2 decade. 3 The data, though, that he found was 4 not always what he was expecting, as you can 5 see in Colorado commercial sales. Around 6 June 1997 there was a significant dropoff in 7 the reported 826 data. This is to be 8 expected in some cases when there's a very 9 large nonrespondent although this was a 10 significantly large nonrespondent if not more 11 than one in that single month. 12 But as you can see in the red 13 published line the imputation system imputed 14 for the missing respondents as well as all of 15 the firms not sampled within the survey was 16 made up for and that is the purpose of the 17 imputation system. The imputation system 18 deals with things like this very, very well. 19 And just to be clear, when I say 20 impute I am not just speaking about imputing 21 values for nonrespondent firms within the 22 samples also imputing values for the non- BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 335 1 samples that make up the universe. 2 If you look at New York residential 3 sales you'll see a different kind of data 4 error. Obviously the previous error in 5 Colorado was a lack of response or a 6 misreported value in the less than category 7 but here there was not a lack of reported 8 value but there was a value that was reported 9 in error. Probably someone reported in 10 kilowatts as opposed to megawatts or what 11 have you. 12 This type of problem is very vexing 13 to the imputation system. It doesn't deal 14 with these problems very well and, as you can 15 see, the data error reported in blue was not 16 fully corrected for as seen by the published 17 data in red. It was dampened but it still 18 existed throughout the published data. 19 DR. HENGARTEN: Joel, if there was 20 an error like kilowatts instead of megawatts 21 why does the blue line start to go up slowly? 22 MR. DOUGLAS: I am not sure. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 336 1 MS. KIRKENDALL: That looks like 2 there is a whole year where that goes on. 3 DR. HENGARTEN: But, I mean, if 4 someone reports badly the whole thing would 5 go up. It wouldn't be a trend. 6 MS. KIRKENDALL: The other thing 7 that gets complicated in figuring out some of 8 these things is that after the data are 9 published there you collect the data, you 10 benchmark to the most recent 861, and then at 11 the end of the year they adjust the monthly 12 numbers so they add up to the annual number 13 that reflects that year. And so you have to 14 really look to see if that's what happened to 15 those other data because of that smoothing 16 that maybe something -- 17 DR. BURTON: The smoothing pulled 18 it up. That's -- 19 MR. BERNSTEIN: You don't change 20 the reported data. We're talking about the 21 blue line there. That blue line starts going 22 up before the peak. So there was somebody BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 337 1 misreporting before this. 2 DR. HENGARTEN: There's something 3 going on and that's actually interesting. 4 DR. BURTON: One taught the other 5 how to misreport. 6 DR. HENGARTEN: It's an interesting 7 error. 8 MR. DOUGLAS: Well, anyway the 9 imputation system, like I said, was not 10 really designed to fix problems like this. 11 The question that the paper addresses is is 12 it possible for the imputation system to fix 13 problems like this. Are there certain 14 algorithms or certain methods that could be 15 incorporated just filling in missing values 16 that could help or alleviate such obvious 17 data errors? 18 A little background on the surveys. 19 The EIA 861 was the annual electric power 20 industry report. The data, of course, is 21 collected annually. The data represents a 22 census of the entire universe of power BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 338 1 industry participants and it also represents 2 the frame of which the monthly sample can be 3 drawn. The data collected is at the firm 4 level and state level and broken into four 5 sectors, the commercial, the residential, the 6 industrial, and the transportation sectors. 7 Form EIA 826 is the monthly sample 8 that's collected from the 861. It is a 9 cut-off sample. The remaining nonsample 10 firms are given an estimated or an imputed 11 value on a monthly basis; therefore, every 12 month you have a value whether imputed or 13 reported for every single firm in the 861 but 14 it's referred to for the rest of this 15 presentation as 826 published data because 16 it's not 861 census annual data. 17 The survey sampling methodology for 18 the 826 is they use of cut-off sample due to 19 the skewed nature of electricity sales and 20 generation. A weighted regression is used to 21 refute for the nonsample data. This is also 22 due to the skewness of the sample. The BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 339 1 survey model, like I said before, is the 2 reported values plus the sum of all the 3 imputed values and, again, as I said, the 4 spread between the two lines that I showed 5 you in Colorado and New York was the reported 6 and then the reported plus the imputed, so 7 just to revise that. 8 The first data cleaning technique 9 was used before we even reached regression 10 imputation. It is called scatter plot 11 editing. It allows data analysts to compare 12 the currently reported data to previously 13 reported data or previously reported 14 aggregate data depending on what you choose. 15 Hopefully this will catch misreported data or 16 data that stands out from its peers and more 17 specifically in context with its previously 18 reported values. In a perfect world that 19 this would be the best way to catch them but 20 obviously things slip by especially when you 21 have a several thousand points in the same 22 graph. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 340 1 Scatter plot editing is currently 2 in place on a few surveys within at least 3 CNEAF but also in the EIA and it's coming up 4 on several other surveys so hopefully that 5 will take care of a lot of the data problems 6 in the future. In the meantime if that's not 7 enough, again, just to refresh, as I said, 8 imputing monthly EIA 826 with EIA 861 the 861 9 is the frame. 10 We're talking about electricity 11 sales although total revenues are also 12 reported on both forms. The specific 13 regression equation with statistic errors is 14 Y equals X where Y is the state level monthly 15 revenue from the I firm into the commercial, 16 industrial, residential, or transportation 17 sectors and X is the state level annual 18 revenue or sales from one of those sectors 19 reported on an 861 and that's divided by 12 20 to give it a yearly or monthly average for 21 the year and the beta value is the growth 22 rate and that should take care of a lot of BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 341 1 the seasonality between the data. 2 Here is a more formal 3 representation on the equation. Y is the EIA 4 826 monthly reported data and X is the 861 5 annual data average for each month within the 6 most recently finalized year. Currently it's 7 2003 but soon to be 2004. 8 Pretty much the heart of the paper 9 dealt with data versus expectations. There 10 is a need to reconcile the survey sampling 11 methodology and the desired property of 12 accuracy that methodology ascribes to and 13 time series which has the desired property of 14 consistency. 15 To repeat, the survey sampling is 16 the reported plus the imputed whereas a time 17 series could be anything from an indicator 18 variable or a reported value plus lagged, 19 reported, or estimated values. The question, 20 of course, is can both be achieved 21 simultaneously. The process that Lindolfo, 22 Nancy, and Joe Sedransk went through to begin BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 342 1 analyzing these problems was that they first 2 had to re-estimate the 826 line by using 3 reported data and consistent imputation 4 system. 5 Now, the imputation system has not 6 changed all that much over the past 10 or 15 7 years but for analysis purposes they needed 8 to recreate the lines and have a consistent 9 process by which to impute from missing and 10 nonsampled data for cross monthly comparisons 11 as well as cross yearly comparisons. 12 Hopefully the newly estimated line matches 13 closer with the historical line. Otherwise 14 the relationship needs to start over because 15 we need a base of not only the imputation 16 system but a time series language to compare 17 other alternative imputation methods. 18 And, as you can see, the 861 and 19 826 re-estimated line matches up quite well 20 with the original until you get about 2002 21 when the relationship seems to break down for 22 some reason, not quite known why but it's BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 343 1 actually very close and, as Nancy pointed out 2 a few days ago, there are very few changes 3 within the imputation system over the years, 4 mostly stratification levels, different 5 samples being taken, and such. 6 But as you can see we have a nice 7 base right now for comparison of the time 8 series as well from now we can take this 9 imputation system that created these lines 10 and add or subtract stuff and attempt to fix 11 data errors. 12 As I previously discussed, scatter 13 plots, first two were used in trying to 14 uncover data errors within both 826 and 861. 15 The next three methods are as follows. The 16 second method was to incorporate automatic 17 outlier and influential observation 18 detection. 19 I will explain how we detect those 20 in a minute and that method would be treated 21 as erroneous or bad data and would be 22 overwritten with an imputed value just as if BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 344 1 they were a nonrespondent. The second would 2 be to take those outlying and influential 3 observations and treat them as add-ons, that 4 is, assume their value is good but do not add 5 them into the model and do not let its large 6 effects rain in on the other estimated or 7 imputed values for either sample or 8 nonrespondent. 9 The third method is the 10 experimental method which Lindolfo had 11 started right before he had left to go to 12 Sweden or Denmark or wherever he is this 13 week. And he used the seemingly unrelated 14 regression model or the SUR model, which I 15 will get into also. 16 Mobile and influential and outlying 17 observations, influential observations, if 18 omitted, is a reported value and the 19 regression estimation would considerably 20 influence the value of its corresponding or 21 predicted value. The way Nancy and Joe and 22 Lindolfo decided to uncover these BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 345 1 observations, they used DFFITS, greater than 2 two over the square root of the count. An 3 outlying observation is one with a 4 considerable rift between its reported value 5 and its predicted value and they use a 6 studentized residual on that one greater than 7 3.5. 8 There are probably different 9 methods to calculating these and as of right 10 now both an influential and the outlying 11 observations are treated as the same whereas 12 you could probably in some cases treat 13 influential observations as add-ons and treat 14 outlying observations as erroneous data to be 15 imputed for. 16 So the second step after scatter 17 plots would be to re-estimate the 826 line by 18 detecting the outliers and influential and 19 imputing for their values. This, of course, 20 is the extreme position on the other side of 21 detecting them as add-ons. Detecting 861 22 outliers showed the effect of smoothing BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 346 1 extreme imputed values whereas 826 outlier 2 and influential detection should have the 3 effect of smoothing extreme monthly 4 observations. As you can see, the results 5 were mixed. 6 If you look in January 1994 the 7 original trend of the time series was kept 8 intact all the way almost into 1995 but, as 9 you can see, it was not necessary because the 10 shock up still occurred. It was not just a 11 yearly shock that was go back down through 12 this little time series line. In some cases 13 the EIA 861 and 826 re-estimated or muted 14 some of the shock. In some cases they 15 exacerbated some of the shocks. Like I said 16 before, the results are just mixed. 17 MR. RUTHERFORD: I don't understand 18 the previous slide. The influential ones, I 19 don't understand why you override the 20 influential. Could you go back to the 21 definition? 22 MR. DOUGLAS: Yes. The influential BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 347 1 ones are not necessarily the largest but 2 those that would have the most effect on the 3 regression and, as I said, they were treated 4 as erroneous data. But they could also be 5 treated as add-ons or they could be kept and 6 say that's good data and we want it in the 7 model. The method that Nancy and Joe and 8 Lindolfo chose was to treat both influential 9 and outlying as the same, either impute for 10 them or treat them as add-ons, of course, 11 different methods. Like I said, it's the 12 extreme position but that was the way —————— 13 ———————————————————— 14 DR. EDMONDS: Is there any 15 difference between those two? 16 DR. HENGARTEN: Yes, one is an 17 outlier in the X direction. If you think of 18 a regression of Y on X if one of the 19 observation on the X-axis is way out that 20 acts like leverage. It will move the 21 regression line that's nullified on the X. 22 That's an influential observation or BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 348 1 possibly. And an outlier on the Y, it's 2 higher than it ought to be. So there's a 3 distinction there. Am I correct? 4 MS. KIRKENDALL: I think so. 5 DR. HENGARTEN: And so if you admit 6 that both forms have outliers then that's ——— 7 —————— 8 MR. RUTHERFORD: Yes, if you just 9 drop off the data and estimate it or if you 10 in a sense drop off the data and then use 11 those data to impute the values and then put 12 it back in that would be identical. 13 DR. HENGARTEN: Yes, when you 14 impute there are going to be small 15 differences because the imputed values are 16 not necessarily on the regression line. 17 DR. EDMONDS: Oh, I thought you 18 were using the regression to do the 19 imputation. 20 DR. HENGARTEN: Oh, so it's not 21 imputation. It's prediction? 22 MS. KIRKENDALL: Right. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 349 1 DR. EDMONDS: So it would be the 2 same if those are identical. 3 DR. HENGARTEN: Then it doesn't 4 make a difference. 5 MS. KIRKENDALL: What doesn't make 6 a difference? 7 DR. HENGARTEN: If you use the 8 regression line to impute then -- 9 DR. EDMONDS: Then dropping them 10 off and just doing the regression or doing 11 the regression and then using it to impute 12 the value for the outlier and replacing it 13 leaves you in the same spot. 14 DR. BURTON: But would you have the 15 same regression line? 16 MS. KIRKENDALL: You get a 17 different regression line especially with 18 influential out because the influential 19 actually moves the regression line with 20 change of data. 21 DR. EDMONDS: Oh, but I thought you 22 dropped them out and then imputed them from BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 350 1 the regression of the other -- 2 MR. DOUGLAS: The regression was 3 run and then that's how they are identified. 4 Then they were taken out and blanked out. 5 And then the regression was re-run and then 6 imputed value was put in. 7 DR. EDMONDS: So that's how you did 8 it. 9 DR. NEERCHAL: Take it out but I 10 think maybe what you think is we go one step 11 further and re-estimate your —————————————— 12 DR. EDMONDS: Right, then it would 13 be identical. 14 DR. NEERCHAL: But they are not 15 doing that because their objective is the 16 imputation only. 17 DR. EDMONDS: Right. 18 DR. NEERCHAL: Also in the previous 19 graph I think the structure in that one where 20 the jump is the blue jumps first and then 21 after a few years red jumps the same amount. 22 And is there an artifact in these kinds of BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 351 1 things? 2 MR. DOUGLAS: Well, each imputation 3 year was run on its own so when you hit right 4 here and the data jumps up the imputation 5 then cleans it and keeps on with the original 6 time series line but then when it's not 7 —————————— this any more ———————————— here 8 the data jumps up again. So it's probably 9 not necessary to clean the data at that 10 point. There were data errors that were not 11 permanently cleaned by this method of 12 automatic outlier influential detection. 13 MS. KIRKENDALL: This one could be 14 that they had found some new companies and 15 added it too. 16 MR. DOUGLAS: Exactly, and the 17 imputations fix that but then it got to the 18 point where it couldn't fix it any more, 19 where it wasn't necessary to fix it any more, 20 because it turned out to be probably good 21 data. 22 MR. BERNSTEIN: Nancy, just on a BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 352 1 side comment for our discussion yesterday, 2 this is an example of something that should 3 be flagged or should have been flagged but 4 still if that data -- 5 MS. KIRKENDALL: Well, but this is 6 just ongoing. I mean, we just discovered 7 these funny data points this year. 8 MR. BERNSTEIN: Right, well, but 9 now you know this one. Whoever that data is 10 should be flagged and said if you're going to 11 use pre-1994 data in this time series be 12 aware that it does this strange thing. And 13 once you know it somebody's got to write a 14 little sticky and make a memo out of it. 15 DR. EDMONDS: Actually it looks 16 like you've already imputed backwards. 17 MS. KIRKENDALL: We're going to 18 have to figure how to do that. 19 MR. HILL: If I remember the state 20 it said Colorado was the one where there was 21 a very large low outlier. It was one of your 22 earlier. I remember it was one of your ————— BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 353 1 —————— 2 MR. DOUGLAS: Yes, Colorado was the 3 one but it wasn't the industrial. It was the 4 commercial sector. 5 MR. HILL: Is it different? 6 MR. DOUGLAS: Yes. 7 DR. SITTER: Could I ask a 8 question? You said this was calculated by 9 year? 10 MR. DOUGLAS: Yes, it's calculated 11 on a monthly basis but if you have 1994 it's 12 using 1993 as its regressor year and when we 13 get to 1995 it uses 1994. So when you had 14 the shock up in 1994 and then with the newly 15 estimated it's taking out the influential and 16 outlier observations in both 861 and 826 so 17 it's reading a different set of data points 18 than the other one. 19 And then in a similar example we 20 have the Arizona residential sales. If you 21 look at the red line it's well within the 22 yellow and the blue lines. The peaks and the BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 354 1 troughs and the extremes are not nearly as 2 much and that was when both again were 3 re-estimated and for comparison purposes the 4 blue line is just 861 re-estimated with 5 outliers influential detected and imputed 6 for, not 826. 7 So you can see the difference 8 between the two lines but it almost mirrors 9 the original data. So some of the extreme 10 data points on a monthly basis were not 11 corrected for or in this case if it even 12 needs to be corrected for. It is more of a 13 comparison. You see in almost every single 14 case the red line maybe prints half of the 15 way up the peaks and the troughs of the 16 yellow and the blue line. 17 So again the results mixed. Maybe 18 the red line is better. Maybe it's taking 19 out points or good data and should not be 20 overwritten and the blue and the yellow line 21 are more correct. 22 DR. NEERCHAL: I cannot see the BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 355 1 blue at all except in two or three places -- 2 MR. DOUGLAS: It mirrors the 3 original data very, very well and that's just 4 with the regressor data —————————————— 5 DR. HENGARTEN: What is under the 6 yellow one? 7 DR. NEERCHAL: Right under the 8 yellow one? 9 DR. HENGARTEN: Yes, it's sneaking 10 under the yellow. 11 DR. NEERCHAL: Then it should be 12 green. 13 MR. DOUGLAS: The third method, 14 like I said, was to create 826 influential 15 and outlier observations as add-ons, assuming 16 that their reported data was good and the 17 value was kept intact but it was not kept in 18 the regression model so the value did not 19 affect the imputation or estimation of other 20 nonsampled or nonrespondent firms. And again 21 the results were mixed, as you can see. 22 If you notice, the re-estimated red BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 356 1 line before 1995 actually made data worse for 2 some reason. From 1995 until about 2001 it 3 mirrored it pretty well, didn't change it all 4 that much. In 2002 it made it worse and then 5 2003 it actually made it a little bit more 6 consistent. 7 So of these three methods discussed 8 so far none of them really deals with any of 9 the extreme data points that we've seen in 10 some of the historical data or would have if 11 they had been incorporated at the time the 12 data was collected. So the two conclusions 13 of these three methods are that automatic 14 outlier and influential detection is not a 15 global solution to deal with bad or erroneous 16 data points coming through in the reported 17 data and existing in the published data. 18 And a more positive conclusion is 19 that possibly the detection of outliers and 20 influentials could help analysts use scatter 21 plots and run a regression and color code the 22 most statistically significant points within BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 357 1 that scatter plot, help them direct their 2 efforts to those plants which need the most 3 attention. 4 The final method that they tried, 5 again with seemingly unrelated regressors, 6 this is still very experimental. In fact I 7 just got an e-mail from Lindolfo this morning 8 with a bunch more graphs. I think you got it 9 too saying look at these, aren't they really 10 cool, and they go along with some of the 11 graphs I'm about ready to show you. 12 SUR allows for information feedback 13 across regressor groups. It uses more 14 information than other regression types, 15 different co-variance relationships and it 16 also utilizes different strata. I'm sure 17 Nancy could go into that a little bit more if 18 there are questions on that. 19 This example was the one we used 20 back in the outlier influential detection. 21 As you can see with the SUR method, the blue 22 line, that shock we were just talking about, BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 358 1 the re-estimated one delayed it for a year 2 but then it still shocked up. With SUR it 3 smoothed it out and was not a single shock 4 but it's more of a consistent time series. 5 But if you notice in the middle about 1998 6 the SUR creates more extreme peaks and 7 troughs than the outlier influential 8 detection method used previously. 9 And then finally New York 10 residential sales was the second example that 11 I gave you. If you notice the point in 12 December of 1999 of reported data that went 13 through to the published data the SUR method 14 actually smoothed that out although the model 15 completely broke down in 2002 for some 16 unknown reason and created more extreme peaks 17 and troughs in the early '90s but the 18 original problem of that one outlying point 19 was solved although it did cause other 20 problems, overall positive but mixed. 21 MS. KIRKENDALL: More work needed. 22 MR. DOUGLAS: Yes, more work BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 359 1 needed. So anyway, in closing, improving 2 upon the imputation methods by both 3 acknowledging survey sampling data quality as 4 well as the need for consistent time series 5 processes is pretty important for EIA and 6 various surveys. Its ramifications could be 7 widespread not only as better time series and 8 data quality but also by the analysts for 9 cleaning the data and things like that. So 10 anyway thank you for your patience and your 11 time. 12 DR. HENGARTEN: Thank you very 13 much, Joel. I'd like to invite Mark Burton 14 to start the discussions. 15 DR. BURTON: Well, if it's an 16 invitation does that mean I'm allowed to 17 decline? As usual I feel like I'm equipped 18 with a knife going to a gunfight. This is 19 interesting to me because a couple of years 20 ago I had somebody ask me to produce some 21 passenger vehicle mileage forecasts and I 22 thought I have no idea how to do this. So I BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 360 1 talked to a friend of mine and she said well, 2 this is very easy. She said passenger 3 vehicle miles are highly correlated with 4 population and land area, land area is not 5 going to change, and our friends at Census 6 provide very good population forecasts, so do 7 the cross-sectional regression and once you 8 have those regressional results use the 9 population forecasts to drive the forecasts 10 for the vehicle miles. 11 It worked very well and this is in 12 a sense a probably more elegant example of 13 the same type of methodology where you're 14 using cross-sectional estimations to help not 15 drive forecasts as much as tune them and so 16 at the broadest level I find it to be very 17 satisfying mostly because it reflects what I 18 might have tried to do if I'd been given the 19 job. 20 Beyond that I have mostly just a 21 lot of questions and I don't think that I can 22 say anything that will add to the methodology BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 361 1 but I do have some questions that might help 2 someone else do that. It seems like in the 3 examples that are used that the sample values 4 before the imputation are almost consistently 5 under the values that result after the 6 imputation so I'm wondering if there's 7 something within the sampling process that's 8 leading to that consistent under-reporting. 9 MS. KIRKENDALL: Well, I think 10 that's just because it's from a sample, it's 11 not from the total, and then the published 12 line is an estimate for the total. So the 13 sum of all the data reported on the sample 14 will always be less than the estimated total. 15 DR. BURTON: So the difference 16 between the two is the actual difference 17 between the annual total and the firms in the 18 sample that weren't a part of the sample? Is 19 that what you are saying? 20 MS. KIRKENDALL: Yes, the 21 difference between those two is the 22 estimation that we made for those companies BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 362 1 that weren't sampled. 2 DR. BURTON: The second question I 3 have is from the discussion it sounds like 4 the regression that forms a basis for the 5 imputation is simply regressing against the 6 annual values. There's other information 7 available for the firm on an annual basis. 8 Has there been any thought of modifying that 9 regression to include more information that 10 might tighten to fit some? I have no idea 11 whether that would be productive but -- 12 MS. KIRKENDALL: Jim Knobb in CNEAF 13 has done a lot of work with these regression 14 equations and I believe he has tried multiple 15 regressors. Particularly he has tried 16 capacity. 17 DR. BURTON: That was one without 18 any real improvement of it? 19 MR. DOUGLAS: ———————————————— wind 20 energy and things like that. 21 DR. BURTON: Well, the idea is to 22 keep the regressions as simple as possible BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 363 1 unless you have got something that adds 2 appreciably to the fit. So, I mean, if that 3 makes me feel better at least somebody else 4 already had the same idea that I had. And 5 along those lines, Nick, I want you to know 6 that I have a note here that says why the 7 run-up in data errors for New York 8 residential so I'd actually written it down 9 before you said it. 10 DR. HENGARTEN: Good job. I'll 11 take your word for it. 12 DR. BURTON: Another question that 13 I had is does the set of sampled firms change 14 over time and if so in what way? 15 MS. KIRKENDALL: It has changed 16 over time. I mean, this is a survey that 17 started in certainly early '90s and we had in 18 the frame everybody that we knew about. And 19 since then, of course, people have tracked 20 the industry and added firms to the frame. 21 They've also observed errors in the 22 estimation and when they thought that the BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 364 1 errors were too big they'd add a few firms. 2 There have been changes in the industry in 3 the early 2000s with deregulation and so they 4 changed certain ———————————— firms. 5 DR. BURTON: Firm mergers? 6 MS. KIRKENDALL: Well, 7 deregulation, they're no longer regulated 8 now. They're public firms and we've changed 9 how we treat them. So there have been 10 changes over time and this isn't always true 11 either but usually they make the changes with 12 January data but they'd add firms to a 13 sample. 14 DR. BURTON: Is there any 15 correlation between those changes and some of 16 the data points that are at issue? 17 MS. KIRKENDALL: Maybe. I think so 18 far what's happened is they've run all the 19 methodology but we haven't really gone in to 20 look into the detail to see what caused funny 21 things and that's probably a step that we 22 need to do at some point. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 365 1 DR. BURTON: The last question I 2 have, and I apologize. Again I have a very 3 little to offer in terms of guidance and just 4 a lot of questions but the last question I 5 have is one of the purported goals as I read 6 this was to try and distinguish between data 7 issues and actual economic shocks. And I 8 wasn't convinced that I was seeing how that 9 happened and it's not the first time I've 10 missed something but if you can help me see 11 how this does that because that's a 12 tremendously important thing to do. 13 If it's an actual data shock then 14 you've got to leave it and let it have 15 whatever influence it has. If it's an 16 economic shock it should be there. If it's a 17 data issue then you want to try and remove 18 the influence. I couldn't quite see how that 19 distinction was being made so if either one 20 of you -- 21 MS. KIRKENDALL: I'm not sure we 22 are making that distinction yet. I think BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 366 1 what's been tried so far is this automatic 2 detection. You call them all outliers, 3 either take them all out, don't use them at 4 all and impute for them, or take them out of 5 the regression but add them back in. I don't 6 think the solution is that if you do it all 7 completely automatic. I think Joel said 8 that. So you probably do need to rely on 9 somebody who's going to talk to the company 10 and find out whether it was a real data 11 point. You'll just never get out of having 12 some of that go on although in other venues 13 we've had problems, as you heard, that 14 respondents are not as knowledgeable as they 15 used to be in this area. I got it off of 16 this form in my spreadsheet. Of course, it's 17 right. 18 DR. BURTON: You guys are awfully 19 knowledgeable about what's going on. 20 Sometimes you probably can look at 21 circumstances in a particular state and at a 22 point of time and say oh, yeah, that's when BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 367 1 something happened to the grid or that's when 2 this terminal went down. There probably 3 would be instances where even without 4 verifying it through the firms you guys know 5 enough to find it. 6 DR. NEERCHAL: But you seem to put 7 this drill down feature to the scatter plot. 8 Does that give these remarks or something 9 from the database for the user to see? Do 10 you have a feature like that in the scatter 11 plot? 12 DR. EDMONDS: It gives a list of 13 all the firms that made that monthly 14 aggregate for that sector and the state 15 whether they're imputed or whether they're 16 observed and their values. One thing to 17 notice if you remember back to —————————— New 18 York residential generation the large spike 19 up that occurred in December of 1999. That 20 seemed like an odd month for one spike up to 21 occur. 22 DR. BURTON: For it to be BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 368 1 nontransitory, I mean, spikes are one thing. 2 There you had a nontransitory shift where it 3 just went up and from then on forward it 4 stayed there and from an analytical 5 standpoint that's even more significant and 6 something to deal with. The bottom line is 7 this is very cool and the fact that you guys 8 are engaged in an effort like this I think 9 reflects the sophistication or elegance 10 pervasive among the work that you all do so I 11 like it. 12 DR. HENGARTEN: Thank you very 13 much. Nagaraj, anything to add? 14 DR. NEERCHAL: I think first of all 15 I should say that Joel did a fantastic job of 16 presentation on this one. I think it was 17 very, very helpful to me really because this 18 new format about two discussants puts a 19 little bit of added pressure on the lead 20 discussant because they have to read it ahead 21 of time and come prepared. 22 DR. BURTON: No, you don't. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 369 1 DR. NEERCHAL: So it ended and I 2 think that a lot of the details brought up 3 today I think will really help if they are 4 part of the documentation. For example, the 5 fact that the equation of known plus unknown 6 in a classical sampling paradigm, seen plus 7 unseen kind of concept, I mean, it was very 8 useful for me to really see it even though I 9 was trying to get that from him. So you are 10 doing a really good job. 11 And I think that I come to this 12 from a different angle because I'm a 13 statistician. I'm a you can say closet 14 econometrician because my advisor was an 15 econometrician and so I understand some of 16 the language but in general people may not 17 understand this. 18 So the first thing I would say that 19 it seems to me that the objective is to 20 impute data. Now, impute obvious errors I 21 think like megawatt being mistaken for 22 kilowatt and things like that, in that case BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 370 1 if it's really identifying an error but once 2 you identify it you know exactly what the 3 answer should be. 4 In other cases where you are really 5 using the prediction ability of your model in 6 a way because you don't know. Something has 7 gone wrong. You have not identified the 8 reason why it is. That is a prediction 9 problem in a way and you can call it impute 10 because it is in the middle of the data. 11 And I wondered what exactly is 12 fixing. Is it just enough to catch them or 13 you want a method that is robust with respect 14 to these things? Those are questions that I 15 couldn't quite get an answer to in the 16 documentation. So I would like to see some 17 discussion there saying what does it mean to 18 say you have a method that's working. I 19 think it would be nice if there is 20 description of that somewhere. And I think I 21 would do that. 22 And some of my other comments are BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 371 1 specific. I believe seemingly unrelated 2 regression, the name was coined by Zellner 3 (?) because saying there are two regression 4 equations they should not really be thought 5 of any way related. Maybe it's the economy 6 of Japan, it's the economy of another 7 country, and he just said if you put them 8 together, do the regression simultaneously, 9 you can get some small sample efficiency 10 gain. 11 I think it is small sample 12 efficiency gain because you really are using 13 the same data as you did earlier. You just 14 use the correlations to do a second stage 15 estimation so as regarding that it will tell 16 a different story. So I think that in this 17 case on the other hand there are unrelated at 18 all. It has different states. There are 19 some borrowing and lending going on between 20 states. 21 So it's an obvious thing to do a 22 simultaneous estimation or simultaneous BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 372 1 system. It seems to me ———————————————— 2 happens to fall under the seemingly unrelated 3 methodology but they're not seemingly 4 unrelated at all. 5 MS. KIRKENDALL: They are seemingly 6 related. 7 DR. NEERCHAL: They're seemingly 8 related. 9 MS. KIRKENDALL: Not obviously 10 related, not seemingly unrelated. 11 DR. NEERCHAL: Or unseemingly 12 related so it seems to me that it's an 13 obvious thing to do and I think this was 14 actually partly brought out last time I think 15 when Joel presented something related to 16 these data that it was brought out. How come 17 you're not borrowing strengths. I think the 18 committee had brought this up, I remember. I 19 think that seems to me an obvious thing to do 20 but on some references in the documentation 21 like the correspondence between regression 22 estimators and the ratio estimators, those BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 373 1 relationships have to be explored under the 2 measurement error ———————— because ratio 3 estimators are like regression without 4 intercept in a sense but that will assume 5 that X with no measurements error or Y with 6 measurement error and so some details are 7 there and those technical details are to be 8 brought out. We cannot say in general one 9 beats the other. I don't think it's possible 10 to do that. And so this is obviously a very 11 data-driven problem and I really like it, 12 though. I think this is really useful to do 13 it using all the states together. We should 14 try to do it as a system. 15 Some specific things that maybe I 16 should probably volunteer later. Even if you 17 do them one state at a time you still get 18 unbiased estimates. Only when you do them 19 together you get a more efficient estimate. 20 That's the reality. They're not biased in 21 any way. I don't think that is the reason 22 why they are different. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 374 1 I think these are also driven by 2 the e-mail that Bill sent to us later, the 3 e-mail exchange between Jim Knobb and there 4 was an e-mail if you would forward it to us 5 later on. There's some comment about there 6 are some production issues. It cannot be 7 implemented during production. That's hard 8 for me to comment on that one that I don't 9 know but I'm looking at it from a 10 statistician's point of view. 11 It seems to me that you have to do 12 them together. So if you can borrow strength 13 from other states you have to try to do that, 14 I think, at least see how much you gain. So 15 I want to stay out of that controversy if 16 possible. 17 And one thing occurred to me during 18 the presentation. It seems to me that when 19 you do seemingly unrelated the differences 20 become less tractable because of the 21 smoothing. That is the other side of 22 smoothing because before you smooth I asked BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 375 1 him why it's jumped. He could easily explain 2 well, it looks like some data points are not 3 cleaned and they were being left in the data 4 segments and so on. Easy to explain why the 5 differences are but the moment you put this 6 seemingly unrelated stuff because it is 7 borrowing stuff from everywhere it is very 8 hard to tell why the difference is still 9 there. I think that is the flip side of 10 smoothing. It's hard to track why they are 11 different. 12 MS. KIRKENDALL: And I think Jim's 13 bias is to keep it simple because this simple 14 regression model has been used for a long 15 time. He's done a good job. And I still 16 think that there can be smart things to do 17 with the outliers and the influential but he 18 doesn't like the idea of the automatic 19 outlier detection because he thinks somebody 20 ought to look at it. He thinks a data person 21 ought to make sure it's right. 22 He has really some good points and BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 376 1 he's interested in a system that runs 2 smoothly and is easy to use and easy to 3 understand and does a good job. He thinks 4 maybe fine tuning what they're doing is a 5 good idea and he likes using scatter plots. 6 He likes the visual way of looking at data. 7 DR. NEERCHAL: I think, suppose you 8 have enough manpower, looking at the outliers 9 is definitely a good thing. I don't think 10 it's possible to tell some errors apart but 11 if it's an obvious error or apparent shift I 12 think really many times it could be difficult 13 to distinguish, I think. 14 So I think I'm partial to the idea 15 that every time the model flags or something, 16 look at it but I think doing things 17 simultaneously with all the states is easily 18 the right thing to do. I think you'll have 19 more to learn from that than doing them 20 separately, definitely, so I want to just say 21 that I definitely think that the one thing I 22 wanted to say was that the documentation BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 377 1 definitely needs improvement because 2 especially when it complains that the 3 documentation or the imputation is not very 4 good. Then your documentation should be 5 better. I think the presentation really 6 helped me to understand better. 7 I think I'm with Mark on that one. 8 I think this is an interesting work 9 definitely from a statistical point of view 10 and I think that I'd like to see if not 11 anything else some simulation to see whether 12 there is really efficiency gained by doing 13 this. Are we unnecessarily ———— spending 14 more energy? 15 DR. HENGARTEN: Thank you very 16 much. Mark has been shaking his head. He 17 can't wait. 18 MR. BERNSTEIN: One thing I need to 19 say on particularly the recent work we're 20 just getting ready to publish, there is a lot 21 of statistically significant difference 22 between states particularly in demand and BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 378 1 price relationships and things like that. So 2 I would not do a national. The states do 3 differ significantly enough except when 4 they're in neighborhoods. So you can look in 5 some senses to neighboring state trends but 6 even then it's iffy. So I would caution 7 against trying to pull these things together 8 because there is enough difference. 9 DR. NEERCHAL: But you don't have 10 to fit the same coefficients for all the 11 states. I mean, you do it as a system I'm 12 asking you, you can build the flexibility of 13 different coefficients, perhaps, but do them 14 simultaneously, borrow the data from each 15 other? 16 MR. BERNSTEIN: But it seems that 17 that gets more complicated particularly if 18 there are statistically significant 19 differences between the states, I think, in 20 both trends and then -- 21 DR. BURTON: I think what he's 22 saying is that you may have consumption here BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 379 1 at one state and here in another but that 2 those two are correlated. Even though 3 they're very, very different the movements 4 are correlated that I'm selling to you or 5 you're selling to me. 6 MR. BERNSTEIN: But actually there 7 are not enough. There are enough differences 8 in both trends and relationships between 9 states significant enough in some fuels and 10 not other fuels. Electricity is significant, 11 gas is not, residential is more significant 12 than commercial, but it really changes. I 13 mean, what we found in this latest work is it 14 really changes from state to state and fuel 15 to fuel. And so you can't you can't do the 16 same thing for residential electricity that 17 you're doing for commercial natural gas, for 18 example. So that's on the one thing. 19 The other, as for Mark's other 20 statement, I'm not what the objective was 21 here. There seemed to be two objectives. It 22 disturbs me that that New York residential BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 380 1 spike could still be maintained in the data. 2 I mean, I know this was yesterday's 3 discussion but that continues to bug me. It 4 shouldn't be there. For sure that wasn't 5 real, that couldn't happen, it's impossible. 6 Well, it's not impossible, I suppose, but 7 highly unlikely and it should not persist in 8 the database. 9 On the other hand Iowa, which 10 looked weird, the one you showed, actually 11 there are good explanations why that would 12 occur. And so it is hard sometimes to 13 distinguish but there are clearly times when 14 you've got these spikes, when you know it's 15 wrong, and it just has to get out of there. 16 MS. KIRKENDALL: That brings up 17 yesterday's discussion and actually Bob 18 Schnapp is here. Bob, would you like to come 19 down to the table? 20 DR. HENGARTEN: Hi, Bob. Come on 21 down, all the way down. Please join us. 22 MS. KIRKENDALL: Just sit at the BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 381 1 table. 2 MR. SCHNAPP: Now I'm really in 3 trouble. 4 MS. KIRKENDALL: I'll give you some 5 background because you weren't here 6 yesterday. Yesterday we weren't talking 7 about electric power data. We were talking 8 about other data anomalies in the EIA. And 9 the committee said that they wanted to see 10 flags when we saw something that was odd. 11 Whether we knew what the answer was or not 12 they'd like have users be alerted to when 13 there are funny things in the data. And, of 14 course, you guys have data back to 1990s. 15 Rather than have to keep correcting them I'd 16 like to say that the 1990 data are fair 17 ———————— and we don't need to mess with them 18 any more but there are some funny things in 19 the old data. 20 So the research questions about 21 putting flags on, how do you put flags on? 22 How do you notify your users that there are BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 382 1 some issues? Sometimes it will be that you 2 know what the answer is and you could fix it 3 if only you had the time and resources. And 4 sometimes you might not know what the reason 5 is. I'm not asking you a question. This is 6 putting it in context because it's something 7 that the committee has brought up. 8 MR. BERNSTEIN: But we did not say 9 yes, that you had to fix every problem. 10 MS. KIRKENDALL: No. 11 MR. BERNSTEIN: You just need to 12 make sure the user knew that you thought 13 there was a problem. 14 MR. RUTHERFORD: That you're 15 working on it. 16 MR. BERNSTEIN: That you're working 17 on it or you could say we're not working work 18 on it, it's too far in the past, but the fact 19 is you just got to let the user know. That's 20 what our focus is. 21 MS. KIRKENDALL: So how do you do 22 that, though? I mean, envision a spreadsheet BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 383 1 with a bunch of numbers on it which is one of 2 the ways we disseminate data. Do you have 3 some other thing, a text message, some -- 4 MR. BERNSTEIN: If there is a 5 particular data point that you know is 6 questionable you can put a comment there on 7 that particular data point in the spreadsheet 8 or somewhere else in the spreadsheet. I 9 mean, there are different ways to do it. I'm 10 generally not the user. It's one of my 11 assistants who's the user. So I could sit 12 down and ask them what's the best way for 13 them to get it but clearly if you know there 14 is a problem then somehow let us know. 15 DR. BURTON: You may even run into 16 an instance where a user seeing that flag 17 says oh, I know what's going on there. In a 18 sense it's a way to enlist -- 19 MS. KIRKENDALL: And could help you 20 find out what the problem was. 21 MR. RUTHERFORD: But just to be the 22 devil's advocate, how is it that the people BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 384 1 who are collecting the data are supposed to 2 know this is a problem or this is just an 3 anomaly, right? I mean, a lot of times the 4 perception of what actually constitutes an 5 outlier or what constitutes a problem is 6 something very specific to the context in 7 which the data is used and we can't expect 8 the DoE analyst to be able to anticipate 9 every particular way in which you could slice 10 or look at the data samples. 11 MR. BERNSTEIN: No, and sure, there 12 are going to be cases that are not as obvious 13 as this. But if you get a case that's 14 obvious as this -- 15 MR. RUTHERFORD: Then why do they 16 have to flag it because you as the user can 17 see it as well as they can? 18 MR. BERNSTEIN: But only if you're 19 actually plotting it. If you're taking the 20 data and assuming it's okay and you're doing 21 regressions on it and you're not actually 22 plotting it when we pull this big data set BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 385 1 down we don't look to see that every number 2 is okay. We assume it's okay and therefor 3 start doing regressions on it. And the only 4 way we uncovered these is because we actually 5 started disaggregating the data to look at 6 things in a more disaggregated way and 7 noticed that six out of the 48 states we were 8 looking at had some weird things in it. We 9 wouldn't have seen it if we were doing the 10 national -- 11 MR. RUTHERFORD: But the thing is 12 that there are so many different ways to 13 slice it and then once you visualize it the 14 right way then you can see the problem. 15 MR. BERNSTEIN: There are obvious 16 ones and there are not obvious ones and 17 they're not going to be able to flag 18 everything but the obvious ones when they see 19 them and they know. And also, I mean, 20 statistically you can figure this out as 21 you're pulling data in. 22 DR. HENGARTEN: Jae? BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 386 1 DR. EDMONDS: I thought that was an 2 excellent presentation. I go back to what is 3 it we're trying to do and ask myself how does 4 this serve the basic product and it seems to 5 me that what you're trying to do is improve 6 the quality of the data that you report. And 7 so it sounds like you're doing the right 8 thing, which is using these approaches to 9 flag stuff that you want to go back and ask 10 questions about, go back and say are we 11 actually getting reported what we intended to 12 get reported. 13 There's an implication that you 14 there's another thing you might be doing and 15 I think are doing all the time which is 16 inferring the values for nonrespondents which 17 you've got to do if you're trying to make an 18 estimate of the universe. There is a third 19 thing you might be doing which flows over 20 into the STEO. Are you drifting toward 21 replacing primary data with your model 22 output? That troubles me a bit. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 387 1 DR. BURTON: It's much cleaner, 2 though. 3 DR. EDMONDS: The data sets will 4 all look great and every once in a while 5 you'll find that you'll get a data set and 6 you'll actually put together a little model 7 you'll stumble across I got a perfect fit. 8 This is really good. But it may be too good 9 to be true and then you find, of course, 10 those weren't really primary data. So I 11 think, given that your principle product is 12 the data and that the models are in service 13 of getting the best primary data, I just 14 caution, be careful about drifting over into 15 that other piece of business. It's important 16 to infer for nonrespondents but don't let 17 just a model override and overwrite the 18 primary data because it's really the primary 19 data which is your product and in hindsight 20 it may well turn out that this thing actually 21 did happen when you go back and investigate 22 it and there is real important information BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 388 1 that you don't want to lose. 2 DR. BURTON: A couple of years ago 3 I was assigned to try and estimate flood 4 damages based on flood characteristics. And 5 so we took all these damage estimates from 6 flood events and started regressing them 7 against the flows and other flood 8 characteristics and the model fit was 9 absolutely pristine and I looked and said 10 this is great. Look how good this is. Turns 11 out once we started showing this to the Corps 12 of Engineers they said well, we didn't really 13 actually have specific estimates. We just 14 used the flood characteristics. No wonder my 15 model fit so good. 16 DR. HENGARTEN: Walter? 17 MR. HILL: I found this 18 presentation fascinating —————————————— quick 19 question maybe given the time —————————————— 20 In the Oklahoma data, for example, there is a 21 shock and I'm wondering if after that you 22 observed that there are upward estimates or BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 389 1 perhaps the shock is propagating throughout 2 that latter part ———————————— those latter 3 observations because your regression model 4 has —————————————— you may have looked at 5 that but it might be something —————————— 6 MR. DOUGLAS: ———————— the base 7 imputation for ———————————— should have 8 matched up perfectly. 9 MR. HILL: It was not quite long 10 enough for me ———————————————————— quickly. 11 The other ———————————— so it does turn out 12 that the problems with the estimate might 13 occur because of the shock propagating a year 14 later. 15 MS. KIRKENDALL: I think in a lot 16 of these cases especially where the 17 re-estimation didn't do what we thought it 18 might do we need to take a look and see if we 19 can figure out why. 20 MR. HILL: And —————————————— my 21 other point if that shock really should be 22 there if I was the data user I'd want to know BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 390 1 that there was a low outlier in that 2 particular observation rather than have that 3 get cleaned up and —————————————— explanation 4 for it ———————————— something happened to the 5 weather that —————— that observation low if 6 that was the case or for the New York data 7 it's the month before the millennium —————— 8 it looks like the difference is more than 9 celebrations but there might be some easy 10 explanation. 11 DR. HENGARTEN: Do you want to 12 reply? 13 MR. SCHNAPP: I've been taking down 14 some notes as I had been hearing this. Let 15 me give you a little bit of background. I'm 16 not sure if it explains any of these data 17 anomalies but let's see. In 2002 the form 18 was changed so that we could collect 19 information from energy service providers. 20 I'm not just talking about Oklahoma here but 21 I'm just talking about in general. 22 Before that it was easy. We would BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 391 1 send the form to electric utilities. They 2 would tell us who all their customers were, 3 what their revenues, what their sales were. 4 That was easy but when the states deregulated 5 then we had to start collecting information 6 from energy service providers. Well, they 7 only give you information about what their 8 customers and their revenues are. So now you 9 have to pick up the other piece of the 10 revenue from the distributors. 11 So theoretically inside of a state 12 those numbers should match. The number of 13 customers should match and the sales should 14 match as well, the sales versus the amount of 15 electricity that you're transporting there 16 for them, and what we found is that they 17 don't. 18 And so the 2001 data actually did 19 reconcile that data. We worked with Nancy's 20 group to reconcile that. That wasn't done in 21 2002 or 2003. It was done this year for 22 2004. And this year Tom Leckey has actually BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 392 1 gone back and revised it for 2002 and 2003 so 2 that they are consistent and so we'll be 3 putting out a revised historical Excel 4 spreadsheet hopefully inside of next month or 5 so here. 6 Another large change that happened 7 was in 2003. There was a residential, 8 commercial, industrial sectors and then there 9 was another sector called other. And most of 10 the stuff that went into other was things 11 like public street lighting and irrigation 12 and transportation. And we did away with 13 that category and we created the 14 transportation category so that our 15 information could be consistent with the rest 16 of EIA's data. 17 And we asked them to take whatever 18 they're reporting that should have been in 19 the commercial or the industrial sector and 20 move it there moving forward. And so we did 21 that with the 2003 data for the first time 22 and that was fairly good data. I would say BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 393 1 the 2004 data has gotten better but what you 2 see is then it couldn't explain some of the 3 shocks that you see there, either things 4 moving in or moving out. You need to add 5 together the commercial, the industrial, and 6 transportation in order to see what's 7 happening there. 8 You also have about two years ago 9 the acting administrator asked us to look at 10 the sales data to see what was going on 11 because there seemed to be something funny 12 between the commercial and the industrial 13 sectors. Having made lots and lots of phone 14 calls, what we found out was that there was a 15 number of different things going on. You 16 have companies that merge and so the prior 17 companies categorize certain customers, for 18 example, as industrial because they give them 19 a real low rate but they're really commercial 20 customers which they categorize as 21 industrial. 22 But when they're taken over by BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 394 1 another company they say well, they're not 2 industrial, they're commercials, they report 3 to us as commercials. So now you see a 4 movement of the customers and the sales and 5 revenue going in that direction. And it 6 might look odd but that's what happens. You 7 also have -- 8 MR. BERNSTEIN: Could I stop you 9 for one second? 10 MR. SCHNAPP: Sure. 11 MR. BERNSTEIN: Do you explain that 12 anywhere in the data? 13 MR. SCHNAPP: Well, I mean, the 14 movements are fairly subtle. I mean, they're 15 there but to explain every movement of a 16 couple of percent is difficult. 17 MR. BERNSTEIN: It's not really a 18 shock. A shock means something that's going 19 to significantly change the analysis you're 20 doing. So, I mean, there are subtle changes. 21 That's fine. 22 MR. SCHNAPP: Right, and that's all BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 395 1 they were talking about because like a 2 percent or two here was throwing off the STEO 3 model. And so they wanted to know what was 4 going on there. We also have a case where a 5 company starts to build a mall, for example, 6 and all the electricity being sold to that 7 company is categorized as industrial because 8 an industrial facility is building this mall. 9 The question is after they're finished does 10 that become a commercial sale or they leave 11 as industrials and they do both. 12 So we can't control them as to who 13 they're classifying as what. We give them 14 our definitions and we can't sit there and 15 hold their hand with every single customer 16 that they have to figure out where they go. 17 The last thing that I wanted to 18 mention was I'm pretty sure I'd come here and 19 talked to you a couple of times about our 20 Internet data collection system. And what 21 that does is we have built-in edit so that 22 when the respondents key in their data when BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 396 1 they're about to submit it it automatically 2 compares their data to the previous 3 submission that they had. And if it's 4 outside of a certain bound then it says 5 you've got an error here. You need to fix 6 this before you submit it. 7 They are allowed to submit it if 8 they override it and explain why. And then 9 we look at that explanation and see if it's 10 reasonable and either accept that or we call 11 them back. So our Internet data collection 12 system right now for all of our forms we 13 collect about 36-37,000 forms during the 14 year. About 88 percent of them came in on 15 the Internet this year. 16 And as far as the 826 is concerned 17 it's probably closer to 95 percent came in on 18 the Internet. So most of that already is 19 clean and there are about 450 respondents on 20 the 826 out of 3,000-ish that's on the 861. 21 Those 450 account for somewhere between 85 22 and 90 percent of electricity sales. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 397 1 So we're getting the bulk of it and 2 what we're talking about in this presentation 3 is doing the other 10 percent. So if you're 4 looking at the national number, I mean, the 5 impact is not going to be that great. At the 6 state level it's much more important and 7 those are things that we still have to look 8 at but we're very comfortable with our 9 Internet data collection system. 10 Particularly on the 826 and the 861 they 11 really do keep perfecting them on the edit 12 side. 13 And we're also, frankly, actually 14 very excited about the scatter plots, which 15 still could show us outliers, and the way 16 they put it together, you see an outlier, you 17 click on that dot, and it gives you all the 18 information you need to know. So it's very 19 useful to us and those are the comments that 20 I have from having heard all of the things 21 that I've heard here. I'm not sure if that's 22 helped or hurt your understanding of it but BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 398 1 that's what I am going to offer to you. 2 DR. HENGARTEN: Thank you. I'd 3 like to resonate Jae's comment to understand 4 what is it that we want to do. And the way I 5 see it I don't see three problems. I see two 6 problems and the first problem is the obvious 7 one. It's the one I think everybody thinks 8 is in terms of predicting or is the current 9 estimate the correct one. And if you look at 10 the models that we've used you use the past 11 data and you're trying to figure out is this 12 observation valid, do I make an imputation, 13 is it an outlier. 14 And then there's the other question 15 that we've talked about yesterday which is 16 data quality. And data quality sometimes 17 cannot be made on the fly. Sometimes 20/20 18 hindsight is a useful thing, especially for 19 data quality. That's how sometimes we find 20 those errors that, you know what, that thing 21 just didn't look good in hindsight. And so 22 when you're going to do the data quality BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 399 1 you're allowed to use both past and future 2 observations and that can change. 3 So, for example, if you have a 4 change in level if you look at this and oh, 5 yes, there's a change in level, that doesn't 6 seem like an outlier. It's a break, it's 7 something that happened, but it's not an 8 outlier. And so really thinking of what use 9 you want to do will influence what kind of 10 data you're using and how you're using. And 11 for data quality you're allowed to use future 12 observations to predict the past. At least 13 morally I would feel satisfied if you would 14 do that. 15 So I know it's a small detail, 16 maybe, but I think it's something to keep in 17 mind of what use you want to do with the data 18 because it influences your model and what 19 you're going to do with it. There are still 20 time series but sometimes the way you're 21 going to use the data is different. So that 22 was my comment and I think it goes along a BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 400 1 lot with the discussion we're having here. 2 DR. NEERCHAL: I think we are 3 running over time, I think. 4 DR. HENGARTEN: You already had 5 your turn. 6 DR. NEERCHAL: I just wanted to 7 mention I think the NHTSA people have a data 8 set on accidents. They have a column there 9 on blood alcohol content and this is a 10 notoriously difficult number to get because 11 many people refuse to take the test and 12 sometimes they forget because it's such an 13 emergency that they don't have time to give 14 this test. And so this is one of the 15 spottiest columns in that vehicle accidents 16 data in NHTSA. 17 And they have done imputation and I 18 think that Jae's comment reminded me so they 19 still make the original data available. So 20 it's not imputed data only and since you 21 already know people in NHTSA they will be 22 able to give you more details on it. And I BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 401 1 think in terms of how they handle the 2 publication of it it's a good example. 3 DR. HENGARTEN: Well, thank you, 4 everybody, for this lively discussion. I'd 5 like to invite the committee to split in two. 6 We have break-out session. 7 MR. COLE: One quick question or 8 comment, would it be possible to include a 9 small sample of below cut-off firms in a 10 monthly survey? These cases right now are 11 being imputed based upon a Russian model. Or 12 as an alternative would it be possible to 13 develop two models, one for nonresponse for 14 the larger firms and one for nonsample cases 15 based upon the smaller observations in the 16 sample? 17 MS. KIRKENDALL: They don't have 18 very much nonresponse on this survey but it's 19 nice to have an automatic imputation method. 20 I mean, in fact usually they don't have any 21 nonresponse or if they do it's on the really 22 small ones. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 402 1 MR. COLE: Would it be possible to 2 build a model for nonsample cases based upon 3 the smaller units in the sample? Small firms 4 are different from big firms and they have 5 different behaviors and they may in fact have 6 a different pattern. That's all. 7 MS. KIRKENDALL: The initial survey 8 design for this did have a probability sample 9 and the problem with the small firms is they 10 don't like to report to us and when they do 11 report to us they report bad data. 12 DR. HENGARTEN: Any more questions? 13 So I'd like to invite Mark Bernstein, Mark 14 Burton, Jae Edmonds, and Tom Rutherford to go 15 downstairs to Room 5E-069 and the other 16 committee members will stay up here for the 17 break-out session. We should reconvene at 18 10:45. 19 (Recess) 20 MR. HOUGH: Good morning, everyone. 21 It's tough to follow that lively discussion 22 but we'll give it a shot here. My name is BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 403 1 Richard Hough. I'm familiar with most of you 2 folks. I've been coming here to talk to you 3 about some Census Bureau project. Now I 4 think this is the fourth straight meeting 5 that I've had the pleasure of coming up here 6 and talking to you. I'm here today to talk a 7 little bit about the final two evaluations 8 from the frames evaluation project that we 9 undertook with the CNEAF surveys and CNEAF 10 stands for coal, nuclear, electric, and 11 alternative Fuels. 12 I'm here with Vicki Haitot, who is 13 the senior analyst on my staff and did the 14 majority of the groundwork for this project. 15 Also here today from Census is Susan Bucci, 16 who is the branch chief of the Construction 17 and Minerals Branch, where the work was 18 conducted. Also Stacy Cole is the branch 19 chief from the Research and Methodology 20 Branch within the Manufacturing and 21 Construction Division. 22 Also sitting in the back there is BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 404 1 another member of my staff, Ashley Robinson, 2 who worked with Vicki on the project, and 3 Bill Bostic, who is the division chief of the 4 Manufacturing and Construction Division and 5 he'll be available to answer any questions 6 that I cannot when we get to the end of the 7 presentation. 8 I think most of you were here in 9 the spring when I presented the results from 10 the first three surveys that we did a frame 11 evaluation on. This is actually the 12 conclusion of the project today. A little 13 overview of what we're going to talk about 14 today, first of all I'm going to talk to you 15 a little bit about background and some of the 16 purposes of the frame evaluations, give a 17 little bit of summary on the final two 18 surveys that we evaluated, talk to you a 19 little bit about the generalized two-step 20 matching process that we went through to 21 match establishments on both survey frames. 22 Then Vicki is going to come up and BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 405 1 present some of the results. She's going to 2 talk to you a little bit about some of the 3 implications of the frame structure and how 4 we dealt with that. Then she's going to give 5 you the results of both by coverage, by 6 establishment counts. She's going to talk to 7 you a little bit about coverage by volume and 8 then we'll give a summary of what we found on 9 these two surveys. 10 Then I'll come back up and talk to 11 you a little bit about some general 12 conclusions from the project as a whole, talk 13 a little bit about the next steps for EIA off 14 of these results, and then we have a question 15 for the committee that we'd like to end with, 16 maybe possibly having a little bit of a 17 discussion. 18 The purpose of the frames 19 evaluation, we wanted to evaluate the 20 coverage of the EIA frames, we wanted to 21 identify differences between the frames, and 22 we wanted to supply EIA with any BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 406 1 characteristics of the missing establishments 2 to point them in the right direction to go 3 and enhance their frames if they wanted to. 4 The final two evaluations that 5 we're going to be talking about today, both 6 evaluations were conducted using the 2002 7 survey frames. The first one was the EIA-3, 8 which is the quarterly coal consumption and 9 quality report for manufacturing plants. 10 This survey collects data on coal consumption 11 within the manufacturing sector of the United 12 States. 13 The second evaluation that we 14 conducted was actually a EIA frame subset of 15 two EIA surveys. The first survey was the 16 EIA-860. This is the annual electric 17 generator report. This survey collects 18 information about generators from electric 19 power producers. The EIA-906 is the monthly 20 power point report and this survey collects 21 operational information about fuel 22 consumption and electricity generation from BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 407 1 regulated and unregulated electric power 2 plants. 3 From these two surveys we created 4 what we called the combined heat and power 5 plants frame. The analysis that we're going 6 to talk about will evaluate the coverage of 7 these plants and most of them are in the 8 manufacturing sector. The EIA-860 had over 9 5700 establishments of which 562 were non- 10 regulated and on the 906 frame in 2002. 11 The nonregulated portion of the 12 electric power industry consists of these 13 combined heat and power plants and 14 independent power producers. The analysis 15 will focus on the 645 combined heat and power 16 plants or what we'll call the CHPs that were 17 self-classified on EIA's frame within the 18 manufacturing sector and throughout the rest 19 of this presentation we will refer to it as 20 the CHP frame. 21 In order to do the analysis we had 22 to match units on EIA's frames to units that BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 408 1 we have at the Census. The first step was to 2 match reporting units to our business 3 register and we needed to do this to 4 determine if there were manufacturing details 5 available in what we call the back end 6 dataset. Now, the business register is the 7 universe of establishments that the Census 8 Bureau uses. It's a large data set and it 9 contains classification information where 10 establishments are classified based on 11 primary activity at the plant or at the 12 facility that's being considered. 13 So only the establishments that we 14 were able to match to manufacturing in this 15 step of the matching process are included in 16 the results that you'll see when Vicki comes 17 up to talk to you. And, as I just stated, 18 getting ahead of myself a little bit, the 19 second step was then to take these 20 manufacturing establishments and match them 21 to establishments that were on the 2002 22 manufacturing energy consumption survey. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 409 1 The manufacturing energy 2 consumption survey is a sample survey based 3 on the economic census. The reason we used 4 this survey in this time, if you remember in 5 the spring we did the analysis with the 2002 6 economic census. The data was better for 7 this type of analysis on the MECS. We have 8 more detail for electricity consumption and 9 production and we also have more detail on 10 coal consumption. 11 For the economic census we did not 12 have a specific piece of data for coal 13 consumption that we could match to. Coal 14 consumption in the economic census is 15 considered a cost of fuels and we had no way 16 to break out what portion of that fuel was 17 spent on coal and what portion, for example, 18 would be spent on natural gas and other fuels 19 used at the establishment. So we matched to 20 the manufacturing energy consumption survey 21 and that's the data that will be used in the 22 results. BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 410 1 Vicki's going to come up and talk 2 to you a little bit now about what she found 3 when she did the analysis and present some of 4 the results, all of the results, actually. 5 MS. HAITOT: Good morning. The 6 coal consumption survey frame has 7 manufacturing establishments as the target 8 population of the survey. We used a matching 9 program which we created it from Stacy's 10 office to match the companies on EIA-3 by 11 name with establishments in the business 12 register. And then we had to look at those 13 to make sure that they were accurately 14 matched. The matching program had a higher 15 percentage of one to one matches. 16 There is a threshold of 1,000 short 17 tons of coal consumed annually which was 18 easily implemented in this survey. There are 19 28 establishments on MECS that consume this 20 and so we took them out of the analysis. For 21 the CHP frame, which was the subset of the 22 annual electric generator frame, and the BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 411 1 monthly power plant report we had to match 2 establishments a different way because the 3 program had a very small percentage of 4 matches. 5 The manufacturing establishments 6 were a very small percentage of the target 7 population for this survey and this survey 8 had a threshold of 1-megawatt capacity. So 9 there was no generic relationship between 10 capacity and generation which the MECS has on 11 their form. So we took the two files and we 12 wanted to see what level of generation would 13 be equivalent to the megawatt capacity of the 14 generators so where the matching on the MECS 15 became less frequent with EIA's 16 establishments we determined was the 17 threshold for MECS which was around 2,000 18 megawatt hours, so we took everybody that was 19 generating less than 2,000 megawatts off the 20 MECS for the comparison. 21 There was a little bit of confusion 22 matching plants of the EIA-60 because of the BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 412 1 way that they classified their 2 establishments. The MECS is concerned with 3 the activities of the facility level so our 4 frame is based on which facilities. Well, 5 EIA is concerned with activities going on in 6 the generator stage so they're concerned with 7 what's happening at each generator. 8 Well, it turned out that some of 9 the generators could be owned by more than 10 one company and they might not even be 11 manufacturing. Like, company 1 could be a 12 utility while company 2 could be 13 manufacturing and so it would be hard to 14 figure out which one actually was at the 15 facility or owned the facility so it made 16 matching a little more difficult and it may 17 result in some false nonmatches. 18 So the coverage by establishment 19 count for the coal consumption survey, we 20 used a ratio of 355 matched out of 452 total 21 establishments in the sample for the 2002 22 MECS. We initially found 37 establishments BETA COURT REPORTING www.betareporting.com 202-464-2400 800-522-2382 413 1 on EIA's frame not in manufacturing. There 2 were 23 in mining, four in utility, four were 3 in government, four were in some other 4 sector, and three we were unable to match. 5 So that accounted for 14 percent of EIA's 6 consumption, very small, so they're missing 7 eight establishments off the MECS survey in 8