An Attempt to Crack The Real-World Mechanism [1]
by: Dhaniel Ilyas
Enough about my humble attempts on econometric philosophy and ethics issues that I had written in my previous two writings, I will now try in lettering some words on some semi-technical econometrics topics with simple explanations. My previous writings were in Indonesian language. Since I know we have a noteworthy amount of interested blog-reader, I will now give a shot in making articles in English.
Most econometric models contain unknown parameters. An estimate of these parameters (in a model) is crucial in knowing the behavior of the variables relating to it. To compute the parameter estimates we need two things: A model describing interaction among variables with certain set of parameters and a sample made up of real observed data. Thus, if the model is correctly specified, it will describe the real-world mechanism which generated the data in our sample.
This process does not come without problems. First, nobody knows, even the smartest econometricians alive, the ‘true’ real world mechanism which generated the sample data.Second, often the reliability of the data mining process from surveys is questionable. I have never tried to find any research concerning Indonesia’s Central Bureau of Statistics (BPS) data quality (may be someone can point me out to certain studies). Actually, there have been some techniques in econometrics for minimizing the bias that comes from ‘real’ data error measurement. But if the bias is severe, there is nothing much we can do.
Population is the base from which a sample is drawn. A model is made for explaining what is going on in the population, using the sample data, in order to make inferences and forecasting. Once upon a time, when statistics was biostatistics, their object of studies was the human population from a specific town, villages or country from which random samples were drawn. The average weight of all members in the population would then be estimated by the sample mean of individual’s weight. The sample mean was an estimate of the population mean. The idea was to represent the population by using the sample that could also save time and money. In contrast, the use of the term population in econometrics is simply a metaphor.
A better way to explain the last statement is by introducing the concept of data-generating process (DGP). By this term, any real-world mechanism that is at work in actual economic activity process, it is precisely the mechanism that our econometric model is supposed to describe. Thus, a DGP is the analog of the population in biostatistics as sample could be drawn from a DGP just as they may be drawn from a population. This seems too technical to be digested. Let just say that if I were the know-all-superman, and I have a magic-super-complex mathematical equation that can explain perfectly about all things that happened in the world describe in the form of parameters, variables and some well-defined stochastic element, I can explain the past, present, and future with remarkable accuracy. It is exactly what we after in building a model: cracking the real ‘true’ world mechanism (with our naïve and limited mathematical and computation ability) as close as we can be.
A model is build to understand the existing phenomena. A class of models may have a general (mathematical) form within which the members of the class are distinguished by values of the parameters. In a model that are not mathematically tractable, computationally intensive methods involving simulations, resamplings, etc may be used to make the desired inferences. The process of model building required continous refinements. The evolution of the models proceeds from vague, tentative models to more complex ones, along with our understanding of the process being modeled. It is not possible to measure bias or variance of a model selection (from any arbritary set of models), except in the relatively simple case of selection from some well-defined and simple set of possible models.
Now, let me explain a bit into the practice of modelling. With our limitation to process all the data in the world, we need to create a reasonable strategy in our modelling process. First, we need to form a solid theory about the phenomena we want to observe. This is critically needed because basically we can throw any variables into our (simplified) model. Do you think a falling leaves in autumn will have an effect to certain stock price? Although some ambitious modeller could believe that this seemingly uncorrelated varibles can actually relate, I doubt that an economist will pursue such modelling strategy. And even if I have a good model that explain elegantly the change in one or some variables in terms of other variables, it does not imply true causation. Let me quote what Gujarati said in his book,”…a statistical relationship per se cannot logically imply causation…”. From this apriori or theoritical considerations we go forward into the ‘jungle of empirical study’. Many of us experience surprising result that mounted into a variety of problems, but as life itself, we have to put up with it and struggle all the way. Thus we have come into what people say of a knowledge that is described more as an ‘art’.
Everyday, we look into the debate of outstanding economists, econometricians, physicists[5] statisticians, and mathematicians with their own economic model that they believe were superior, but at the end of the day the winner is hardly found in the mist of our limitation in comprehending the world perfectly. But we have to acknowledge also the positive result and insight from each of their models that shapes our economic advancement until today. Never forget that the ‘real’ judge is not a bunch of towering scientist that has a privilege to say a model is good or bad. It is the effect of the positive spirit in the model, acknowledge by the real advancement of human beings in all the related aspects, which will make a model superior. Thus as a modeller myself, I will say,”one effective and efficient model for one problem at hand.”