Thursday, March 9, 2017

PROJECT ON CORRELATION AND REGRESSION



CHAPTER ONE

1.0     INTRODUCTION

          Very often in practice a relationship is found to exist between two or more variables. In some research problem, two measurements are taken on each of the unit consideration. We may be interested in finding a relationship between “height of students and their respective Weights” Income and expenditure”, “unemployment and crime are” and so on.
          The statistical method used to find out if there exist any relationship between two sets of variables and to establish an equation to represent this relationship which can also be used for prediction is known as “CORRELATION AND REGRESSION”. The measure of the classes or degree of relationship between two or more variables is known as Correlation analysis while the equation of line that represent the relationship between two or more variables which can also be needed for prediction is being referred to as” Regression Analysis”
          The major focus of correlation and Regression is to study the changes in one variable called the dependent variables “Y” that is brought about as a result of values of the other variables called the independent variables ’X’. The independent variable can usually be manipulated or controlled while the observed response is recorded as the dependent variable. A mathematical model relating the two variables can usually be formed. For this analysis, it is assumed that the measurement is at least on the internal scale and that the dependent and the independent variables have a linear relationship.
          In the course of this project, we shall consider sales and advertising expenditure Maltina, drink one of the product of International breweries PLC, Ilesa, using the statistical method correlation and Regression.

1.1     HISTORICAL BACKGROUND

          International Breweries PLC Ilesa was incorporated in Nigeria as a private Limited liability company on 22nd December 1971 but commenced operation in December 1978. The objective was to establish a brewery to produce market high quality beer with the brand name: Trophy” The company operates from its brewery, the company also brews and sells the “major” brand of larger beer, a beer brewed with 100% local inputs. Besides larger beer, the company equally produces and bottle “Maltina ” a non-alcoholic beverage drink to increase the variety of its products. Over the years, these products have not been accepted locally with NIS award, but also in the international markets. All these products are now from 100% local cereals and the company has developed unflagging resolve to stay in the front league in the production of high quality products in over competitive beer and beverages markets with the overall objective of satisfying the consuming public in Nigeria.
          International Breweries PLC Ilesa has highly well organised sales and advertising units. These units are under the marketing department of the company. The sales and advertising managers who are also member of the sales and advertising practitioner council of Nigeria (SAPCON) are headed by the head of the marketing department. The creative section of the sales and Advertising units of the company is under the leadership of a professional graphic artists. Each of the company’s product is being handled by a separate advertising agencies.
          The company therefore uses the outdoor media (bill boards and street signs), print media (newspaper) and electronic media (radio and television) as means of advertising.
MALTINA
          Maltina is one of the product produce by International Breweries Plc and was introduce after the production the Trophy to boost the company income. It is a non-alcoholic drink, which has generally contributed to the nationwide recognition of the product and the company. The liquid content constitute the following:
          28CL liquid content in a bottle made of the following ingredient; malted sorghum, maize, sucrose.
          The 40% of the annual total sales at the company is generated from maltina  and the company spent much in advertising this product.
This cowpony also encounter some problems such as:
1.            GOVERNMENT POLICY: This occurred generally through government intervention by introduction of taxes and tariffs which affect the income of the company.
2.            COMPETITORS STRATEGIES: This accrued, when there is close substitute due to the enactment of decree 62 of 1979 constitution of the Federal republic of Nigeria which lead to free establishment of both alcoholic and   non-alcoholic dunks, the company has been facing series of competition from close substitutes in the market which reduces total profit and over all income generation of the company. Example of close substitutes can be seen in Nigeria Breweries Plc (NB) which manufactures malture which compete with this product of the International Breweries Plc. Others products that complete with this product are malta Guness of Gueinss Plc, Hi-malt of consolidated breweries and so on.
3.            Technological know-ho: Due to the uncontrolled an incessant inflation in the nation, the company can only afford less expert in the manufacturing. This also lead to the importation of machines which caitres the less development of the company.
4.            Financial problems: This affects the growth of the company due to the creeping and hitting inflation in the country. This has made the government stop financial assistance grant render to companies. Both short and long-term loans have dropped drastically.

1.2     AIMS AND OBJECTIVES

          The main aim and objective of this project work is part of the partial requirement for the award of the National Diploma in statistic. It also to test  the students knowledge as to applied theoretical knowledge achieved in study of statistics as a course with particular application to regression and correlation to real (practical) life. We are to consider the use of correlation and regression analysis on a certain product. The project objectives shall include the following.
·        To determine, if there exists any relationship between advertising expenditure.
·        To determine the dependency of advertisement expenditure an sales.
·        To measure the degree of association between two variables.
·        To test for the significance of regression and correlation co-efficient.
·        To verity the type of relationship between the two variables
·        To use the necessary concept is known is statistics in other to explain to explain the variation between two variables or how sales affects advertising expenditure.
·        To forecast the future behaviour of those sales and advertising expenditure.
·        To improve the activities of International Breweries through reasonable recommendation based on the result out statistical analysis.

1.3     RESEARCH HYPOTHESIS

          The following research hypothesis has been formulated;
Ho:    There is a relationship between sales and advertising expenditure of Maltina.
Hi:     There is no relationship between sales and advertising expenditure of Maltina.

1.4     SIGNIFICANCE OF THE STUDY

The study is very important the sense that if enables us to really understand the rate at which sales affects advert expenditure.
The data (information) used in this research work is based on yearly sales and advertising expenditure of a product. Although this of this nature might have been carried out before, it significance has an effect on academic knowledge (both economically and statically cannot be over emphasized.
1.5     SCOPE AND LIMITATION
This project with the topic regression and correlation analysis will be used to analyse the data on the sales and advertising expenditure of Maltina  non-alcoholic drink, of the product of international breweries plc Ilesa from the year 2005-2014. The ten years set of data used in this study is a sample taken from sales and advertising expenditure of Maltina  non-alcoholic obtained from regression and correlation coefficients are sample of the population parameter of the IBPLC Ilesa. The data used is based on the period of ten years and analysis is done on these data inferences are made and useful recommendation are made for the company.
In the course of collecting statistical data, there must be a problem associated to it. The problems may be technical and altitudinal problems. In this problem some technical problems forces are:
(i)           Inability to go to every company that produces malta drink to collect data. That is why we concentrate on International Breweries Plc Ilesa.
(ii)          Time Constraint time in collecting the data is very start.
(iii)        Inability to meet despondence on time is another problem faced. The company was visit on several occasions before data could be obtained.
(iv)        Economic problem such as increase in transport fare also constituted a great problem.
(v)         Distance of the company also affects.

1.6     DEFINITION OF SOME TERMS

(1)         REGRESSION: This is a statistical device with the objective of analyzing the association between two or more variables.
(2)         CORRELATION: This is the measure of degree of association between two or more quantitative variables.
(3)         DATA: Data is a set of numerical information colleted for a particular purpose by an investigation.
i.             PRIMARY DATA: It refers to the statistical data (information) which the investigator originate himself for the purpose of enquiry in hard.
ii.            SECONDARY DATA: This refers to those statistical data which are not originates by the investigator himself, but obtained from some organizations, either in published or unpublished forms.
(4)         BIVARIATE: This is a set of values which appear in part whereby one value, Y depends on other value X. Hence, sales is department variable Y, while advertising expenditure incurred by the company is independent variable X.
(5)         D.F   -     Degree of freedom.
                   SST  -         Sum of square total
                   SSG  -         Sum of square error.
                   SSA   -        Sum of square among the group
                   SS     -         Sum of square
                   MS    -         Mean of square
                    Fial    -        F – test calculated value
                    ab      -        Parameters, regression coefficient
                    SXY   -        covariance of X and Y
                   SXY   -        Covariance of Y
                   SXX   -        Covariance of X
                   n        -        Number of observation
(6)         VARIABLE: This as a factor that changes quantitatively or qualitatively.
(7)         DEPENDENT VARIABLE: This is the variable that represents the observed outcome to an activity or a venture. It is the outcome of an experiment of a production process. It be in response to another variable that feeds it.
(8)         INDEPENDENT VARIABLE: This is a variable that can be controlled or manipulated at will by the investigator. It can be called an imput.
(9)         X – A bold face lower case letter or the upper case form to represent a random variable.
Xi               - represent the ith value of the variable X
Ã¥                   - represent addition therefore.
Ã¥Xi       -  represent the addition of the value of the variable X
Ã¥XY     -  represent the addition of the product of X and Y
Ã¥X2 - Represent the addition of the square of X
(10)      IBPLC – This means International Breweries Public liability company
(11)      CONFIDENCE INTERVAL: Internal: This can be simply refer to as the region of acceptance.
(12)      LINE OF BEST FIT: This is the line that best represent the data on the graph.


CHAPTER TWO

2.0       LITERATURE REVIEW

2.1       REGRESSION AND CORRELATION ANALYSIS

            In this chapter, we shall discuss the various method that can be to determine if there exist any relationship between two variables and shall be express numerically.
REGRESSION: This is measure of relationship between two or more variables (X, Y) where the value of one (Y) or depends on the other X. regression analysis can so be refer to as the procedure by which an algebraic equation is formulated to estimate the value of a continuous random variable given the value of another quantitative variable. The variable for which the value is estimated by the regression equation is called the dependent variable; the variable used as the basis for the estimate is called the independent variable (Leonard J. Kazmier, 1979). When there is only one independent variable in the regression equation, we have a simple regression. When the regression is a linear one, then we have a simple linear regression. When we have more than two independent variables, the relationship is a multiple one.
ASSUMPTION FOR THE USE OF REGRESSION ANALYSIS
            Regression analysis is based on some assumptions, among these are:
(i)            The relationship between the two variables must be a linear one.
(ii)          The independent variables are fixed while the dependent variables are random.
(iii)         The conditional distributions of the variable must have equal variances.

2.2       SCATTER DIAGRAM

            This is the diagram required to express the relationship between variables consider the case of a simple linear regression, a set of pairs of values (X, Y) are obtained, the points represented by (X, Y) are plotted Y against X the graph (diagram) obtained is known as SCATTER DIAGRAM from this, one may judge whether the variable are linearly associated or not. If the graph shows a linear association, then the following regression model can be fitted into the data.
            Y = a + bxi + ei …………………(i)
This is the general model that usually fitted, a and b are the population parameters, a is called the intercept (value of Y when X = O), b is called the slope or the regression coefficient (the rate at which variable Y changes for every unit change in variable x), ei is the random error component.
            The sample estimate of Y is given as
            Yest  = a + bXi ……………………… (2)
Then   Y =      a + bXi + ei …………………… (3)
            If data are available ‘a’ and ‘b’ can be obtained and so equation (2) can be estimated. The second part ei can be evaluated as (Y – Yest).
            Example of measurement that show a linear relationship
(i)            Price of goods and quantity supply or demanded.
(ii)          Income and expenditure.
(iii)         Experience (year) and efficiency of cashier

2.2.1   SHAPES OF SCATTER DIAGRAM

            When the values of Y are plotted against the value of X, any of the following diagrams may be obtained.
(i)                                                         (ii)
            Y                                                         Y
                              X x x                                        x
                         x x x                                                       x
             x  x X                                                                         x
                                                      X                                                         X
Positive Linear Relationship                               Negative Linear Relationship

(v)                                                        (iv)
            Y                                                         Y


Text Box: * * ** * * * * * * * * * * * * * * * * * * * *  * * * * * * * * * * ** *
 



                                                      X                                                         X
No correlation                                                   Positive curve Line relationship
(vi)
            Y



                                                      X
            Negative curve Line relationship

2.3       REGRESSION EQUATION

            This is the equation of the regression line for a linear regression.  A simple linear regression equation is of the form Y = a + bXi, where a is the Y intercept (the value of Y at the point where X = 0) and b is the slope of the line (the change in Y which accompanies a change of one unit in X).  Ordinarily, the numerical constants ‘a’ and ‘b’ are estimated from sample data, and once they have been determined, we can substitute a given value of X into the equation and calculate the predicted value of Y.  The general equation of a straight line is given by:
            Y = a + bXi, where;
            b = Slope (gradient) of the line
            a = Intercept variable
            x = Independent variable
            Y = Dependent variable
            We are to consider the following deduction from general equation of a straight line;
i.          The line Y = a + bXi cuts the Y-axis at ‘a’
ii.         The line Y = a + bXi passes through the origin where a = 0
iii.        The line Y = a + bXi slopes from left to right when b > 0
iv.        The line Y = a + bXi slopes from right to left when b < 0.

2.4       METHOD OF FITTING REGRESSION LINE

            By observing the bivariate data alone, we could see that it help us a lot in deciding whether or not two variables X and Y are correlated.  A bivariate data (X, Y) is a set of values, which appear in points. It is a data in which the value of one i.e. Y depends on the other X.  The best way to show or determine whether or not any relationship exists between two variable X and Y is by drawing a graph of bivariate data. This shows to be linear as all the points i.e. (on or near a line is called a “regression” line) The regression line can be fitted by;
(i)            Free hand method
(ii)          Semi-average method
(iii)         The least square method
(iv)         The mean method.

2.4.1   FREE HAND METHOD

            The regression line is fitted into the scatter diagram by eye. This method may not produce unique regression line and regression coefficient as the regression line drawn depends on individual judgment. It therefore clearly shown the visible ways regression line could be fitted into scatter diagram using free hand method.
            The major deficiency of this method is that, it did produce a unique answer in deficiency that different individual obtain different “line of best fit” according to one judgment.

2.4.2   THE MEAN METHOD

The grand mean X, Y of each the two variables X and Y are computed and the point (X, Y) are plotted in the scatter diagram. A line is then drawn to pass through the point (X, Y) in such a way the number of point in the scatter diagram above and below the line are equal or almost equal. The line so drawn is the regression line. The main disadvantage of this method is that, it is subjective and may not lead to unique regression line and regression coefficient.

2.4.3   SEMI AVERAGE METHOD

The techniques consist of separating the data into equal parts or group, to plot the mean point for each group and by joining these two points with a strength line. Hence the procedures are discuss as follow:
STEP 1: Separate the bivariate data into order by X-value
STEP2:  Split the data into two equal group (part), a lower half and an half in case there is an odd number of items.
            To get the intercept of Y, extend the straight line to cross the Y-axis and read the Y value.  This is ‘a’.  The ‘b’ is obtained by calculating the ratio of the difference between the two means of Y and the difference between the two means of X.

2.4.4   THE LEAST SQUARE METHOD

            This is the best method to estimate ‘a’ and ‘b’ of the regression line with equation Y = a + bxi.  The regression is caused by change in some systematic ways with a change in ‘X’ we should be able to predict ‘Y’ from ‘X’.
            Therefore, regression can also be defined as the way by which the predictions are made and how the accuracy is determined.  This is the actual line and the estimated line.  The line can be denoted by:
Y = a + bxi + ei.  Where ei is the residual or standard error term.  In general the expansion method from the above equation.
            Y = a + bxi + ei
            ei = Y – a – bxi
            (Sum of square error) SSE = Ã¥ei2
:. SSE = Ã¥ei2 = Ã¥(Y – a – bxi) …………… (i)
            Hence, we have to reduce this by differentiating equation (i) with respect to a and b.
dSSE = -2Ã¥(Y - a – bxi) ……………… (ii)
    da
dSSE = -2XiÃ¥(Y - a – bxi) ……………… (iii)
    db
Equate equation (ii) and (iii) to equal zero
            -2Ã¥(Yi - a – bxi) = 0 …………… (iv)
            -2xiÃ¥(Yi - a – bxi) = 0 …………… (v)
Divide equation (iv) by –2
            Ã¥(Yi – a – bxi) = 0 …………… (vi)
            XiÃ¥(Yi – a – bxi) = 0 …………… (vii)
Solving equation (vi)
            Ã¥(Yi – a – bxi) = 0
            Ã¥Yi – na – bÃ¥xi = 0
            na = Ã¥Yi – bÃ¥xi
Divided through by n
a =       Ã¥Yi      -           bÃ¥Xi
             n                         n
a  =  – b
substitute for a = Ã¥Yi           -           bÃ¥Xi  ………… in equation (vii)
                               n                           n
i.e.       XiÃ¥(Yi – a – bXi2) = 0
            Ã¥XiYi – aÃ¥Xi – bÃ¥Xi2 = 0
Substitute a in the equation we have;
Ã¥XiYiÃ¥Yi – bÃ¥Xi   Ã¥Xi - bÃ¥Xi2 = 0
               n        n
Ã¥XiYiÃ¥XiYi – b(Ã¥Xi)2  - bÃ¥Xi2 = 0
                n          n
multiply through by n.
nÃ¥XiYiÃ¥XiÃ¥Yi – b(Ã¥Xi)2  - nbÃ¥Xi2 = 0
nÃ¥XiYiÃ¥XiÃ¥Yi = b[nÃ¥Xi2 – (Ã¥Xi)2]
b =       nÃ¥XiYiÃ¥XiÃ¥Yi
             nÃ¥Xi2 – (Ã¥Xi)2

After obtaining ‘a’ and ‘b’ then, their value can be substituted into he original equation to get the equation of regression line i.e. Y = a + bx.

2.5       CHI - SQUARE TEST

            The relationship between tow variables can be tested through the use of chi-square test (X2). Suppose that in a particular sample a set of possible event Ei, E2, …, Ec are know to occur O1, O2, …, Oc times respectively and that according to probability rules, they are expected to occur e1, e2, …, ec times respectively and O1, O2, …, Oc are called observed frequency, e1, e2, …, ec are called expected or theoretical frequency.
            Our aim is to estimate whether the observed frequency differs significantly or not from the expected one. Hence a measure of discrepancy between the observe and expected frequency is given by test statistics called “CHI – SQUARE” and it is given by;
c2cal =
Note: When a variable can be categories into two ways with r. degree of freedom level of first category and c. degree of freedom level of second categorization for any set of n observation and random variable, we have r x c contingency table which may be arrange in the table below.
                                                            CATEGORIZATION 1
CATEGORIZATION 2          1          2          3                  C         TOTAL
                                    1          O11      O12      O13              O1C      O1
                                    2          O21      O22      O23              O2C      O2
                                    3          O31      O32      O33              O3C      O3
                                    .           .           .           .                   .           .
                                    .           .           .           .                   .           .
                                    .           .           .           .                   .           .
                                    r           Or1       Or2       Or3               OrC       Or
                        TOTAL            O.1         O.2         O.3                      O.c       n

            The hypothesis to be tested as follows:
H0: There is association between sales and advertising expenditure
Hi: There is no association between sales and Advertisement expenditure

2.6       TEST OF HYPOTHESIS

2.6.1   THE F - TEST

            Consider two independent samples taken from two population assumed to be normally distributed let n1, n2 be the sample sizes drawn from the population and S12, S22, be the sample estimate of the population variances s12, s22 respectively. Then the F – statistic needed for the test is F    = S12/S22 ……… (i)
The critical point of F at the specified a significance level is Fa n1-1, n2-1, where n1-1, n2-1 is the degrees of freedom.
            Therefore to test the identify of two distribution i.e. that the shape to test the identify of two population have equal variance s12 = s22 the corresponding hypothesis is set as follows:
H0: s12 = s22 = s (Distribution have the same shape)
H1: s12 ¹ s22
            The critical value of F is Fa/2 n1-1, n2-1 (two tailed test)

2.6.2   INTERVAL, ESTIMATE AND SIGNIFICANCE TEST

            When a population parameter is estimated by a single number the resulting estimate is known as point estimate. On the other hand, it is an initial estimate if the estimate lies within two numbers. This, the estimate m = 2, Ã¥ = 3.5 £ m < 0.05 are points and interval estimate of N respectively = Therefore the level of significance is the area or percentage of the standard distribution curve covered by the regression region. For two ten led test 95% confidence level of each rejection region covers 2.5% (or 0.025) and 5% (0.05) for one-tailed test of the standard no normal distribution curve.

2.6.3   POSSIBLE TEST

TEST ON SINGLE MEAN: LET ‘X’ be a sample size, ‘n’ obtained for population of unit of interest. Let N be the population mean of the variable X (or the hypothesis besides means of x). When s is unknown and population is normal but n < 30. We use t distribution.
tcal =  at (n – 1)         ………… (1)
The critical value of t is ta, n - 1 (one tail test).
When n is large i.e. n > 30 we use the z- distribution for the statistic calculation
Z =  …………………… (2)
            The critical value of z is za (one tail test) for one test that is based on proportion we can use
Z =  …………………… (3)
Z =  
TEST FOR DIFFERENCE OF MEAN: In this method of test, let X1, X2 be variable under consideration respectively form population 1 and population 2. Let X1, X2 be the sample mean, variance with S12, S22 being the sample values and standard deviation given s1, s2 we can calculate it when n is large or small.
(a)          For large sample, when s1, and s2 are known i.e. n is large use + statistics
Zcal =
The critical value of Z is Z1-a/2
(b)          For small sample, when s1, and s2 unknown i.e. n is small use t-statistic
tcal =

where sp =  (n1 – 1)S12 + (n2 – 1)S22
                                    n1 + n2 – 2

The critical value of t is t(1-a/2)v where V=n1 + n2 ­- 2
Note: Thus tests depends on the type of test to be carried out whether two-tail test or one tail-test.

2.7       CORRELATION ANALYSIS

If two qualities vary in such a manner that movement in one is associated with movement in the other, the quantities are said to be correlated.
Correlation analysis is a technique for estimating the closeness or degree of relationship between two or more variable. Thus, in correlation analysis we try to determine how well a linear equation or other equation describes or explains the relationship between variables. When only two variables i.e. (X and Y) are involved, the correlation is said to be SIMPLE. When more than two variables are involved then we speak of MULTIPLE CORRELATION. Correlation helps to know more about the relationship between two or more variable in particular. It does not tell us the cause of this relationship, it only shows whether the relationship exists or not. In cause of this study we shall unit our discussion to simple correlation. That is, measure of degree of association between tow variable only.



2.7.1   PATTERNS OF CORRELATION
(a)                                                       (b)
            Y                                                         Y
                                    X                                        x
                          X                                                           x
                X                                                                             x
                                                      X                                                         X
                        (r = 1)                                                  (r = -1)
Perfect and positive correlation                          Perfect and negative correlation
(c)                                                        (d)
            Y                                                         Y
 



                                                      X                                                         X
                        (r < 1)                                                  (r < 0)
Positive correlation                                            Negative correlation
(e)
Text Box: * * ** * * * * * * * * * * * * * * * * * * * *  * * * * * * * * * * ** *            Y



                                                      X
                        (r = 0) 
            No correlation
Note: If all the points of the scatter diagram lie on a strength line then we say that two variable are perfectly correlated (a and b). Perfect correlation could however be position or negative depending on whether Y increase as X increases (a) or whether Y decreases as X increases (b)
If Y tends to increase as X increases as in figure (c), then the correlation is said to be positive or direct. On the other hard, if Y tends to decrease as X increases as in figure (d). Then the correlation is called negative or inverse correlation. If there is no definite pattern in the direction of the variables X and Y, then we say the variables are uncorrelated or have zero correlation (figure (e)).

2.7.2   INTERPRETATION OF RESULTS

(i)         r  = + 1            means the variable have a perfect positive correlation
(ii)        r  = - 1             means the variable have a perfect negative correlation
(iii)       r  = o               means there is no correlation between the variable
(iv)       -1< r<o            means there is an imperfect negative correlation between the variable
(v)        O<r<1             means there is an imperfect positive correlation between the variables
            It should be noted that the greater the magnitude o + r, the more impressive the association. For example, if r, = 0.65 and r2 = 0.8 then the variable in the second are more positively correlated that the first.

2.7.3   ASSUMPTION FOR THE OF CORRELATION ANALYSIS

Correlation analysis is based on some assumptions, among these are;
(i)            The relationship between the two variables must be a linear one.
(ii)          Both variable are random variables
(iii)         The conditional distribution for each variable must have equal variances.
(iv)         Successive observation for each variable are uncorrelated.

2.8       METHOD OF FINDING CORRELATION

2.8.1   SCATTER DIAGRAM

            Scatter diagram can be used to estimate the existing of correlation between two variable. It is the graphical representation of bivariate data, which help to know the correlation (association) between the variables. This may however, be positively or negatively correlated. The dependent variable ‘Y’ is plotted along the vertical axis and the independent variable ‘X’ is plotted along the horizontal axis.
TYPES OF SCATTERED DIAGRAM
(i)            Correlation can be graphical presented as:
(ii)          Direct positive linear scatter diagram;
i.          This is an indication that as X increases; the value of Y also increases.
            Y
                                    X
                          X                               b > 0
                X    
                                                      X  
ii.         Direct negative linear scatter diagram.
This explains that as X increases the value of Y tend to decreases.
(a)                                                       (b)
               Y                                 
                               x
                                        x
                                                x
                                                                 X
                        (b < 0)

ZERO LINEAR SCATTER DIAGRAM
            This is observed when all points of scatter diagram do not follow a specific pattern i.e. X do not produce any useful information about Y OR, we can say there is no relationship between two variables.
            Y


Text Box: * * ** * * * * * * * * * * * * * * * * * * * *  * * * * * * * * * * ** *
 



                                                      X
                        (b = 0)

2.8.2   CORRELATION COEFFICIENT

The coefficient of correlation r, may be defined as the square root of the product of the two regression coefficient.
That is, r = ±
When; byx     = regression of Y on X
            bxy      = regression of X on Y
Also; r =                      covariance of Y on X
                        Standard deviation of ‘x’ multiplied by standard
Deviation of “Y”
            r =        s2xy
                        sxsy
Where; s2xy  =          covariance of X and Y
            s2x      =          Standard deviation of X and s2x = variance of X
            s2y      =          Standard deviation of Y and s2y = variance of Y
From the explanation of correlation coefficient, it can be show that “r” can take any value between –1 and = +1.
            The r = +1 when there is a perfect relationship between X and Y with a unit increase in X always leads to a constant increase in Y.
R = -1 when there is a unit increase in X leading to a constant decrease in Y.
R = V when there is no relationship between X and Y.
            The major method used for calculation the correlation coefficient are:
(i)            Karl, Pearson’s product moment coefficient of correlation (r)
(ii)          Spearman’s Rank correlation coefficients (R)

2.8.3   KARL PEARSON’S PRODUCT MOMENT CORRELATION COEFFICIENT

            The Karl, Person’s product moment correlation coefficient (or coefficient of correlation is denoted by raw given by:
            r =
Where  -1 £ r £ 1
Limitation: It cannot be used, can there is direct quantitative measurement of phenomenon under study is not possible.
COEFFICIENT OF DETERMINATION
            When one variable is used to predict the other, it is usually very important to asses its usefulness between it may be used to product future values with some measure of satisfaction. The coefficient of determination is the population of the total variation in the dependent variable that can be accounted for by the independent variable in the equation. This tells us the percentage explanation of the total variation that the independent variable can other. The coefficient of determination is defined according to Leonard (1979) as:
            r2 =
            OR
            r2 =

2.8.4   SPEARMAN’S RANK CORRELATION COEFFICIENT

The spearman rank correlation is the non-parametric equivalent of the parametric correlation coefficient describe above. It data are in form of ranks or they can conveniently be ranked then we can apply this measure of correlation. If we have to rank the measures our self, there is the possibility that some value will be repeated. When values are repeated, we have to assign their average rank to each of them. The coefficient is defined as;



            Rxy     =          -
Where d = Rank (X) – Rank (Y) n = number of object being ranked
LIMITATIONS
(i)            The main limitation of this is that, it is not as accurate as Karl person’s coefficient of correlation.
(ii)          The will not be appropriate where n is more than thirty 930) unless the original data are ranked instead of scores.

2.9       TEST OF HYPOTHESIS (ii)

A statistical hypothesis is a statement about a distribution of a number of random variable intended to correspond to some statement about the real world.
Hence, statistical hypothesis is a statement about a population, which we want to verity on the basis of information available from a sample.
TYPE OF HYPOTHESIS
a.            THE NULL HYPOTHESIS (H0): This is the hypothesis we accept for the purpose of rejection. This hypothesis is always state first. It is denoted by H0.
b.            THE ALTERNATIVE HYPOTHESIS (Hi); This is the hypothesis we reject for the purpose of acceptance. It indicates the direction of the expected results. It is also denoted by Hi Example of test of hypothesis.
            H0 :      m1 = m2
            H1 :      m1 ¹ m2
TYPE I ERROR: This is the probability of rejecting the null hypothesis when the fact – its true. It is called producers risk (a)
TYPE II ERROR: This is the probability of accepting the null hypothesis when the fact is false. It is called consumer risk (b).
CRITICAL REGION: Critical region is divided into two parts/regions: the acceptance region and the rejection region. The acceptance region is turned the confidence regions. The region is turned critical region when a value is estimated by a simple value the estimate is accepted if it falls within the confidence region (confidence ranges) with assurance chance of 100 (1-a)%. If it falls outside this range the estimated value is rejected implying that it is not worthy to represent the predicted/ specified value.








Acceptance region
 

 
                                                                             a
                 1 - a                                                              1 - a
(a)                                                                               (b)


Acceptance region
 
 


                        1 - a
(c)
Rejection Regions
n > 30
            When “n” is large Z – distribution is used (n > 30) and when “n” is small, t-distribution we make used of (n < 30). Where “n” is the number of observations
t =
H0: r = O (then is relationship between sales and advertisement)
Hi: r = O (then is no relationship between sales and advertisement)
If t – calculated > t – tabulated reject null hypothesis (H0)
If t – calculated < t – tabulated accept null hypothesis (H0)

2.9.1   ANALYSIS OF VARIANCE ANOVA

            The analysis of variance is a statistic to test the null hypothesis for multiple means, H0;m1 = m2 = m3 … = mk, using the F-ratio which is an extension of the t-test. The F-ratio is defined as,
F =      variance estimate between groups ……………… (i)
            Variance estimate within groups
            The difference between this test and t-test is that while F-ratio uses ratio of two variance estimate to test whether the multiple means are significant different, the t-test uses a ratio of difference and within group variation to test whether the two means are significantly different.
            ANOVA can be classified according to the number of independent variable involved into study. When only one independent variables is involved in the study of the dependent variable the ANOVA is termed one-way. If two independent variables are used to study the dependent variable we have tow-way ANOVA.
            Analysis of variance is;
            SYY    = bxy + SSE
            Tss      = SSR + SSE
Where SSE = sum of square of errors
            TSS =              total sum of squares
            SSR = sum of squares of regression
Source of variation
Degree of freedom
Sum of square
Means sum of square
F – ratio

Regression
Error
Total
1
n-2
n-1
SSR
SSE
TSS
SSR/1 = MSR
SSE/n-2=MSE

MSR

MSE

From the above table, we can see that, SSR and SSE are values of independent chi-square variable with 1 and n-2 degree of freedom respectively. Also, TSS is a value of chi-square variable with n-1 degree of freedom
Where;
MSR   =          mean sum of regression
MSE   =          sum of error
SIGNIFICANCE TEST “r”
            If H0 is true, we would expect r to be distributed about zero with standard error;
Sr =

2.10    LIMITATION OF REGRESSION AND CORRELATION ANALYSIS

(i)            All assumption for their use must be satisfied
(ii)          Extrapolation may be clangorous because there is no statistical basis to assume that the linear relationship will apply outside the range of the sample data.
(iii)         A significant correlation does not necessarily indicate causation but indicates just a common linkage in a sequence of event.
(iv)         Correlation coefficient may be misleading if it is a spurious one.


CHAPTER THREE

3.0    RESEARCH METHODOLOGY AND ANALYSIS OF DATA

3.1    INTRODUCTION 

This chapter presents the design of the study, the area of the study, sampling procedures, the sample, instrumentation, validity of the instrument reliability of instrument, procedure for data collection and method of data analysis

3.2    THE DESIGN OF THE STUDY

This work is designed to look into the correlation and regression analysis on sales advertising expenditure of maltina of International Breweries Plc, Ilesa in Osun state with the application at some statistical concepts to analysis the data collected for the period of ten years to determine if there exist any relationship between sales and advertising expenditure of maltina and to advice the company accordingly. 
3.3    AREA OF THE STUDY
This research was carried out in Osun state particularly in International Breweries Plc Ilesa in Nigeria as a private Limited liability company on 22nd December 1971 but commenced operation in December 1978. The objective was to establish a brewery to produce market high quality beer with the brand name: Trophy” The company operates from its brewery, the company also brews and sells the “major” brand of larger beer, a beer brewed with 100% local inputs. Besides larger beer, the company equally produces and bottle “Maltina” a non-alcoholic beverage drink to increase the variety of its products. Over the years, these products have not been accepted locally with NIS award, but also in the international markets. All these products are now from 100% local cereals and the coma pony has developed unflagging resolve to stay in the front league in the production of high quality products in over competitive beer and beverages markets with the overall objective of satisfying the consuming public in Nigeria.
International Breweries PLC Ilesa has highly well organised sales and advertising units. These units are under the marketing department of the company. The sales and advertising managers who are also member of the sales and advertising practitioner council of Nigeria (SAPCON) are headed by the head of the marketing department. The creative section of the sales and Advertising units of the company is under the leadership of a professional graphic artists. Each of the company’s product is being handled by a separate advertising agencies.

3.4    SOURCE OF DATA

The major source of data used in this study is secondary data which refers to statistical data which are obtained from organizations, either in published or unpublished forms.
3.5    METHODS OF DATA COLLECTION
The method used for the collection the data is transcription from records.
Which is a useful method used when a particular purpose is already recorded in a register maintained in one or more departments making it easier to collect directly from the maintenance unit of the international Breweries plc, thesa, in ogun stae.

3.6    PROBLEMS OF DATA COLLECTION

In Nigeria and indeed in any thirst world country accurate data may be very hard to get. This may b due to many reasons ranging from the individual to corporations, agencies and even the government as a body. Some of the reasons are listed below as:
i.             Lack of proper communication between users and productions of statistical data.
ii.            Difficulty in estimating variables, which are of interest to planner.
iii.           Ignorance and illiteracy of respondent
iv.          High proportion of non-response due to suspicion can the part of respondents.
v.           Luck of frames from which samples can be selected.
vi.          Wrong ordering of priorities including misdirection of emphasis and bad utilization of human and material resources.

3.7    PRESENTATION OF DATA

Graphs charts and tables are used to present information make analysis and interpretation of statistical data. They form the excellent way of condensing information and fastest means through which we receive complex information. After the collection of data, the first step is to compute the data by editing verity checking and coverage for computation summary and coding.



3.8    THE SALES AND ADVERTISING EXPENDITURE OF MALTINA OF INTERNATIONAL BREWERIES PLC ILESA         FROM (2005-2014)

YEAR
ADVERTISING EXPENDITURE
SALES
2005
279,990
527,567
2006
396,513
568,095
2007
222,280
1,033,307
2008
970,611
4,758,281
2009
1,334,972
7,413,610
2010
6,350,408
11,943,641
2011
8,507,623
13,943,773
2012
10,526,471
14,430,910
2013
13,291,765
15,200,669
2014
17,224,147
21,123,156

Source: International Breweries Plc, Ilesa.




CHAPTER FOUR

4.0       ANALYSIS AND INTERPRETATION OF DATA

Here, we will deal with the mathematical or computational analysis of data collected. That is, the sales and advertising expenditure of Bataroalt drink of international Breweries Plc, Ilesa between the periods of 1993 to 2002. The advertising expenditure, which is the independent is represented by X and sales the dependent variable id represented by Y.
YEAR
ADVERTISING EXPENDITURE
SALES
2005
279,990
527,567
2006
396,513
568,095
2007
222,280
1,033,307
2008
970,611
4,758,281
2009
1,334,972
7,413,610
2010
6,350,408
11,943,641
2011
8,507,623
13,943,773
2012
10,526,471
14,430,910
2013
13,291,765
15,200,669
2014
17,224,147
21,123,156
SOURCE:  The sales and advertising expenditure Manager’s Office, International Breweries Plc, Ilesa.
In order to make the computation or mathematical aspect easy the original data were coded by putting the figures in hundred thousand as shown in the
Table 4.1 below;
YEAR
ADVERTISING EXPENDITURE

SALES

2005
2.80
5.28
2006
3.96
5.68
2007
2.22
10.33
2008
9.71
47.58
2009
13.35
74.14
2010
63.50
199.44
2011
85.08
139.44
2012
105.26
144.31
2013
132.92
152.01
2014
172.24
211.23


4.1 ANALYSIS OF THE DATA

The data presented earlier shall be analyzed in order to draw conclusion and make a reasonable recommendation. In this analysis, the procedure shall be adopted.
i.            Regression analysis using least square method





i.              Correlation analysis using karl Pearson’s product moment and spearman’s rank correlation analysis method
4.1.1          REGRESSION ANALYSIS USING LEAST SQUARE METHOD
YEAR
X(00,000)
Y(00,000)
XY
X2
Y2
2005
2.80
5.28
14.78
7.84
27.88
2006
3.97
5.68
22.55
15.76
32.26
2007
2.22
10.33
22.93
4.93
106.71
2008
9.71
47.58
462.00
94.28
2,263.86
2009
13.35
74.14
989.77
178.22
5,496.74
2010
63.50
119.44
7,584.44
4,032.25
14,265.91
2011
85.08
139.44
11,863.56
7,338.61
19,443.51
2012
105.26
144.31
1,190.07
11,079.67
20,825.38
2013
132.92
152.01
20,205.16
17,667.73
23,107.04
2014
172.24
211.23
36,382.25
29,666.62
44,618.11
TOTAL
591.05
909.44
92,737.51
69,985.91
130.187.40
Regression model   Y = a + bXi + ei
                                    Y = a + bXi + ei
From the above information, we can use the normal equation to find the value of ‘a’ and ‘b’.



b =
^
 
OR
^
 
Regression: Y = a + bXi
            b = SXY/SXX
where; n = 10, åX = 591.05, åXY = 92737.51, åY = 909.44,
åX2 = 69985.91 åY2 = 130187.40
b = 10(92,737.51) – (591.05) (909.44)
          10(69,985.91) – (591.05)2‑
            = 927,375.1  -  537,524.51
               699859.1   -   349340.10
            =          389850.59
                        350519.00
            b = 1.1122
b » 1.11 to 2 decimal place
OR
SXY = Ã¥XiYi    -    (Ã¥XiÃ¥Yi)
                                    n
SXX = Ã¥Xi2 – (Ã¥Xi)2,             SYY = Ã¥Yi2 – (Ã¥Yi)2
                             n                                            n
Therefore, having recall that åXiYi = 92,737.51
åXi = 591.05, åYi = 909.44, åXi2 = 69,985.91
(Ã¥Xi)2 = 349340.10
SXY = 92,73.751  - (591.05 X 909.44)
                                                10
            = 92,737.51 – 53752.45
SXY = 38985.06
SXX  = Ã¥XiYi Xi2 –  (Ã¥Xi)2
                                     n
= 69985.91 – 349340.10
                 10
            = 69985.91 – 34934.01
SXX = 35051.90
SYY = Ã¥Yi2 – (Ã¥Yi)2
                         n
= 130,187.40 – 82708.11
SYY = 47,479.29


^
 
 
            b = SXY/SXX
            =          38985.06
                        35051.90
^
 
            = 1.1122
            b = 1.11 to 2 d.p
^
 
^
 
^
 
to calculate the value of ‘a’
^
 
^
 
Y = a +  bXi = a + bXi
a = Y – bXi
            = Ã¥Yi  -  bÃ¥Xi
                n           n
Recall that åY = 909.44, åXi = 591.05, n = 10
            a = 909.44   -   1.11  591.05
                        10                        10
            = 90.94 – 1.11(59.11)
            = 90.94 – 65.61
            a = 25.33
Therefore, the regression equation line which is Y = a + bx will be Y = 25.33 + 1.11x
This indicates a positive relationship between the sale and advertising expenditure of Maltina’.
^
 
With the above equation we can then calculate that the estimated value of Y (i.e. Y) as obtained below by substituting the value of X into the regression line Y = 25.33 + 1.11x.

YEAR
ADVERTISING EXPENDITURE (X) N (00,000)
SALES (Y) N (00,000)
Y ESTIMATED
2005
2.80
5.28
28.44
2006
3.97
5.68
29.74
2007
2.22
10.33
27.79
2008
9.71
47.58
36.11
2009
13.35
74.14
40.15
2010
63.50
119.44
95.82
2011
85.08
139.44
119.77
2012
105.26
144.31
142.17
2013
132.92
152.01
172.87
2014
172.24
211.23
216.52

4.1.2   CONFIDENCE INTERVIEW AND TEST OF HYPOTHESIS ON b

CONFIDENCE INTERVAL: The statistic T-distribution can be used to construct A 100 (1 - a)% confidence interval for the parameter b
            A 100(1 - a)% C.I for the parameter b in the regression line E(Yi) = a + bxi implies
            t =        b - b0
                        S/ÖSxx
Confidence interval at 100 (1 - a)% with o.05 level of significance and n-2 degree of freedom.
Confidence interval for b is given by;
Pr  -t(1-a), n-2 £  £ t(1-a), n-2  = 1 - a
-t(1-a), n-2 S/ÖSxx £ b - b0 £ t(1-a), n-2 S/ÖSxx
Rearrange to reflect b
b - t(1-a), n-2 S/ÖSxx £ b0 £ b + t(1-a), n-2 S/ÖSxx
Where; b = b = 1.11
            SXX = 35051.90
            a = 0.05
Degree of freedom = n – 2 = 10 – 2 = 8
Therefore: t(1-a/2), n-2
            = t(1 – 0.05/2), 1 – 2
            =  t(1 – 0.025),8
            = t0.975,8
            = 2.306
» 2.31
To calculate for S i.e.
S =
Where: SSE = Syy – b2Sxx
Therefore; S=
Recall that;
Syy = 47479.29
Sxx = 35051.90
b= 1.11
=
=
=
=
S = 23.16
So also;
            Sxx = 35051.9
            =
            = 187.22
We have;
b-t(1-a/2), n-2  £ bo £ b + t(1-a/2),n-2
= 1.11. (2.306) (23.16)  £ bo £ 1.11 + 92.306 (23.16)
187.22                                                 187.22
= 1.11.(2.306 X 0.1237) £ bo £ 1.11 + (2.306X 0 .1238)
= 1.11- 0.2853 £ bo £ 1.11 + 0.2853
 = 0.8247 £ bo < 1.3953
4.1.3   TEST OF HYPOTHESIS ON b
H0: bo = (there is no significance difference between sales an advertising expenditure).
Ho: bo # 0 (there is significance difference between sales an advertising expenditure).
Using t – distribution with n-2 degree of freedom to establish the critical region and base our decision on the formula.
            t =        b - b0
                        S/ÖSxx
The computed value of t is given by;
            t=     with n-2 d.f and 0.05 level of significance
We know that;
            b =  = 1.11              bo   = 0
            S = 23.16
 = 187.22
t =   1.11 – 0
      23.16/187.22
=    1.11
    0.1237
tcal = 8.97
to calculate for t
ttab  = (1-a/2), n-2
  = 0.05
 n = 10
= t(1-0.05/2), 10-2
= t(1 – 0.025),8
= t(0.975), 8
ttab = 2.306                  2.31
CRITICAL REGION: Reject Ho if tcal is greater than tab i.e. (tcal > ttab). An accept Hi and if t – cal < ttab H0 is accepted and Hi is rejected.
CONCLUSION: Since tcal is greater than ttab, i.e. (8.97 > 2.306) then reject H0 and accept Hi. This indicates that there is a linear relationship between sales and advertisement.

4.1.4   CONFIDENCE INTERVAL AND TEST OF HYPOTHESIS FOR a CONFIDENCE INTERVAL: The t – statistic can also be used in construction of 100(1 - a) % confidence Interval fro parameter a
            A 100 (1 - a)% confidence interval for parameter a Implies
            t =             a - ao
                       
  Confidence interval at 100 (1 - a)% with 0.05 level of significance and n - 2 degree of freedom.
Confidence Interval for a is given by;
pr   - t(1-a/2), n-2 £  £ t(1-a/2), n-2   = 1-a
                                       
=   - t(1-a/2), n-2 £  £ t(1-a/2), n-2
Rearrange to reflect a
=  a - t(1-a/2), n-2 £  a0 £ a + t(1-a/2), n-2
where a = a = 25.33
t(1-a/2), n-2 = t(1- 0.05/2), 10-2
            = t(1- 0.025), 8
            = t(0.975), 8
t(1-a/2), n-2 = 2.306
S =
S = 23.16
n = 10
Sxx = 35051.90
Ã¥X2 = 69985.91
Therefore;
 =
            =
            =  = 0.4467
            =  23.16
               0.4469
 = 51.82
To compute the confidence interval;
=  a - t(1-a/2), n-2 £  a0 £ a + t(1-a/2), n-2
25.33 – (2.306 x 51.82) £  a0 £ 25.33 + (2.306 x 51.82)
25.33 – 119.50 £  a0 £ 25.33 + 119.50
            - 94.17 £  a0 £ 144.83

TEST OF HYPOTHESIS FOR a
H0: a = 0 (there is relationship between advertisement (X) and ales (Y))
H1: a ¹ 0 (there is no relationship between advertisement (X) and sales (Y))
The hypothesis for a is given by:
            t =             a - ao
                       
The computed value of t is given by
            t =             a - ao
                       

With n –2 degree of freedom and 0.05 level of significance
Recall that;
a = 25.33
a0 = 0

S = 23.16
Ã¥X2 = 69985.91
Sxx = 35051.9
n = 10
But;
 = 51.82
To compute for tcalculated
       tcal =               a - ao
                       
            = 25.33 – 0
                 51.82
            = 25.33
               51.82
tcal = 0.4881
To compute for ttabulated
ttab = t(1-a/2), n-2
            = t(1-0.05/2), 10 - 2
            = t(1-0.025), 8
            = t0.975, 8
ttab = 2.306
CRITICAL REGION: Reject H0 if tcal is greater than ttab and accept H1, if otherwise accept H0 and Reject H1.
CONCLUSION: Since tcal is less than ttab i.e. (0.4881 < 2.306) H0 is accepted, and we conclude that a = 0.
This indicates that, there is relationship between the two variable X and Y




4.2       CORRELATION ANALYSIS USING KARL PEARSON’S PRODUCT         MOMENT CORRELATION CO-EFFICIENT.
            The Karl Pearson’s product moment correlation co-efficient is given by:
YEAR
X (00,000)
Y (00,000)
XY
X2
Y2
2005
2.80
5.28
14.78
7.84
27.88
2006
3.97
5.68
22.55
15.76
32.26
2007
2.22
10.33
22.93
4.93
106.71
2008
9.71
47.58
462.00
94.28
2,263.86
2009
13.35
74.14
989.77
178.22
5,496.74
2010
63.50
119.44
7,584.44
4,032.25
14,265.91
2011
85.08
139.44
11,863.56
7,338.61
19,443.51
2012
105.26
144.31
1,190.07
11,079.67
20,825.38
2013
132.92
152.01
20,205.16
17,667.73
23,107.04
2014
172.24
211.23
36,382.25
29,666.62
44,618.11
TOTAL
591.05
909.44
92,737.51
69,985.91
130.187.40
Mathematically; correlation is computed as;
            r =
OR
            =
From the table above, we have the following results.
Ã¥Xi = 591.05
Ã¥Yi = 909.44
Ã¥XY = 92737.51
Ã¥X2 = 69985.91
Ã¥Y2 = 130187.40
To compute for value of r
r =
            =
            =
            = 389850.59
               407950.89
r = 0.9556
or
            =
Sxy = 38985.06
Sxx = 35051.90
Syy = 47479.29
            =
            =      38985.06
               187.22 x 217.89
            = 38985.06
               40793.36
            = 0.9556
Thus, it can be seen that using Karl-Pearson’s product moment formula in the data for variable X and Y, r = 0.9556 which indicates that, it is highly correlated. Therefore, it signifies that the variables were fairly distributed.

4.2.1   TEST OF HYPOTHESIS FOR r – USING t - DISTRIBUTION
In order to know whether r = 0.9556 exist under text of hypothesis as stated thus; H0: r = 0.96 (there is no relationship)
H1: r = 0.96 (there is relationship)
Test at 0.05 level of significance.
CRITICAL REGION: Reject H0 if tcalculated is greater than ttabulated and accept if otherwise.
            From the table, t-tabulated can be calculated as: t(1-a/2), n-2 (since it is two-tailed test) at n – 2 degree of freedom.
t(1-a/2), n-2 = t(1-0.05/2), 10-2
            = t0.025, 8
ttab = 2.306
Also, tcalculated can be shown below;
            t = r Ön-2
             
where r = 0.9556
            n = 10
            t =
            =
            =
tcal = 9.1745               » 9.18
DECISION RULE: Reject H0 since tcalculated is greater than ttabulated i.e. (9.1745 >2.306).
CONCLUSION: Since tcal is greater than ttab i.e. (9.1705 > 2.306) reject H0 and accept H1. Then, it means that, there is relationship between the two variables X and Y. hence r = 0.9556 exists.

4.2.2   COEFFICIENT OF DETERMINATION
            r2 =
            r2 = 0.9556
            r =
            = 0.9775
            » 0.98
INTERPRETATION: This shows that 98% of variation in the value of Y can be predicted by change in the value of X learning 2% of the variable in Y to be explained in other ways or other factors.

4.2.3   THE SPEARMAN’S RANK CORRELATION CO-EFFICIENT
            This is another approach of computing correlation coefficient and it is given as:
            r =

where;
Ã¥di2 = 6
 n = 10
r = 1   - 6(6)
            10(99)
            =   1 –   36
                        10(99)
            =   1 –   36
                          990
            = 1 – 0.0364
            = 0.9636
            » 0.96
Where    is the Sum Square different in rank Xi and rank Yi
N is the number of sample taken
YEAR
Xi
Y
RX
RY
D = (RX – RY)
d2
2005
2.80
5.28
2
1
1
1
2006
3.97
5.68
3
2
1
1
2007
2.22
10.33
1
3
-2
4
2008
9.71
47.58
4
4
0
0
2009
13.35
74.14
5
5
0
0
2010
63.50
119.44
6
6
0
0
2011
85.08
139.44
7
7
0
0
2012
105.26
144.31
8
8
0
0
2013
132.92
152.01
9
9
0
0
2014
172.24
211.23
10
10
0
0






Ã¥d2 = 6

Using the formula:
r =
where
Ã¥di2 = 6
n = 10
r =
r =
r =
r = 1 – 0.0364
            = 0.9636
            » 0.96
INTERPRETATION: This shows that sales and advertising expenditure are highly and positively correlated.
            From this computational analysis, we can deduce that the application of both formula has been used to confirm the story degree of relationship established between sales and advertising expenditure of Betalmalt has been shown by the (0.9556) and (0.9636) closer to each other.
This method is not applicable in this field of study because it can only be used when the variables data are ranked according to their magnitude.

4.3       ANALYSIS OF VARIANCE FOR TESTING SIGNIFICANCE OF REGRESSION (ANOVA)
This is a statistical method used to compare more than two sample means or a number pertinent variables in the same experiment. Estimation of s2 shows that Syy = bSxy + SSE using partition sum of square we shall have.
Syy = SSR + SSE
Where TSS = total sum of square
SSR = regression sum of square
SSE = Error sum of square or residual
            We know that SSR and SSE are values of independent variables with 1 and n – 2 degree of freedom and TSS is also a variable with n –1 degree of freedom.
            The test of hypothesis of interest is H1 i.e. w accept H1 which says there is a linear relationship between the sales and advertising expenditure.
H0: b = 0
H1: b ¹ 0
At 0.05 level of significance = 5.32
CRITICAL REGION: We reject H0 at 0.05 level of significance when Fcal > Ftab i.e. F0.05, 1, n –2.
We compute these on practice;
TSS = Syy
SSR = bSxy
SS = TSS – SSR
F =        SSR/I
            SSE/n-1
Recall that:
SYY = 47479.29
bsxy = (1.11) (38985.06)
            = 43273.42
SSE = TSS – SSR
            = 47479.29 – 43273.42
SSE = 4205.87

4.3.1   ANALYSIS OF VARIANCE TABLE (ANOVA TABLE)
SOURCE OF VARIATION
DEGREE OF FREEDOM
SUM OF SQUARE
MEAN SQUARE
F - RATION
REGRESSION
1
43273.42
432773.42
82.31111027
ERROR
8
4205.87
525.73
-
TOTAL
9
47479.29
-
-

DECISION RULES: We reject H0 since Fcal which is 82.3111 is greater that the Ftab i.e. 5.322. It means there is a linear relationship between the sales and advertising expenditure and accept H1. Therefore, the regression equation has significant effect at 0.05 level of significant.


CHAPTER FIVE

5.0       CONCLUSION

5.1       SUMMARY OF ANALYSIS

ESTIMATE                            FORMULA USED                            RESULT
1. Regression Computation                 b =               b = 1.11
of advertising on sale                           a =
                                                                = Ã¥Yi  -  bÃ¥Xi
                                                      n         n                                         a = 25.33

                                                                Y = a + bX                                                           Y = 25.33 + 1.11Xi

2.  Karl Pearson’s Product
moment correlation coefficient                            r =            r = 0.9556
                                    or         r =
3.  Spearman’s Rank
correlation coefficient                           r =                        r = 0.9636
4. Coefficient of
determination                                        r =             r = 0.9775
5a. Test of b (Beta)                                                H0: b = 0                                                                  tcal = 8.97
                                                                H1: b ¹ 0                                                                  ttab = 2.306
                                                t =        b - b0
                                                S/ÖSxx
                                Reject H0 if tcal > ttab and accept otherwise          Reject H0 and conclude that b¹0
b. Test of a (Alfa)                                   H0: a = 0                                                                  tcal = 0.4881
                                                                H1: a ¹ 0                                                                  ttab = 2.306
                                                t =        a - a0
                                                S/ÖSxx
                                Reject H0 if tcal > ttab and accept otherwise                                          a ¹0

c. Test of r-test                                       H0: r = 0                                                                   tcal = 9.1745
                                                                H1: r ¹ 0                                                                   ttab = 2.306
                                                t = r Ön-2
                                                     
                                                Reject H0 if tcal > ttab and accept otherwise                          r = 0.9556 exists

5.2       FINDINGS

This project has been able to show the relationship that exists between the sales and advertising expenditure of international Breweries Plc, Ilesa in Osun State within the period of 10 years 1993-2002
From the analysis so far in the previous chapters and in relation to the aims and objectives of this project we can conclude that there is a positive linear relationship between sales and advertising expenditure under review that is, there is dependency between the two variables x and Y.
When testing the hypothesis on regress coefficient by the use of various approaches. We rejected the hypothesis that the regression equation has no significant effect on sales and advertising expenditure. While the alternative hypothesis was accepted which means that regression equation has significant effect on both sales and advertising expenditure. The conclusion thus, is that there exist a positive linear relationship between sales and advertising expenditure.
The regression equation Y = 1.11x + 25.33 was estimated using the least square method. It was proved to be the best fit for the linear relationship.
There future prediction can be estimated in order to verity whether the economy remains stable.
We also calculated the correction coefficient (r) using Karl Pearson’s product moment. This give 0.9556 which indicate positive and strong relationship between the variable X and Y. when rank correlation method was used, the result was 0.9636 and when the coefficient of determination was used for explained and unexplained variation, it gives 0.9775 that is 97.8% of the total variations could be explained thereby leaving a smaller part of 2.25% unexplained.
Lastly, the execution of this project exposes the writers to the challenges of a carrier in statistics. It also reflects the stress involved in data collection and analysis. The need for accuracy and unbiased interrelationship of figures cannot be overemphasizes. Therefore “statistics” is not only theoretical course where formulae are the order of the day but as a body of method and theory applied to numerical evidence in making decision in the face to uncertainty.

5.3       RECOMMENDATIONS

            Since the main objective of international breweries plc, Ilesa is to generate revenue. The data used in this project shows that the variables are linearly positively correlated i.e. sales increases as the advertising expenditure increase. Even the sales and advertising expenditure for the next five (i.2005-2014) can be predicted and this
can shows than an increase in advertisement leads to an increase in sales.
            Based on the analysis we have made so far on the original data (Raw data) used in this project and having examined and analysed the value obtained from the computation made the following recommendation are being made:
i.              That the day-to-day activities of the organisation must be monitored very well and they about the worker’s welfare. That is, to improve standard of living as a way of motivating the workers in order to encourage them to put in their best in there respective work.
ii.            The organisation should take to budgetary policies that will ensure the true relationship of advertising expenditure as a function of sales a corresponding increase in sales viable and profit oriented economic project should be embank upon to generate more revenue for the organisation. For example organizing Maltina  – Night at different areas, raffle draw bonanza etc
iii.           Since increase in advertising expenditure leads to a corresponding economic project should be embark upon to generate more revenue for the organisation.  For example: organizing Maltina -Night and different areas, raffle draw bonanza, etc.
iv.           So also, highly competent and well trained personel should be employed in order to have higher productivity which in turn yield more profit for the company at large.  Moreover labour should be employed in the area needed in order to reduce high rate of wastage.
v.            The company statistical unit should be more equipped because of indispensability saddled with responsibility of collecting, collating, organizing, summarizing, analysis, presentation and interpretation of data pertaining to the activities of the company.  This will ensure the availability of accurate data in all other areas of the company, and the statistical unit will discover that sales and advertising expenditure are positively correlated always, so as to increase and maintain maximum turn over which definitely can lead to huge profit over a specified period of time.





REFERENCES

Agha, S.O (1998) The secret of the sciences: A Necessity for every science students: Ebonyi State, Agba Family series

Abolede (1986) Managers ans schools Ile-ife; University of Ife press

Adama, S.O. & Tinuke Johnson (1985). Statistics for Beginness Book One, Eveans Brother Nigeria

Adebile O. A. & T. O. Ojo: Business Statistics (First Edition)

Adebile O. A.: Statistical Methods in Research (First Edition)

Chris Spatz, Basic Statistics Table of Distribution (10th Edition).

David F. Groebner and Patrick W. Shannon (1981). Business statistics, A Decision making Approval, Bell and Howell Company.

Donald H. Sandals (1990). Statistics, A. Fresh Approach, (4th Edition)  McGraw Hill Publishing Company, New York.

Dopuglas C. Montyomerg George C. Kunger, Applied Statistics and probability.

Frances A.: Business Mathematics and Statistics

Frank Owen & Ron Jones: Statistics (Forth Edition)

Harper W.M. (1971). Statistics, McDonald and Evans Ltd, Britain.

Jirgba Emmanuel Yila (2015). Handout, Applied General Statistic.

Murray R. Spiegel (1972). Schaum’s Outline Series (1st Edition).

Oyekunle J. O.: Applied General Statistics Note (STA 224)

Paul G. Hoel & Raymond Jessen (1971). Basic Statistics for Business and Economics, Mcgraw Hill Inc.

Qazi Zameeruddin V.K. Khanna SK Bhambri by Business Math
T. O. Ojo: Sampling Techniques Note (STA 222)


No comments: