A Regression Model Based on Uncertain Set

Traditional regression analysis is a method of statistical data analysis based on probability theory. Regression models play crucial roles in various branches of statistics including design of experiments, econometrics etc. In regression models, the dependent variable is assumed to be of stochastic nature where randomness enters via errors. Further, the independent variables are assumed to be of deterministic nature. The regression coefficients which explain the interdependency between the variables are assumed to be crisp quantities. Whenever, difficulty arises in expressing the values taken by the dependent variable in terms of crisp quantities, traditional regression models become irrelevant. This paper provides a framework for dealing with such situations on using the notion of uncertain sets of various forms. In this paper, a solution for this problem obtained via linear programming technique is introduced along with an illustrative example.


Introduction
Regression analysis is one of the major branches of statistical theory.Regression analysis aims at analyzing the extent of interdependency between variables by making use of the observed data.Further, models developed by using the experimental data are used for several tasks including forecasting and decision making.
In a regression model, some are called independent variables and others are named the dependent variables.The dependent variables change along with the changes of the independent variables.In the traditional regression model, the dependent variable is regarded as random variable because of the random disturbance term.Researchers have made significant contributions on the theoretical aspects as well as the applications of tools available in linear models in various fields.Regression analysis assumes the experimental data as well as the predicted values of the dependent variables are of crisp nature.And in the application of the model, it is needful to meet the following two conditions.On one hand, each independent repeated experiment is conducted under the same conditions; on the other hand, the experimental number is large enough.However, in many practical problems, the above conditions cannot be met and there also exists nondeterminacy in many systems which leads to the result that it contains language data in input or output.So under these circumstances, people began to pay attention to uncertain data such as fuzzy data to describe these phenomena.
The development of fuzzy set theory (Zadeh, 1965) created alternative directions for developing models which are similar to regression models using imprecise or vague data.In literature such models are referred to as fuzzy regression models.The notion of fuzzy regression analysis was initiated by Tanaka et al. (1982) and the linear model of fuzzy regression analysis was established.A fuzzy functional relationship was given between independent variables and dependent variables in fuzzy regression model.The input data and unknown coefficients may be crisp or fuzzy and the predicted values of dependent variables will be of fuzzy nature.Research in the direction of developing fuzzy linear regression models has received the attention of several researchers during the past three decades.Once the functional relationship is determined, the main work is to estimate the parameters.For different types of fuzzy numbers, Diamond (1988Diamond ( , 1997Diamond ( , 1987) ) and Korner (1998) used the least square approach in estimating the parameters involved in the model.And Su, Wang and Wang (2013) investigated parametric regression analyses which includes both linear and nonlinear of imprecise data by using the fuzzy evidential EM algorithm.Based on the possibilistic approach, verny et al. ( 2013) studied the linear regression models for uncertain, indeterminate or interval data taking in to account the loss of information.Based on fuzzy neural networks, Ishibuchi et al. (2001) analyzed the fuzzy regression model by bringing in asymmetric fuzzy coefficients.Muller et al. (2014) introduced an algorithm for identification and estimation of relevant parameters in an optimized manner.Danesh et al. (2016) introduced the adaptive neuro-fuzzy inference system (ANFIS) which is used for fuzzy nonparametric regression function prediction where input and output are crisp and fuzzy respectively.Chen et al. (2016) have developed an approach to optimize the value of membership degree h of fuzzy linear regression whose coefficients are asymmetric triangular fuzzy data.In addition, Sclove (2014) improved the method of estimating the coefficients in an orthogonal linear regression model by point estimation which is more efficient than the ordinary.
Linear regression models are applicable only when distributional assumptions made in their construction via probability theory.They become irrelevant when randomness is replaced by impreciseness in the system being studied.Researchers working on fuzzy sets have developed alternative approaches to handle such situations.However, lack of laws of excluded middle and contradiction creates suspicion among practitioners in accepting the conclusions drawn from such fuzzy models.To overcome this problem, Liu (2007) initiated a new branch of study, namely, uncertainty theory in 2007 and made refinements on the same in 2010.
Uncertainty theory is established analogous to measure theory and now it has developed into a branch of axiomatic mathematics.Liu (2010) introduced uncertain statistics which is a methodology employed to collect and interpret experts' experimental data via uncertainty theory when no samples are obtainable.The problem of estimating parameters in uncertainty distributions with known functional form is one of the several research problems tackled in uncertain statistics.Towards addressing this problem, Liu (2007) introduced the concept of uncertainty distribution.Wang et al. (2012) considered Delphi method for estimating uncertainty distribution based on multiple domain experts data.A method based on the principle of least squares was suggested by Liu (2007) for estimating the parameters of uncertainty distribution.Later, a new method namely method of moments was suggested by Wang and Peng (2014).Besides, a statistical method called uncertain hypothesis testing was suggested by Wang et al. (2012) in order to test whether two uncertainty distributions are equal.
To describe the relationship between variables involved in uncertainty phenomena, Liu (2007) proposed an uncertain regression model based on uncertain variable.In the proposed model, uncertainty comes from the error component which is treated as an uncertain variable.It was assumed that the model receives as input crisp data and the output will be in terms of uncertain variables.Furthermore, an uncertain linear regression model was studied by Guo et al. (2017) and it was applied to predict China's GDP.Apart from these, for situations in which the data gathered from experts' knowledge is of imprecise form, Guo et al. (2011) came out with an uncertain regression model with an intrinsic error structure driven by uncertain canonical process.In the above-mentioned uncertain regression model, the observed values of variables are real numbers.However, in fact, the observations are with the characteristic of multiple memberships and maybe the sample size is small.Therefore, we can turn our attention to uncertain set which can be applied to describe uncertainty phenomenon.It is pertinent to note the difference between fuzzy and uncertain sets.While fuzzy sets use the concept of possibility measures, uncertain sets use the uncertain measure.It is to be highlighted that uncertain sets give room for accommodating several properties which include independence of uncertain sets as well.Research contributions related to uncertain sets are reviewed in the forthcoming section.
The main objective of the paper is to introduce a regression model based on uncertain sets.Apart from this the present work considers and evaluates methods of estimation of parameters in the proposed model.Uncertain linear regression model based on symmetrical triangular, symmetrical trapezoidal and normal uncertain set are considered respectively.A procedure based on linear programming has been suggested for estimating the parameters in the uncertain linear models and appropriate performance evaluation criteria are also considered.Section organization of the paper is as follows.The second section of this paper gives a brief introduction to uncertain theory as well as uncertain statistics.Uncertain linear regression model using uncertain sets and the problem of estimating parameters involved in it are considered in the third section.An illustrative example is given in the fourth section and conclusions drawn from this study are given in the fifth section.

Preliminaries
In this section, we will introduce some fundamental definitions and theorems in uncertainty theory and uncertain statistics.

Uncertainty Theory
Definition 1. (Liu, 2007) Let  be a nonempty set, and L be a   algebra over .Each element L  is called an event.A number {} M  denotes the level that  will occur.Then M is called an uncertain measure if it satisfies the following axioms: Axiom 3: (Subadditivity Axiom) For every countable sequence of events { i  }, we have LM  is called an uncertainty space.
Besides, the product uncertain measure on the product   algebra L was defined by Liu (2009) as follows.
Axiom 4: (Product Axiom) Let ( , , ) be uncertainty spaces for k = 1, 2,⋯.The product uncertain measure M is an uncertain measure satisfying , where   are arbitrarily chosen events from   .
The concept of uncertain variable ξ was introduced by Liu as a measurable function from an uncertainty space ( , , ) LM  to the set of real numbers.In order to describe an uncertain variable, uncertainty distribution was defined.
Definition 2. (Liu, 2007) An uncertain variable is a measure function ξ from an uncertain space ( , , ) Definition 3. (Liu, 2007) The uncertainty distribution of an uncertain variable ξ is Definition 4. (Liu, 2010) Let ξ be an uncertain variable with a regular uncertainty distribution .
Then - is called the inverse uncertainty distribution of ξ

Uncertain Set
In order to model the concepts whose boundaries are not sharp because of the ambiguous human language, uncertain set was proposed by Liu in 2010.
Definition 5. (Liu, 2010) An uncertain set is a function ξ from an uncertain space Γ, L, M to a collection of sets of real numbers such that both {B ⊂ ξ } and { ξ⊂ B} are events for any Borel set B of real numbers.
Definition 6. (Liu, 2012) An uncertain set ξ is said to have a membership function if for any Borel set B of real numbers, we have { ⊂ ξ} and {ξ ⊂ } Definition 7. (Liu, 2007) Let ξ 1 , ξ 2 , ⋯ ξ be uncertain sets on the uncertainty space Γ, L, M , and let f be a for any γ∈Γ.
Definition 12. (Liu, 2012) Let ξ be an uncertain set with membership function .Then the set-valued function Theorem 1. ([16]) Let ξ 1 , ξ 2 , ⋯ ξ be independent uncertain sets with inverse membership and let f be a measurable function.Then   ξ 1 , ξ 2 , ⋯ ξ } has an inverse membership function Theorem 2. (Liu, 2007) Let ξ be an uncertain set with inverse membership function  −1  .Then the With the aim of measuring the uncertainty treated as uncertain set, the entropy for uncertain set is introduced by Liu (2011).Similarly, Peng and Li (2013) defined the radical entropy for uncertain set and discussed several important properties.At the same time, the way to calculate the radical entropy was proposed.Lu and Wang (2013) defined the triangular entropy for uncertain set and discussed its properties.Meanwhile, the computational formula of the triangular entropy was studied.Wang and Ha (2013) defined the quadratic entropy for uncertain set and investigated the relationship between quadratic entropy and Liu's entropy.Besides, quadratic cross entropy was also introduced to measure the difference between two uncertain sets.In addition, Yao (2014) gave the definition of sine entropy and applied it to portfolio selection and clustering.The research of uncertain set has been utilized widely.
Theorem 3. (Zhao, 2008) For a special kind of nonlinear programming problems (1) where the objective function contains absolute value symbol it can be transformed to the following linear programming problem, where   1 ,  2 , ⋯ ,   ,   1 ,  2 , ⋯ ,   .

Uncertain Regression Model
In real life situations, practitioners come across wide variety of data sets.Tools available in statistics are effectively employed in decision making process as long as there is no impreciseness in data sets.However, statistical tools do not perform well when uncertainty enters the system.Hence, alternative approaches become necessary.Sensing this necessity, different tools parallel to those available in statistics are being developed under uncertainty theory.Liu proposed an uncertain regression model which is explained below.
Let x be a vector of independent variables and y be a dependent variable.Assume the functional relationship between y and x can be expressed by the regression model where β is an unknown vector of parameters.If f |β is a linear function, i.e.
where p is the number of independent variables, then we obtain an uncertain linear regression model.In traditional linear regression model, we assume that the observed values of y are influenced by independent variables and random error.In other words, the dependent variable is a random variable with nondeterminacy.In uncertain linear regression model (4), the disturbance term and y are regarded as uncertain variables.This is the essential difference between uncertain regression model and traditional regression model.
As a matter of fact, in many practical problems, the observed values can not be expressed as a crisp quantity but it may be possible to identity a wide range of possible values for y.For example, the experts often give a range when predict the price of a stock.In such cases, if we continue to use some crisp values for y, the conclusion drawn from the model may not reflect the actual behavior of the system.To handle such situations, we have to consider the uncertain regression model based uncertain set, namely, where ∈R, β and y are uncertain sets.
Models of the form stated in (5) can be used in the place of conventional statistical model when decision making process involves uncertainty.In this paper, we mainly focus our attention on the linear regression model, namely, where x  ,   , j = 1; 2; • • • , p and y(x i ) are uncertain sets.

Parameter Estimation
Once the functional form of f is determined, next arises the problem of estimating the parameters β involved in the identified functional form.
This section presents a solution to the estimation of parameters  0 ,  1 , ⋯ ,  assumed as uncertain sets.The proposed method makes use of the observed values (x i , y i ) and it is based on linear programming method.This work considers three types of uncertain sets, namely, symmetrical triangular, symmetrical trapezoidal and normal uncertain sets.In order to estimate the parameters, we first introduce the concept of fitted value.means that the membership information of yi is contained in y(x i ).Fitted value is an important index to describe the degree of congruence between the observed value y i and the theoretical value y(x i ).We generally restrict that all the i are greater than a specific value H (0≤ H ≤ ).H is fixed by taking into account the opinion of domain experts.That is , 1, 2, , .

i H i n  
It can be guaranteed that the degree of congruence between y i and y(x i ) is higher than level H for any i.Higher the value of i , better will be the quality of the corresponding fit.

Regression Model Based on Symmetrical Triangular Uncertain Set
Let  0 ,  1 , ⋯ ,  be independent symmetrical triangular uncertain sets, where ( , , ), 0,1, , .Let (x i , y i ) be observed value, where x i is a vector made up of crisp numbers with i ≥ 0 and y i =(m i ,p i ,n i ) is a symmetrical triangular uncertain set with an inverse membership function 1 ( ) [( 1) , (1 ) ], 1, 2, , .
Assume that the relationship between x i and y(x i ) can be expressed by the linear function is a symmetrical triangular uncertain set.By Theorem 1, y(x i ) has an inverse membership function as follows.
The relationship among i  , () i yx and y i is shown in Figure 1.To meet the requirement on i  , namely,  Generally speaking, the closer y i and y(x i ) better will be the fit.Hence, we aim for estimating the parameters in such a that y i has the same membership function as y(x i ).Because they are both symmetrical triangular uncertain sets, if the radius of y i and the radius of y(x i ) are same under the condition of they are pretty anastomotic.As a result of the above observations, parameter estimation problem can be transformed into the following LP model.: (1 ) (1 ) (1 ) (1 ) The model above can not be solved because of the unknown parameter  .Besides, it usually meets the condition  ≥  If model ( 6) is established while i =H it can be guaranteed that  ≥  for each index i.Hence it can be transformed into the following LP model.: (1 ) (1 ) (1 ) ( 1) The solution of the above linear programming model gives the estimated value ( , , ). a b c j j j j  
Assume that the relationship between x i and y(x i ) can be expressed by the linear function is a symmetrical trapezoidal uncertain set.By Theorem 1, y(x i ) has an inverse membership function as follows.
The relationship among  should be satisfied.Generally speaking, the closer y i and y(x i ) better will be the fit.Hence, we aim for estimating the parameters in such a that y i has the same membership function as y(x i ).Because they are both symmetrical trapezoidal uncertain sets, if their toplines and baselines are same under the condition , The model above can not be solved because of the unknown parameter  .Besides, it usually meets the condition  ≥  If model ( 8) is established while i =H it can be guaranteed that  ≥  for each index i.
It follows from Theorem 3 that the nonlinear programming model ( 9) is equivalent to the following LP model. : ( The solution of the above linear programming model gives the estimated value ( , , , ). a b c d j j j j j  

Regression Model Based on Normal Uncertain Set
Let  0 ,  1 , ⋯ ,  be independent normal uncertain sets, where The relationship among  In generally, the closer y i and y(x i ) better will be the fit.Hence, we aim for estimating the parameters in such a The model above can not be solved because of the unknown parameter  .If model ( 11) is established while i =H it can be guaranteed that  ≥  for each index i.Hence it can be transformed into the following model.
It follows from Theorem 3 that the NLP model ( 12) is equivalent to the following LP model.: The solution of the above linear programming model gives the estimated value ( , ).N a b j j j  

Evaluation Criteria
Once the uncertain regression model is fitted using the available data, next arises the question of assessing the quality of fit.Towards this, two measures meant for assessing the quality of fit are presented below.
(1) The relative deviation between the centers of fitted value and observed data y i : The ratio between the widths of fitted value and observed data y i : Generally speaking, if Λ 1 and Λ 2 are both within 30%, the fitted uncertain regression model is considered to be acceptable.

Numerical Experiment
In this section, an illustrative example is given in support of the estimation procedure suggested in the previous section for the uncertain regression model.Consider the relationship between the heat released by some cement during solidification denoted by y i and two kinds of chemical compositions denoted by x 1 and x 2 .The industry experience has shown that y i is not a fixed value but assumes values over an identified interval of values.Hence it is reasonable to treat heat as an uncertain set.The observed values are furnished in Table 1.(1, , ) .
The above LP problem which corresponds to the model explained in ( 14) has been solved by taking H = 0.4 and the following solution is obtained.

Conclusion
In this paper, a linear regression model based on uncertain sets that can be used for investigating the relationship between variables involved in an uncertain situation.The proposed model assumes the independent variables are of crisp nature and the dependent variable as well as regression coefficients are uncertain sets.Estimation procedures meant for handling situation where one encounters any one of the three uncertain sets, namely, symmetrical triangular, symmetrical trapezoidal and normal uncertain sets have been discussed in detail.An illustrative example has been given to add strength to the proposed model as well as the method of estimation.
, b, c) where a, b, c are real numbers with a<b<c.ξ is called symmetrical triangular uncertain set if and only if c − b = b − a. Then c − b or b − a is named radius of ξ and b is the center of ξ.
where a, b, c, d are real numbers with a < b < c < d. ξ is symmetrical trapezoidal uncertain set if and only if d − c = b − a. Then d − c or b − a is named radius of ξ.Definition 10. (Guo, 2014) An uncertain set ξ is called normal if it has a membership function by N(a, b) where a and b are real numbers with b > 0. Especially, a normal uncertain set N(a, b) is called standard if the parameters a = 0 and b = 1; denoted by N(0, 1).The parameter a is called the center of ξ.

Figure 1 .
Figure 1.Parameter estimation of symmetrical triangular uncertain set

Figure 2 .
Figure 2. Parameter estimation of symmetrical trapezoidal uncertain set are pretty anastomotic.As a result of the above observations, parameter estimation problem can be transformed into the following nonlinear programming (NLP) model.

Figure 3 .
Figure 3. Parameter estimation of normal uncertain set that y i has the same membership function as y(x i ).Because they are both normal uncertain sets, if the centers and dispersion degrees of membership functions are same under the condition of It is known from the past experience that there exists a linear function relationship between y and x, where 12 ,