ML: Linear Regression

In statistics, linear regression is a model that estimates the linear relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression.[1] This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.[2]

Dalam statistik, linear regression adalah model yang meng-estimasi hubungan linear antara respon skalar (variabel dependent) dan satu atau lebih variabel penjelas (regressor atau variabel independen). Model dengan tepat satu variabel penjelas adalah linear regression versi simpel; model dengan dua atau lebih variabel penjelas adalah linear regression multiple.[1] Term ini berbeda dari linear regression multivariate, yang memprediksi berbagai variabel dependensi yang terkorelasi bukannya variabel dependensi single.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

Dalam linear regression, model hubungan dibuat berdasarkan fungsi prediktor linear dengan parameter yang tidak diketahui yang di estimasi dari data. Umumnya, rata-rata kondisional dari respons yang diberikan nilai-nilai variabel penjelas (atau prediktor) diasumsikan sebagai fungsi afine dari nilai-nilai tersebut; lebih jarang, median kondisional atau beberapa kuantil lainnya digunakan. Seperti bentuk lainnya dari regression analysis, linear regression fokus pada distribusi probabilitas respon yang didistribusikan berdasarkan nilai dari prediktor, dan bukan berdasarkan distribusi probabilitas gabungan dari semua variabel ini, yang adalah domain dari multivariate analysis.

Linear regression is also a type of machine learning algorithm, more specifically a supervised algorithm, that learns from the labelled datasets and maps the data points to the most optimized linear functions that can be used for prediction on new datasets. [3]

Regresi linear juga adalah tipe algoritma machine learning, lebih tepatnya algoritma supervised, yang dapat belajar dari dataset yang sudah diberi label dan memetakan data points kepada fungsi linear yang paling optimal yang dapat digunakan untuk memprediksi dataset yang baru.

Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.[4] This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.

Regresi linear adalah tipe pertama dari regression analysis yang dipelajari secara menyeluruh, dan digunakan secara luas pada aplikasi praktikal. Ini karena model yang bergantung secara linear kepada parameter unknowns lebih mudah untuk fit daripada model yang tidak linear kepada parameter mereka dan karena atribut statistik dari estimasi yang dihasilkan jauh lebih mudah untuk ditentukan.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

Regresi linear memiliki banyak kegunaan praktis. Sebagian besar aplikasi jatuh kepada dua kategori besar berikut:

If the goal is error i.e. variance reduction in prediction or forecasting, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.

Jika tujuan akhirnya adalah error (e.e. reduksi variasi pada prediksi atau ramalan), regresi linear dapat digunakan untuk memasangkan model prediktif kepada data set respons dan variabel penjelas yang telah diamati. Setelah mengembangkan model seperti itu, jika nilai tambahan dari variabel penjelas didapatkan tanpa nilai respon, model yang telah di fitted dapat digunakan untuk membuat prediksi respons.

If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.

Jika tujuannya adalah menjelaskan variasi dari variabel respon yang dapat diatribusikan kepada variasi dalam variabel penjelas, analisis regresi linear dapat diaplikasikan untuk mengukur kekuatan dari hubungan antara respon dan variabel penjelas, dan secara khusus untuk menentukan apakah variabel penjelas tertentu mungkin tidak memiliki hubungan linear dengan respon sama sekali, atau untuk mengidentifikasi subset dari variabel penjelas yang mana yang mungkin mengandung informasi redundan dari respons.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the “lack of fit” in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Use of the Mean Squared Error (MSE) as the cost on a dataset that has many large outliers, can result in a model that fits the outliers more than the true data due to the higher importance assigned by MSE to large errors. So, cost functions that are robust to outliers should be used if the dataset has many large outliers. Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms “least squares” and “linear model” are closely linked, they are not synonymous.

Model regresi linear biasanya di fit menggunakan pendekatan least squares, tetapi kadang mereka juga di fit menggunakan cara lain, seperti meminimalisasi “lack of fit” pada norma lain (seperti dengan least absolute deviations regression), atau dengan meminimalisasi versi penalized dari fungsi biaya least squares seperti dalam ridge regression (L2-norm penalty) dan lasso (L1-norm penalty). Penggunaan Mean Squared Error (MSE) sebagai biaya pada dataset yang memiliki banyak outlier yang besar. Sebaliknya, pendekatan least square dapat digunakan untuk melakukan fit pada model yang bukan model linear. Maka, meskipun term “least squares” dan “linear model” terhubung secara dekat, mereka tidak sinonim.

Leave a Reply Cancel reply