Anything Goes -

ML: Linear Regression

Written by caveowner on February 12, 2025 in Uncategorized with no comments.

In statistics, linear regression is a model that estimates the linear relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable). A model with exactly one explanatory variable is a simple linear regression; a model with two or more explanatory variables is a multiple linear regression.[1] This term is distinct from multivariate linear regression, which predicts multiple correlated dependent variables rather than a single dependent variable.[2]

Dalam statistik, linear regression adalah model yang meng-estimasi hubungan linear antara respon skalar (variabel dependent) dan satu atau lebih variabel penjelas (regressor atau variabel independen). Model dengan tepat satu variabel penjelas adalah linear regression versi simpel; model dengan dua atau lebih variabel penjelas adalah linear regression multiple.[1] Term ini berbeda dari linear regression multivariate, yang memprediksi berbagai variabel dependensi yang terkorelasi bukannya variabel dependensi single.

In linear regression, the relationships are modeled using linear predictor functions whose unknown model parameters are estimated from the data. Most commonly, the conditional mean of the response given the values of the explanatory variables (or predictors) is assumed to be an affine function of those values; less commonly, the conditional median or some other quantile is used. Like all forms of regression analysis, linear regression focuses on the conditional probability distribution of the response given the values of the predictors, rather than on the joint probability distribution of all of these variables, which is the domain of multivariate analysis.

Dalam linear regression, model hubungan dibuat berdasarkan fungsi prediktor linear dengan parameter yang tidak diketahui yang di estimasi dari data. Umumnya, rata-rata kondisional dari respons yang diberikan nilai-nilai variabel penjelas (atau prediktor) diasumsikan sebagai fungsi afine dari nilai-nilai tersebut; lebih jarang, median kondisional atau beberapa kuantil lainnya digunakan. Seperti bentuk lainnya dari regression analysis, linear regression fokus pada distribusi probabilitas respon yang didistribusikan berdasarkan nilai dari prediktor, dan bukan berdasarkan distribusi probabilitas gabungan dari semua variabel ini, yang adalah domain dari multivariate analysis.

Linear regression is also a type of machine learning algorithm, more specifically a supervised algorithm, that learns from the labelled datasets and maps the data points to the most optimized linear functions that can be used for prediction on new datasets. [3]

Regresi linear juga adalah tipe algoritma machine learning, lebih tepatnya algoritma supervised, yang dapat belajar dari dataset yang sudah diberi label dan memetakan data points kepada fungsi linear yang paling optimal yang dapat digunakan untuk memprediksi dataset yang baru.

Linear regression was the first type of regression analysis to be studied rigorously, and to be used extensively in practical applications.[4] This is because models which depend linearly on their unknown parameters are easier to fit than models which are non-linearly related to their parameters and because the statistical properties of the resulting estimators are easier to determine.

Regresi linear adalah tipe pertama dari regression analysis yang dipelajari secara menyeluruh, dan digunakan secara luas pada aplikasi praktikal. Ini karena model yang bergantung secara linear kepada parameter unknowns lebih mudah untuk fit daripada model yang tidak linear kepada parameter mereka dan karena atribut statistik dari estimasi yang dihasilkan jauh lebih mudah untuk ditentukan.

Linear regression has many practical uses. Most applications fall into one of the following two broad categories:

Regresi linear memiliki banyak kegunaan praktis. Sebagian besar aplikasi jatuh kepada dua kategori besar berikut:

If the goal is error i.e. variance reduction in prediction or forecasting, linear regression can be used to fit a predictive model to an observed data set of values of the response and explanatory variables. After developing such a model, if additional values of the explanatory variables are collected without an accompanying response value, the fitted model can be used to make a prediction of the response.

Jika tujuan akhirnya adalah error (e.e. reduksi variasi pada prediksi atau ramalan), regresi linear dapat digunakan untuk memasangkan model prediktif kepada data set respons dan variabel penjelas yang telah diamati. Setelah mengembangkan model seperti itu, jika nilai tambahan dari variabel penjelas didapatkan tanpa nilai respon, model yang telah di fitted dapat digunakan untuk membuat prediksi respons.

If the goal is to explain variation in the response variable that can be attributed to variation in the explanatory variables, linear regression analysis can be applied to quantify the strength of the relationship between the response and the explanatory variables, and in particular to determine whether some explanatory variables may have no linear relationship with the response at all, or to identify which subsets of explanatory variables may contain redundant information about the response.

Jika tujuannya adalah menjelaskan variasi dari variabel respon yang dapat diatribusikan kepada variasi dalam variabel penjelas, analisis regresi linear dapat diaplikasikan untuk mengukur kekuatan dari hubungan antara respon dan variabel penjelas, dan secara khusus untuk menentukan apakah variabel penjelas tertentu mungkin tidak memiliki hubungan linear dengan respon sama sekali, atau untuk mengidentifikasi subset dari variabel penjelas yang mana yang mungkin mengandung informasi redundan dari respons.

Linear regression models are often fitted using the least squares approach, but they may also be fitted in other ways, such as by minimizing the “lack of fit” in some other norm (as with least absolute deviations regression), or by minimizing a penalized version of the least squares cost function as in ridge regression (L2-norm penalty) and lasso (L1-norm penalty). Use of the Mean Squared Error (MSE) as the cost on a dataset that has many large outliers, can result in a model that fits the outliers more than the true data due to the higher importance assigned by MSE to large errors. So, cost functions that are robust to outliers should be used if the dataset has many large outliers. Conversely, the least squares approach can be used to fit models that are not linear models. Thus, although the terms “least squares” and “linear model” are closely linked, they are not synonymous.

Model regresi linear biasanya di fit menggunakan pendekatan least squares, tetapi kadang mereka juga di fit menggunakan cara lain, seperti meminimalisasi “lack of fit” pada norma lain (seperti dengan least absolute deviations regression), atau dengan meminimalisasi versi penalized dari fungsi biaya least squares seperti dalam ridge regression (L2-norm penalty) dan lasso (L1-norm penalty). Penggunaan Mean Squared Error (MSE) sebagai biaya pada dataset yang memiliki banyak outlier yang besar. Sebaliknya, pendekatan least square dapat digunakan untuk melakukan fit pada model yang bukan model linear. Maka, meskipun term “least squares” dan “linear model” terhubung secara dekat, mereka tidak sinonim.

ML: Bayesian Bandit

Written by caveowner on October 23, 2024 in Data Science with no comments.

Reference

the problem

Suppose you are at a casino and have a choice between N slot machines. Each of the N slot machines (bandits) has an unknown probability of letting you win. i.e. Bandit 1 may have P(win) = 0.9. Bandit 2 may have P(win) = 0.3. We wish to maximize our winnings by playing the machine which has the highest probability of winning. The problem is determining which machine this is without playing each machine million times so we can maximize the profit!

the general idea

The idea: let’s not pull each arm 1000 times to get an accurate estimate of its probability of winning. Instead, let’s use the data we’ve collected so far to determine which arm to pull. If an arm doesn’t win that often, but we haven’t sampled it too much, then its probability of winning is low, but our confidence in that estimate is also low – so let’s give it a small chance in the future. However, if we’ve pulled an arm many times and it wins often, then its probability of winning is high, and our confidence in that estimate is also high – let’s give that arm a higher chance of being pulled.

the result

~~~

Tags: machine learning

Summary: Exploring the Full Potentials of IoT for Better Financial Growth and Stability: A Comprehensive Survey

Written by caveowner on December 23, 2023 in Uncategorized with no comments.

Basalt Dust

Written by caveowner on December 10, 2023 in Uncategorized with no comments.

Image you are a farmer and you living your whole life doing what farmers do – using sterilizer, pesticide to beef up your farm to make ’em grow what you plant. But its 2023 and apparently your farm is producing too many carbons.

Worry not, scientist have found a great solution for it in form of basal dust!

Basal dust is a crushed volcanic rocks that when sprinkled will remote vast amounts of carbon dioxide from farmland, according University of Sheffield. But wait, not only that, basal dust also claimed to improve soil fertility and crop nutrition.

Then, what are those farmers waiting for! Well, its uncommon, so the farmers would need to go through some Gestalt Cycle of Change steps to trying out the new stuff and once enough farmers starting it, the rest will follow!

Blockchain Blocks

Written by caveowner on November 4, 2023 in Uncategorized with no comments.

Records in blockchain distributed ledger that are used to record transactions across many computers so that any involved block cannot be altered retroactively, without the alteration of all subsequent blocks.

Distributed Ledger

Written by caveowner on November 1, 2023 in Uncategorized with no comments.

Consensus of replicated, shared, and synchronized digital data that is geographically spread (distributed) across many sites, countries, or institutions.[1] In contrast to a centralized database, a distributed ledger does not require a central administrator, and consequently does not have a single (central) point-of-failure.

Risk-Averse

Written by caveowner on August 14, 2023 in Uncategorized with no comments.

Reluctant to take risks.

The people involved in using PLCs on equipment are very risk-averse in the sense that they don’t want to be the one who made a machine not work because they applied a firmware update to it.

_{Source: https://arstechnica.com/security/2023/08/microsoft-finds-vulnerabilities-it-says-could-be-used-to-shut-down-power-plants/}