Machine Learning: How does it work?

Machine learning is a type of artificial intelligence, where the computer "learns" about something without being explicitly programmed. The approach can be used for tasks such as spam filtering, image recognition, and natural language processing (NLP). This AI-based method has also been incorporated into other applications including self-driving cars, personal assistants, and online advertising.

In the broadest sense, we can talk about machine learning in two major contexts- Generalized AI and Application Specific AI.

Generalized AI, or Artificial Narrow Intelligence (ANI), is the type of intelligence that aims to produce human-like behavior such as a robot performing a useful service, an information filtering system on the web providing search results just like Google, or a self-driving car driving us to our destination without hitting any pedestrians.

Application-Specific AI is aimed at solving one specific problem. In this context, there are two types: Deep Machine Learning: This approach involves training deep neural networks to learn from large amounts of data and then use this learning to solve problems. Supervised Machine Learning: In supervised machine learning, the system is trained on a set of examples that have been manually labeled by humans. The labels are then used to train a predictive model which can be used for classification or regression tasks in applications such as document classification, spam detection, image recognition, etc.

Machine learning algorithms typically need two things in order to operate- training data and a function called an objective. Training data consists of an input vector (a sequence), along with its correct output value, according to some objective function. This input includes both constant values and values generated randomly by the machine for testing purposes. The machine then uses the training data in order to try and find an input-output mapping that will produce the correct output every time.

For example, consider a binary classifier (this is a classification algorithm that outputs only 0 or 1). It takes in data as an input and tries to predict which category it belongs to by looking at patterns found within the data. For instance, if we have pictures of dogs and cats labeled respectively as "dog" and "cat", our learning algorithm would be trained on this information so as to distinguish between them properly. We can say that the logic behind this process is: given any picture, predict whether it's of a dog or cat. The question here is how does one get these pictures of dogs and cats in the first place? They are labeled randomly by humans so that the machine can learn how to distinguish between them.

Characteristics of Machine Learning:

Every machine learning algorithm is trained on a set of training data with a specific input-output mapping objective function or cost function. The choice of this function determines the quality of the model (since it measures tradeoffs). Costs functions used in ML algorithms typically aim at minimizing prediction error or maximizing classification accuracy. While many standard cost functions exist, they all differ slightly in their implementations.

In terms of fairness towards different types of data points, some common metrics include mean absolute error (MAE), root mean squared error (RMSE), and cross-entropy. All of these metrics try to measure how bad it is for an algorithm to classify one data point as belonging to a certain category when it actually belongs to a different category. In other words, they reward the machine's success rate while penalizing its failure rate.

The cost function that is used will also vary depending on the type of problem that is being solved. The most famous types are classification problems and regression problems. In classification problems, the type of data is discrete- meaning there can be only one class or answer for every piece of information (examples include spam detection, face recognition, etc.). In regression problems, the type of data is continuous with an infinite number of possible answers- this means that every piece of information can be assigned multiple possible answers (examples include real-time price prediction, medical diagnosis, etc.).

Types of Algorithms in Machine Learning

To solve a given problem, practitioners must identify an appropriate learning algorithm to use. A general rule of thumb is that for classification problems we should choose algorithms with high accuracy and for regression problems we should choose algorithms that are accurate but more robust since absolute error rate isn't important. Here are some examples:

Linear Regression: Linear regression uses the concept of linearity to predict continuous values based on a number of input variables. It minimizes the sum of squared error in order to achieve this. This type of algorithm tends to be fast and scalable for big data sets as it avoids iterating through all possible solutions; however, it is unstable.

Linear regression uses linearity to find an equation for predicting a continuous value based on 2 or more input variables.

Logistic Regression: A binary classifier, logistic regression is used when there are only two possible outcomes (0/1). It minimizes the error rate of false positives and false negatives- this means that under no circumstances should we predict something true as false or vice versa.

Neural Networks: In ML, neural networks are a tool in classification problems and incorporate concepts from neuroscience such as how human brains function. They're used when the data points have multiple independent dimensions but can take only one output value. Neural networks consist of multiple hidden layers of neurons with different activation functions at each layer and a final layer for producing results. They're used to solve problems that are non-linear or otherwise difficult to tackle using other algorithms.

The two most common activation functions include sigmoid (also called logistic function) and step. Support Vector Machines (SVM): SVM is known as the "Northeast" algorithm in several academic circles; it's widely considered to be the best supervised discriminative learning model available today for classification problems. It's particularly useful when there's only one target label, that data points are linearly separable/contiguous, and that samples from all classes are roughly the same size.

However, SVM is also known to be unstable and expensive for most real-world problems. SVMs fall into either the linear or the non-linear category depending on whether their decision boundary corresponds to a hyperplane or not. Tree-based algorithms: Many classification problems can be represented as a tree structure where every internal node represents a test with boolean value and an edge between two nodes indicates that if we take the right path at one node we must go left at the other to get out of it (examples include C4.5, CHAID, etc.). Decision trees are used when data points have multiple independent dimensions but can take only one output value; they're designed through recursive binary splitting (splitting at the node with maximum information gain).

Tree-based algorithms: C4.5 and CHAID are the two most popular examples of Decision Tree Learning. Boosting: Boosting is an ensemble learning technique that's often used when the data points have multiple independent dimensions but can take only one output value (examples include AdaBoost, GBDT, etc.). It basically involves training a number of weak classifiers by using some initial probability distribution and then combining them in order to deliver better results than what each individual classifier would deliver on its own. It was initially developed for tackling problems with imbalanced classes (where one class is present in much larger numbers compared to others).

Machine learning algorithms are programs/ models that learn from data and improve from experience regardless of the intervention of human beings. Machine learning is pure mathematics or number game only. Algorithms work by understanding this game that underlines all algorithms.

Previous
Previous

Maximizing Your UWorld Subscription: Tips and Benefits

Next
Next

AutoML: The future of Data Science and Machine Learning