From Basic Gates to Deep Neural Networks: The Definitive Perceptron Tutorial Table of Contents 1. Introduction 2. The Mathematics Behind the Perceptron Model 3. The Perceptron Model as a Binary Classifier 4. Logic Gates and the Perceptron Model 5. Perceptrons for Multiplication and Transistor-like Functionality 6. Comparing the Perceptron Model to Logistic Regression 7. Creative and Unique Applications of the Perceptron Model 8. The Evolution of the Perceptron Model and Legacy in Deep Learning 9. Conclusion References Contact

Artificial Intelligence

From Basic Gates to Deep Neural Networks: The Definitive Perceptron Tutorial Table of Contents 1. Introduction 2. The Mathematics Behind the Perceptron Model 3. The Perceptron Model as a Binary Classifier 4. Logic Gates and the Perceptron Model 5. Perceptrons for Multiplication and Transistor-like Functionality 6. Comparing the Perceptron Model to Logistic Regression 7. Creative and Unique Applications of the Perceptron Model 8. The Evolution of the Perceptron Model and Legacy in Deep Learning 9. Conclusion References Contact

admin

May 1, 2023

From Basic Gates to Deep Neural Networks: The Definitive Perceptron Tutorial
Table of Contents
1. Introduction
2. The Mathematics Behind the Perceptron Model
3. The Perceptron Model as a Binary Classifier
4. Logic Gates and the Perceptron Model
5. Perceptrons for Multiplication and Transistor-like Functionality
6. Comparing the Perceptron Model to Logistic Regression
7. Creative and Unique Applications of the Perceptron Model
8. The Evolution of the Perceptron Model and Legacy in Deep Learning
9. Conclusion
References
Contact

Towards Mastering AI

Mathematics, binary classification, logic gates, and more

TL;DR

The world of perceptrons is fascinating. Perceptron models are the constructing blocks of recent artificial intelligence. This blog post makes a protracted story short. Allow us to learn the story of the perceptron (i.e., from neural network to multilayer perceptron and beyond). We are going to dig into the easy mathematics that drives the model deployable as a binary classifier and a simulated computer transistor, multiplier, and even logic gate. Allow us to have a look at how the perceptron model paved the best way for more advanced classifiers, akin to logistic regression, SVM, and deep learning. Sample code snippets and illustrations are provided throughout to boost our understanding. Moreover, we are going to examine the perceptron model using practical use cases to learn the way and where it must be used.

Whether you’re a self-taught data scientist, an AI practitioner, or a seasoned skilled proficient in ML, there may be likely something in here for you! Allow us to dive deep and look wide on the model that was there when AI was in its infancy and remains to be here today. Allow us to have a look at how and in what ways the perceptron works, walk through its history, construct models and gates, compare it to other models, and forecast where we’ll go from here.

1.1 A Transient History of the Perceptron Model

Warren McCulloch and Walter Pitts’ work on artificial neurons in 1943 [1] inspired a psychologist named Frank Rosenblatt to make the perceptron model in 1957 [2]. Rosenblatt’s perceptron was the primary neural network (NN) to be described with an algorithm, paving the best way for contemporary techniques for machine learning (ML). Upon its discovery, the perceptron got much attention from scientists and most people. Some saw this recent technology as essential for intelligent machines—a model for learning and changing [3].

Nevertheless, the perceptron’s popularity didn’t persist. Then, in 1969, Marvin Minsky and Seymour Papert published their book, “Perceptrons,” which highlighted the constraints of the perceptron model while revealing that it couldn’t solve problems just like the XOR classification [4] (Section 3). This work triggered a major lack of interest in NNs, turning their attention to other methods. The early years of the perceptron are listed in .

Significant milestones within the history of the perceptron (1943–1982). Figure created by the writer.

It took over a decade, however the Nineteen Eighties saw interest in NNs rekindle. Many thanks, partially, for introducing multilayer NN training via the back-propagation algorithm by Rumelhart, Hinton, and Williams [5] (Section 5).

In 2012, using the prior development and the advancements in computing power (i.e., GP-GPUs), big data, non-linear activations (i.e., RELU), and dropout, the most important convolutional neural networks trained so far were produced. ImageNet provided the big labeled dataset needed to fill its capability.

Significant milestones within the history of the perceptron (1985–1997). Figure created by the writer.

Out got here the rise of today’s frenzy for deep learning. Hence, the perceptron model plays a pivotal role of their foundation. andlist the remaining milestones (continuation of ).

Significant milestones within the history of the perceptron (2006–2018). Figure created by the writer.

1.2. The Importance of the Perceptron Model in Machine Learning

Despite its limitations, the perceptron model stays an important constructing block in ML. It’s a fundamental a part of artificial neural networks, which are actually utilized in many various ways, from recognizing images to determining what people say.

The simplicity of the perceptron model makes it an amazing place to begin for people recent to machine learning. It makes linear classification and learning from data easy to grasp. Also, the perceptron algorithm may be easily modified to create more complex models, akin to multilayer perceptrons (MLP) and support vector machines (SVMs), which may be used in additional situations and get around lots of the problems with the unique perceptron model.

In the next sections, we’ll cover the mathematics behind the perceptron model; how it may possibly be used as a binary classifier and to make logic gates, and the way it may possibly be used to do multiplication tasks like a pc’s transistors. We’ll also talk in regards to the differences between the perceptron model and logistic regression and show how the perceptron model may be utilized in recent and exciting ways.

2.1. Linear Separability

At its core, the perceptron model is a linear classifier. It goals to seek out a “hyperplane” (a line in two-dimensional space, a plane in three-dimensional space, or a higher-dimensional analog) separating two data classes. For a dataset to be linearly separable, a hyperplane must appropriately sort all data points [6].

Mathematically, a perceptron model may be represented as follows:

y = f(w * x + b).

xis the input vector;w is the load vector;b is the bias term; and f is the activation function. Within the case of a perceptron, the activation function is a step function that maps the output to either 1 or 0, representing the 2 classes (.

Depiction of the unit step function, with the piece-wise conditions for mapping outputs to 0 or 1. Figure created by the writer.

A perceptron model may be prolonged to have multiple features in inputx, that are defined as follows:

y = f(w_1 * x_1 + w_1 * x_1 ... w_n * x_n + b).

The above equation, together with the step function for its output, is activated (i.e., turned off via 0 or on via 1), as depicted in the next figure, .

Multi-variant linear classification. Note that the weighted sum is passed through the activation, the step function mentioned above—source link.

2.2. The Perceptron Learning Algorithm

The perceptron learning algorithm is a technique to keep the weights and biases up-to-date to cut back classification errors [2]. The algorithm may be summarized as follows:

Initialize the weights and the bias to small random values.
For every input-output pair(x, d), compute the expected outputy = f(w * x + b).
Update the weights and bias based on the errore = d - y:

w = w + η * e * x

b = b + η * e,

whereη is the training rate, a small positive constant that controls the step size of the updates.

4. Repeat steps 2 and three for a hard and fast variety of iterations or until the error converges.

We will use Python and Sklearn to implement the steps above quickly:

import numpy as np
from sklearn.linear_model import PerceptronX = np.array([2, 3], [1, 4], [4, 1], [3, 2])
y = np.array([1, 1, 0, 0])
perceptron = Perceptron()
perceptron.fit(X, y)

Then, using the fitted model, we are able to predict as follows:

new_data_point = np.array([[1, 2]])
prediction = perceptron.predict(new_data_point)
print(prediction)

The perceptron learning algorithm guarantees convergence if the info is linearly separable [7].

Boolean classification, where the classes are linearly separable. Image created by the writer.

2.3. The Perceptron Convergence Theorem

Rosenblatt proved the perceptron convergence theorem in 1960. It says that if a dataset may be separated linearly, the perceptron learning algorithm will find an answer in a finite variety of steps [8]. The concept says that, given enough time, the perceptron model will find one of the best weights and biases to categorise all data points in a linearly separable dataset.

But when the dataset is not linearly separable, the perceptron learning algorithm won’t find an acceptable solution or converge. For this reason, researchers have developed more complex algorithms, like multilayer perceptrons and support vector machines, that may take care of data that does not separate in a straight line [9].

3.1. Linear Classification

As previously mentioned, the perceptron model is a linear classifier. It comes to a decision boundary, a feature-space line separating the 2 classes [6]. When a recent data point is added, the perceptron model sorts it based on where it falls on the choice boundary. The perceptron is fast and simple to make use of since it is straightforward, but it may possibly only solve problems with data that may be separated linearly.

3.2. Limitations of the Perceptron Model

One big problem with the perceptron model is that it may possibly’t take care of data that does not separate in a straight line. The XOR problem is an example of how some datasets are inconceivable to divide by a single hyperplane, which prevents the perceptron from finding an answer [4]. Researchers have developed more advanced methods to get around this problem, akin to multilayer perceptrons, which have a couple of layer of neurons and might learn to make decisions that do not follow a straight line [5].

The perceptron model can also be sensitive to setting the training rate and initial weights. For instance, if the training rate is just too low, convergence may be slow, whereas a big learning rate may cause oscillations or divergence. In the identical way, the alternative of initial weights can affect how briskly the answer converges and the way it seems [10].

3.3. Multi-class Classification with the Perceptron Model

Despite the fact that the essential perceptron model is made for two-class problems, it may possibly solve problems with greater than two classes by training multiple perceptron classifiers, one for every category [11]. Probably the most common approach is one-vs-all (OvA), wherein a separate perceptron is trained to tell apart classes. Then, when classifying a recent data point, the perceptron with the very best output is chosen as the expected class.

One other approach is the one-versus-one (OvO) method, wherein a perceptron is trained for every pair of classes. The ultimate classification decision is made using a voting scheme, where each perceptron casts a vote for its predicted class, and the sort with probably the most votes is chosen. While OvO requires training more classifiers than OvA, each perceptron only must handle a smaller subset of the info, which may profit large datasets or problems with high computational complexity.

4.1. How Perceptrons Can Be Used to Generate Logic Gates

Perceptron models may be used to represent logic gates, that are probably the most basic constructing blocks of digital circuits. By appropriately adjusting the weights and biases of a perceptron, it may possibly be trained to perform logical operations akin to AND, OR, and NOT [12]. This link between perceptrons and logic gates shows that neural networks can do computation and have the potential to simulate complex systems.

Linearly separable logic gates: and (left and middle, respectively). Then again, can’t be separated by a single linear classifier (right) but may be with a two-layer network (more on this later)—figure created by the writer.

4.2. Example: Implementing a NAND Gate Using a Perceptron

A NAND gate is a fundamental logic gate that produces an output of 0 only when each inputs are 1, leading to 1 in all other cases. The reality table for a NAND gate is as follows:

NAND Gate Truth Table. Table created by the writer.

To implement a NAND gate using a perceptron, we are able to either manually set the weights and biases or train the perceptron using the perceptron learning algorithm. Here’s a possible configuration of weights and bias:

w1 = -1;

w2 = -1;

b = 1.5.

With these parameters, the perceptron may be represented as:

y = f((-1 * A) + (-1 * B) + 1.5).

Training data, graphical depiction, and linear function for an AND gate. Figure created by the writer.

Here, f is the step function, and A and B are the inputs. Should you test this setup with values from the reality table, you will get the correct results from a NAND gate:

Truth table for the logic NAND, together with the output of the perceptron trained above. Created by the writer.

In Python, the NAND may be implemented as follows:

def nand_gate(x1, x2):
w1, w2, b = -1, -1, 1.5
return int(w1 * x1 + w2 * x2 + b > 0)binary_inputs = [(0,0), (0,1), (1,0), (1,1)]
for A, B in binary_inputs:
print(f"({A}, {B}) --> {nand_gate(A, B)}")

As expected, reproducing the table summarizing the NAND gate above:

(0, 0) --> 1
(0, 1) --> 1
(1, 0) --> 1
(1, 1) --> 0

A NAND gate may be used to construct all other gates since it is functionally complete, meaning that every other logic function may be derived using just NAND gates. Here’s a transient explanation of tips on how to create a few of the basic gates using NAND gates:

NOT gate: Connect each inputs of the NAND gate to the input value.
AND gate: First, create a NAND gate after which pass the output through a NOT gate.
OR gate: Apply a NOT gate to every input before feeding them right into a NAND gate.

To create a NAND gate that accepts an arbitrary variety of inputs, you need to use Python to define a function that takes an inventory of inputs and returns the NAND output. Here’s a code snippet demonstrating this:

def nand_gate(inputs):
assert len(inputs) > 1, "Not less than two inputs are required."# Helper function to create a 2-input AND gate
def and_gate (x1, x2):
w1, w2, b = 1, 1, -1.5
return int(w1 * x1 + w2 * x2 + b > 0)
# Reduce the inputs to a single NAND output using the helper function
result = and_gate(inputs[0], inputs[1])
for i in range (2, len (inputs)):
result = and_gate(result, inputs[i])
return 0 if result > 0 else 1
# Example usage
inputs = [(0, 0, 0, 0),
(0, 0, 0, 1),
(0, 0, 1, 0),
(0, 0, 1, 1),
(0, 1, 0, 0),
(0, 1, 0, 1),
(0, 1, 1, 1),
(1, 0, 0, 0),
(1, 0, 0, 1),
(1, 0, 1, 0),
(1, 0, 1, 1),
(1, 1, 0, 0),
(1, 1, 0, 1),
(1, 1, 1, 0),
(1, 1, 1, 1)]
for A0, A1, A2, and A3 inputs:
output = nand_gate((A0, A1, A2, A3))
print(f"({A0}, {A1}, {A2}, {A3}) --> {output}")

This function uses a helper function (i.e., and_gate) to make a NAND gate with two or more inputs. The AND operation is then repeated on the given inputs. The end result is the output of the NAND gate, with an arbitrary variety of input bits, which is the negated value of the AND gates.

4.3. Extending to Other Logic Gates: AND, OR, XOR

Similarly, perceptrons can model other logic gates, akin to AND, OR, and NOT. For instance, an AND gate may be represented by a perceptron with weights w1 = 1, w2 = 1, andb = -1.5.

def and_gate(x1, x2):
w1, w2, b = 1, 1, -1.5
return int(w1 * x1 + w2 * x2 + b > 0)binary_inputs = [(0,0), (0,1), (1,0), (1,1)]
for A, B in binary_inputs:
print(f"({A}, {B}) --> {and_gate(A, B)}")

Again, outputs mimic those of the intended AND gate.

(0, 0) --> 0
(0, 1) --> 0
(1, 0) --> 0
(1, 1) --> 1

Nevertheless, a single perceptron cannot model the XOR gate, which will not be linearly separable. As an alternative, a multi-layer perceptron or a mixture of perceptrons have to be used to resolve the XOR problem [5].

5.1. Analogies Between Perceptrons and Transistors

Transistors are the essential constructing blocks of electronic devices. They’re in command of easy tasks like adding and multiplying. Interestingly, perceptrons may also be viewed as computational units that exhibit similar functionality. For instance, perceptrons are utilized in machine learning and artificial neurons. Conversely, transistors are physical parts that change how electrical signals flow [13]. Still, because the last section showed, each systems can model and perform logical operations.

5.2. Performing Multiplication with Perceptrons

We will leverage their capabilities for binary operations to perform multiplication using perceptrons. For instance, let’s consider the expansion of two binary digits (i.e., A and B), which may be represented as an easy AND gate. As demonstrated in Section 4, an AND gate may be modeled using a perceptron.

But for more complicated multiplication tasks involving binary numbers with greater than two bits, we want so as to add more parts, like half and full adders, which require a mixture of logic gates [14]. Using perceptrons to construct these parts makes making a man-made neural network that may perform binary multiplications possible.

For instance, suppose we would like to multiply two 2-bit binary numbers, A1A0 and B1B0. Then, we are able to break down multiplication right into a series of AND operations and additions:

Compute the partial products: P00 = A0 * B0, P01 = A0 * B1, P10 = A1 * B0, and P11 = A1 * B1.
Add the partial products using half and full adders, leading to a 4-bit binary product.

Each AND operation and addition may be done with perceptrons or groups of perceptrons that represent the logic gates needed.

Using the AND gate function, we arrange within the last section, we are able to do the next in Python to implement perceptron-based multiplication:

A1A0 = [1, 0]
B1B0 = [1, 1]P00 = and_gate(A1A0[1], B1B0[1])
P01 = and_gate(A1A0[1], B1B0[0])
P10 = and_gate(A1A0[0], B1B0[1])
P11 = and_gate(A1A0[0], B1B0[0])
# Implement an easy adder using perceptron-based logic gates
result = [P00, P01 ^ P10, (P01 & P10) ^ P11, P11]
print(result)

5.3. The Way forward for Perceptrons and Hardware Implementation

Despite the fact that perceptrons can act like transistors and perform basic math operations, their hardware implementation is less efficient than traditional transistors. But recent improvements in neuromorphic computing have shown that it may be possible to make hardware that acts like neural networks, like perceptrons [15]. These neuromorphic chips could help machine learning tasks use less energy and open the door to recent ways of desirous about computers.

6.1. Similarities Between Perceptron and Logistic Regression

Each the perceptron model and logistic regression are linear classifiers that may be used to resolve binary classification problems. They each depend on finding a call boundary (a hyperplane) that separates the classes within the feature space [6]. Furthermore, they may be prolonged to handle multi-class classification problems through techniques like one-vs-all and one-vs-one [11].

Let’s take a have a look at the differences in Python implementation:

from sklearn.linear_model import LogisticRegressionlog_reg = LogisticRegression()
log_reg.fit(X, y)
new_data_point = np.array([[1, 2]])
prob_prediction = log_reg.predict_proba(new_data_point)
print(prob_prediction)

import numpy as np
from sklearn.linear_model import Perceptron, LogisticRegression# Dataset
X = np.array([[2, 3], [1, 4], [4, 1], [3, 2]])
y = np.array([1, 1, 0, 0])
# Train Perceptron
perceptron = Perceptron()
perceptron.fit(X, y)
# Train Logistic Regression
log_reg = LogisticRegression()
log_reg.fit(X, y)
# Recent data point
new_data_point = np.array([[1, 2]])
# Perceptron prediction
perc_prediction = perceptron.predict(new_data_point)
print("Perceptron prediction:", perc_prediction)
# Logistic Regression prediction
log_reg_prediction = log_reg.predict(new_data_point)
print("Logistic Regression prediction:", log_reg_prediction)
# Logistic Regression probability prediction
prob_prediction = log_reg.predict_proba(new_data_point)
print("Logistic Regression probability prediction:", prob_prediction)

This outputs:

Perceptron prediction: [1]
Logistic Regression prediction: [1]
Logistic Regression probability prediction: [[0.33610873 0.66389127]]

6.2. Differences Between Perceptron and Logistic Regression

Despite the fact that the perceptron model and logistic regression have some similarities, there are some essential differences between the 2:

Activation function: The perceptron model uses a step function as its activation function, while logistic regression uses the logistic (sigmoid) function [10]. This difference ends in a perceptron having a binary output (0 or 1). At the identical time, logistic regression produces a probability value (between 0 and 1) representing the likelihood of an instance belonging to a specific class.
Loss function: The perceptron learning algorithm minimizes the misclassification errors, whereas logistic regression minimizes the log-likelihood or cross-entropy loss [16]. This distinction makes logistic regression more robust to noise and outliers within the dataset, because it considers the magnitude of the errors quite than simply the variety of misclassified instances.
Convergence: The perceptron learning algorithm can converge if the info is linearly separable but may fail to converge otherwise [7]. Logistic regression, then again, employs gradient-based optimization techniques like gradient descent or Newton-Raphson, that are guaranteed to achieve a worldwide optimum for convex loss functions just like the log-likelihood [17].
Non-linearly separable data: While the perceptron model struggles with non-linearly separable data, logistic regression may be prolonged to handle non-linear decision boundaries by incorporating higher-order polynomial features or using kernel methods [18].

6.3. Selecting Between Perceptron and Logistic Regression

The perceptron model and logistic regression decisions depend upon the issue and dataset. Logistic regression is more reliable and might take care of a broader range of problems since it relies on probabilities and might model non-linear decision boundaries. However the perceptron model could also be easier to make use of and use less computing power in some situations, especially when coping with data that may be separated linearly.

7.1. Optical Character Recognition (OCR)

The perceptron model has been utilized in optical character recognition (OCR) tasks, where the goal is to acknowledge and switch printed or handwritten text into machine-encoded text [19]. A perceptron or other machine learning algorithm is commonly used for OCR tasks to preprocess the image that will likely be read, pull out features from it, and classify them. The perceptron model is a great alternative for OCR tasks with characters that may be separated in a straight line since it is straightforward to make use of and works well with computers.

7.2. Music Genre Classification

Perceptrons may also be used for music genre classification, which involves identifying the genre of a given audio track. A perceptron model may be trained to categorise audio into already-set genres [20]. This is completed by taking relevant parts of audio signals, akin to spectral or temporal features, and putting them together. Despite the fact that more advanced methods like deep learning and convolutional neural networks often give higher results, the perceptron model can work well, especially when only a couple of genres or features may be separated linearly.

7.3. Intrusion Detection Systems

Intrusion detection systems, or IDS, are utilized in cybersecurity to search for bad behavior or unauthorized access to computer networks. IDS can use perceptrons as classifiers by packet size, protocol type, and network traffic connection length to find out if the activity is regular or malicious [21]. Support vector machines and deep learning may higher detect things, however the perceptron model may be used for easy IDS tasks or as a comparison point.

7.4. Sentiment Evaluation

Perceptrons may be applied to sentiment evaluation, a natural language processing task determining the sentiment (e.g., positive, negative, or neutral) expressed in text. By turning text into numerical feature vectors like term frequency-inverse document frequency (TF-IDF) representations [22], a perceptron model may be taught to categorise text based on its tone. More advanced techniques like recurrent neural networks or transformers have since surpassed perceptrons in sentiment evaluation performance. Nevertheless, perceptrons can still be an introduction to text classification or a less complicated alternative for specific use cases.

8.1. The Evolution of Perceptrons to Multi-Layer Perceptrons (MLPs)

The perceptron model has been in a position to solve problems with clear decision lines, but it surely needs help with tasks that need clear decision lines. The introduction of multi-layer perceptrons (MLPs), consisting of multiple layers of perceptron-like units, marked a major advancement in artificial neural networks [5]. MLPs can approximate any continuous function, given a sufficient variety of hidden layers and neurons [23]. By employing the backpropagation algorithm, MLPs may be trained to resolve more complex tasks, akin to the XOR problem, which will not be solvable by a single perceptron.

8.2. Deep Learning and Perceptron’s Legacy

The perceptron model laid the muse for deep learning, a subfield of machine learning focused on neural networks with multiple layers (deep neural networks). The perceptron model was the premise for deep learning techniques like convolutional neural networks (CNNs) and recurrent neural networks (RNNs), which have reached state-of-the-art performance in tasks like image classification, natural language processing, and speech recognition [24].

In CNNs, the concept of weighted input signals and activation functions from perceptrons is carried over to the convolutional layers. To study spatial hierarchies in the info, these layers apply filters to the input regions near them. In the identical way, RNNs construct on the perceptron model by adding recurrent connections. This lets the network learn temporal dependencies in sequential data [25].

Deep learning versus other models: Google trend over time. Image created by the writer following Carrie Fowle’s TDS Medium blog (link).

8.3. The Way forward for Perceptrons and Deep Learning

While fundamental, more sophisticated deep learning techniques have primarily eclipsed the perceptron model. Nevertheless it remains to be priceless for machine learning since it is an easy but effective technique to teach the fundamentals of neural networks and get ideas for making more complicated models. As deep learning keeps improving, the perceptron model’s core ideas and principles will likely stay the identical and influence the design of latest architectures and algorithms.

This blog comprehensively explores the perceptron model, its mathematics, binary classification, and logic gate generation applications. By understanding these fundamentals, we’ve got unlocked the potential to harness the perceptron’s power in various neat applications and even construct more advanced models like multi-layer perceptrons (MLPs) and convolutional neural networks (CNNs).

We also compared perceptrons and logistic regression, highlighting the differences and similarities by examining the role of a perceptron as a foundation for more advanced techniques in ML. We prolonged this upon setting perceptron’s role in artificial intelligence, historical significance, and ongoing influence.

Allow us to do not forget that ‌perceptron is only one piece of the puzzle. Countless other models and techniques, either discovered or waiting to be, each with unique strengths and applications. Nonetheless, with a solid foundation provided by this tutorial, you’re well-equipped to tackle the challenges and opportunities in your journey through artificial intelligence.

I hope this blog is engaging, informative, and provoking, and I encourage you to proceed learning and experimenting with the perceptron model and beyond. Embrace your newfound knowledge, and let your creativity and curiosity guide you toward the exciting world of AI and machine learning. Please share your thoughts and comments below!

[1] McCulloch, W.S., & Pitts, W. (1943). A logical calculus of the ideas immanent in nervous activity Bulletin of Mathematical Biophysics, 5, 115–133.

[2] Rosenblatt, F. (1958). The perceptron is a probabilistic model for information storage and organization within the brain. Psychological Review, 65(6), 386–408.

[3] The Recent York Times (1958, July 8). A Recent Navy Device Learns by Doing The Recent York Times

[4] Minsky, M., & Papert, S. (1969). Perceptrons: An Introduction to Computational Geometry, MIT Press.

[5] Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning representations by back-propagating errors Nature, 323 (6088), 533–536.

[6] Duda, R. O., Hart, P. E., & Stork, D. G. (2001). Pattern Classification (2nd ed.). Wiley.

[7] Novikoff, A. B. (1962), on convergence proofs for perceptrons Symposium on the Mathematical Theory of Automata, 12, 615–622.

[8] Rosenblatt, F. (1960). The perceptron: A theory of statistical separability in cognitive systems (Project PARA Report 60–3777). Cornell Aeronautical Laboratory

[9] Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine Learning, 20(3), 273–297.

[10] Bishop, C. M. (2006). Pattern Recognition and Machine Learning, Springer,

[11] Rifkin, R., & Klautau, A. (2004). In defense of the one-vs-all classification Journal of Machine Learning Research, 5, 101–141.

[12] Minsky, M. L. (1961). Steps toward artificial intelligence. Proceedings of the IRE, 49(1), 8–30.

[13] Horowitz, P., & Hill, W. (1989). The Art of Electronics (2nd ed.). Cambridge University Press

[14] Mano, M. M., & Ciletti, M. D. (2007). Digital Design (4th ed.). Prentice Hall.

[15] Merolla, P. A., Arthur, J. V., Alvarez-Icaza, R., Cassidy, A. S., Sawada, J., Akopyan, F.,… & Modha, D. S. (2014). One million spike-neuron integrated circuits with a scalable communication network and interface Science, 345 (6197), 668–673.

[16] Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction (2nd ed.). Springer.

[17] Nocedal, J., & Wright, S. (2006). Numerical Optimization (2nd ed.). Springer.

[18] Schölkopf, B., & Smola, A. J. (2002). Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond. MIT Press,

[19] LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation was applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551.

[20] Tzanetakis, G., & Cook, P. (2002). Musical genre classification of audio signals IEEE Transactions on Speech and Audio Processing, 10(5), 293–302.

[21] Garcia-Teodoro, P., Diaz-Verdejo, J., Maciá-Fernández, G., & Vázquez, E. (2009). Anomaly-based network intrusion detection: techniques, systems, and challenges Computers & Security, 28 (1–2), 18–28.

[22] Pang, B., Lee, L., & Vaithyanathan, S. (2002). Thumbs up? Sentiment classification using machine learning techniques Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing, 10, 79–86.

[23] Hornik, K., Stinchcombe, M., & White, H. (1989). Multilayer feedforward networks are universal approximators. Neural Networks, 2(5), 359–366

[24] LeCun, Y., Bengio, Y., & Hinton, G. (2015). Deep learning. Nature, 521 (7553), 436–444.

[25] Hochreiter, S., & Schmidhuber, J. (1997). long-term memory. Neural Computation, 9(8), 1735–1780.

Need to Connect? Follow Dr. Robinson on LinkedIn, Twitter, Facebook, and Instagram. Visit my homepage for papers, blogs, email signups, and more!