The keystroke dynamics which can be utilized in this text’s machine learning models for user recognition are behavioral biometrics. Keystroke dynamics uses the distinctive way that all and sundry types to verify their identity. That is completed by analyzing the on Key-Press and Key-Release — that make up a keystroke on computer keyboards to extract typing patterns.* The article will examine how these patterns will be applied to create 3 precise machine learning models for user recognition.*

The goal of this text can be split in two parts, 3 Machine Learning models (1. 2. 3. ) and in an actual live single point API able to predicting the user based on 5 input parameters: the ML model and 4 keystroke times.

Source: https://www.rootstrap.com/blog/a-primer-into-keystroke-recognition-technology

## The issue

The target of this part is constructing ML models for user recognition based on their keystroke data. keystroke dynamics is a behavioral biometric which utilizes the unique way an individual types to confirm the identity of a person.

Typing patterns are predominantly extracted from computer keyboards. the patterns utilized in keystroke dynamics are derived mainly from the 2 events that make up a keystroke: the Key-Press and Key-Release.

The Key-Press event takes place on the initial of a key and the Key-Release occurs at the following of that key.

On this step, a dataset of keystroke information of users is given with following information:

- keystroke.csv: on this dataset the keystroke data from 110 users are collected.
- All users are asked to type a 13-length constant string 8 times and the keystroke data (key- press time and key-release time for every key) are collected.
- The info set comprises 880 rows and 27 columns.
- The primary column indicates UserID, and the remainder shows the press and release time for first to thirteenth character.

## You need to do following steps:

- Often, the raw data just isn’t informative enough, and it is required to .

On this regard, 4 features:

- Hold Time “HT”,
- Press-Press time “PPT”,
- Release-Release Time “RRT”,
- Release-Press time “RPT”

are introduced and the definition of every of them are described above.

2. For every row in keystroke.csv, you must for every two consecutive keys.

3. After completing previous step, you must per row. Consequently, you must have 8 features (4 mean and 4 standard deviation) per row. → `process_csv()`

`def calculate_mean_and_standard_deviation(feature_list):`

from math import sqrt

# calculate the mean

mean = sum(feature_list) / len(feature_list)# calculate the squared differences from the mean

squared_diffs = [(x - mean) ** 2 for x in feature_list]

# calculate the sum of the squared differences

sum_squared_diffs = sum(squared_diffs)

# calculate the variance

variance = sum_squared_diffs / (len(feature_list) - 1)

# calculate the usual deviation

std_dev = sqrt(variance)

return mean, std_dev

`def process_csv(df_input_csv_data):`

data = {

'user': [],

'ht_mean': [],

'ht_std_dev': [],

'ppt_mean': [],

'ppt_std_dev': [],

'rrt_mean': [],

'rrt_std_dev': [],

'rpt_mean': [],

'rpt_std_dev': [],

}# iterate over each row within the dataframe

for i, row in df_input_csv_data.iterrows():

# iterate over each pair of consecutive presses and releases

# print('user:', row['user'])

# list of hold times

ht_list = []

# list of press-press times

ppt_list = []

# list of release-release times

rrt_list = []

# list of release-press times

rpt_list = []

# I exploit the IF to pick only the X rows of the csv

if i < 885:

for j in range(12):

# calculate the hold time: release[j]-press[j]

ht = row[f"release-{j}"] - row[f"press-{j}"]

# append hold time to list of hold times

ht_list.append(ht)

# calculate the press-press time: press[j+1]-press[j]

if j < 11:

ppt = row[f"press-{j + 1}"] - row[f"press-{j}"]

ppt_list.append(ppt)

# calculate the release-release time: release[j+1]-release[j]

if j < 11:

rrt = row[f"release-{j + 1}"] - row[f"release-{j}"]

rrt_list.append(rrt)

# calculate the release-press time: press[j+1] - release[j]

if j < 10:

rpt = row[f"press-{j + 1}"] - row[f"release-{j}"]

rpt_list.append(rpt)

# ht_list, ppt_list, rrt_list, rpt_list are a listing of calculated values for every feature -> feature_list

ht_mean, ht_std_dev = calculate_mean_and_standard_deviation(ht_list)

ppt_mean, ppt_std_dev = calculate_mean_and_standard_deviation(ppt_list)

rrt_mean, rrt_std_dev = calculate_mean_and_standard_deviation(rrt_list)

rpt_mean, rpt_std_dev = calculate_mean_and_standard_deviation(rpt_list)

# print(ht_mean, ht_std_dev)

# print(ppt_mean, ppt_std_dev)

# print(rrt_mean, rrt_std_dev)

# print(rpt_mean, rpt_std_dev)

data['user'].append(row['user'])

data['ht_mean'].append(ht_mean)

data['ht_std_dev'].append(ht_std_dev)

data['ppt_mean'].append(ppt_mean)

data['ppt_std_dev'].append(ppt_std_dev)

data['rrt_mean'].append(rrt_mean)

data['rrt_std_dev'].append(rrt_std_dev)

data['rpt_mean'].append(rpt_mean)

data['rpt_std_dev'].append(rpt_std_dev)

else:

break

data_df = pd.DataFrame(data)

return data_df

## All of the code you’ll find on my GitHub within the KeystrokeDynamics repository:

Now that we now have parsed the info we will start constructing the models by training the MLs.

## Support Vector Machine

`def train_svm(training_data, features):`

import joblib

from sklearn.svm import SVC"""

SVM stands for Support Vector Machine, which is a sort of machine learning algorithm used:

for classification and regression evaluation.

SVM algorithm goals to search out a hyperplane in an n-dimensional space that separates the info into two classes.

The hyperplane is chosen in such a way that it maximizes the margin between the 2 classes,

making the classification more robust and accurate.

As well as, SVM may also handle non-linearly separable data by mapping the unique features to a

higher-dimensional space, where a linear hyperplane will be used for classification.

:param training_data:

:param features:

:return: ML Trained model

"""

# Split the info into features and labels

X = training_data[features]

y = training_data['user']

# Train an SVM model on the info

svm_model = SVC()

svm_model.fit(X, y)

# Save the trained model to disk

svm_model_name = 'models/svm_model.joblib'

joblib.dump(svm_model, svm_model_name)

Additional reading:

`def train_random_forest(training_data, features):`

"""

Random Forest is a sort of machine learning algorithm that belongs to the family of ensemble learning methods.

It's used for classification, regression, and other tasks that involve predicting an output value based on

a set of input features.The algorithm works by creating multiple decision trees, where each tree is built using a random subset of the

input features and a random subset of the training data. Each tree is trained independently,

and the ultimate output is obtained by combining the outputs of all of the trees indirectly, resembling taking the common

(for regression) or majority vote (for classification).

:param training_data:

:param features:

:return: ML Trained model

"""

import joblib

from sklearn.ensemble import RandomForestClassifier

# Split the info into features and labels

X = training_data[features]

y = training_data['user']

# Train a Random Forest model on the info

rf_model = RandomForestClassifier()

rf_model.fit(X, y)

# Save the trained model to disk

rf_model_name = 'models/rf_model.joblib'

joblib.dump(rf_model, rf_model_name)

Additional reading:

`def train_xgboost(training_data, features):`

import joblib

import xgboost as xgb

from sklearn.preprocessing import LabelEncoder

"""

XGBoost stands for Extreme Gradient Boosting, which is a sort of gradient boosting algorithm used for classification and regression evaluation.

XGBoost is an ensemble learning method that mixes multiple decision trees to create a more powerful model.

Each tree is built using a gradient boosting algorithm, which iteratively improves the model by minimizing a loss function.

XGBoost has several benefits over other boosting algorithms, including its speed, scalability, and skill to handle missing values.:param training_data:

:param features:

:return: ML Trained model

"""

# Split the info into features and labels

X = training_data[features]

label_encoder = LabelEncoder()

y = label_encoder.fit_transform(training_data['user'])

# Train an XGBoost model on the info

xgb_model = xgb.XGBClassifier()

xgb_model.fit(X, y)

# Save the trained model to disk

xgb_model_name = 'models/xgb_model.joblib'

joblib.dump(xgb_model, xgb_model_name)

Additional reading: