Home Artificial Intelligence Why You Should Consider Using Fortran As A Data Scientist Background What’s Fortran? Benefits & Disadvantages Setting Up Fortran Performance Example: Knapsack Problem Summary & Further Thoughts References & Further Reading Connect With Me!

Why You Should Consider Using Fortran As A Data Scientist Background What’s Fortran? Benefits & Disadvantages Setting Up Fortran Performance Example: Knapsack Problem Summary & Further Thoughts References & Further Reading Connect With Me!

3
Why You Should Consider Using Fortran As A Data Scientist
Background
What’s Fortran?
Benefits & Disadvantages
Setting Up Fortran
Performance Example: Knapsack Problem
Summary & Further Thoughts
References & Further Reading
Connect With Me!

An exploration of the advantages that Fortran can bring to Data Science and Machine Learning

Photo by Federica Galli on Unsplash

Python is widely considered the gold standard language for Data Science, and your complete range of packages, literature, and resources related to Data Science is at all times available in Python. This isn’t necessarily a foul thing, because it implies that there are many documented solutions for any data-related problem that you might encounter.

Nonetheless, with the appearance of larger datasets and the rise of more complex models, it might be time to explore other languages. That is where the old-timer, , may develop into popular again. Subsequently, it is worth it for today’s Data Scientists to develop into aware of it and possibly even attempt to implement some solutions.

Fortran, short for Formula Translator, was the primary widely used programming language that originated within the Fifties. Despite its age, it stays a high-performance computing language and might be faster than each C and C++.

Initially designed for scientists and engineers to run large-scale models and simulations in areas comparable to fluid dynamics and organic chemistry, Fortran continues to be ceaselessly used today by physicists. I even learned it during my physics undergrad!

Its specialty lies in modelling and simulations, that are essential for varied fields, including Machine Learning. Subsequently, Fortran is perfectly poised to tackle Data Science problems, as that’s exactly what it was invented to do many years ago.

Fortran has several key benefits over other programming languages comparable to C++ and Python. Listed here are a few of the fundamental points:

  • : Fortran is a compact language with only five native data types: INTEGER, REAL, COMPLEX, LOGICAL, and CHARACTER. This simplicity makes it easy to read and understand, especially for scientific applications.
  • : Fortran is usually used to benchmark the speed of high-performance computers.
  • : Fortran has a wide selection of libraries available, mainly for scientific purposes. These libraries provide developers with an unlimited array of functions and tools for performing complex calculations and simulations.
  • : Fortran has had multi-dimensional array support from the start, which is crucial for Machine Learning and Data Science comparable to Neural Networks.
  • : Fortran was built specifically for pure number crunching, which is different from the more general-purpose use of C/C++ and Python.

Nonetheless, it isn’t all sunshine and rainbows. Listed here are a few of Fortran’s drawbacks:

  • Not ideal for characters and text manipulation, so not optimal for natural language processing.
  • : Despite the fact that Fortran has many libraries, it’s removed from the whole number in Python.
  • The Fortran language has not got as large a following as other languages. This implies it hasn’t got quite a lot of IDE and plugin support or stack overflow answers!
  • : It’s explicitly a scientific language, so don’t try to construct an internet site with it!

Homebrew

Let’s quickly go over find out how to install Fortran in your computer. First, it is best to install (link here), which is a package manager for MacOS.

To put in Homebrew, simply run the command from their website:

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

You may confirm Homebrew is installed by running the command brew help. If there aren’t any errors, then Homebrew has been successfully installed in your system.

GCC Compiler

As Fortran is a , we’d like a compiler that may compile Fortran source code. Unfortunately, MacOS doesn’t ship with a Fortran compiler pre-installed, so we’d like to put in one ourselves.

A preferred option is the (GNU Compiler Collection) compiler, which you possibly can install through Homebrew: brew install gcc. The GCC compiler is a set of compilers for languages like C, Go, and after all Fortran. The Fortran compiler within the GCC group is named , that may compile all major versions of Fortran comparable to 77, 90, 95, 2003, and 2008. It is suggested to make use of the .f90 extension for Fortran code files, although there may be some discussion on this topic.

To confirm that gfortran and GCC have been successfully installed, run the command which fortran. The output should look something like this:

/opt/homebrew/bin/gfortran

The gfortran compiler is by far the most well-liked, nonetheless there are several other compilers on the market. An inventory of might be found here.

IDE’s & Text Editors

Once we’ve got our Fortran compiler, the following step is to decide on an Integrated Development Environment (IDE) or text editor to put in writing our Fortran source code in. This can be a matter of private preference since there are various options available. Personally, I take advantage of PyCharm and install the Fortran plugin because I prefer to not have multiple IDEs. Other popular text editors suggested by the Fortran website include Sublime Text, Notepad++, and Emacs.

Running a Program

Before we go onto our first program, it will be significant to notice that I won’t be doing a syntax or command tutorial in this text. Linked here is a brief guide that can cover all the essential syntax.

Below is a straightforward program called example.f90:

GitHub Gist by writer.

Here’s how we compile it:

gfortran -o example example.f90  

This command compiles the code and creates an executable file named example. You may replace example with another name you like. For those who don’t specify a reputation using the -o flag, the compiler will use a default name which is usually a.out for many based operating systems.

Here’s find out how to run the example executable:

./example

The ./ prefix is included to point that the executable is in the present directory. The output from this command will appear like this:

 Hello world
1

Now, lets tackle a more ‘real’ problem!

Overview

The is a widely known problem that poses:

A set of things, each with a price and weight, should be packed right into a knapsack that maximizes the whole value whilst respecting the load constraint of the knapsack

Although the issue sounds easy, the variety of solutions increases exponentially with the variety of items. Thus, making it to unravel by beyond a certain variety of items.

methods comparable to might be used to seek out a ‘ok’ or ‘approximate’ solution in an affordable period of time. For those who’re eager about learning find out how to solve the knapsack problem using the genetic algorithm, try my previous post:

The knapsack problem has sundry applications in Data Science and , including stock management and provide chain efficiency, rendering it necessary to unravel efficiently for business decisions.

On this section, we’ll see how quickly Fortran can solve the knapsack problem by pure brute-force in comparison with Python.

Note: We will likely be specializing in the essential version, which is the where each item is either fully within the knapsack or not in in any respect.

Python

Let’s start with Python.

The next code solves the knapsack problem for 22 items using a brute-force search. Each item is encoded as a 0 (not in) or 1 (in) in a 22-element length array (each element refers to an item). As each item has only 2 possible values, the variety of total mixtures is 2^(num_items). We utilise the itertools.product method that computes the of all of the possible solutions after which we iterate through them.

GitHub Gist by writer.

The output of this code:

Items in best solution:
Item 1: weight=10, value=10
Item 6: weight=60, value=68
Item 7: weight=70, value=75
Item 8: weight=80, value=58
Item 17: weight=170, value=200
Item 19: weight=190, value=300
Item 21: weight=210, value=400
Total value: 1111
Time taken: 13.78832197189331 seconds

Fortran

Now, let’s solve the identical problem, with the identical exact variables, but in Fortran. Unlike Python, Fortran doesn’t contain a package for performing permutations and mixtures operations.

Our approach is to make use of the operator to convert the iteration number right into a binary representation. For instance, if the iteration number is 6, the modulo of 6 by 2 is 0, which suggests the primary item isn’t chosen. We then divide the iteration number by 2 to shift the bits to the precise and take the modulo again to get the binary representation for the following item. That is repeated for each item (so 22 times) and eventually leads us to getting every possible combination.

GitHub Gist by writer.

Compile and execute using the linux time command:

time gfortran -o brute brute_force.f90
time ./brute

Output:

 Items in best solution:
Item: 1 Weight: 10 Value: 10
Item: 6 Weight: 60 Value: 68
Item: 7 Weight: 70 Value: 75
Item: 8 Weight: 80 Value: 58
Item: 17 Weight: 170 Value: 200
Item: 19 Weight: 190 Value: 300
Item: 21 Weight: 210 Value: 400
Best value found: 1111
./brute 0.26s user 0.01s system 41% cpu 0.645 total

The Fortran code is ~21 times quicker!

Comparison

To get a more visual comparison, we are able to plot the execution time as a function of the variety of items:

Plot generated by writer in Python.

Fortran blows Python out of the water!

Despite the fact that thte compute time for Fortran does increase, its growth isn’t nearly as large because it is for Python. This truly displays the computational power of Fortran with regards to solving optimisation problems, that are of critical importance in lots of areas of Data Science.

Although Python has been the go-to for Data Science, languages like Fortran can still provide significant value especially when coping with optimisation problems as a result of its inherent number-crunching abilities. It outperforms Python in solving the knapsack problem by brute-force, and the performance gap widens further as more items are added to the issue. Subsequently, as a Data Scientist, you would possibly want to think about investing your time in Fortran if you happen to need an edge in computational power to unravel your corporation and industry problems.

The total code utilized in this text might be found at my GitHub here:

(All emojis designed by OpenMoji — the open-source emoji and icon project. License: CC BY-SA 4.0)

3 COMMENTS

LEAVE A REPLY

Please enter your comment!
Please enter your name here