Up so far in my data visualization series, I even have covered the foundational elements of visualization design. These principles are essential to grasp before actually designing and constructing visualizations, as they make sure that the underlying data is completed justice. If you could have not done so already, I strongly encourage you to read my previous articles (linked above).
At this point, you might be ready to start out constructing visualizations of our own. I’ll cover various ways to achieve this in future articles—and within the spirit of information science, a lot of these methods would require programming. To make sure you are ready for this next step, this text will consist of a temporary review of Python essentials, followed by a discussion of their relevance to coding data visualizations.
The Basics—Expressions, Variables, Functions
Expressions, variables, and functions are the first constructing blocks of all Python code—and indeed, code in any language. Let’s take a take a look at how they work.
Expressions
An expression is a press release which evaluates to some value. The best possible expression is a continuing value of any type. As an example, below are three easy expressions: The primary is an integer, the second is a string, and the third is a floating-point value.
7
'7'
7.0
More complex expressions often consist of mathematical operations. We are able to add, subtract, multiply, or divide various numbers:
3 + 7
820 - 300
7 * 53
121 / 11
6 + 13 - 3 * 4
By definition, these expressions are evaluated right into a single value by Python, following the mathematical order of operations outlined by the acronym PEMDAS (Parentheses, Exponents, Multiplication, Division, Addition, Subtraction) [1]. For instance, the ultimate expression above evaluates to the number 7.0. (Do you see why?)
Variables
Expressions are great, but they aren’t super useful by themselves. When programming, you often need to avoid wasting the worth of certain expressions so that you may use them in later parts of our program. A variable is a container which holds the worth of an expression and enables you to access it later. Listed here are the very same expressions as in the primary example above, but this time with their value saved in various variables:
int_seven = 7
text_seven = '7'
float_seven = 7.0
Variables in Python have a number of essential properties:
- A variable’s name (the word to the left of the equal sign) should be one word, and it cannot start with a number. If it’s good to include multiple words in your variable names, the convention is to separate them with underscores (as within the examples above).
- You would not have to specify a knowledge type after we are working with variables in Python, as you could be used to doing if you could have experience programming in a distinct language. It is because Python is a language.
- Another programming language distinguish between the declaration and the task of a variable. In Python, we just assign variables in the identical line that we declare them, so there isn’t any need for the excellence.
When variables are declared, Python will all the time evaluate the expression on the proper side of the equal sign right into a single value before assigning it to the variable. (This connects back to how Python evaluates complex expressions). Here is an example:
yet_another_seven = (2 * 2) + (9 / 3)
The variable above is assigned to the worth 7.0, not the compound expression (2 * 2) + (9 / 3).
Functions
A function might be considered a type of machine. It takes something (or multiple things) in, runs some code that transforms the thing(s) you passed in, and outputs back exactly one value. In Python, functions are used for 2 primary reasons:
- To control input variables of interest and give you an output we’d like (very similar to mathematical functions).
- To avoid code repetition. By packaging code inside a function, we are able to just call the function every time we’d like to run that code (versus writing the identical code many times).
The best option to understand methods to define functions in Python is to take a look at an example. Below, we have now written an easy function which doubles the worth of a number:
def double(num):
doubled_value = num * 2
return doubled_value
print(double(2)) # outputs 4
print(double(4)) # outputs 8
There are quite a few small print concerning the above example it’s best to make sure you understand:
- The
defkeyword tells Python that you would like to define a function. The word directly afterdefis the name of the function, so the function above is nameddouble. - After the name, there’s a set of parentheses, inside which you place the function’s parameters (a elaborate term which just mean the function’s inputs). Vital: In case your function doesn’t need any parameters, you continue to need to incorporate the parentheses—just don’t put anything inside them.
- At the top of the
defstatement, a colon should be used, otherwise Python is not going to be blissful (i.e., it’s going to throw an error). Together, the complete line with thedefstatement is named the function signature. - The entire lines after the
defstatement contain the code that makes up the function, indented one level inward. Together, these lines make up the function body. - The last line of the function above is the return statement, which specifies the output of a function using the
returnkeyword. A return statement doesn’t necessarily should be the last line of a function, but after it’s encountered, Python will exit the function, and no more lines of code can be run. More complex functions could have multiple return statements. - You call a function by writing its name, and putting the specified inputs in parentheses. For those who are calling a function with no inputs, you continue to need to incorporate the parentheses.
Python and Data Visualization
Now then, let me address the query you could be asking yourself: Why all this Python review to start with? In spite of everything, there are numerous ways you’ll be able to visualize data, they usually definitely aren’t all restricted by knowledge of Python, and even programming generally.
That is true, but as a knowledge scientist, it is probably going that you have to to program in some unspecified time in the future—and inside programming, it’s exceedingly likely the language you employ can be Python. Whenever you’ve just been handed a knowledge cleansing and evaluation pipeline by the information engineers in your team, it pays to know methods to quickly and effectively turn it right into a set of actionable and presentable visual insights.
Python is very important to know for data visualization generally speaking, for several reasons:
- It’s an accessible language. For those who are only transitioning into data science and visualization work, it’s going to be much easier to program visualizations in Python than it’s going to be to work with lower-level tools corresponding to D3 in JavaScript.
- There are numerous different and popular libraries in Python, all of which give the flexibility to visualise data with code that builds directly on the Python basics we learned above. Examples include Matplotlib, Seaborn, Plotly, and Vega-Altair (previously referred to as just Altair). I’ll explore a few of these, especially Altair, in future articles.
- Moreover, the libraries above all integrate seamlessly into pandas, the foundational data science library in Python. Data in pandas might be directly incorporated into the code logic from these libraries to construct visualizations; you regularly won’t even must export or transform it before you’ll be able to start visualizing.
- The fundamental principles discussed in this text could appear elementary, but they go a great distance toward enabling data visualization:
- Computing expressions accurately and understanding those written by others is crucial to making sure you might be visualizing an accurate representation of the information.
- You’ll often must store specific values or sets of values for later incorporation right into a visualization—you’ll need variables for that.
- Sometimes, you’ll be able to even store in a variable for later use or display.
- The more advanced libraries, corresponding to Plotly and Altair, mean you can call built-in (and sometimes even user-defined) functions to customize visualizations.
- Basic knowledge of Python will enable you to integrate your visualizations into easy applications that might be shared with others, using tools corresponding to Plotly Dash and Streamlit. These tools aim to simplify the technique of constructing applications for data scientists who’re recent to programming, and the foundational concepts covered in this text can be enough to get you began using them.
If that’s not enough to persuade you, I’d urge you to click on certainly one of the links above and begin exploring a few of these visualization tools yourself. Once you begin seeing what you’ll be able to do with them, you won’t return.
Personally, I’ll be back in the following article to present my very own tutorial for constructing visualizations. (A number of of those tools may make an appearance.) Until then!
