Conceptual Frameworks for Data Science Projects

-

are analytical structures for representing abstract concepts and organizing data. Data scientists usually use such frameworks — knowingly or unknowingly — to derive project plans, select machine learning models that balance various trade-offs, and present findings and proposals to stakeholders. This text provides an summary of common sorts of conceptual frameworks, an easy three-step process for constructing custom frameworks, and suggestions for successfully doing so.

Note: All figures in the next sections have been created by the creator of this text.

Common Framework Types

Although conceptual frameworks are available many alternative sizes and shapes, 4 basic framework types stand out as being especially common in data science projects: , , , and . We’ll briefly go over each of those framework types below.

Hierarchies

Hierarchical frameworks often take the shape of tree diagrams, starting with a root node and ending with several leaf nodes, as shown in Figure 1. For instance, the basis node may represent an overarching concept in a taxonomy or an initial binary query in a choice tree. A node’s position within the hierarchy (or tree) gives us precious details about its relationship to other nodes. Although Figure 1 labels the items within the hierarchy as “concepts,” they could be any type of entity. Entities could also be neutral (e.g., concepts, topics, segments) or have some positive or negative valence (e.g., revenues, costs, problems, issues). The hierarchical structure can vary in depth and breadth.

Figure 1: Generic Structure of a Hierarchical Framework

In visual representations of hierarchies, vertical links between two entities are typically drawn explicitly and could be non-directional (easy lines) or directional (downward or upward arrows, depending on the flow of the connection). Against this, horizontal links between entities at the identical level of a hierarchy are typically not shown explicitly. Same-level entities could also be subject to a natural ordering (e.g., temporal or spatial), which could be shown by placing them accordingly within the framework. For example, entities that occur earlier in an ordering needs to be placed to the left of entities that occur later. If the entities don’t include a natural ordering, you may still consider ordering them not directly (e.g., by level of importance or priority) to help reasoning. Entities at the identical level in a hierarchy should generally even be at the identical level of abstraction.

In lots of situations, it helps if the nodes of a hierarchy are and , or (pronounced “me-see”), to a big extent. Being mutually exclusive signifies that the concepts represented by individual nodes don’t have any major overlaps (i.e., no redundancies), while being cumulatively exhaustive signifies that the framework leaves out nothing essential. A MECE hierarchy could be useful for breaking down a broad concept into sub-concepts (or components) to discover key drivers of the entire.

Matrices

A matrix is a tabular data structure consisting of rows and columns. Data scientists working on tabular use cases routinely leverage matrices for storing training data and model weights. Training machine learning models can yield high-dimensional matrices of weights that capture complex relationships between predictors and targets. Low-dimensional matrices just like the one shown in Figure 2 could be useful for analyzing problems and communicating key insights.

Figure 2: Generic Structure of a Two-by-Two Matrix Framework

The generic two-by-two matrix shown in Figure 2 compares two different dimensions against one another. Such a matrix naturally yields 4 quadrants. By convention, the bottom-left quadrant (where each dimensions are “low”) is usually the undesirable region of the matrix, and the top-right quadrant (where each dimensions are “high”) represents the desirable region. For instance, the market research firm Gartner uses two-by-two matrices to research the competitive landscape in various industry sectors and calls the top-right region of the matrix (where the market leaders are plotted) the “magic quadrant.”

The size of a matrix may represent continuous, ordinal or categorical data types. Ideally, these dimensions (or axes) needs to be essential to the overarching framework objective not directly (e.g., key sub-concepts, problems, or drivers in a given context). The interactions between these dimensions needs to be of particular interest as a source of insight, because it is these interactions that matrices can capture well.

On the whole, the MECE principle also applies to the selection of dimensions — they need to collectively cover the essential sub-concepts or drivers of the issue being investigated and avoid redundancies. Otherwise, taking a look at the interaction will likely be no different from taking a look at a person dimension. If analyzing the interaction just isn’t essential, a hierarchical framework could also be more suitable. Converting between a matrix framework and its hierarchical analog could be straightforward. For example, to rework the matrix in Figure 2 right into a hierarchy, create a root node that defines the general context, let its child nodes be Dimensions 1 and a pair of, and let their respective child nodes be “high” and “low.”

Process Flows

A process flow defines a sequence of logically ordered activities that interact to attain an overarching objective. For example, tools similar to Dataiku and KNIME allow users to construct data science pipelines as process flows, going from data ingestion all of the approach to modeling and report generation. Figure 3 depicts a generic process framework.

Figure 3: Generic Structure of a Process Framework

The entities of the method in Figure 3 are labeled as activities, but these might be steps, stages, operations, etc. The method starts with an activity (Activity 1), ends with an activity (Activity 3), and has a number of activities in between (Activity 2). Some inputs are typically fed into the method at the beginning and transformed over the sequence of activities to yield an output. Note that inputs and outputs may enter and leave at intermediate steps throughout the process.

As with hierarchies and matrices, the MECE principle could be essential in formulating the various activities of the method. If two activities have significant conceptual overlap, you could possibly consider either grouping them right into a single activity or breaking them up right into a more granular set of distinct activities. For example, the intermediate activities in Figure 9 could have resulted from this type of research; Activity 2 might be the end result of merging some overlapping activities, while Activities 2.1-2.3 might be a granular breakdown of a special subset of those merged activities. If an activity or a bigger a part of the method repeats, then it may be represented as a cycle, whereby an activity transitions to a different activity that has already occurred before.

The transition from one activity to a different should meaningfully transform the inputs of the method (e.g., by increasing, reducing, combining or otherwise altering the inputs not directly) with the aim of manufacturing the specified output. If a transition doesn’t change the inputs, then the 2 activities on either side of the transition are likely redundant and needs to be merged or split up otherwise, as discussed above.

Relational Maps

Relational maps shift the main target from individual concepts (or entities) to the relationships between them. Data scientists working with knowledge graphs or box-and-arrow “path diagrams” of causal relationships (as shown in Figure 4) will likely be acquainted with this framework type.

Figure 4: Generic Structure of a Path Diagram

A relationship can generally be any function that links two different concepts together. 4 sorts of relationships are especially common:

  • Transactional: A relationship can represent a number of transactions between entities. The transactions may involve the flow of tangible things (e.g., products bought and sold) or intangible things (e.g., information, money). Transactional relationships can incorporate directionality; a transaction can flow from A to B, from B to A, or in each directions, and every of those cases has a unique meaning for the entities (e.g., they might be receivers, senders, or each).
  • Causal: Entities A and B could also be causally related if A is responsible — a minimum of partly — for the occurrence or state of B (or vice versa). The character of the causal relationship may vary. The role of A is powerful if its presence is sufficient to completely cause B (although A will not be the one entity that may fully cause B). The role of A can be strong whether it is needed to cause B (although A may not have the ability to do that alone). Furthermore, if A causes B, it doesn’t necessarily follow that B causes A; the notion of directionality is clearly essential for specifying causal relationships.
  • Similarity-based: Entities could also be related because they’re similar or dissimilar not directly. For instance, entities A and B could be similar because they have a tendency to seem in the identical place or occur at the identical time (and dissimilar if the occurrence of 1 entity tends to preclude the occurrence of the opposite). The notion of is a mathematical formalization often used to construct measurable, similarity-based relationships. Note that, simply because two entities are correlated doesn’t necessarily mean that they’re causally related (although in the event that they are causally related, then they might even be correlated).
  • Membership-based: Entities could be linked together by being members of the identical group, community, or category. For example, people could be related by being in the identical neighborhood, grocery items could be a part of the identical product category, and a set of sub-concepts could also be a part of an overarching concept. Indeed, one could apply a hierarchical framework to drill down into successively deeper levels of membership inside entities into account.

How you can Construct Your Own Frameworks

The next three-step process could be used to construct a custom framework:

  1. Define the framework’s objective.
  2. Discover the precise constructing blocks (i.e., the framework type and dimensions).
  3. Put the constructing blocks together in an efficient manner to reply the framework’s objective.

Step 1: Define the Objective

In defining the framework’s objective, ask yourself: In what context will the framework be used? What should the framework accomplish? Can an existing framework be reused — perhaps with some minor modifications — or does a brand new one must be built to suit your specific needs?

The development of the framework needs to be tied to the next goal, similar to the delivery of a project, formulation of a choice, or creation of some documentation. Once the context has been properly understood, careful consideration needs to be given to what the framework should accomplish in concrete terms. Is the framework intended as a decision-making tool? Is the framework meant to structure the flow of an argument in a report or a presentation?

Simply because you wish a framework doesn’t mean that you will need to construct one yourself. In lots of situations, existing conceptual frameworks could be reused without significant modification. Spending some effort to keep up a solid, up-to-date overview of relevant existing frameworks avoids downstream costs of “reinventing the wheel.” Reusing existing frameworks has advantages beyond not having to begin from scratch; if the framework has been around for a while, its principal features, in addition to its strengths and limitations, could also be well-documented and tested in several settings. Platforms similar to are an important source for keeping abreast of conceptual frameworks related to data science projects.

Step 2: Discover the Framework Type and Dimensions

Having clarified the target of the framework, it’s time to think more concretely concerning the construction of the framework itself. One in all the principal difficulties here is that conceptual frameworks are inherently not as tangible as physical ones (like molds in a factory). We are inclined to intuit the link between form and performance — the framework and its purpose — more easily when the framework and its object are tangible. The hallmark of a superb conceptual framework is its ability to show a seemingly intangible argument or decision into something more tangible, and the important thing to that is representation.

Broadly speaking, there are two points that determine the representation of conceptual frameworks: the of the framework and the of the framework. You might be more likely to notice the framework type first because it determines how the framework appears as a complete. The previous sections covered the 4 common framework types. The framework dimensions dictate what the framework can specifically represent (e.g., by way of granularity and ordering). By adjusting the scale, the identical framework type could be reused to generate a big selection of various insights. Following are three common classes of framework dimensions:

  • Categorical: These dimensions consist of a finite set of discrete categories that fully describe the dimension. The categories needn’t be ordered (e.g., a set of products, customer segments, gender).
  • Ordinal: These dimensions are ordered, which implies which you can analyze whether something is “lower than,” “greater than,” “equal to,” and so forth, in relation to something else (e.g., negative/positive, low/medium/high).
  • Continuous: Such dimensions can take the notion of ordinal dimensions to a rather more granular level. Being continuous signifies that the dimension is numerical and might include decimals (e.g., 1.23, -2.718, 3.14159).

Step 3: Put It All Together

Once the framework type and dimensions have been identified, they could be combined to provide a custom framework. Often, the identification and combination steps usually are not explicitly separated, because you rarely do one without the opposite. However the framework type and its dimensions — the fundamental constructing blocks — usually are not necessarily wedded to one another. Some mixtures may make more sense than others, and you may generally mix and match the constructing blocks in some ways, over several iterations, until the framework feels right. Give you the option to identify and exploit this combinatorial flexibility is a vital skill that it’s best to start developing from the outset of your framework-building journey.

Furthermore, there are broadly 4 “pathways of research” that capture the link between the framework and its objective:

  • Descriptive: Approaches the framework’s objective by gathering and organizing past information (e.g., using visuals similar to graphs and tables, or written summaries). Doing so allows us to higher describe and analyze what happened up to now, but it surely may not necessarily tell us why something happened, or whether it’ll occur again.
  • Diagnostic: Takes descriptive information of past events and goes a step further to take a look at why something happened. This is completed by drilling down into the info, mining for clues and correlations, and trying to seek out a plausible link between cause and effect. As with the descriptive pathway, the main target is on the past.
  • Predictive: Differs from the prior two by asking and answering questions on the long run. The main focus is on making an informed guess about what’s going to occur in the long run by counting on a bunch of typically quantitative techniques that range from the easy (e.g., basic probability theory, linear models) to the more complex (e.g., neural nets).
  • Prescriptive: Goes beyond merely predicting future events to recommending ways to take care of them. The main focus is on determining the best way to make something occur — or whether it should occur — in the long run. The reasoning for the prescription could be quantitative (e.g., based on statistics or simulation modeling) or qualitative (e.g., based on personal experience).

Framework types and dimensions can subsequently be combined in other ways to provide custom frameworks that lend themselves to descriptive, diagnostic, predictive, and prescriptive use cases.

Top Suggestions

This section gives five suggestions for constructing good conceptual frameworks. The information are in no way an exhaustive list of the points that it’s best to consider, but represent a basic set of things to take into account.

Tip 1: Deal with the Objective and Audience

The means of constructing frameworks broadly consists of three steps, namely defining the target, then identifying and mixing the constructing blocks (framework types and dimensions) accordingly. While step one will, by its nature, emphasize the strategic objective and audience of the framework, the main target within the latter two steps shifts to the nitty-gritty details of the framework’s constructing blocks. The deeper you get into the mechanics of the framework, the harder it may be to keep up visibility of the unique objective. To keep up visibility of the larger picture, it may help to take a step back infrequently throughout the framework-building process and remind yourself of the strategic objective and audience. It may additionally help to delay a part of the evaluation until the needed data becomes available and to hunt regular feedback from colleagues and the audience of your framework where possible.

Tip 2: Keep It as Easy as Possible

To paraphrase a quote often attributed to Albert Einstein — some of the achieved builders of conceptual frameworks of the last century — we will say that a framework needs to be made so simple as possible, but not simpler. Because the process inherently involves trying out different mixtures of framework types and dimensions, it may sometimes be tempting to snap an increasing number of pieces together. Yet sacrificing simplicity can potentially diminish the broader value of the framework in practice. Complex frameworks could be obscure, apply, evaluate, and construct — it’s possible you’ll have to confirm several assumptions and preconditions, and adjust many alternative levers throughout the framework.

Tip 3: Make It MECE 

Ensuring that a framework is MECE has some essential benefits. From a theoretical standpoint, being MECE signifies that the sub-concepts follow a consistent, additive part-whole logic; you expect the sub-concepts to “add up” to form the larger concept. Crucially, this logic lets you substitute the set of sub-concepts for the larger concept (and vice versa) throughout your evaluation. The additive logic of MECE also enables you to compare different concepts in a rigorous manner; as an alternative of claiming that two concepts are similar, you may state precisely the extent to which they’re similar by identifying the sub-concepts they share. From a practical perspective, being MECE means which you can “divide and conquer” big problems efficiently and solutions to some sub-problems could also be reusable. Sometimes you may even reach the answer of the larger problem without solving all of the sub-problems (e.g., if the larger problem could be represented as a disjunction of the sub-problems). Bypassing sub-problems also works if you end up solving the larger problem inductively (e.g., as in mathematical induction).

Tip 4: Make It Flexible 

Fundamentally, a conceptual framework needs to be designed to satisfy its overall objective, so it’s possible you’ll be wondering why flexibility is a vital aspect to contemplate. In practice, there are a minimum of two sorts of situations during which flexibility generally is a big help. In the primary situation, it’s possible you’ll be coping with an objective that could be a moving goal, with some parts of the target’s full scope changing (even barely) infrequently; responding to such scope changes generally is a pain if some flexibility just isn’t baked into the framework. Within the second situation, your framework could have to undergo several iterations, during which different framework types and dimensions are added, modified and removed over the course of the framework’s evolution; a versatile design makes it much easier to facilitate such alterations of the framework’s shape and content. Modularity, scalability, robustness, extensibility, and portability — while typically related to software engineering and architecture — are also relevant design considerations for constructing flexible conceptual frameworks.

Tip 5: Construct It Iteratively 

It might be great if you happen to could provide you with the proper framework in a single go, but it surely rarely works out that way. Several aspects could make the primary iteration more of a primary draft, to be followed by a minimum of a couple of more. The overarching objective — and particularly the operational implications on the subject of constructing the framework — will not be fully clear at first. Over a few iterations, nonetheless, you’ll likely begin to get the hang of which framework types and dimensions work and which don’t. While your output after a given iteration could also be removed from perfect, it could nevertheless amount to a (MVP) if it yields a viable solution to the overarching objective with minimal effort and complexity. The MVP could be tested (e.g., with actual data and real users) to know its strengths and weaknesses. Each successive iteration can produce an improved MVP by adding, removing or changing features of the previous iteration.

To shut off, here’s a video that shares some more good advice on constructing and using conceptual frameworks:

The Wrap

Conceptual frameworks help us turn abstract ideas into concrete, tangible products that other people can see, use, and appreciate. This could be especially essential for data scientists, or so-called “knowledge staff,” whose jobs involve collecting, analyzing, and deriving conclusions from data. Should you are reading this text, you’re probably a knowledge employee. To paraphrase famous management guru Peter Drucker, “It’s data that permits knowledge staff to do their job,” but it surely is the power to meaningfully organize this data that results in a job well done — and that, in a nutshell, is why the correct use of conceptual frameworks can aid the successful design and delivery of information science projects.

ASK ANA

What are your thoughts on this topic?
Let us know in the comments below.

0 0 votes
Article Rating
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments

Share this article

Recent posts

0
Would love your thoughts, please comment.x
()
x