Home Artificial Intelligence Extending PAC Learning to a Strategic Classification Setting

Extending PAC Learning to a Strategic Classification Setting

0
Extending PAC Learning to a Strategic Classification Setting

Why Strategic Classification Is Useful: Motivation

Binary classification is a cornerstone of machine learning. It was the primary topic I used to be taught once I took an introductory course on the topic; the real-world example we examined back then was the issue of classifying emails as either spam or not spam. Other common examples include diagnosing a disease and screening resumes for a job posting.

The fundamental binary classification setup is intuitive and simply applicable to our day-to-day lives, and it could function a helpful demonstration of the ways we are able to leverage machine learning to resolve human problems. But how often can we stop to think about the proven fact that people often have a vested interest within the classification final result of such problems? Spammers want their emails to make it through spam filters, not everyone wants their COVID test to come back back positive, and job seekers could also be willing to stretch the reality to attain an interview. The information points aren’t just data points — they’re lively participants within the classification process, often aiming to game the system to their very own profit.

In light of this, the canonical binary classification setup seems a bit simplistic. Nevertheless, the complexity of reexamining binary classification while tossing out the implicit assumption that the objects we wish to categorise are uninfluenced by external stakes sounds unmanageable. The preferences that would affect the classification process are available so many various forms — how could we possibly take all of them into consideration?

It seems that, under certain assumptions, we are able to. Through a clever generalization of the canonical binary classification model, the paper’s authors show the feasibility of designing computationally-tractable, gaming-resistant classification algorithms.

From Data Points to Rational Agents: Preference Classes

First, if we wish to be as realistic as possible, we’ve got to properly consider the wide breadth of forms that real-world preferences can take amongst rational agents. The paper mentions five increasingly general categories of preferences (which I’ll call preference classes). The names I’ll use for them are my very own, but are based on the terminology utilized in the paper.

  1. Impartial: No preferences, similar to in canonical binary classification.
  2. Homogeneous: Equivalent preferences across all of the agents involved. For instance, inside the set of people who find themselves willing to fill out the paperwork vital to use for a tax refund, we are able to reasonably expect that everyone seems to be equally motivated to get their a refund (i.e., to be classified positively).
  3. Adversarial: Equally-motivated agents aim to induce the alternative of their true labels. Consider bluffing in poker — a player with a weak hand (negatively classified) wants their opponents to think they’ve a robust hand (positively classified), and vice versa. For the “equally-motivated” part, imagine all players bet the identical amount.
  4. Generalized Adversarial: Unequally-motivated agents aim to induce the alternative of their true labels. This isn’t too different from the plain Adversarial case. Still, it needs to be easy to grasp how a player with $100 dollars on the road could be willing to go to greater lengths to deceive their opponents than a player betting $1.
  5. General Strategic: Anything goes. This preference class goals to encompass any set of preferences conceivable. All 4 of the previously mentioned preference classes are strict subsets of this one. Naturally, this class is the most important focus of the paper, and a lot of the results demonstrated within the paper apply to it. The authors give the wonderful example of faculty applications, where “students [who] have heterogeneous preferences over universities […] may manipulate their application materials in the course of the admission process.

How can the canonical classification setup be modified to account for such wealthy agent preferences? The reply is astoundingly easy. As a substitute of limiting our scope to (x, y) ∈ X × { -1, 1 }, we consider data points of the shape (x, y, r) ∈ X × { -1, 1 } × R. A degree’s r value represents its preference, which we are able to break down into two equally necessary components:

  • The sign of r indicates whether the information point desires to be positively or negatively classified (r > 0 or r < 0, respectively).
  • The absolute value of r specifies how strong the information point’s preference is. For instance, a knowledge point with r = 10 could be way more strongly motivated to control its feature vector x to make sure it finally ends up being positively classified than a knowledge point with r = 1.

What determines the preference class we operate inside is the set R. We will formally define each of the aforementioned preference classes by way of R and see how the formal definitions align with their intuitive descriptions and examples:

  1. Impartial: R = { 0 }. (This makes it abundantly clear that the strategic setup is only a generalization of the canonical setup.)
  2. Homogeneous: R = { 1 }.
  3. Adversarial: R = { -1, 1 }, with the added requirement that every one data points prefer to be classified as the alternative of their true labels.
  4. Generalized Adversarial: R ⊆ ℝ (and all data points prefer to be classified as the alternative of their true labels.)
  5. General Strategic: R ⊆ ℝ.

Giving Preference Magnitude Meaning: Cost Functions

Clearly, though, R by itself isn’t enough to construct a whole general strategic framework. The very idea of a knowledge point’s preference having a certain magnitude is meaningless without tying it to the associated fee the information point incurs in manipulating its feature vector. Otherwise, any data point with a positive r, regardless of how small, would haven’t any reason not to control its feature vector ad infinitum. That is where the concept of cost functions comes into play.

Let c: X × X → ℝ⁺. For simplicity, we’ll assume (because the paper’s authors do) that c is induced by seminorms. We are saying that a test data point (x, y, r) may transform its feature vector x into z X with cost c(z; x). It’s necessary to notice on this context that the paper assumes that the training data is unmanipulated.

We will divide cost functions into two categories, with the previous being a subset of the latter. An instance-invariant cost function is similar across all data points. To place it more formally:

∃ℓ: X × X → ℝ⁺ . ∀(x, y, r) ∈ X × { -1, 1 } × R . ∀zX . c(z; x) = ℓ(zx)

I.e., there exists a function ℓ such that for all data points and all potential manipulated feature vectors, c(z ; x) simply takes the worth of ℓ(zx).

An instance-wise cost function may vary between data points. Formally:

∀(x, y, r) ∈ X × { -1, 1 } × R . ∃ℓ: X × X → ℝ⁺ .zX . c(z; x) = ℓ(z – x)

I.e., each data point can have its own function,, and c(z; x) takes the worth of ℓ(z – x) for every individual data point.

As we’ll see in the ultimate article on this series, while the difference between the 2 forms of cost functions could appear subtle, instance-wise cost functions are significantly more expressive and harder to learn.

Preference Classes and Cost Functions in Motion: An Example

Let’s take a take a look at an example given within the paper to assist hammer home the features of the setup we’ve covered up to now.

Image by R. Sundaram, A. Vullikanti, H. Xu, F. Yao from PAC-Learning for Strategic Classification (use under CC-BY 4.0 license).

In this instance, we’ve got a choice boundary induced by a linear binary classifier and 4 data points with individual preferences. General strategic is the one applicable preference class on this case.

The dotted perimeter around each xᵢ shows the manipulated feature vectors z to which it will cost the purpose exactly 1 to maneuver. Since we assume the associated fee function is induced by seminorms, the whole lot inside a fringe has a price of lower than 1 for the corresponding data point to maneuver to. We will easily tell that the associated fee function in this instance varies from data point to data point, which suggests it’s instance-wise.

As we are able to see, the leftmost data point (x₁, -1, -1) has no incentive to cross the choice boundary because it is on the negative side of the choice boundary while also having a negative preference. (x₄, -1, 2), nonetheless, desires to be positively classified, and for the reason that reward for manipulating x to cross the boundary (which is 2) outweighs the associated fee of doing so (which is lower than 1), it is smart to undergo with the manipulation. (x₃, 1, -2) is symmetric to (x₄, -1, 2), also deciding to control its feature to attain its desired classification final result. Lastly, (x₂, -1, 1), the associated fee function of which we are able to see is predicated on taxicab distance, opts to remain put no matter its preference to be positively classified. It’s because the associated fee of manipulating x₂ to cross the choice boundary could be greater than 1, surpassing the reward the information point would stand to achieve by doing so.

Assuming the agents our data points represent are rational, we are able to very easily tell when a knowledge point should manipulate its feature vector (advantages outweigh costs) and when it shouldn’t (costs outweigh advantages). The following step is to show our intuitive understanding into something more formal.

Balancing Costs & Advantages: Defining Data Point Best Response

This leads us to define the data point best response:

So we’re in search of the feature vector(s) zX that maximize… what exactly? Let’s break down the expression we’re aiming to maximise into more manageable parts.

  • h: A given binary classifier (h: X → { -1, 1 }).
  • c(z; x): As stated above, this expresses the cost of modifying the feature vector x to be z.
  • 𝕀(h(z) = 1): Here, 𝕀(p) is the indicator function, returning 1 if the predicate p is upheld or 0 if it isn’t. The predicate h(z) = 1 is true if the vector z into consideration is positively classified by h. Putting that together, we discover that 𝕀(h(z) = 1) evaluates to 1 for any z that’s positively classified. If r is positive, that’s good. If it’s negative, that’s bad.

The underside-line is that we wish to seek out vector(s) z for which 𝕀(h(z) = 1) ⋅ r, which we are able to call the realized reward, outweighs the associated fee of manipulating the unique x into z by as much as possible. To place it in game theoretic terms, the information point best response maximizes the utility of its corresponding agent within the context of the binary classification into consideration.

Putting It All Together: A Formal Definition of the Strategic Classification Problem

Finally, we’ve laid all of the vital groundwork to formally define the strategic classification problem.

A diagram illustrating the formal definition of the strategic classification problem. Image by creator.

Given a hypothesis class H, a preference class R, a price function c, and a set of n data points drawn from a distribution D, we wish to seek out a binary classifier h’ that minimizes the loss as defined within the diagram above. Note that the loss is solely a modification of the canonical zero-one loss, plugging in the information point best response as an alternative of h(x).

Conclusion

Ranging from the canonical binary classification setup, we introduced the notion of preference classes. Next, we saw methods to formalize that notion using an r value for every data point. We then saw how cost functions complement data point preferences. After that, we broke down an example before defining the important thing concept of data point best response based on the ideas we explored beforehand. Lastly, we used the information point best response to define the modified zero-one loss utilized in the definition of the strategic classification problem.

Join me next time as I define and explain the strategic VC dimension, which is the natural next step from where we left off this time.

References

[1] R. Sundaram, A. Vullikanti, H. Xu, F. Yao. PAC-Learning for Strategic Classification (2021), International Conference on Machine Learning.

LEAVE A REPLY

Please enter your comment!
Please enter your name here