Bailey Kacsmar, PhD Candidate at University of Waterloo – Interview Series

Artificial Intelligence

Bailey Kacsmar, PhD Candidate at University of Waterloo – Interview Series

admin

June 29, 2023

Bailey Kacsmar, PhD Candidate at University of Waterloo – Interview Series

Bailey Kacsmar is a PhD candidate within the School of Computer Science on the University of Waterloo and an incoming faculty member on the University of Alberta. Her research interests are in the event of user-conscious privacy-enhancing technologies, through the parallel study of technical approaches for personal computation alongside the corresponding user perceptions, concerns, and comprehension of those technologies. Her work goals at identifying the potential and the constraints for privacy in machine learning applications.

Your research interests are in the event of user-conscious privacy-enhancing technologies, why is privacy in AI so vital?

Privacy in AI is so vital, largely because AI in our world doesn’t exist without data. Data, while a useful abstraction, is ultimately something that describes people and their behaviours. We’re rarely working with data about tree populations and water levels; so, anytime we’re working with something that may affect real people we should be cognizant of that and understand how our system can do good, or harm. This is especially true for AI where many systems profit from massive quantities of information or hope to make use of highly sensitive data (comparable to health data) to attempt to develop recent understandings of our world.

What are some ways that you simply’ve seen that machine learning has betrayed the privacy of users?

Betrayed is a powerful word. Nevertheless, anytime a system uses details about people without their consent, without informing them, and without considering potential harms it runs the chance of betraying individual’s or societal privacy norms. Essentially, this leads to betrayal by a thousand tiny cuts. Such practices may be training a model on users email inboxes, training on users text messages, or on health data; all without informing the themes of the info.

Could you define what differential privacy is, and what your views on it are?

Differential privacy is a definition or technique that has risen to prominence when it comes to use for achieving technical privacy. Technical definitions of privacy, generally speaking, include two key features; what’s being protected, and from who. Inside technical privacy, privacy guarantees are protections which might be achieved given a series of assumptions are met. These assumptions could also be in regards to the potential adversaries, system complexities, or statistics. It’s an incredibly useful technique that has a big selection of applications. Nevertheless, what is significant to take into accout is that differential privacy will not be equivalent with privacy.

Privacy will not be limited to at least one definition or concept, and it will be significant to pay attention to notions beyond that. For example, contextual integrity which is a conceptual notion of privacy that accounts for things like how different applications or different organizations change the privacy perceptions of a person with respect to a situation. There are also legal notions of privacy comparable to those encompassed by Canada’s PIPEDA, Europe’s GDPR, and California’s consumer protection act (CCPA). All of that is to say that we cannot treat technical systems as if they exist in a vacuum free from other privacy aspects, even when differential privacy is being employed.

One other privacy enhancing form of machine learning is federated learning, how would you define what that is, and what are your views on it?

Federated learning is a way of performing machine learning when the model is to be trained on a set of datasets which might be distributed across several owners or locations. It will not be intrinsically a privacy enhancing form of machine learning. A privacy enhancing form of machine learning must formally define what’s being protected, who’s being protected against, and the conditions that have to be met for these protections to carry. For instance, when we expect of a straightforward differentially private computation, it guarantees that somebody viewing the output won’t find a way to find out whether a certain data point was contributed or not.

Further, differential privacy doesn’t make this guarantee if, as an illustration, there may be correlation amongst the info points. Federated learning doesn’t have this feature; it simply trains a model on a set of information without requiring the holders of that data to directly provide their datasets to one another or a 3rd party. While that appears like a privacy feature, what is required is a proper guarantee that one cannot learn the protected information given the intermediaries and outputs that the untrusted parties will observe. This formality is particularly vital within the federated setting where the untrusted parties include everyone providing data to coach the collective model.

What are among the current limitations of those approaches?

Current limitations could best be described as the character of the privacy-utility trade-off. Even in case you do the whole lot else, communicate the privacy implications to those effected, evaluated the system for what you are attempting to do, etc, it still comes right down to achieving perfect privacy means we do not make the system, achieving perfect utility will generally not have any privacy protections, so the query is how will we determine what’s the “ideal” trade-off. How will we find the proper tipping point and construct towards it such that we still achieve the specified functionality while providing the needed privacy protections.

You currently aim to develop user conscious privacy technology through the parallel study of technical solutions for personal computation. Could you go into some details on what a few of these solutions are?

What I mean by these solutions is that we are able to, loosely speaking, develop any variety of technical privacy systems. Nevertheless, when doing so it will be significant to find out whether the privacy guarantees are reaching those effected. This will mean developing a system after checking out what sorts of protections the population values. This will mean updating a system after checking out how people actually use a system given their real-life threat and risk considerations. A technical solution might be an accurate system that satisfies the definition I discussed earlier. A user-conscious solution would design its system based on inputs from users and others effected within the intended application domain.

You’re currently looking for interested graduate students to begin in September 2024, why do you’re thinking that students must be serious about AI privacy?

I believe students must be interested since it is something that can only grow in its pervasiveness inside our society. To have some idea of how quickly these systems look no further than the recent Chat-GPT amplification through news articles, social media, and debates of its implications. We exist in a society where the gathering and use of information is so embedded in our day-to-day life that we’re almost continually providing details about ourselves to varied corporations and organizations. These corporations wish to use the info, in some cases to enhance their services, in others for profit. At this point, it seems unrealistic to think these corporate data usage practices will change. Nevertheless, the existence of privacy preserving systems that protect users while still allowing certain evaluation’ desired by corporations will help balance the risk-rewards trade-off that has turn into such an implicit a part of our society.

Bailey Kacsmar, PhD Candidate at University of Waterloo – Interview Series

1 COMMENT

LEAVE A REPLY Cancel reply