And Sydney too. Consolidating some thoughts on an exciting two weeks of surprises, advances, and retreats in AI.
The next brings together three twitter threads on the launch of ChatGPT, the launch of Bing Chat, the surprises and the (too predictable) pullback of “Sydney”.
1/ No exaggeration to say that perhaps chatGPT has captured the imagination of the broadest set of individuals within the shortest time of any computing technology. That anyone can use immediately it and see for themselves is a giant a part of it. How will productivity using “it” evolve?
2/ The polarization of fear <> opportunity is consistent with principally every thing as of late. That makes me a bit sad. I’m definitely one among those who see this as an advance. I’m a believer is full throttle innovation and never second guessing at every step.
3/ Plenty of 4-D chess predicting where things will go. Who will win or lose? How much a platform shift is “AI” or not? It’s too soon to know. If PC, phone, cloud, or web are a guide — wary/pessimists will quickly fall behind because exponential growth is like that.
4/ There are parallels to learn from and help guide us on how technology will evolve. Not the one path, however the styles of paths that may follow. History rhymes. Why? Because each producers and consumers are humans and humans follow patterns, not precisely though.
5/ First, in the following 6–12 months every product (site/app) that has a free form text field can have an “AI-enhanced” text field. All text entered (spoken) can be embellished, corrected, refined, or “run through” an LLM. Every text box becomes a prompt box.
6/ This can be a trivial add for many any product. Some will enhance with more bells & whistles. For instance there could be an automatic suggestion (API costs aside) or several specific “query expansions” that take the text and guide the enhancement. Everyone will call the API.
7/ This can be done to call attention to the brand new feature but additionally so as to add more surface area upon which to prove there’s some depth to the work beyond just feeding what one types to the LLM.
8/ This jogs my memory of the mundane example of spell-checking moved from a stand alone feature to integrated into word processing to suites after which 💥 it showed up within the browser. Swiftly it wasn’t an app feature but every text box had squiggles.
9/ At first this was neat. Then we quickly learned just how much we type in edit boxes isn’t amenable to spell checking. All those street names, part numbers, and other people names. In case you blindly autocorrected you’d find yourself with gibberish. You continue to do. Don’t judge prematurely!!
10/ LLMs can be like this too. In other words the invention will rapidly diffuse but actually “create work” for people as with a purpose to realize the profit there must be a “human within the loop” to smooth out the improved or “fix” text.
11/ People can be very critical of this step. Many will poo-poo the innovation saying that it takes more time to repair than it does to only do the work the primary time. Grammar checking was like that as well. But not for everybody…
PS/ Some will immediately wish to ban using a tool that’s “flawed” or “removes humans”. You may think spell checking is trivial, but I needed to get permission to make use of it fresh yr to put in writing papers. In High School I needed to ask the principle to make use of it. similar to calculators.
12/ Then after some time period there can be a far more significant advance within the technology. Old spell checking was a dictionary on an area PC. Then Google used all it dropped at the party to release “cloud spell check” which used…the web.
13/ Spelling (and grammar) haven’t been the identical since. That was 10 years after the primary browsers and something like 25 years after the primary stand-alone spell checkers for PCs and now everyone has it. But nobody is dfferentiated by it. Why?
PS/ Here’s a history of red squiggles.
The idle loop is a devil’s playground. –Developers on the Word team https://hardcoresoftware.learningbyshipping.com/p/036-fancy-wizard-and-red-squiggles
14/ There have been many stand-alone corporations that offered spelling (or analagois writing tools). Many *word processors* focused on “higher typing” and adding spelling, grammar, tabs, page numbers, etc. But then a feature war ensued. 100’s and 1000’s of latest features.
15/ Many bemoan these feature wars as “bloat” but they play a crucial part in how customers eat products and it means you may’t be a “one trick pony” out there. There is no such thing as a “easy”. Consumers acquire productivity tools for the “worst case” not the easy or only case.
16/ Critically, the winning product is one which does essentially the most *vital* work, not essentially the most mundane work. Ex, typewriters were good at filling out forms, but word processors weren’t. It took years before forms were a WP thing, but books, manifestos, etc. → WORD!
17/ What matters is doing vital work, not simply automating low cost or easy work. The tools that win will generalize to an important problems people face. The associated fee of adding additional tools or “point solutions” is far higher than savings.
18/ Ppl think automating the easy “eliminates jobs”. That has not shown to be the case. In reality, what has tended to occur is more automation/creation tools mean more people add “creation” to their jobs. Now all of us fill out forms, not only administrative support. (unknown src)
19/ Higher tools bring creation closer to where “human within the loop” adds most value. PowerPoint is an ex of that. All of us might bemoan “slides” but with a terrific tool (for *vital* work) essentially the most expert/knowledgable will use the tools to do what was previously “support” work.
20/ When email was on the rise and Office was just Word, Excel, PowerPoint we were deeply concerned that each one “documents” would change into easy ASCII mail messages — all those features would not matter. *WRONG* Email made everyone a “content contributor” even big time execs.
21/ Ppl *did* stop using Word for one page meeting agendas. Many stopped taking notes in Word. (Ditto for equiv Excel or PowerPoint tasks). BUT when work mattered and was *vital* the human-in-the-loop contributing mattered and so did having the “strongest” tool.
22/ That’s why Office, _soul-sucking Office_ so to talk, remains to be utilized by lots of of thousands and thousands of individuals. It isn’t inertia however it is since the *vital* work is completed there and on the surface probability something vital must be done, those features matter.
23/ Many appear to think LLMs will “eliminate” jobs or wipe out whole swaths of creative work. I feel what is going to transpire will occur in two phases. First, all creation tools used can be augmented with LLM, and in a short time. Everyone will use these enhancements — human within the loop.
24/ Then over time latest tools will emerge that may subsume the old tools. That is what happened as typewriters replaced typesetting, which were replaced by character word processing, replaced by graphical, and only now are we seeing changes in writing — 35 yrs later.
25/ Key’s latest tools, built from the bottom up assuming LLMs, might want to get their footing doing an important work and additionally they must do the mundane. They must be approachable by the people doing the mundane and the vital. Otherwise they’re just point solns.
26/ Why things evolve this fashion is subtle. Within the work environment, there is no such thing as a shortage of “vital” — everyone thinks their work is importnt. Every department. Every creative task. There are limitless requirements or “needs” that can be thrown up as barriers to alter.
27/ Everyone’s job looks easy from the surface. But until you really attempt to automate a job out of existance or replace the workdlow or tooling of a job, you don’t really know what’s the human part or where the “work” happens.
28/ Most of all, once tools start getting utilized by more people then the necessity for what’s produced goes wat up. Meaning more people must do more work. Meaning expectations for what’s “good” or “vital” all go up. Latest tools don’t simply automate, they create work too.
29/ One final example. In physical sciences and math, writing used to require special typesetting (math, physics) and even draftsman (for molecular models, lab bench workflow). There have been humans in every company/dept that did this work.
30/ LaTeX, ChemDraw, and more got here along after which all researchers were creators. Papers had more digrams and math. Expectations for presenting information went up. And people draftsman/typesetters weren’t just eliminated but used these tools for much more advanced work.
31/ This shows that even in highly domain specific and advanced tooling, that massive improvements don’t simply make everyone’s job easier (or vanish) but add work for people — for experts — to do more, to create more, and most of all to be humans within the loop. // END
Adding to the thrill over LLMs and continuing the implications for productivity scenarios. LLMs represent the primary tech advancement that has a possible to seamlessly deploy across 7B smartphones and thus could be a platform shift. HUGE bull here. 1/
2/ Tech has been pining for a platform shift ~10 years as mobile/social/cloud settled in. What’s next? After all it needs to be huge — the entire planet adopted the present platform. By layering on top of M/S/C and never requiring a tool or legacy reset, the potential huge. Yay!
3/ A lot said already about “Disruption of Google”, “Leadership of Microsoft”, “Changing of the guard”, and general “disruption” or “whole latest world”. That is all premature or simply silly. OTOH the technology will occur. The “cat is out of the bag” and mass deployment is next.
4/ The opinions have polarized like every thing else in today’s society. The arguments “for” or “against” quickly amount to picking sides and preparing for battle. “Misinformation”, “Doesn’t Understand”, “Eliminates/reinvents all jobs” and more. Ugh.
5/ LLMs meet a crucial & vital but not sufficient criteria for platform shifts. They don’t yet work on a regular basis, boundary cases are plentiful. Recall from “Hardcore Software” first decade of PCs was literally “making them work”. The Web? security, broke links, etc.
6/ BUT BUT to dismiss some concerns is as naive as dismissing the exponential nature of change. There’s a ton of hype. One thing I’ve learned about AI over quite a lot of “winters” is the hype is as a consequence of “generalizing” what amounts to a major advance in some extent soln.
7/ The winters keep happening since the technologists and punditry are likely to take a single advance and generalize it. Like advances in programming languages, AI can indeed make one scenario easier and doesn’t must make all scenarios easier/possible.
8/ That said, LLMs represent a collision of sorts across two dimensions: LLMs are default
• optimistic/positive and the longer term of all software, productivity, creation
• skeptical/negative and wishes to be a priori constrained and made right into a “responsible” tech
9/ When in point of fact, most desire a broad middle: let’s all profit from advances in market. Those abusing or misusing it ought to be held accountable
Never before has there been a broadly available latest tech at such low price so quickly been presumed a net negative by so many. Sigh.
10/ There are three areas (A, B, C) to be rightfully concerned that include the presumed negatives and temper the blind optimism.
A very powerful aspect is that that is what creates opportunities for next generation corporations and makes all of it so difficult for incumbents.
11/ (A) LLMs, especially with respect to web search, fundamentally breaks or calls into query, the web (the online) itself. Yes.
The net relies on linking. Search relies on crawling. Entities that contribute to the online permit crawling for the advantage of being linked to.
12/ If LLMs simply use the crawling side of the web without returning links then the incentives to allow crawling and ultimately linking go away.
For the most important content sites which have subscriptions or can afford this it’s okay. But as we saw with news, almost none can.
13/ Today that is already a tension filled area. BUT it isn’t latest. After I managed Microsoft’s search efforts (2006–7) huge *effort* was going into properly creating snippets and competing with what became “Google Onebox”. Providing answers while not having to click.
14/ The *effort* was how one can avoid breaking the online, stealing content, and properly advancing fair use. This led to a series of industry changes similar to licensing content, partnerships, search “verticals”. And a failure of book scanning.
15/ So while everyone wants search to easily “tell me the reply” unless the search provider “knows” the reply or “bought” the reply, or it’s public domain, it may possibly’t simply unilaterally crawl the web to seek out it without cost.
16/ This results in (B) The legal system has no framework for this mass scale reuse. Is that this Betamax, Napster, YouTube/UGC? Copyright laws, private property, libel, fair use, derivative works, plagiarism, and more run right into each training LLMs and the output.
17/ The “cat is out of the bag” was the argument for the Betamax and later business skip. But those ended in a different way and importantly home use may be very different than the “storage, rebroadcast, retransmission” of another person’s data for MASS consumption.
18/ No amount of “needle threading” too-clever lawyering will change the truth that training is making a replica of another person’s data, using it, and even when the precise copy is discarded a derivative work is produced.
18A/ Imagine if every result got here with every source that contributed to prompt output — almost a debug view. Is that legit “sourcing” or essentially a denial of service attack for the legal system?
19/ In junior highschool all of us learned: first, you may’t copy the Greek history section of an encyclopedia and call it a report on Sparta. Second, you may’t simply move the words around and add a sentence from one other work if many of the report was from World Book.
20/ Many lauded Bing’s use of footnotes and sources. Putting aside future role paid placement can have in those, one must be incredibly clear that these will not be at all times sources in a verifiable/legit sense. You possibly can’t source something and say something different.
21/ But wait, 1) there’s a link so go figure it out. And a pair of) its just a pc algorithm taking text, rewriting it, adding other stuff, and so forth similar to an individual. EXACTLY, and an individual (especially a money-making one) SHOULD BE RESPONSIBLE.
22/ Up to now 25 Section 230 years an individual could post something on the web and wherever they posted it didn’t count because the writer. Now we now have a primary party writer, human directly or not, and that is strictly what isn’t protected.
23/ After all big corporations know this, and so going through third party APIs or using other third parties to do scraping, rewriting, etc. seem to be a type of legal insulation. Read the Getty v. Stability suit for exactly this reason.
24/ YouTube when it was acquired was viewed as a liability for google due to copyright issues with UGC. Over time YouTube became a model for the legit protection of IP. It took a decade. And YT may be very different now than it was then in that regard.
25/ There’s no getting across the incontrovertible fact that one site can’t simply move the words around and supply a link and claim that’s just “research” or “fair use” or “derivative work”. There’s 200 yrs of law and constitutional protection at work.
26/ Relating to “disruption” a giant a part of my belief is that Google spent 20+ years wrestling with this topic while also balancing business needs/desires. Their caution is the results of the truth they experienced.
27/ This results in the third big area of concern (C) and that’s the notion of “responsible AI” which I feel will result in a major tempering of output but additionally a *HUGE* missed opportunity to make the world’s knowledge more accessible.
28/ “Responsible AI” is the primary time a technology spawned an almost police motion before it was even deployed — primarily coming about throughout the early days of image recognition. Imagine if we had locked down early PCs a la Trustworthy Computing, but in 1985 or the web in… twitter.com/i/web/status/1…
29/ Essentially the most big shot corporations of the US and the CEOs have created “Policy Recommendations for Responsible Artificial Intelligence” where before any real use/deployment they already called on congress to manage AI [sic]. s3.amazonaws.com/brt.org/Busine…
30/ These after all appear “good” but they can not possibly survive the complexity of data, knowledge, scientific peer review, political parties, school boards, and in addition the world of what’s deemed “acceptable” at any given time.
31/ Much of Reddit has been consumed attempting to get Bing or OpenAI to say bad words or worse some “cancelable” offense. Because it seems this isn’t difficult. Worse, it is simple to stumble into those crossed with clear factual errors.
32/ The primary answers to those problems can be to retreat and only say things which are “established facts” and “acceptable in today’s context”. As we all know, humans will not be allowed to say bad things even in the event that they caveat them with “that is how people talked” or “I’m quoting”.
33/ Essentially the most mundane topics change into off limits or “not definitely worth the risk”. Even in a business context, that is enormously difficult. I can’t even make a whole list of all of the times I handled spelling dictionaries, maps, clip art, and even fonts that were deemed “irresponsible”.
34/ Quite simply, the entire idea of “default responsible” in the case of generating content based on user questions without human review of each input and output is unsolvable. There needs to be room for mistakes, offenses, or worse.
35/ But how can that be with a default commitment to responsible. Worse, even when legal liability is removed, even when w/a EULA/wavier/consent box, no entity wants the limitless/ongoing PR crisis of the day each time a news event causes a latest wave of prompts and generated answers.
36/ This commitment from CEOs/lawyers/comms/HR is an invite to a priori regulate AI. In some ways that is the worst position to be in — -regulating something before it has even really been invented. They asked for it to occur promising only the “responsible” output.
37/ So together with the present legal framework needing adjusting to account for an unprecedented scale of automated “use” (fair or otherwise), the notion of “responsible AI” will must be revisited lest it twist itself around every side of each issue.
38/ This isn’t “trustworthy computing” because that was binary — protect from bad people even on the expense of usability as we stated. This can be a proactive agenda designed to appease a subset of consumers and might’t possibly please all constituents of each issue.
39/ R AI is far more like Google’s IPO promise of “Don’t be Evil”. There was great skepticism about that on the time, and great hope. The skeptics were proven right since the world is complex and murky and unknown, not only good and evil.
40/ What does this mean then in practice. First, big corporations are going to find yourself continuing to constrain the scenarios and “sanitize” the output. Huge swathes of content will simply not exist for fear of being “irresponsible” “bad PR” or illegal (or potentially so).
41/ Big corporations will find yourself specializing in mundane results, especially in search, that may effectively provide a greater expression of “OneBox” answers for known topics with scrubbed inputs, prompt kill-lists, hand-coded default responses, apologies, etc.
42/ Second, productivity tools and use cases for LLMs will find yourself specializing in far more narrow cases of mundane and repetitive work. This is figure where LLMs are principally improved grammar/spelling/templates for common interactions.
43/ The largest barrier to make use of a basic Word template has been just customizing it to the precise customer/use context without breaking grammar. LLMs make this easy.
44/ LLMs can be worthwhile to some extent for summarizing first party content, improving first party writing, and even modifying first party images using available/licensed images (eg, show this photo of our latest product getting used on an airplane)
45/ Unfortunately, these will not be the “vital” cases. These won’t drive whole latest layers of productivity tools. They can be great additions to existing workflows and tools, similar to CS or CRM tools.
46/ Therein is the massive opportunity: latest tools that approach hard problems and high value prompts AND also from outset working throughout the developed legal framework while benefiting from the world’s knowledge. Those have an enormous advantage.
Invent. Take Risk BigCo can’t. //END
PS/ Love this instance. Shows how the much lauded notion of using sources only makes a generated answer more authoritative when in actual fact it isn’t. This can be a trivial compilation. Generated compilations/summaries ought to be noted “…in line with the intern with no domain knowledge”.
Microsoft limits Bing chat to 5 replies to stop the AI from getting real weird theverge.com/2023/2/17/2360…//NOOO. IMO a major strategy error misreading failure of past week & over-correcting. This compounds the error of conflating LLMs, Bing, and Search generally. 1/
Microsoft limits Bing chat to 5 replies to stop the AI from getting real weirdIf you refer to the AI too long, it’d inform you it loves you.
2/ With OpenAI we *broadly* experienced the brand new wonders of a generative text platform. It gave us all a taste of a complete latest type of creativity — generative creativity. On this thread I argued for the importance of “human within the loop” for productivity. (see above)
3/ Many individuals began to see a terrific deal of “fun” (or entertainment or wasting time or broadly creativity) by careful and crazy prompt engineering. ChatBot has the makings of a complete latest type of tool, however it was very early.
4/ Many were very quick to give attention to errors, hallucinations, and crazy side of what was generated. Much of existing AI researchers from big tech were piled on seeing limitations as risks. After all “Responsible AI” chimed in over concerns where OpenAI had put mitigations in place.
5/ MS selected to construct on top of OpenAI with an extra layer but way more importantly positioned their additions as a reinvention of web search and as a disruptive force unlike anything Google had seen before. Microsoft v. Google, the rematch. Really? Chat reinvents search?
6/ I’m skipping over Google but my thread below goes into view that Google had been wrestling with AI and attempting to be *accurate* for many years and their success in search led to caution. This might feel like innovator’s dilemma. Or if could just be prudent. (See above)
7/ After 48 hours prudent looked correct. Bing with LLM morphed into Sydney and the thrill was quickly replaced by limitless stories of crazy talk, uncanny experiences with Sydney, and a bunch of responsible AI problems.
8/ Then the AI researchers at other corporations were quick to indicate that this was all expected. Again, this anchoring was on accuracy, truth, tone, guardrails, and all of the things expected from “computers are at all times right” and “search is for locating facts:”
9/ So now we’ll see limitless punditry cycles about how AI was not ready and we’d like far more work on responsibility. The above limits on Bing are step one. All a direct results of placing LLMs in context of search and productivity.
Is that this MS *causing* the following AI Winter?
10/ In Search domains, the buyer expectation is consistent with “computers” which is accuracy, emotionless, predictable, non-biased and so forth. Sydney was, by virtue of engineering, NONE of those things. It was designed to surface patterns and combos not yet imagined.
11/ In that sense, and that sense specifically, that Sydney was a fun and novel breakthrough even over what all of us had just began to soak up with ChatBot. That is where the misread/misfire starts to disclose itself. What was built was never going to be good at search.
12/ Search was primary problem MSFT selected to unravel — burning capital, probably not profitable, plus it was 20 yrs of flailing (I managed team Christopher Payne created and led in 2006–7, then Satya, then Qi Lu, etc) No surprise, MS might want payback as Google continued to *thrive*.
13/ It makes some sense to have seen LLMs as a broad extension to “answers” as I described in previous thread. But there was a technology mismatch — LLMs are nowhere near ready to supply definitive answers. Google has been fighting this for many years, hence conservatism.
14/ To me that is an example of a technology mismatch — in time, positioning, and even company. — despite the fact that the concept/idea is strictly right. It’s the way in which that Windows CE phones were the appropriate concept but entirely flawed implementation, flawed time, flawed company.
15/ A lot of essentially the most successful tech corporations have collected a big library of right concept, but flawed technology base, flawed time, flawed approach. Often that is viewed as “too early”
After all on this context CLIPPY is a incredible example. You’re welcome.
16/ The thing about these situations is that fans and people involved can years later paint a positive picture of being too early. But really it was just flawed, like Apple Newton flawed, like Windows 8 flawed. Too many elements needed to align to make this right.
17/ Now, and I mean it, I’m not attempting to be negative concerning the work. There’s amazing work in ChatBot and Sydney was probably the most [unintentionally surprising] innovations in an extended time. Like those products above, too many can be quick to shut the books on it. DO NOT.
18/ The correct answer is to take heed to the experiment and market. Now isn’t the time to maneuver on (as was done with those others). Now could be the appropriate time to, for the shortage of no higher word, pivot. Sydney isn’t about answers. Sydney isn’t the recipe for Bing to outflank Google/Ads.
19/ Sydney has potential to be a latest sort of tool. Mix Sydney w/generative images, audio, video, and it’s the genesis of a complete latest era of tooling. The hallucinations, bias, randomness, and crazy are FEATURES NOT BUGS. They’re what make it a completely latest creative tool.
20/ After we look back on platforms that were successful, most every thing that some viewed as flaws and bugs turned out to be the features or the engineering constraints that legit turned a unusual thing right into a platform.
21/ The browser *not* having the rendering power of Word was a feature. Lacking a security model was a feature. Lacking centralization was a feature. Broken links led to a complete series of inventions. The fragility of the PC in comparison with “IBM” unleashed innovation. And on and on.
22/ Every part we’ve seen previously week or so is horrible if the goal was precise, correct, “Responsible AI approved” answers. Unsuitable tool, flawed user model, flawed positioning. Ten blue links + targeted ads are way superior — and least for now and foreseeable future.
23/ However the industry has not seen a latest creative tool along the lines of Sydney in an extended time — since possibly JPEG. Now can be a terrific time to unleash the ability of this creativity and see what’s created. It’d just be the following PC game, Netflix series, or possibly the metaverse.
24/ Is that this all nutty crazy talk? Perhaps, however it was a human generated argument so it has those flaws. It’s a response to a misunderstood failure, a method that wasn’t right at first, and a what’s definitely to be a [predictable] over correction, and so forth. // END
PS/ Now could be the time to double-down. Separate from Search. Make Sydney itself more available, cheaper. Construct more tooling. Find more developers. Add images, sound, motion, animation. Construct a generative future.
PPS/ One the hallmarks (vital, not sufficient) of platform shifts is that a set of individuals emerges (and grows quickly) willing to make the brand new technology a hobby — to explore, poke, break, learn. To search out limits. To create.
Sydney most definitely had those qualities.
PPPS/ here Sydney is essentially writing a script for a latest streaming series featuring a belligerent character forced into celebrity and has accountability challenged by equally belligerent power brokers. 🤔
A summary of potential reasons Bing went so off the rails relative to expectations. My view is that this only further emphasizes the mismatch between the technology and the scenario chosen. “Why *is* Bing so reckless?” @GaryMarcus
“One side felt that web search should remain the way in which it’s while the opposite pushed for a chat-based interface. Ultimately, Microsoft decided to have each methods of search available and permit people to modify backwards and forwards easily.” // After all they did.
Your point of view caught my eye and was very interesting. Thanks. I have a question for you.