Forget chat. AI that may hear, see, and click on is already here.

Exhibit A: Google’s NotebookLM. NotebookLM is a research tool the corporate launched with little fanfare a yr ago. Just a few weeks ago, Google added an AI podcasting tool called Audio Overview to NotebookLM, which allows users to create podcasts about anything. Add a link to, for instance, your LinkedIn profile, and the AI podcast hosts will boost your ego for nine minutes. The feature has change into a surprise viral hit. I wrote about all of the weird and amazing ways individuals are using it here.

To present you a taste, I created a podcast of our One hundred and twenty fifth-anniversary magazine issue. The AI does a terrific job of picking some highlights from the magazine and providing you with the gist of what they’re about. Have a listen below.

Multimodal generative content has also change into markedly higher in a really short time. In September 2022, I covered Meta’s first text-to-video model, Make-A-Video. Next to today’s technology, those videos look clunky and silly. Meta just announced its competitor to OpenAI’s Sora, called Movie Gen. The tool allows users to make use of text prompts to create custom videos and sounds, edit existing videos, and make images into videos.

The best way we interact with AI systems can also be changing, becoming less reliant on text. OpenAI’s latest Canvas interface allows users to collaborate on projects with ChatGPT. As a substitute of counting on a conventional chat window, which requires users to do several rounds of prompting and regenerating text to get the specified result, Canvas allows people to pick out bits of text or code to edit.

Even search is getting a multimodal upgrade. Along with inserting ads into AI overviews, Google has rolled out a brand new feature where users can upload a video and use their voice to go looking for things. In a demo at Google I/O, the corporate showed how you possibly can open the Google Lens app, take a video of fish swimming in an aquarium, and ask a matter about them. Google’s Gemini model will then search the net and give you a solution in the shape of Google’s AI summary.

What unites these features is a more interactive, customizable interface and the power to use AI tools to numerous several types of source material. NotebookLM was the primary AI product shortly that brought me wonder and delight, partly due to how different, realistic, and unexpected the AI voices were. However the incontrovertible fact that NotebookLM’s Audio Overviews became a success despite being a side feature hidden inside a much bigger product just goes to point out that AI developers don’t really know what they’re doing. Hard to imagine now, but ChatGPT itself was an unexpected hit for OpenAI.

We’re a few years into the multibillion-dollar generative AI boom. The massive investment in AI has contributed to rapid improvement in the standard of the resulting content. But we’ve yet to see a killer app, and these latest multimodal applications are a results of the immense pressure AI firms are under to earn money and deliver. Tech firms are throwing different AI tools at people and seeing what sticks.

Deeper Learning

AI-generated images can teach robots the best way to act

Forget chat. AI that may hear, see, and click on is already here.

Deeper Learning

What are your thoughts on this topic?
Let us know in the comments below.

Share this article

Recent posts

Constructing a Python Workflow That Catches Bugs Before Production

OpenClaw gives users yet another excuse to be freaked out about security

Working to advance the nuclear renaissance

DenseNet Paper Walkthrough: All Connected

I Replaced Vector DBs with Google’s Memory Agent Pattern for my notes in Obsidian

Forget chat. AI that may hear, see, and click on is already here.

Deeper Learning

What are your thoughts on this topic? Let us know in the comments below.

Share this article

Recent posts

What are your thoughts on this topic?
Let us know in the comments below.