Having your cake and eating it too: How Vizio built a next-generation data platform to enable BI reporting, real-time streaming, and AI/ML

-

: Parveen Jindal, Darren Liu, Alina Smirnova

VIZIO is the leading Smart TV brand in the USA, harnessing data from our Smart TV’s to power our platform business and create engaging experiences for our customers. As a pacesetter in the info & analytics space, we’ve made great strides in innovating the watching experience. Now as we considered the long run needs of our businesses, we migrated to the Databricks Lakehouse to support our rapid growth.

Before Databricks Lakehouse, we had no single platform to run a knowledge as a service business at large scale which requires ingesting and processing data in real time from hundreds of thousands of TVs. So we got creative by stitching together many data services and leveraging a knowledge warehouse to power our business. It was an excellent system, but as the info volumes and the number of recent features the business desired to add grew, managing this method became prohibitively expensive and time consuming to administer.

Moreover, it might have been a large undertaking to bolt on a separate real-time streaming and production ML system on top of our current Data Warehouse to support recent features. This may’ve required us to construct these systems from scratch and taking data out and governing it (together with any models) outside of the Data Warehouse entirely.

First, we identified a slew of options for standardizing our future platform. We evaluated the next options:

● Staying on our current Data Warehouse + homegrown solutions

● Moving to a Data Warehouse (together with using DBT, Airflow, an ML platform, a separate streaming layer, etc.)

● Self-hosting Spark and the relevant other services needed

● Databricks Lakehouse

Just about all solutions were infeasible and easily created different “Frankenstein architectures” that might force us down the identical path again.

TL;DR — Databricks was the and most solution of any we tested. With the opposite Data Warehouse vendors we considered, we might’ve had to construct our own systems for real-time streaming, exploratory data science, orchestration, and production MLops. Databricks offered the total gamut of aforementioned tooling, which enabled us to get to production quickly and manage the environment easily.

Here were the first criteria that drove our decision:

● — Databricks is built on open-source components akin to Spark, Delta Lake, and MLflow, that are battle-tested industry standard projects with years of support

● — We’re processing 100s of TB’s of knowledge a day, having a platform that’s robust enough to handle this scale to maintain our business running was paramount.

○ Databricks with Photon was in a position to provide us excellent performance for our Join-heavy workloads, but on the info lake with open table formats, and with costs growing linearly to data growth, even at massive scale.

○ Specifically, Databricks Photon proved to be 3X faster for our needs than other data warehouse vendors. This gave us confidence that the system could scale well.

● — Running a platform at this scale, keeping costs in line could be very essential. Databricks enabled us to scale our costs linearly as our data grew and ensure we were running the platform in probably the most optimal fashion.

○ Specifically, Databricks is the one vendor we tested that enabled us to “form fit” compute to the correct use case. For instance, for best performance on ETL we required compute optimized instances for higher parallelism in transformations, and for business ready datasets that were join heavy, storage optimized instances were best, and memory optimized instances were best for our real-time streaming workloads. Other Data Warehouse vendors we considered offered either monolithic clusters or a T-shirt sizing model, neither of which give us any of that optionality.

○ Due to Databricks Photon, we now have a viable path to cut back our costs as much as 32% in comparison with the opposite options we evaluated.

○ Also, on account of the decoupled compute and storage architecture, we could scale our costs linearly to data growth.

● — since we’re a knowledge forward company, scaling our ML practice was very essential for us.

○ We would have liked an answer that offered a Multi-language Notebook environment for exploratory data evaluation and have engineering, Automated experiment tracking and governance, multi-node model training, Production grade Model Deployment for real-time inference, and Feature store to facilitate the re-use of features across the business.

● — Our business requirements demanded increased freshness of knowledge, only which a streaming architecture could provide. Since we’ve hard SLAs to hit, it was critical to give you the option to manage the frequency of micro-batches. Databricks met all these criteria nicely.

Ultimately, Databricks was the one platform that would handle ETL, monitoring, orchestration, streaming, ML, and Data Governance on a . Not only was Databricks SQL + Delta in a position to run queries faster on real-world databut we now not needed to purchase other services simply to run the platform and add features in the long run. This made the choice to maneuver to a Lakehouse architecture very compelling for solving our current challenges and while setting ourselves up for achievement on our future product roadmap.

As we’re actively transitioning in 2023, the advantages of Databricks Lakehouse were palpable. Our core ETL pipelines that were once hard to administer and scaling poorly, are actually robust pipelines in Databricks Workflows that drive Structured Streaming jobs with a totally visible pipeline.

What was once a manually managed series of our Current Data Warehouse monolithic batch loads, is now a totally elastic job running on Ephemeral compute that grows and shrinks to precisely the correct capability for that job routinely.

For instance, in a single job, all parts of the Databricks Lakehouse work seamlessly with each other akin to:

● — All tables are open Delta Tables which can be performant and straightforward to administer. With ZORDER, Auto compaction, time travel, and more, we’ve a totally governed Lakehouse in an open format.

● — A native, extremely robust orchestrator built right into the platform at no extra cost. This provides alerting, conditional task orchestration, and automatic cluster management to make running the platform incredibly easy. All compute in Workflows (Jobs Compute) is ephemeral and autoscaling, which drastically reduces costs since it could routinely fit the compute must your exact problem at hand on runtime. This native orchestration is sort of not possible to do on other Data Warehouses without adding third party tools.

● — All pipelines are actually Structured Streaming jobs that provide automatic statement management, failure recovery, incremental processing, and throughput management. Now as a substitute of brittle hourly batch logic in python, all we want to do to get data faster is change the trigger interval of our pipeline, and Structured Streaming handles the remaining. This makes broken state a thing of the past for our team.

● — Any pipelines will be built directly in notebooks and immediately scheduled as production jobs, making time to market twice as fast without sacrificing governance. Now that Databricks offers IDE support, we’ve one of the best of each worlds.

● — Our ETL is complex, and Databricks’ Photon engine made it possible to not only run our pipeline faster, but additionally less expensive than our previous solution on our Data Warehouse. Before Photon, this sort of performance for data warehousing style workload (think a lot of joins/groupings/transformations) was simply impossible on an open data lake.

● — Databricks’ Serverless native warehousing offering runs our data quality system that sends automatic alerts, creates native data quality profile dashboards, and allows users to perform Ad Hoc SQL analytics directly on their Delta Lake just like all other Cloud warehouse with easy startup and shut down.

:

Putting all this together, we now have an architecture that gives us a chance to consolidate various data platform use cases (BI, AI, Streaming) with one unified platform, with linear scaling of costs, has full observability, automated state management, scales well, and sets us up for achievement for our future plans for more pioneering advanced analytics products:

Not only are we set as much as grow our business, but our engineers are happier, more productive, and might now concentrate on staying on the bleeding edge Smart TV innovation.

admin

What are your thoughts on this topic?
Let us know in the comments below.

Subscribe
Notify of
guest
0 Comments
Inline Feedbacks
View all comments

Share this article

Recent posts

Could We Achieve AGI Inside 5 Years? NVIDIA’s CEO Jensen Huang Believes It’s Possible

Within the dynamic field of artificial intelligence, the search for Artificial General Intelligence (AGI) represents a pinnacle of innovation, promising to redefine the interplay...

MS reveals a part of 'Customized Co-Pilot'… “Testing in progress… coming soon”

A few of the 'Customized Co-Pilot' that Microsoft (MS) announced in January has been released. In addition they announced that they plan to...

Impact of Rising Sea Levels on Coastal Residential Real Estate Assets

Using scenario based stress testing to discover medium (2050) and long run (2100) sea level rise risksThis project utilizes a scenario based qualitative stress...

Create a speaking and singing video with a single photo…”Produce mouth shapes, facial expressions, and movements.”

https://www.youtube.com/watch?v=9KuCy0W5s4o Alibaba introduced a man-made intelligence (AI) system that creates realistic speaking and singing videos from a single photo. It's the follow-up to the...

Recent comments

binance us registrácia on The Path to AI Maturity – 2023 LXT Report
Do NeuroTest work on The Stacking Ensemble Method
AeroSlim Weight loss price on NIA holds AI Ethics Idea Contest Awards Ceremony
skapa binance-konto on LLMs and the Emerging ML Tech Stack
бнанс рестраця для США on Model Evaluation in Time Series Forecasting
Bonus Pendaftaran Binance on Meet Our Fleet
Créer un compte gratuit on About Me — How I give AI artists a hand
To tài khon binance on China completely blocks ‘Chat GPT’
Regístrese para obtener 100 USDT on Reducing bias and improving safety in DALL·E 2
crystal teeth whitening on What babies can teach AI
binance referral bonus on DALL·E API now available in public beta
www.binance.com prihlásení on Neural Networks and Life
Büyü Yapılmışsa Nasıl Bozulur on Introduction to PyTorch: from training loop to prediction
yıldızname on OpenAI Function Calling
Kısmet Bağlılığını Çözmek İçin Dua on Examining Flights within the U.S. with AWS and Power BI
Kısmet Bağlılığını Çözmek İçin Dua on How Meta’s AI Generates Music Based on a Reference Melody
Kısmet Bağlılığını Çözmek İçin Dua on ‘이루다’의 스캐터랩, 기업용 AI 시장에 도전장
uçak oyunu bahis on Thanks!
para kazandıran uçak oyunu on Make Machine Learning Work for You
medyum on Teaching with AI
aviator oyunu oyna on Machine Learning for Beginners !
yıldızname on Final DXA-nation
adet kanı büyüsü on ‘Fake ChatGPT’ app on the App Store
Eşini Eve Bağlamak İçin Dua on LLMs and the Emerging ML Tech Stack
aviator oyunu oyna on AI as Artist’s Augmentation
Büyü Yapılmışsa Nasıl Bozulur on Some Guy Is Trying To Turn $100 Into $100,000 With ChatGPT
Eşini Eve Bağlamak İçin Dua on Latest embedding models and API updates
Kısmet Bağlılığını Çözmek İçin Dua on Jorge Torres, Co-founder & CEO of MindsDB – Interview Series
gideni geri getiren büyü on Joining the battle against health care bias
uçak oyunu bahis on A faster method to teach a robot
uçak oyunu bahis on Introducing the GPT Store
para kazandıran uçak oyunu on Upgrading AI-powered travel products to first-class
para kazandıran uçak oyunu on 10 Best AI Scheduling Assistants (September 2023)
aviator oyunu oyna on 🤗Hugging Face Transformers Agent
Kısmet Bağlılığını Çözmek İçin Dua on Time Series Prediction with Transformers
para kazandıran uçak oyunu on How China is regulating robotaxis
bağlanma büyüsü on MLflow on Cloud
para kazandıran uçak oyunu on Can The 2024 US Elections Leverage Generative AI?
Canbar Büyüsü on The reverse imitation game
bağlanma büyüsü on The NYU AI School Returns Summer 2023
para kazandıran uçak oyunu on Beyond ChatGPT; AI Agent: A Recent World of Staff
Büyü Yapılmışsa Nasıl Bozulur on The Murky World of AI and Copyright
gideni geri getiren büyü on ‘Midjourney 5.2’ creates magical images
Büyü Yapılmışsa Nasıl Bozulur on Microsoft launches the brand new Bing, with ChatGPT inbuilt
gideni geri getiren büyü on MemCon 2023: We’ll Be There — Will You?
adet kanı büyüsü on Meet the Fellow: Umang Bhatt
aviator oyunu oyna on Meet the Fellow: Umang Bhatt
abrir uma conta na binance on The reverse imitation game
código de indicac~ao binance on Neural Networks and Life
Larry Devin Vaughn Wall on How China is regulating robotaxis
Jon Aron Devon Bond on How China is regulating robotaxis
otvorenie úctu na binance on Evolution of Blockchain by DLC
puravive reviews consumer reports on AI-Driven Platform Could Streamline Drug Development
puravive reviews consumer reports on How OpenAI is approaching 2024 worldwide elections
www.binance.com Registrácia on DALL·E now available in beta