As LLMs proceed to evolve, they have gotten smaller and smarter, enabling them to run directly in your phone. Take, for example, the DeepSeek R1 Distil Qwen 2.5 with 1.5 billion parameters, this model really shows how advanced AI can now fit into the palm of your hand!
On this blog, we’ll guide you thru making a mobile app that means that you can chat with these powerful models locally. The entire code for this tutorial is out there in our EdgeLLM repository. For those who’ve ever felt overwhelmed by the complexity of open-source projects, fear not! Inspired by the Pocket Pal app, we’ll provide help to construct an easy React Native application that downloads LLMs from the Hugging Face hub, ensuring all the pieces stays private and runs in your device. We are going to utilize llama.rn, a binding for llama.cpp, to load GGUF files efficiently!
Why You Should Follow This Tutorial?
This tutorial is designed for anyone who:
- Is fascinated about integrating AI into mobile applications
- Desires to create a conversational app compatible with each Android and iOS using React Native
- Seeks to develop privacy-focused AI applications that operate entirely offline
By the top of this guide, you should have a totally functional app that means that you can interact together with your favorite models.
0. Selecting the Right Models
Before we dive into constructing our app, let’s speak about which models work well on mobile devices and what to contemplate when choosing them.
Model Size Considerations
When running LLMs on mobile devices, size matters significantly:
- Small models (1-3B parameters): Ideal for many mobile devices, offering good performance with minimal latency
- Medium models (4-7B parameters): Work well on newer high-end devices but may cause slowdowns on older phones
- Large models (8B+ parameters): Generally too resource-intensive for many mobile devices, but may be used if quantized to low precision formats like Q2_K or Q4_K_M
GGUF Quantization Formats
When downloading GGUF models, you may encounter various quantization formats. Understanding these can provide help to select the appropriate balance between model size and performance:
Legacy Quants (Q4_0, Q4_1, Q8_0)
- Basic, straightforward quantization methods
- Each block is stored with:
• Quantized values (the compressed weights).
• One (_0) or two (_1) scaling constants. - Fast but less efficient than newer methods => not used widely anymore
K-Quants (Q3_K_S, Q5_K_M, …)
- Introduced on this PR
- Smarter bit allocation than legacy quants
- The K in “K-quants” refers to a mixed quantization format, meaning some layers get more bits for higher accuracy.
- Suffixes like _XS, _S, or _M discuss with specific mixes of quantization (smaller = more compression), for instance :
• Q3_K_S uses Q3_K for all tensors
• Q3_K_M uses Q4_K for the eye.wv, attention.wo, and feed_forward.w2 tensors, and Q3_K for the remaining.
• Q3_K_L uses Q5_K for the eye.wv, attention.wo, and feed_forward.w2 tensors, and Q5_K for the remaining.
I-Quants (IQ2_XXS, IQ3_S, …)
- It still uses the block-based quantization, but with some latest features inspired by QuIP
- Smaller file sizes but could also be slower on some hardware
- Best for devices with strong compute power but limited memory
Beneficial Models to Try
Listed below are some models that perform well on mobile devices:
Finding More Models
To seek out additional GGUF models on Hugging Face:
- Visit huggingface.co/models
- Use the search filters:
- Visit the GGUF models page
- Specify the dimensions of the model within the search bar
- Search for “chat” or “instruct” within the name for conversational models
When choosing a model, consider each the parameter count and the quantization level. For instance, a 7B model with Q2_K quantization might run higher than a 2B model with Q8_0 quantization. So if you happen to can fit a small model comfortably in your device try to make use of a much bigger quantized model as a substitute, it might need a greater performance.
1. Setting Up Your Environment
React Native is a preferred framework for constructing mobile applications using JavaScript and React. It allows developers to create apps that run on each Android and iOS platforms while sharing a major amount of code, which quickens the event process and reduces maintenance efforts.
Before you possibly can start coding with React Native, you want to arrange your environment properly.
Tools You Need
-
Node.js: Node.js is a JavaScript runtime that means that you can run JavaScript code. It is important for managing packages and dependencies in your React Native project. You’ll be able to install it from Node.js downloads.
-
react-native-community/cli: This command installs the React Native command line interface (CLI), which provides tools to create, construct, and manage your React Native projects. Run the next command to put in it:
npm i @react-native-community/cli
Virtual Device Setup
To run your app during development, you have to an emulator or a simulator:
For those who are interested by the difference between simulators and emulators, you possibly can read this text: Difference between Emulator and Simulator, but to place it simply, emulators replicate each hardware and software, while simulators only replicate software.
For establishing Android Studio, follow this excellent tutorial by Expo : Android Studio Emulator Guide
2. Create the App
Let’s start this project!
You could find the complete code for this project within the EdgeLLM repo here, there are two folders:
EdgeLLMBasic: A basic implementation of the app with an easy chat interfaceEdgeLLMPlus: An enhanced version of the app with a more complex chat interface and extra features
First, we’d like to initiate the app using @react-native-community/cli:
npx @react-native-community/cli@latest init
Project Structure
App folders are organized as follows:
Default Files/Folders
-
android/- Incorporates native Android project files
- Purpose: To construct and run the app on Android devices
-
ios/- Incorporates native iOS project files
- Purpose: To construct and run the app on iOS devices
-
node_modules/- Purpose: Holds all npm dependencies utilized in the project
-
App.tsx- The predominant root component of your app, written in TypeScript
- Purpose: Entry point to the app’s UI and logic
-
index.js- Registers the foundation component (
App) - Purpose: Entry point for the React Native runtime. You needn’t modify this file.
- Registers the foundation component (
Additional Configuration Files
tsconfig.json: Configures TypeScript settingsbabel.config.js: Configures Babel for transpiling modern JavaScript/TypeScript, which implies it can convert modern JS/TS code to older JS/TS code that’s compatible with older browsers or devices.jest.config.js: Configures Jest for testing React Native components and logic.metro.config.js: Customizes the Metro bundler for the project. It is a JavaScript bundler specifically designed for React Native. It takes your project’s JavaScript and assets, bundles them right into a single file (or multiple files for efficient loading), and serves them to the app during development. Metro is optimized for fast incremental builds, supports hot reloading, and handles React Native’s platform-specific files (.ios.js or .android.js)..watchmanconfig: Configures Watchman, a file-watching service utilized by React Native for decent reloading.
3. Running the Demo & Project
Running the Demo
To run the project, and see the way it looks like on your personal virtual device, follow these steps:
-
Clone the Repository:
git clone https://github.com/MekkCyber/EdgeLLM.git -
Navigate to the Project Directory:
cd EdgeLLMPlus cd EdgeLLMBasic -
Install Dependencies:
npm install -
Navigate to the iOS Folder and Install:
cd ios pod install -
Start the Metro Bundler:
Run the next command within the project folder (EdgeLLMPlus or EdgeLLMBasic):npm start -
Launch the App on iOS or Android Simulator:
Open one other terminal and run:npm run ios npm run android
This can construct and launch the app in your emulator/simulator to check the project before we start coding.
Running the Project
Running a React Native application requires either an emulator/simulator or a physical device. We’ll deal with using an emulator because it provides a more streamlined development experience together with your code editor and debugging tools side by side.
We start by ensuring our development environment is prepared, we must be within the project folder and run the next commands:
npm install
npm start
In a brand new terminal, we’ll launch the app on our chosen platform:
npm run ios
npm run android
This could construct and launch the app in your emulator/simulator.
4. App Implementation
Installing Dependencies
First, let’s install the required packages. We aim to load models from the Hugging Face Hub and run them locally. To realize this, we’d like to put in :
llama.rn: a binding forllama.cppfor React Native apps.react-native-fs: allows us to administer the device’s file system in a React Native environment.axios: a library for sending requests to the Hugging Face Hub API.
npm install axios react-native-fs llama.rn
Let’s run the app on our emulator/simulator as we showed before so we will start the event
State Management
We are going to start by deleting all the pieces from the App.tsx file, and creating an empty code structure like the next :
App.tsx
import React from 'react';
import {StyleSheet, Text, View} from 'react-native';
function App(): React.JSX.Element {
return <View> <Text>Hello WorldText> View>;
}
const styles = StyleSheet.create({});
export default App;
Contained in the return statement of the App function we define the UI rendered, and out of doors we define the logic, but all code will likely be contained in the App function.
We may have a screen that appears like this:
The text “Hello World” isn’t displayed properly because we’re using an easy View component, we’d like to make use of a SafeAreaView component to display the text accurately, we’ll cope with that in the following sections.
Now let’s take into consideration what our app needs to trace for now:
-
Chat-related:
- The conversation history (messages between user and AI)
- Current user input
-
Model-related:
- Chosen model format (like Llama 1B or Qwen 1.5B)
- Available GGUF files list for every model format
- Chosen GGUF file to download
- Model download progress
- A context to store the loaded model
- A boolean to ascertain if the model is downloading
- A boolean to ascertain if the model is generating a response
Here’s how we implement these states using React’s useState hook (we’ll have to import it from react)
State Management Code
import { useState } from 'react';
...
type Message = 'assistant';
content: string;
;
const INITIAL_CONVERSATION: Message[] = [
{
role: 'system',
content:
'This is a conversation between user and assistant, a friendly chatbot.',
},
];
const [conversation, setConversation] = useState<Message[]>(INITIAL_CONVERSATION);
const [selectedModelFormat, setSelectedModelFormat] = useState<string>('');
const [selectedGGUF, setSelectedGGUF] = useState<string | null>(null);
const [availableGGUFs, setAvailableGGUFs] = useState<string[]>([]);
const [userInput, setUserInput] = useState<string>('');
const [progress, setProgress] = useState<number>(0);
const [context, setContext] = useState<any>(null);
const [isDownloading, setIsDownloading] = useState<boolean>(false);
const [isGenerating, setIsGenerating] = useState<boolean>(false);
This will likely be added to the App.tsx file contained in the App function but outside the return statement because it’s a part of the logic.
The Message type defines the structure of chat messages, specifying that every message should have a job (either ‘user’ or ‘assistant’ or ‘system’) and content (the actual message text).
Now that we’ve got our basic state management arrange, we’d like to take into consideration easy methods to:
- Fetch available GGUF models from Hugging Face
- Download and manage models locally
- Create the chat interface
- Handle message generation
Let’s tackle these one after the other in the following sections…
Fetching available GGUF models from the Hub
Let’s start by defining the model formats our app goes to support and their repositories. After all llama.rn is a binding for llama.cpp so we’d like to load GGUF files. To seek out GGUF repositories for the models we would like to support, we will use the search bar on Hugging Face and seek for GGUF files for a selected model, or use the script quantize_gguf.py provided here to quantize the model ourselves and upload the files to our hub repository.
const modelFormats = [
{label: 'Llama-3.2-1B-Instruct'},
{label: 'Qwen2-0.5B-Instruct'},
{label: 'DeepSeek-R1-Distill-Qwen-1.5B'},
{label: 'SmolLM2-1.7B-Instruct'},
];
const HF_TO_GGUF = {
"Llama-3.2-1B-Instruct": "medmekk/Llama-3.2-1B-Instruct.GGUF",
"DeepSeek-R1-Distill-Qwen-1.5B":
"medmekk/DeepSeek-R1-Distill-Qwen-1.5B.GGUF",
"Qwen2-0.5B-Instruct": "medmekk/Qwen2.5-0.5B-Instruct.GGUF",
"SmolLM2-1.7B-Instruct": "medmekk/SmolLM2-1.7B-Instruct.GGUF",
};
The HF_TO_GGUF object maps user-friendly model names to their corresponding Hugging Face repository paths. For instance:
- When a user selects ‘Llama-3.2-1B-Instruct’, it maps to
medmekk/Llama-3.2-1B-Instruct.GGUFwhich is one in all the repositories containing the GGUF files for the Llama 3.2 1B Instruct model.
The modelFormats array comprises the list of model options that will likely be exhibited to users in the choice screen, we selected Llama 3.2 1B Instruct, DeepSeek R1 Distill Qwen 1.5B, Qwen 2 0.5B Instruct and SmolLM2 1.7B Instruct as they’re the preferred small models.
Next, let’s create a solution to fetch and display available GGUF model files from the hub for our chosen model format.
When a user selects a model format, we make an API call to Hugging Face using the repository path we mapped in our HF_TO_GGUF object. We’re specifically on the lookout for files that end with ‘.gguf’ extension, that are our quantized model files.
Once we receive the response, we extract just the filenames of those GGUF files and store them in our availableGGUFs state using setAvailableGGUFs. This enables us to point out users a listing of accessible GGUF model variants they will download.
Fetching Available GGUF Files
const fetchAvailableGGUFs = async (modelFormat: string) => {
if (!modelFormat) {
Alert.alert('Error', 'Please select a model format first.');
return;
}
try {
const repoPath = HF_TO_GGUF[modelFormat as keyof typeof HF_TO_GGUF];
if (!repoPath) {
throw latest Error(
`No repository mapping found for model format: ${modelFormat}`,
);
}
const response = await axios.get(
`https://huggingface.co/api/models/${repoPath}`,
);
if (!response.data?.siblings) {
throw latest Error('Invalid API response format');
}
const files = response.data.siblings.filter((file: {rfilename: string}) =>
file.rfilename.endsWith('.gguf'),
);
setAvailableGGUFs(files.map((file: {rfilename: string}) => file.rfilename));
} catch (error) {
const errorMessage =
error instanceof Error ? error.message : 'Didn't fetch .gguf files';
Alert.alert('Error', errorMessage);
setAvailableGGUFs([]);
}
};
Note: Ensure to import axios and Alert at the highest of your file if not already imported.
We’d like to check that the function is working correclty, let’s add a button to the UI to trigger the function, as a substitute of View we’ll use a SafeAreaView (more on that later) component, and we’ll display the available GGUF files in a ScrollView component. the onPress function is triggered when the button is pressed:
<TouchableOpacity onPress={() => fetchAvailableGGUFs('Llama-3.2-1B-Instruct')}>
<Text>Fetch GGUF FilesText>
TouchableOpacity>
<ScrollView>
{availableGGUFs.map((file) => (
<Text key={file}>{file}Text>
))}
ScrollView>
This could look something like this :
Note: For the entire code until now you possibly can check the
first_checkpointbranch within theEdgeLLMBasicfolder here
Model Download Implementation
Now let’s implement the model download functionality within the handleDownloadModel function which ought to be called when the user clicks on the download button. This can download the chosen GGUF file from Hugging Face and store it within the app’s Documents directory:
Model Download Function
const handleDownloadModel = async (file: string) => {
const downloadUrl = `https://huggingface.co/${
HF_TO_GGUF[selectedModelFormat as keyof typeof HF_TO_GGUF]
}/resolve/predominant/${file}`;
setIsDownloading(true);
setProgress(0);
try {
const destPath = await downloadModel(file, downloadUrl, progress =>
setProgress(progress),
);
} catch (error) {
const errorMessage =
error instanceof Error
? error.message
: 'Download failed because of an unknown error.';
Alert.alert('Error', errorMessage);
} finally {
setIsDownloading(false);
}
};
We could have implemented the api requests contained in the handleDownloadModel function, but we’ll keep it in a separate file to maintain the code clean and readable. handleDownloadModel calls the downloadModel function, situated in src/api, which accepts three parameters: modelName, downloadUrl, and a progress callback function. This callback is triggered throughout the download process to update the progress. Before downloading we’d like to have the selectedModelFormat state set to the model format we would like to download.
Contained in the downloadModel function we use the RNFS module, a part of the react-native-fs library, to access the device’s file system. It allows developers to read, write, and manage files on the device’s storage. On this case, the model is stored within the app’s Documents folder using RNFS.DocumentDirectoryPath, ensuring that the downloaded file is accessible to the app. The progress bar is updated accordingly to reflect the present download status and the progress bar component is defined within the components folder.
Let’s create src/api/model.ts and duplicate the code from the src/api/model.ts file within the repo. The logic ought to be easy to grasp. The identical goes for the progress bar component within the src/components folder, it’s an easy coloured View where the width is the progress of the download.
Now we’d like to check the handleDownloadModel function, let’s add a button to the UI to trigger the function, and we’ll display the progress bar. This will likely be added under the ScrollView we added before.
Download Model Button
<View style={{ marginTop: 30, marginBottom: 15 }}>
{Object.keys(HF_TO_GGUF).map((format) => (
<TouchableOpacity
key={format}
onPress={() => {
setSelectedModelFormat(format);
}}
>
<Text> {format} Text>
TouchableOpacity>
))}
View>
<Text style={{ marginBottom: 10, color: selectedModelFormat ? 'black' : 'gray' }}>
{selectedModelFormat
? `Chosen: ${selectedModelFormat}`
: 'Please select a model format before downloading'}
Text>
<TouchableOpacity
onPress={() => {
handleDownloadModel("Llama-3.2-1B-Instruct-Q2_K.gguf");
}}
>
<Text>Download ModelText>
TouchableOpacity>
{isDownloading && <ProgressBar progress={progress} />}
Within the UI we show a listing of the supported model formats and a button to download the model, when the user chooses the model format and clicks on the button the progress bar ought to be displayed and the download should start. Within the test we hardcoded the model to download Llama-3.2-1B-Instruct-Q2_K.gguf, so we’d like to pick Llama-3.2-1B-Instruct as a model format for the function to work, we must always have something like:
Note: For the entire code until now you possibly can check the
second_checkpointbranch within theEdgeLLMBasicfolder here
Model Loading and Initialization
Next, we’ll implement a function to load the downloaded model right into a Llama context, as detailed within the llama.rn documentation available here. If a context is already present, we’ll release it, set the context to null, and reset the conversation to its initial state. Subsequently, we’ll utilize the initLlama function to load the model right into a latest context and update our state with the newly initialized context.
Model Loading Function
import {initLlama, releaseAllLlama} from 'llama.rn';
import RNFS from 'react-native-fs';
...
const loadModel = async (modelName: string) => {
try {
const destPath = `${RNFS.DocumentDirectoryPath}/${modelName}`;
const fileExists = await RNFS.exists(destPath);
if (!fileExists) {
Alert.alert('Error Loading Model', 'The model file doesn't exist.');
return false;
}
if (context) {
await releaseAllLlama();
setContext(null);
setConversation(INITIAL_CONVERSATION);
}
const llamaContext = await initLlama({
model: destPath,
use_mlock: true,
n_ctx: 2048,
n_gpu_layers: 1
});
console.log("llamaContext", llamaContext);
setContext(llamaContext);
return true;
} catch (error) {
Alert.alert('Error Loading Model', error instanceof Error ? error.message : 'An unknown error occurred.');
return false;
}
};
We’d like to call the loadModel function when the user clicks on the download button, so we’d like so as to add it contained in the handleDownloadModel function right after the download is complete if it’s successful.
if (destPath) {
await loadModel(file);
}
To check the model loading let’s add a console.log contained in the loadModel function to print the context, so we will see if the model is loaded accurately. We keep the UI the identical as before, because clicking on the download button will trigger the handleDownloadModel function, and the loadModel function will likely be called inside it. To see the console.log output we’d like to open the Developer Tools, for that we press j within the terminal where we ran npm start. If all the pieces is working accurately we must always see the context printed within the console.

Note: For the entire code until now you possibly can check the
third_checkpointbranch within theEdgeLLMBasicfolder here
Chat Implementation
With the model now loaded into our context, we will proceed to implement the conversation logic. We’ll define a function called handleSendMessage, which will likely be triggered when the user submits their input. This function will update the conversation state and send the updated conversation to the model via context.completion. The response from the model will then be used to further update the conversation, which implies that the conversation will likely be updated twice on this function.
Chat Function
const handleSendMessage = async () => {
if (!context) {
Alert.alert('Model Not Loaded', 'Please load the model first.');
return;
}
if (!userInput.trim()) {
Alert.alert('Input Error', 'Please enter a message.');
return;
}
const newConversation: Message[] = [
...conversation,
{role: 'user', content: userInput},
];
setIsGenerating(true);
setConversation(newConversation);
setUserInput('');
try {
const stopWords = [
'',
'<|end|>',
'user:',
'assistant:',
'<|im_end|>',
'<|eot_id|>',
'<|end▁of▁sentence|>',
'<|end▁of▁sentence|>',
];
const result = await context.completion({
messages: newConversation,
n_predict: 10000,
stop: stopWords,
});
if (result && result.text) {
setConversation(prev => [
...prev,
{role: 'assistant', content: result.text.trim()},
]);
} else {
throw latest Error('No response from the model.');
}
} catch (error) {
Alert.alert(
'Error During Inference',
error instanceof Error ? error.message : 'An unknown error occurred.',
);
} finally {
setIsGenerating(false);
}
};
To check the handleSendMessage function we’d like so as to add an input text field and a button to the UI to trigger the function, and we’ll display the conversation within the ScrollView component.
Easy Chat UI
<View
style={{
flexDirection: "row",
alignItems: "center",
marginVertical: 10,
marginHorizontal: 10,
}}
>
<TextInput
style={{flex: 1, borderWidth: 1}}
value={userInput}
onChangeText={setUserInput}
placeholder="Type your message here..."
/>
<TouchableOpacity
onPress={handleSendMessage}
style={{backgroundColor: "#007AFF"}}
>
<Text style={{ color: "white" }}>SendText>
TouchableOpacity>
View>
<ScrollView>
{conversation.map((msg, index) => (
<Text style={{marginVertical: 10}} key={index}>{msg.content}Text>
))}
ScrollView>
If all the pieces is implemented accurately, we must always give you the chance to send messages to the model and see the conversation within the ScrollView component, it isn’t beautiful in fact but it surely’s an excellent start, we’ll improve the UI later.
The result should appear to be this:
Note: For the entire code until now you possibly can check the
fourth_checkpointbranch within theEdgeLLMBasicfolder here
The UI & Logic
Now that we’ve got the core functionality implemented, we will deal with the UI. The UI is simple, consisting of a model selection screen with a listing of models and a chat interface that features a conversation history and a user input field. In the course of the model download phase, a progress bar is displayed. We intentionally avoid adding many screens to maintain the app easy and focused on its core functionality. To maintain track of which a part of the app is getting used, we’ll use a an other state variable called currentPage, it can be a string that may be either modelSelection or conversation. We add it to the App.tsx file.
const [currentPage, setCurrentPage] = useState<
'modelSelection' | 'conversation'
>('modelSelection');
For the css we’ll use the identical styles as within the EdgeLLMBasic repo, you possibly can copy the styles from there.
We are going to start by working on the model selection screen within the App.tsx file, we’ll add a listing of model formats (you want to do the mandatory imports and delete the previous code within the SafeAreaView component we used for testing):
Model Selection UI
<SafeAreaView style={styles.container}>
<ScrollView contentContainerStyle={styles.scrollView}>
<Text style={styles.title}>Llama ChatText>
{/* Model Selection Section */}
{currentPage === 'modelSelection' && (
<View style={styles.card}>
<Text style={styles.subtitle}>Select a model formatText>
{modelFormats.map(format => (
<TouchableOpacity
key={format.label}
style={[
styles.button,
selectedModelFormat === format.label && styles.selectedButton,
]}
onPress={() => handleFormatSelection(format.label)}>
<Text style={styles.buttonText}>{format.label}Text>
TouchableOpacity>
))}
View>
)}
ScrollView>
SafeAreaView>
We use SafeAreaView to be sure that the app is displayed accurately on devices with different screen sizes and orientations as we did within the previous section, and we use ScrollView to permit the user to scroll through the model formats. We also use modelFormats.map to map over the modelFormats array and display each model format as a button with a mode that changes when the model format is chosen. We also use the currentPage state to display the model selection screen only when the currentPage state is about to modelSelection, this is completed through the use of the && operator. The TouchableOpacity component is used to permit the user to pick a model format by pressing on it.
Now let’s define handleFormatSelection within the App.tsx file:
const handleFormatSelection = (format: string) => {
setSelectedModelFormat(format);
setAvailableGGUFs([]);
fetchAvailableGGUFs(format);
};
We store the chosen model format within the state and clear the previous list of GGUF files from other selections, after which we fetch the brand new list of GGUF files for the chosen format.
The screen should appear to be this in your device:
Next, let’s add the view to point out the list of GGUF files already available for the chosen model format, we’ll add it below the model format selection section.
Available GGUF Files UI
{
selectedModelFormat && (
<View>
<Text style={styles.subtitle}>Select a .gguf fileText>
{availableGGUFs.map((file, index) => (
<TouchableOpacity
key={index}
style={[
styles.button,
selectedGGUF === file && styles.selectedButton,
]}
onPress={() => handleGGUFSelection(file)}>
<Text style={styles.buttonTextGGUF}>{file}Text>
TouchableOpacity>
))}
View>
)
}
We’d like to only show the list of GGUF files if the selectedModelFormat state isn’t null, which implies a model format is chosen by the user.
We’d like to define handleGGUFSelection within the App.tsx file as a function that may trigger an alert to verify the download of the chosen GGUF file. If the user clicks on Yes, the download will start, else the chosen GGUF file will likely be cleared.
Confirm Download Alert
const handleGGUFSelection = (file: string) => {
setSelectedGGUF(file);
Alert.alert(
'Confirm Download',
`Do you ought to download ${file}?`,
[
{
text: 'No',
onPress: () => setSelectedGGUF(null),
style: 'cancel',
},
{text: 'Yes', onPress: () => handleDownloadAndNavigate(file)},
],
{cancelable: false},
);
};
const handleDownloadAndNavigate = async (file: string) => {
await handleDownloadModel(file);
setCurrentPage('conversation');
};
handleDownloadAndNavigate is an easy function that may download the chosen GGUF file by calling handleDownloadModel (implemented within the previous sections) and navigate to the conversation screen after the download is complete.
Now after clicking on a GGUF file, we must always have an alert to verify or cancel the download :
We will add an easy ActivityIndicator to the view to display a loading state when the available GGUF files are being fetched. For that we are going to have to import ActivityIndicator from react-native and define isFetching as a boolean state variable that will likely be set to true in the beginning of the fetchAvailableGGUFs function and false when the function is finished as you possibly can see here within the code, and add the ActivityIndicator to the view just before the {availableGGUFs.map((file, index) => (...))} to display a loading state when the available GGUF files are being fetched.
{isFetching && (
<ActivityIndicator size="small" color="#2563EB" />
)}
The app should appear to be this for a transient moment when the GGUF files are being fetched:
Now we must always give you the chance to see the various GGUF files available for every model format after we click on it, and we must always see the alert when clicking on a GGUF confirming if we would like to download the model.
Next we’d like so as to add the progress bar to the model selection screen, we will do it by importing the ProgressBar component from src/components/ProgressBar.tsx within the App.tsx file as we did before, and we’ll add it to the view just after the {availableGGUFs.map((file, index) => (...))} to display the progress bar when the model is being downloaded.
Download Progress Bar
{
isDownloading && (
<View style={styles.card}>
<Text style={styles.subtitle}>Downloading : Text>
<Text style={styles.subtitle2}>{selectedGGUF}Text>
<ProgressBar progress={progress} />
View>
);
}
The download progress bar will now be positioned at the underside of the model selection screen. Nevertheless, which means that users might have to scroll right down to view it. To deal with this, we’ll modify the display logic in order that the model selection screen is simply shown when the currentPage state is about to ‘modelSelection’ and the added condition that there is no such thing as a ongoing model download.
{currentPage === 'modelSelection' && !isDownloading && (
<View style={styles.card}>
<Text style={styles.subtitle}>Select a model formatText>
...
After confirming a model download we must always have a screen like this :
Note: For the entire code until now you possibly can check the
fifth_checkpointbranch within theEdgeLLMBasicfolder here
Now that we’ve got the model selection screen, we will start working on the conversation screen with the chat interface. This screen will likely be displayed when currentPage is about to conversation. We are going to add a conversation history and a user input field to the screen. The conversation history will likely be displayed in a scrollable view, and the user input field will likely be displayed at the underside of the screen out of the scrollable view to remain visible. Each message will likely be displayed in a distinct color depending on the role of the message (user or assistant).
We’d like so as to add slightly below the model selection screen the view for the conversation screen:
Conversation UI
{currentPage == 'conversation' && !isDownloading && (
<View style={styles.chatContainer}>
<Text style={styles.greetingText}>
🦙 Welcome! The Llama is prepared to speak. Ask away! 🎉
Text>
{conversation.slice(1).map((msg, index) => (
<View key={index} style={styles.messageWrapper}>
<View
style={[
styles.messageBubble,
msg.role === 'user'
? styles.userBubble
: styles.llamaBubble,
]}>
<Text
style={[
styles.messageText,
msg.role === 'user' && styles.userMessageText,
]}>
{msg.content}
Text>
View>
View>
))}
View>
)}
We use different styles for the user messages and the model messages, and we use the conversation.slice(1) to remove the primary message from the conversation, which is the system message.
We will now add the user input field at the underside of the screen and the send button (they mustn’t be contained in the ScrollView). As I discussed before, we’ll use the handleSendMessage function to send the user message to the model and update the conversation state with the model response.
Send Button & Input Field
{currentPage === 'conversation' && (
<View style={styles.inputContainer}>
<TextInput
style={styles.input}
placeholder="Type your message..."
placeholderTextColor="#94A3B8"
value={userInput}
onChangeText={setUserInput}
/>
<View style={styles.buttonRow}>
<TouchableOpacity
style={styles.sendButton}
onPress={handleSendMessage}
disabled={isGenerating}>
<Text style={styles.buttonText}>
{isGenerating ? 'Generating...' : 'Send'}
Text>
TouchableOpacity>
View>
View>
)}
When the user clicks on the send button, the handleSendMessage function will likely be called and the isGenerating state will likely be set to true. The send button will then be disabled and the text will change to ‘Generating…’. When the model finishes generating the response, the isGenerating state will likely be set to false and the text will change back to ‘Send’.
Note: For the entire code until now you possibly can check the
predominantbranch within theEdgeLLMBasicfolder here
The conversation page should now appear to be this:
Congratulations you’ve got just built the core functionality of your first AI chatbot, the code is out there here ! You’ll be able to now start adding more features to the app to make it more user friendly and efficient.
The opposite Functionalities
The app is now fully functional, you possibly can download a model, select a GGUF file, and chat with the model, however the user experience isn’t one of the best. Within the EdgeLLMPlus repo, I’ve added another features, like on the fly generation, automatic scrolling, the inference speed tracking, the thought strategy of the model like deepseek-qwen-1.5B,… we is not going to go into details here as it can make the blog too long, we’ll undergo a few of the ideas and easy methods to implement them but the entire code is out there within the repo
Generation on the fly
The app generates responses incrementally, producing one token at a time moderately than delivering your entire response in a single batch. This approach enhances the user experience, allowing users to start reading the response because it is being formed. We achieve this by utilizing a callback function inside context.completion, which is triggered after each token is generated, enabling us to update the conversation state accordingly.
Auto Scrolling
Auto Scrolling ensures that the most recent messages or tokens are at all times visible to the user by mechanically scrolling the chat view to the underside as latest content is added. To implement that we’d like we use a reference to the ScrollView to permit programatic control over the scroll position, and we use the scrollToEnd method to scroll to the underside of the ScrollView when a brand new message is added to the conversation state. We also define an autoScrollEnabled state variable that will likely be set to false when the user scrolls up greater than 100px from the underside of the ScrollView.
Inference Speed Tracking
Inference Speed Tracking is a feature that tracks the time taken to generate each token and displays under each message generated by the model. This feature is simple to implement since the CompletionResult object returned by the context.completion function comprises a timings property which is a dictionary containing many metrics concerning the inference process. We will use the predicted_per_second metric to trace the speed of the model.
Thought Process
The thought process is a feature that displays the thought strategy of the model like deepseek-qwen-1.5B. The app identifies special tokens like and to handle the model’s internal reasoning or “thoughts.” When a token is encountered, the app enters a “thought block” where it accumulates tokens that represent the model’s reasoning. Once the closing token is detected, the collected thought is extracted and related to the message, allowing users to toggle the visibility of the model’s reasoning. To implement this we’d like so as to add a thought and showThought property to the Message type. message.thought will store the reasoning of the model and message.showThought will likely be a boolean that will likely be set to true when the user clicks on the message to toggle the visibility of the thought.
Markdown Rendering
The app uses the react-native-markdown-display package to render markdown within the conversation. This package allows us to render code in a greater format.
Model Management
We added a checkDownloadedModels function to the App.tsx file that may check if the model is already downloaded on the device, if it isn’t we’ll download it, whether it is we’ll load it into the context directly, and we added some elements within the UI to point out if a model is already downloaded or not.
Stop/Back Buttons
We added two vital buttons within the UI, the stop button and the back button. The stop button will stop the generation of the response and the back button will navigate to the model selection screen. For that, We added a handleStopGeneration function to the App.tsx file that may stop the generation of the response by calling context.stop and set the isGenerating state to false. We also added a handleBack function to the App.tsx file that may navigate to the model selection screen by setting the currentPage state to modelSelection.
5. Methods to Debug
Chrome DevTools Debugging
For debugging we use Chrome DevTools as in web development :
- Press
jwithin the Metro bundler terminal to launch Chrome DevTools - Navigate to the “Sources” tab

3. Find your source files
4. Set breakpoints by clicking on line numbers
5. Use debugging controls (top right corner):
- Step Over – Execute current line
- Step Into – Enter function call
- Step Out – Exit current function
- Proceed – Run until next breakpoint
Common Debugging Suggestions
- Console Logging
console.log('Debug value:', someValue);
console.warn('Warning message');
console.error('Error details');
This can log the output within the console of Chrome DevTools
- Metro Bundler Issues
For those who encounter issues with the Metro bundler, you possibly can try clearing the cache first:
npm start --reset-cache
- Construct Errors
cd android && ./gradlew clean
cd ios && pod install
6. Additional Features we will add
To boost the user experience, we will add some features like:
-
Model Management:
- Allow users to delete models from the device
- Add a feature to delete all downloaded models from the device
- Add a performance tracking feature to the UI to trace memory and cpu usage
-
Model Selection:
- Allow users to look for a model
- Allow users to sort models by name, size, etc.
- Show the model size within the UI
- Add support for VLMs
-
Chat Interface:
- Display the code in color
- Math Formatting
I’m sure you possibly can consider some really cool features so as to add to the app, be happy to implement them and share them with the community 🤗
7. Acknowledgments
I would really like to thank the next people for reviewing this blog post and providing useful feedback:
Their expertise and suggestions helped improve the standard and accuracy of this guide.
8. Conclusion
You now have a working React Native app that may:
- Download models from Hugging Face
- Run inference locally
- Provide a smooth chat experience
- Track model’s performance
This implementation serves as a foundation for constructing more sophisticated AI-powered mobile applications. Remember to contemplate device capabilities when choosing models and tuning parameters.
Completely happy coding! 🚀

