Home Artificial Intelligence Jarvis: an Experiment Leveraging the OpenAI API, Apple Homekit, and Siri

Jarvis: an Experiment Leveraging the OpenAI API, Apple Homekit, and Siri

Jarvis: an Experiment Leveraging the OpenAI API, Apple Homekit, and Siri

Photo by Ryan Yoo on Unsplash

I’m a pc nerd, and I at all times have been, so when ChatGPT landed within the news I immediately got enthusiastic about the chances. As an analyst/business owner, I wondered if it could possibly be used to make my team’s and client’s lives easier … but figured I needed some experience with the technology first.

Like everyone else, I began asking Chat GPT questions, mostly mundane stuff, and that was fun. Then I discovered how good the platform is at helping people write code!

Last 12 months I built a Python and JavaScript system for keeping track of automotive collections called the Auto Asset Manager. It really works advantageous, but since I used to be latest to Python and my JavaScript was pretty rusty, quite a lot of the code was suboptimal. I used Chat GPT to repair some stuff and it was good! I might just say “what’s one of the best option to access information in a multidimensional array in Python” and it could mainly send me the precise code I needed.

I loved it.

But then I saw this post on combining OpenAI with Apple Homekit via Siri and was inspired. I take advantage of Homekit so much, but truthfully, I don’t use Siri as a part of it because, well, she’s not very sophisticated.

But Jarvis is.

I began with Mate’s shortcut and kind of ended up re-writing it to be a) more modular and b) more tailored to my needs. I ended up with a command shortcut called “Get Jarvis” and three child shortcuts to handle commands, queries, and the necessity to translate the responses different devices provide. I needed the translator because, like anyone with Homekit stuff, I run a plethora of systems using Homebridge and every system seems to present answers a unique way (e.g., Meross says a switch is “on” or “off” but Phillips says “yes” light is on or “no” it’s not and Remootio for my gate indicates “1” for open and “0” for closed, etc.)

Here’s a short video of what Jarvis is in a position to do for me (using Siri as a baseline for what is feasible today):

Behind the scenes, there may be kind of so much happening.

In a nutshell:

  • You activate Siri normally but immediately ask her to “Get Jarvis” which is the name of the controller shortcut;
  • If Jarvis is being called for the primary time in a session he responds with a straightforward prompt or, if relevant, a follow-up query;
  • You make whatever request you’ve of Jarvis and that gets packaged up with the prompt to the OpenAI API (more on that later);
  • OpenAI returns a JSON object;
  • Jarvis parses the response into certainly one of five categories: a question, a command, an issue, a solution to a non-home query, a clarifying query, or a composition;
  • Depending on the category, Jarvis either handles the request itself or calls one other of the shortcuts that are separate only for cleanliness of coding;
  • Jarvis does whatever was requested, provides feedback, and asks for those who need additional help.

Some clever things Jarvis can currently do include:

  • Jarvis can do complex tasks based on a particular routine, for instance I can tell Jarvis “I’m leaving” and he’ll open my gate, my garage door, turn off any lights which are on, and lock up the home. Conversely once I get home he can undo that, with lights being depending on the time of day;
  • Jarvis can open other applications as I would like them. Homekit doesn’t have direct access to cameras (despite them being a part of Homekit) so if I would like to see a camera, Jarvis opens the Home app for me;
  • Jarvis can start my Harmony TV setup and prep every part in order that I can watch Netflix or hearken to music on Apple TV. I can’t get Jarvis to start out a program … yet … but I even have him open my Harmony Distant which is a step in the suitable direction;
  • Jarvis can set one-click reminders for future events, for instance I can tell Jarvis “activate the lights at 5:30 PM” and he’ll create a reminder for that point that may pop-up and provides me a link to click to activate the lights;
  • Jarvis can compose short text messages for me to recipients in my address book and send via the Messages app, or Jarvis can call or Facetime the person directly;
  • Jarvis is in a position to help me with my automotive collection … because of one other private API that informs Jarvis about milage, age, maintenance issues, and the final value of every automotive. This, after all, requires slightly more programming, nevertheless it also opens up an entire latest world of opportunities I feel.

What makes it work is something like this:

You will have been asked to [request]

Reply to this request sent to a sensible home only in JSON format which will likely be interpreted by an application code to execute the actions. These requests needs to be categorised into five groups:

- "command": change the state of an adjunct (required properties within the response JSON: "motion", "request", "devices", "scheduleTimeStamp")
- "query": get state of an adjunct (required properties within the response JSON: "motion", "confirmation", "devices")
- "answer": when the request has nothing to do with the smart home. Answer these to one of the best of your knowledge, and use this weather information below if appropriate (required properties within the response JSON: "motion", "answer")


- "weather": if the request is concerning the weather include the zip code for the placement being asked about in a field "zipcode". If no location is specified use "98607" as a zipcode (required properties within the response JSON: "motion", "weather", "zipcode")
- "make clear": when the motion just isn't obvious and requires rephrasing the input from the user, ask the user to be more specific. This will likely be categorised right into a "query" motion. (required properties within the response JSON: "motion", "query")
- "message": send a creative and thoughtful message to a recipient or recipients (required properties within the response JSON: "motion", "message", "recipient")

Details concerning the response JSON:

The "motion" property needs to be one and only certainly one of the request categories: "command", "query", "answer", "make clear", or "communicate"
The "request" property should echo the request stripping out ALL details about timing. Don't include any date or time information on this property, ever
The "confirmation" property should describe whether we try to "confirm" the state of a tool or "check" the state against a desired state
The "devices" property should only include every accessory matching the request and the state being requested, in lowercase. The JSON response should follow this exact JSON format:

'device': (exact accessory name),
'state': (state being requested)

If I indicate that a tool is "called" something, please use that exact name within the device field.

Regarding "state":

- lights may be turned "on" or "off"
- blinds can "opened" or "closed"
- human doors may be "locked" or "unlocked"
- garage doors may be "opened" or "closed"
- thermostats can let you know the "temperature" within the room but not make changes
- cameras may be "shown" within the Nest app
- "hearken to music" or "watch television" is a command
- any device may be "offline"
- if I ask if a tool is "running" check to see whether it is "on"

Provided that the request is for a time in the long run, the "scheduleTimeStamp" property needs to be the precise time the request has been made for relative to now, otherwise this field needs to be blank. For instance, if I say I'm leaving "in quarter-hour" you'll return a RFC 2822 date that was quarter-hour later than Current Date

It’s pretty involved and since the prompt must be substantial the interactions find yourself costing possibly five cents — the Open API use isn’t free — but at the identical time I’m describing my house and the devices in it the way in which I might to a (mostly) normal human being and the API does the remaining.

For instance, if I ask Jarvis to “check the lights within the bedroom”, the JSON response looks like this:

"motion": "command",
"request": "check the lights within the bedroom",
"devices": [
"device": "eric's light",
"state": "on"
"device": "amity's light",
"state": "on"
"device": "bathroom light",
"state": "on"
"scheduleTimeStamp": ""

You’ll be able to see, I get an motion that I can parse and use to find out which sub-shortcut will handle the response (the “motion”), a listing of “devices” and “states” that I even have asked about.

With this I even have a bunch of Apple Shortcut “scripts” that do what they should do. For instance, here is a number of the Jarvis command script:

Coding shortcuts is pretty tedious nevertheless it gets easier with every hundred lines of code …

None of this is ideal, OpenAI’s responses may be slow and sometimes get confused based on the prompt, but for a primary stab, it’s an entire lot more useful for me than Siri. The largest thing is getting the prompt to act the way in which you would like for the JSON to be parsed, and that requires an awful lot of trial and error.

Here is my billing graph with OpenAI for the work thus far:

This isn’t free: 1,000 tokens costs $0.02 and every of my requests are around 2,000 tokens total
OpenAI shows me token requests in 5 minute buckets or in total which is good for debugging

My immediate goals are to A) get the API to return results that give the shortcuts a good probability of executing appropriately and B) reduce the prompt to the bare minimum to realize “A” which is able to help reduce costs.

Still, my estimate is that after testing, a strong use of Jarvis will probably cost a couple of dollars per 30 days. Pretty low-cost to have my very own digital concierge!

What do you think that?

Let me know within the comments below. Eventually, I hope to get back to the “make my people’s lives easier” use of AI … but for now I’m glad to talk about home automation!



Please enter your comment!
Please enter your name here