AIKit and modAI: building a foundation for AI in MODX

This is a long stream-of-consciousness post reflecting on the last month or two working on AIKit and talking with John and Ryan at MODX about their modAI extra. Here's a TL;DR summary:

I've been integrating AI into MODX by developing AIKit, with an AI Assistant interface using Tools and support for Vector Databases to bring actual authoritative knowledge to it, and perform actions directly in MODX, like creating resources. The goal to provide a foundation that all sorts of AI implementations can build upon, without having to reinvent the wheel. As MODX released modAI, which focused primarily on the in-context actions (i.e. click a button by the summary to generate a summary) around the same time, I've been talking with them and collaborating with John at the recent Hackathon to find ways to combine our efforts into a single tool. We've now defined a path forward, taking modAI and John's work on that as the starting point, and collaborating on bringing AIKit's knowledge and integrations into that.

Earlier this year, I started on a little side project to bring AI into MODX.

Like many people, I was impressed by the speed at which AI was improving. While I had been skeptical before, it was becoming part of my day-to-day work. From helping with writing/editing content, and slowly becoming a more engrained part of the development too. 

A lot of my content writing and editing happens directly in MODX, so needing to go back and forth between different websites was getting cumbersome, so why not just incorporate their APIs directly? 

And thus began my adventure into AI began. 

We have a range of extras that could benefit from having large language model support built in, some of these we've seen people do already over the last months. Redactor obviously, to help write, edit and improve text. In ContentBlocks, to write alt text for images or even fill out a full skeleton from a single prompt. Perhaps even generating images in MoreGallery. 

All of these features individually are pretty simple to build. Send request, get a magical response. Little integrations like these have been popping up left, right, and center.

What became very clear very early to me, is that we need to stop reinventing the wheel, and start working together on them. Rather than build 3 standalone integrations, that all look different, send their own requests to a model that each may or may not support certain features, and may or may not have access to certain information about your site and content.... we need a single foundation that those use cases (and many more) can easily hook into and share. 

You see, AI (especially generational AI that you task to come up with new things) is only as good as what you feed it. It's why a good prompt can make the difference between a mediocre and a great result. 

But the prompt alone is not enough. It needs more context, authoritative knowledge about your products, services, or whatever your site is about.

If I'm writing an article about ContentBlocks, it needs to have an actual understanding of ContentBlocks to be useful and save me time. If it doesn't have that knowledge - it's just going to imagine random things, and I'll end up needing to rewrite all of it. 

Learning the lingo

The basics of implementing AI is easy. It's an API request and you get a response that's like magic. The allure of a sparkle button that magically fixes everything is enticing, but it is also naive. 

Once you start digging deeper, and you want to make your implementation *know things*, things get more complex. 

Terms like RAG, function calling, embeddings, vector databases, MCP... understanding the AI landscape and what all those terms mean and how they affect your usage is a journey - I'm still very much on that journey myself.

And then when you think you figured it out, everything suddenly changes with the release of a new model or concept. It's a wild ride to try and keep up.

Introducing AIKit for MODX

Trying to build a solid foundation for AI in MODX led me to build AIKit.

The goal for AIKit is to provide an assistant experience in MODX that is both knowledgeable and capable of taking actions for you. Accessible from both a button in the menu, and in-context, such as within the resource content. 

AIKit supports different LLM providers (OpenAI, Gemini, Claude, etc) and lets you configure those for your site to ensure consistent language and tone. And I wanted to build it in such a way that would feed it information about your site, to give it that precious context that makes the difference between the model hallucinating and it actually saving you time. Automatically, and without a massive barrier to entry - AI like this should be available to the masses and not just the 1% of big agency sites. 

It supports function calling, or tools, in an extendable way. That means the assistant can search for relevant resources and access that information to teach it about your site, services, products, and anything else it and to know. And then use that context to create resources, send a draft newsletter to MailChimp, or whatever other integration you can dream up. 

At this point in the development, we were a few weeks before the SnowUp that happened the weekend of February 27th. The main topic there was gonna be AI, and I set myself the goal to provide a functional proof of concept, but leave enough of it open to continue working on shaping it there with the people in attendance. I very much knew that I had only scratched the surface, and was hoping to learn from people's experiences there, especially Sterc who have been digging deep into this subject for a while now. 

The SnowUp... and modAI

A few days before the SnowUp, I open sourced the GitHub repository for AIKit. It had come up in several conversations with people at that point and the proof of concept is where I wanted it to be for the SnowUp, so that was the right time to share it. 

On that very same day, I saw it three hours later, Ryan announces modAI. An AI tool for MODX.

Ignoring marketing hyperbole and general excitement about the potential of AI, on its first launch it is certainly a slick tool featuring support for various LLM providers, with a focus on those in-context actions: clicking a little "sparkle" button on a summary to automatically generate it, creating images from an image TV, that sort of thing.

Definitely worth looking into, but at at this point I'm packing for my trip to Switzerland to show AIKit and hopefully learn from others there on how to make it smarter.

I'm able to take a cursory look at the modAI code and drop them a note to share what I've done with AIKit. John (as developer on modAI) and I chat a bit about what we're up to, but because of conflicting travel plans and illness on both sides, finding a date to have a proper call about it all takes a little while. 

At the SnowUp, especially the team from Sterc prove to be instrumental to my understanding of the more complicated parts of AI. 

My initial approach of using function calling works great for obvious things, but the AI doesn't know what it can ask for. As a demo, asking it to "look up the weather in our office locations" works great when you add information that the office location can be found in the "about me" page, but if that page is actually called "about us", it fails.

Basically, the AI still doesn't have enough context. It needs to be able to understand the site more.

Over the weekend, through presentations, demos, and one on one collaboration, the Sterc team help me get a better understanding of especially vector databases and how those can be used to feed the AI. Their many months of experience of building and experimenting with complicated systems for their clients proves to be invaluable to learn what works, and what doesn't.

In short, a vector database is a type of index based on vectors: numeric representations of natural language determined through embedding models. Taking the example of the AI looking for "about me" vs "about us", the numbers representing those terms would be really close together - indicating they're probably related. And because numbers are quick to index and lookup in a database, a search for the vector representing "about me" would be able to inform us that the page "about us" is statistically very likely to relevant very quickly. And of course we don't just index the page title - we index the full content. 

So using a vector database we can take our users' prompt, automatically and using their natural language identify relevant resources, products, or whatever else we stored in our vector database. We can then include the information from that page into the prompt we send the AI model, giving it that precious authoritative knowledge it needs to generate accurate content or answer questions for us.

By the end of the SnowUp, AIKit supports Pinecone as vector database source (on an interface other sources could implement) as well as some cool new features like creating resources from the assistant. I also released a first packaged release to make testing it easier, and hearing the feedback from people trying it out was amazing. Translating resources, building out an entire sitemap from a single prompt - people were already finding ways to use it I hadn't even imagined yet.

Now, about that whole modAI thing...

One foundation, all the implementations

AIKit is far from done at this point.

It's a great proof of concept and thanks to the collaborations at the SnowUp it has gotten closer to that mission of providing a smart foundation for various implementations to use.

It still lacks some key features that were part of the vision though. The chat assistant can lookup relevant context and take actions for you, but it can't yet

It also doesn't yet have image support (though Sinisa worked on a concept for that), it doesn't stream responses (which gives users more direct responses as it is generated), and especially lacks the ability be triggered from different places in the manager and then put the
result back for you, like modAI. I started working on that after my return from Switzerland.

John and I start chatting a little more at this point, sharing our thoughts and ideas on where we are and where we're going with our respective tools. Learning from each other and seeing if there's a way to combine our efforts.

With AIKit bringing knowledgable AI while modAI brings contextual actions, neither of us really want to reinvent what the other has been working on, and we also explicitly discuss this during a call with Ryan, but by this stage I'm still struggling to see how to bridge the differences to get the best of both worlds. They're too different.

Fast forward just a few weeks to the Hackathon hosted by Sterc, that both John and I attend. 

Hackathon 

By the time we sit together at the hackathon, the differences between modAI and AIKit have gotten much smaller. Much credit for that goes to John's continued work on modAI, definitely taking on board the vision for AIKit. 

At this point modAI has gotten multi-step prompting to refine results in an assistant-like experience. John shows me his first implementation of function calling. And we talk quite extensively about what it would take to bring in that additional context and capability that sets AIKit apart, and how we should move forward. 

By lunch, it feels like we're starting to get somewhere.

At this point, modAI has the best UI, features image generation and vision support that I haven't even touched on in AIKit, and also supports native streaming mode across OpenAI, Gemini, and Claude. Adding those things to AIKit is of course possible - but not particularly straightforward and would end up duplicating a lot of the effort John had already done.

But looking at it from the other perspective, the features that set apart AIKit are beginning to look much more attainable in modAI. Saving chats to the database to track token usage and be able to continue a conversation after a refresh. Function calling for retrieving and performing actions is almost there. And we figured out a way to add the additional context from vector databases or other tooling into modAI's architecture, while preserving its streaming functionality.

To me that is the most important part of AIKIt, getting that additional authoritative knowledge into the assistant so it won't just generate imaginary content, but actually understands what you're asking for. And the path to add that into modAI is now clear. 

Moving forward

Prior to the hackathon, I was thinking we'd probably end up needing to build a new third project to merge our efforts (modAIKit!) into a new shared architecture somehow. We even started the Hackathon by asking AI to analyse the projects and suggest ways to merge it. 

But seeing John's progress and talking extensively about how we could tackle the more complex challenges, joining forces on modAI just makes the most sense. modAI is closer to accomplishing AIKit's goals and visions, than AIKit is to accomplishing modAI's goals and visions. And starting from scratch doesn't really make sense, either. 

It's hard to let go of AIKit, but in the interest of giving the community a single foundation to build their AI implementations on top of, porting AIKit's defining features to modAI is just objectively the most effective way forward.

It's not quite there yet, and while its current functions work as expected "production ready" is far from the term I'd use when there's still lots of work to be done. But massive kudos to John for working on getting it closer, every day (in the tools branch if you want to take a look).

I'm excited to contribute to it directly and share what I've been learning, and collaborate on that more and more. AI is moving so fast we absolutely do need to collaborate and learn from each other.

Anyone that has been doing work with AI and is considering building an AI integration: I definitely invite you to join us working on modAI, and to learn about the ways we envision that will be usable for all sorts of implementations in the future. Whether you're in need of a one-shot prompt-response, in-context action, or global assistant functionality... let's collaborate! 

Let's stop re-inveinting the wheel, and instead collaborate on building the rest of the car together. The potential is out there waiting to be grabbed.

So what about AIKit?

For now, I will leave AIKit available for download as it still serves its purpose as a globally available AI assistant that modAI does not yet have released. 

Definitely play with it if you haven't! It's fun, and exciting to see the assistant do work for you while also actually knowing things about you and your site! It truly shows the difference between an AI implementation that needs to make things up, and one that can actually use authoritative knowledge and run actions to save you time.

AIKit won't receive much in terms of updates, as I'll be focusing whatever time I can spare on helping John bring those features to modAI instead. But it's still a great proof of concept and whatever AIKit can do today, modAI will be able to do as well soon. And more.