edit_square igindin

MeetCal: The Gap Between Demo and Product

I built an AI scheduling bot in Telegram in a weekend. Then spent eight weeks making it actually work. Here's what the demo didn't teach me.

Ilya Gindin
translate ru

The demo worked perfectly. I sent “встреча завтра в 4” to the bot, it created the event, the invite went out. I thought I was done. Then my first real user tried “на 4 часа” and the bot booked a 4pm meeting instead of a 4-hour one.

Intent: Why a Telegram Scheduling Bot

Most of my business conversations happen in Telegram. Someone says “let’s sync this week” and immediately you’re switching to Calendly, copying a link, pasting it back. Three context switches for one 30-minute call.

Telegram has about a billion monthly active users. People open it 21 times a day on average. Mini Apps — little web apps that run inside Telegram without any install — have stabilized at around 150-190 million monthly users after the crypto bubble deflated. That’s a massive native surface for utility tools, but roughly 60% of Mini Apps are still crypto-related. Scheduling felt obvious. Nobody had done it well.

The pitch to myself was simple: book a meeting without leaving Telegram. Share a link in chat, guest picks a time, calendar invite goes out. No new account, no app download. Just a deep link.

First Attempt: The 48-Hour Demo

I scaffolded the whole thing in a weekend. Vite, React 19, Express backend, grammY for the Telegram bot layer. By day two I had a working Mini App with timezone picker, booking endpoints that spat out ICS files, tab navigation with animations.

By January 23rd I had Google Calendar OAuth. A few days later the thing was stable enough to show people. Vibe coding — the thing Karpathy named in early 2025 that became every developer’s workflow by year end — had given me a working product in days. The AI wrote most of the boilerplate. I described what I wanted, it generated it, I patched the edges.

That feeling when a demo works is dangerous. Everything looks solved. The happy path is so clean.

The Gap: Where Everything Broke

The AI scheduling assistant was built on raw fetch() calls to OpenRouter. One massive system prompt. The model had to do everything in a single shot: classify intent, extract entities, parse relative dates, handle ambiguity, format output as valid JSON.

Russian relative dates are a nightmare. “завтра в 4” — tomorrow at 4 — seems simple until you ask: 4am or 4pm? “в 4” in Russian conversational context almost always means 4pm, but the model didn’t know that. I’d get startAt: null back. Or it would pick 4am because it was technically ambiguous.

“на 4 часа” broke everything. That phrase means either “at 4 o’clock” or “for 4 hours” depending on context. In my testing, the model guessed wrong about half the time. Users don’t notice when they’re setting up a meeting — they notice when they show up at the wrong time.

Voice messages made it worse. Telegram lets you send voice. I wanted the bot to handle those too. You transcribe them with Whisper, then parse. But transcription errors compound on top of parsing errors. “в четыре” transcribed as “вчетыре” (one word) broke date extraction entirely.

Then there were the timezone issues. A user in Tbilisi books a slot that shows correctly in their calendar but gets an invite with UTC+0 time. Supabase stores everything in UTC, the Mini App reads browser timezone, the bot reads Telegram timezone metadata — three different sources, three potential mismatches. I spent a week just on timezone edge cases.

OAuth token refresh was its own thing. Google Calendar tokens expire. If a user sets up their calendar, comes back three weeks later, books a meeting — the token is dead. The first version just failed silently. No error message. The meeting appeared booked on their end, nothing appeared on the calendar. That’s the worst kind of bug: it looks fine until it isn’t.

None of this showed up in the demo. Because in the demo, I was the user. I knew what the bot expected. I typed clean inputs. Real users don’t do that.

The Pivot: From String Parsing to Tool Calling

The architectural problem was that the model was doing too much. Intent classification, entity extraction, date resolution, output formatting — all in one prompt with one chance to get it right. When it got anything wrong, everything was wrong.

The fix was conceptually simple: split the responsibilities. The model decides what to do. Tools handle how to do it with validated schemas.

I migrated to the Vercel AI SDK with proper tool calling. Instead of asking the model to return a JSON blob with a meeting, I gave it a createEvent tool with a schema where startAt is a required ISO 8601 datetime string. The model can’t call the tool until it has resolved the exact time. If “в 4” is ambiguous, the model has to ask a clarifying question rather than guess.

The difference is enforcement. Before, the model could hallucinate a plausible-looking JSON with startAt: null and the app would try to use it. Now the schema rejects it before anything happens. The model is forced into the right behavior by the structure of the tool.

This is what people mean by “prompt engineering isn’t enough.” At some level of complexity, you need architecture, not better prompts. The tool-calling pattern turned an unreliable assistant into something I could actually test against.

Gemini Flash via OpenRouter handles the reasoning. Fast, cheap, good enough for scheduling intent. The tools handle the parts that need to be deterministic.

What Actually Ships a Product

The AI assistant is maybe 20% of the codebase. The other 80% is the operating system around it.

Payments took two weeks. Telegram Stars — their native currency, 350 Stars per month for the pro tier. Integrating them means handling webhooks, reconciling payment state with Supabase, gating features, handling edge cases like failed renewals.

Zoom integration was another week — the OAuth dance, token storage, creating meetings via API, attaching the link to the calendar invite, refreshing tokens when they expire at 3am.

Then notifications. People need reminders. Cron jobs on the server, checking upcoming meetings, sending Telegram messages 24 hours and 1 hour before. Preference settings so users can toggle what they get. Nobody writes blog posts about cron jobs, but they’re what make a product feel alive.

Rescheduling was surprisingly complex: the original slot needs to free up, the new slot checks availability, the calendar invite updates, both parties get notified. Four operations that all need to succeed together or roll back.

OG image generation for booking links, so when you paste a link in chat it shows a proper preview card. Dynamic SVG rendered to PNG on the server, no headless browser needed.

Inline bot mode — type @meetcal_bot in any chat and share a booking link without opening the Mini App. Took a day to build, and it’s the feature that best captures the original vision of scheduling without context switches.

There’s a stat from a Composio report that sticks with me: 95% of AI pilots produce no business impact. Only about a third of AI projects make it to production at all. The framing I’ve seen is “teams built powerful kernels without an operating system.” That’s exactly what the first version of MeetCal was. A kernel. A very impressive demo with no operating system around it.

Where It Is Now

MeetCal is live. The bot is @meetcal_bot. It handles natural language booking in Russian and English, voice messages, timezone-aware scheduling, Google Calendar sync, Zoom link generation, reminders, rescheduling. Payments are working. The landing page has GSAP animations that I’m probably more proud of than I should be.

If I started over, I’d build the tool-calling architecture from day one. The raw prompt approach was faster to prototype but cost more time to fix than it would have taken to do right initially. The boundary between “AI decides” and “code executes” should be explicit from the start.

What surprised me most: how much of the hardest work had nothing to do with AI. The timezone handling, the token refresh logic, the notification scheduling — that’s where I spent the most debugging time. The AI part, once I got the architecture right, was actually the easy part.

The 48-hour demo got me excited. The eight weeks after it taught me how products actually work.

Takeaway

Every article about vibe coding celebrates the speed of the MVP. Fair — it is genuinely fast. But the story usually ends there, right when it gets interesting.

The demo-to-production gap is where you learn what you’re actually building. The AI assistant failing on “на 4 часа” forced me to understand what structured tool calling does architecturally. The OAuth token refresh bug forced me to think about state across long time horizons. The timezone mismatches forced me to understand how time actually works in distributed systems.

You can’t learn these things in the demo. The demo is clean by definition. Production is where the actual system lives.

Build the demo fast. Then stay in the gap long enough to understand it.

← arrow keys or swipe →