A Few Weeks with Claude: Notes from the Busy Season Trenches

It’s been a few weeks since I downloaded Claude Cowork and took the jump beyond ChatGPT into more advanced AI tools. I realized pretty quickly that this wasn’t just another chatbot model upgrade. It was an entirely different category of tool for this multifaceted CPA-Contractor to put to work.

This isn’t a technical review or a support piece. I’m not an engineer and I’m not trying to become one anytime soon. I’m an accounting operator and small business owner working through overlapping busy season problems in real time, trying to make systems tighter and workflows better. What follows is just what happened when I started using Cowork inside actual production work instead of just playing in the sandbox.

The Summary

Cowork is a completely different tool and workflow than the ChatGPT I came from.

It’s easiest to think of Cowork as a machine. It will process the prompts you give it, with the context behind it (usually, more on that), and it will produce exceptional work with exceptional prompting. It will produce precisely nothing useful with lazy and bad prompting.

Contrast that with ChatGPT, which is a much more interactive model that reacts incredibly well to sloppy prompting. This is because it retains much more background context that Claude’s models don’t.

I’ve started calling this contrast Machine AI and Bouncy AI.

Machine AI behaves like real work infrastructure. Reliability exists (or one would assume). Inputs lead to consistent, expected outputs. If you build the setup prompt and context correctly, the result is predictable. To a CPA, that matters more than I think hardly any tech people realize. The work we do has to hold up over time instead of just sounding good on the screen. Reliability is mission critical, our profession does not exist without it.

Bouncy AI behaves like conversation. You throw something at it, it throws something back, and together you move forward quickly. That’s incredibly useful, but it’s not the same thing. And I don’t see that tech ever being reliable in the CPA sense. It’s a better search engine, or at least the legacy version that pretended advertising was the purpose.

Cowork is not bouncy. Not a search engine by any description. It is a professional data processing machine. It excels at keeping deep work moving once the work has structure.

The other thing I am finding about Bouncy and Machine AI is they are a deadly combo. The types of detailed prompts that Machine AI needs are perfectly crafted by bouncing them off a chat model, that staged iteration we are all being trained to conduct.

The Good

Output Quality is Near-Professional on First Pass

One of the first things I noticed working in Cowork was how close the first pass output was to something I could actually use.

ChatGPT usually takes several edits to get to professional-grade structure, then you have to do an AI slop to human language conversion at the end.

With Cowork, one well planned pass and a final review often gets me there. It's often very close to human language, first pass. Scary close.

The first I saw this was a quick project that evolved into a lot more. The quick part, I wanted to see if I could produce some client intake checklists that I was using as onboarding material, but I wanted to move them upstream to use as a sales funnel tool. It took a few prompts, but I was able to produce incredibly professional looking materials that far exceeded my design skills, in a couple quick passes.

This also came in handy for building out marketing materials for my landscape construction business. This marketing project had been backburnering for more budget and time for years. I knocked the whole set of marketing assets out on a Saturday morning for no spend, and the quality is incredible. Draft pass, edit, review, done. Again, impressive, and a bit scary, at once.

Context Documents Create Continuity Across Sessions

That Machine AI behavior that I mentioned that takes a lot of work to prompt? The easiest way I found to manage this is to have permanent prompt-context references that coach the models how to perform. In my case, across multiple businesses and clients, which requires some creative and careful design.

Once designed, though, there’s a permanence and comfort about the context docs that is hard to appreciate until you start using them regularly. It builds a trust you can never achieve with a chatbot model.

Instead of restarting every session from scratch, the project accumulates structure over time. You tweak the context docs to maximize the outputs you desire. It's almost infinitely customizable.

You end up falling into a cadence of permanent context and variable context for each project, and managing that is a separate workflow of its own. That's largely the work I referenced, but it's generally useful feeling work.

When the machines listen, and when humans properly keep them updated, they start behaving like institutional memory instead of flighty conversation history the chat models are prone to.

 Claude Handles Complexity and Multi-Document Context Better Than Chat Models

Cowork starts separating itself when the work involves multiple, complicated source docs.

Narratives, exports, frameworks, assumptions, and historical structure can all live inside the same prompt environment if staged and instructed correctly. This is as powerful as it sounds, which is extremely powerful. It can do first pass analysis in minutes that would take a human days or weeks. It's that powerful.

One of the more ambitious projects I have underway is to build institutional memory docs from a variety of sources, mostly my ramblings, over time. Without getting into the weeds, it's pulling together a lot of nonfinancial and financial sources to develop better ways to document things that will remain proprietary and my property, the backbone of my businesses’ operating systems. This has been living in my head for some time, and I attempted it several times over the years with chat models to mostly frustrating failure and stalled progress. Cowork gave me the bones to see this project forward because it's ability to digest vast amounts of data quickly is orders of magnitude better. What it does with that ingestion, is entirely up to your prompting. At the end of the day, the ability to distill complicated, voluminous data, into useful material is incredible. That will change the world and quickly.

Systems Work Instead of Artifact Work

You may notice a theme emerging, that a lot of projects I have underway are very heavy lifts. Visionary, strategic, big work. Not exactly the workpapers, reconciliations, and reports that are more boring everyday Controller outputs.

That's not an accident. First, I have still only used it very carefully in direct financial applications. I am still generally uncomfortable with that premise, given that I am operating open models. The other is that as I feel it out, I sense that Cowork is much more suited for this sort of foundational planning and execution than it is at making annoying emails go away or automating that sales spreadsheet you have to do update day.

That's not to say that Cowork can't build those tools someday--that is one of my goals as I advance my skills with it. But it is not the tool itself. It lives upstream of that which is more automation than intelligence.

This is good because as a Controller the thought of my staff, bookkeepers, etc. using AI as a processing tool is still absolutely terrifying. We just do not have guardrails built for that yet.

I Found a Structural Complexity Limit (and it actually told me so)

The most complex task I gave Cowork was to review and propose efficiency changes to a consolidated financial statement roll up spreadsheet template “the beast”. I wanted to test its complexity limit. Any CPA or finance operator that's still following along knows those are the ones that stretch our excel muscles. I loaded a suboptimal consolidation model in just to have it pick it apart. One with lots of manual work, but still well designed with a lot of checks.

Cowork was not able to execute whatsoever on this project. The incredible part though, it told me that. It broke key relationships between financial statements and the output told me it couldn't get key check figures to tie and why. It didn't try to hide it or pass it off as ChatGPT is so prone to do. That’s exactly the failure mode you need from a machine working inside financial structure. "You're on your own here" simultaneously reassured me because I am still not replaceable, and disarmed me with the honesty.

The Bad

Prompting and Context is Everything

The exact thing that makes Cowork so incredibly powerful also makes it slow and painful to use in many situations where you may be inclined to slap some quick AI on a boring workflow. No way around it, it's kind of a pain in the ass. My initial thoughts were way off in many regards after initially seeing it as an always there assistant tool. There is a baseline lift to a Cowork project that many folks will never gather the beans to take on. It’s really made for big work, aided by skilled human direction, not small processing, and certainly not letting the machine drive itself.

If you don’t know what you’re building yet, your preferred Chat tool is still the better place to start. You may end up building "something real" that is completely and totally delusional, but Chat will get you there nonetheless, with rockets and flare to match. Claude will just spin out. It's not a brainstorming tool like bouncy AI is. It’s not going to flatter you.

Review Iterations Are Trickier

Review cycles need to be designed before the prompt is written. That bouncy habit just does not work with machine AI.

That was not intuitive at first. It took me a couple passes to discover my favorite Claude workflow: the old fashioned, comments on paper review. Seriously, if you could share my brain you would understand, but complicated work just hits differently on paper.

I have started printing the first pass from Cowork, then marking up and making comments in the margins, scanning that review copy back in, and bam, Cowork reads your comments, checks back, and then corrects. Don't worry, I recycle the paper.

All that needs to be prompted into the plan. I listed this as a bad, because Cowork isn't what I would call flexible on the fly.

With ChatGPT, you can steer mid-stream, and some of my best work has resulted from just that. With Cowork, you need to decide what the destination is before you start walking. Otherwise you will waste time and do rework. Correcting on the fly is much more challenging.

Tokens and Time/Cost is Not There Yet - For Processing Work

This model absolutely burns through tokens even more than it burns through time. When you see what it is capable of, it makes sense.

Tokens are the accepted unit of measure for AI compute. Like most tech-age conceived units of measure, good luck understanding it, but generally it’s usage adjusted for demand times and stuff. It’s the tech age. We’re just living in it. The point is tokens are the unit of compute consumption, and Cowork burns through them quickly when doing the kind of deep work it excels at.

The thing about Cowork is when you start running more complex projects, which is exactly where it shines, it burns through tokens very quickly. Sometimes in a single prompt. When everything goes right, that’s fine. It’s the cost of doing business.

The CAM Reconciliation Example

CAM reconciliations are a useful stress test because they sit at the intersection of several high-stakes accuracy requirements simultaneously. Multi-year expense pools have to reconcile cleanly. Allocation logic has to be defensible to tenants. Billing calculations carry real financial consequences if wrong. And owner reporting has to hold up under scrutiny from lenders or auditors. A tool that produces a confident-looking output with a traceability gap buried in the context summary isn't just unhelpful — in this context, it's operationally dangerous.

I was coaching Cowork up to analyze raw data, narratives, and identify those holes so I could launch a process improvement project from there. Real work, but playing in a comfortable sandbox with anonymous data, so I could review it in a closed environment.

I took my time to gather all the data and write the detailed prompt. About thirty minutes of setup time. Ran the prompt. Everything looked great. It produced a seventeen page report that on the surface looked strong. I printed the report and started reviewing. Quickly in the context summary I ran into this line:

GL Data: The provided GL export (2024–March 2026) contains detailed expense account activity for 2026 YTD only. Prior year (2024–2025) accounts reflect balance sheet activity. Full prior-year P&L GL detail was not available for this analysis. Noted as a traceability limitation.

That’s a what-just-happened moment.

Remember the master context prompt? One of the key instructions was:

If any aspect of the context is unclear, stop and ask clarification questions before proceeding.

It skipped that instruction and produced the report anyway.

At that point I went to correct it and got the dreaded one-prompt-output morning.

That’s where tokens stop being theoretical and start being operational.

The Cost Question is Unresolved

I racked up a lot of additional usage over a weekend sprint to attempt test the limits of what I might incur monthly. This was a major buildout to launch the Future Ready Accounting Sprint, a project I have been working on to assess accounting technology and systems at real estate and skilled trades SMBs. This project has been in process for a long time, and Cowork remarkably advanced my launch timeline by months.

Product development is always an investment. Operating costs are different. Transaction processing burning up tokens is math I have not done yet, but don't see that mathing in many cases, yet. That model is not going to work for most small business processing applications, but I expect we will see it evolve.

Cowork makes economic sense for architecture work, and it may build the tools that will lead to processing work, but it's not much of a tool for processing work right now. That’s a potential shift toward accountants designing their own systems when we keep hearing AI is going to replace the processing layer within a new wave of AI empowered software. Those are fundamentally conflicting theories, and which one turns out to be more correct I guess remains to be seen.

Reliability Still Matters More Than Capability

I am not a technologist or engineer. I understand AI consumes power and infrastructure and that the economics are still evolving.

But from a user perspective the issue isn’t physics. The issue is I just burned all the compute and the model ignored a key instruction that should have stopped the process. It produced a structured, but inferior work product anyway, despite being clearly prompted to output CPA quality work.

That’s not really okay, and it severely limits deployment potential in client-facing cases.

Engineers evaluate capability. Operators evaluate reliability. Until we get the two closer, AI has a fundamental disconnect for business and specifically accounting use cases.

AI as Writing Tools

Full disclosure, I wrote about 80% of this piece and wanted to see how Claude (chat) could fill in the gaps. I find that Claude writes in a much more human style than ChatGPT, when properly loaded with context, but it’s still thoroughly slop. I left those sections mostly intact here, and I think the savvy will notice.

I don’t like AI as a writing tool. I use bouncy AI sometimes for ideation and light editing, but I am really not a fan of turning language into uniform mush.

Conclusion

After a few weeks of using Cowork inside actual busy season workflows, what stands out isn’t that AI replaces thinking. It’s that it rewards structured thinking.

Claude behaves like infrastructure. It works best when the process already exists and you’re trying to strengthen it. It works best when you know what the deliverable should look like before you begin.

It is much less useful when the work itself is still undefined. That’s not a weakness. It’s a signal demonstrating where this uncertain tech is headed.

The people who benefit most from tools like this aren’t the people looking for answers. They’re the people already building systems. It just amplifies their power. AI in accounting is starting to look less like assistance software and more like a building tool for infrastructure. And like any infrastructure, it amplifies whatever discipline already exists upstream. If the process is clean, the output improves. If the process is unclear, the model just scales the confusion.

Right now, the biggest advantage isn’t knowing how to prompt better. It’s knowing how to design the work before the prompt exists.

Joe Minich, CPA
Founder of Ridgeline Business Solutions. Joe works as an embedded financial operator and readies businesses for the accounting of the future. Ridgeline helps owner-led businesses in property, real estate, and skilled trades navigate growth, transition, and operational challenges.

Next
Next

Why Your Bookkeeping Problem Isn't Actually a Bookkeeping Problem