That alone dropped our hallucination rate from ~40% to under 5% on a 300-query eval set. Also completely agree on testing bottom-up — we caught so many chunking bugs early by unit-testing the splitter ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results