Paper Trail
I built this RAG agent as a fact checker for myself. ArXiv is a huge repository for preprints and a great source to read about computer science, math, and physics. I’ve included a hardcoded demo of a real generation (since APIs aren’t free).
Highlights
- Breaks claims into sub-claims, each with its own verdict and confidence score.
- When a sub-claim (e.g. recent, commercial) can’t be verified directly, an analgous search is done to find similar ideas.
- Opinions are also categorized independently.
Tech Stack
- Python + FastAPI - Serves pipeline
- LangGraph - Orchestration (categorizes the sub-claims non-linearly)
- Gemini 2.5 Flash - Synthesizes the output
- Chroma - Vector database
- all-MiniLM-L6-v2 - Sentence transformer (embeddings)
Notes
- First project dealing with NLP.
- During testing, I tried messing around with the same claim, just worded differently, and I was surprised to see different papers getting pulled everytime.
- Results could be more consistent between runs.
- Checking claims about AI I see on social media always gives out interesting outputs.
- I wonder how I can speed up the runtime, it takes 60-90s on avg…
paper trail
ai-powered academic fact-checking against arxiv
sub-claim analysis
Academic literature indicates that transformer-based models, including leading MLLMs and LLMs, exhibit brittleness on tasks requiring stable symbolic manipulation and demonstrate struggles with visual-mathematical reasoning, recursive program synthesis, and certain mathematical reasoning benchmarks. These models show characteristic failure modes, such as variable confusion and inconsistency, on tasks that involve logical relations and mathematical abstraction.
Sources (3)
Chain-of-thought prompting is described as significantly enhancing the reasoning capability and improving multi-step problem-solving in large language models. These models are commonly understood to be Transformer-based, and the demonstrated improvements in reasoning potential confirm its purpose in addressing multi-step problems.