Estimation Methodology
Accurate estimation enables predictable delivery and sustainable development pace. YeboLearn uses story points with velocity tracking to forecast releases and manage commitments.
Story Point Fundamentals
What Story Points Measure
Story points represent:
- Complexity of the work
- Amount of work required
- Uncertainty and risk
- Learning curve
Story points do NOT represent:
- Hours or days (not time)
- Individual developer speed
- Perfect accuracy (estimates are ranges)
Why Story Points vs Hours?
- Abstract relative sizing is easier
- Accounts for complexity, not just time
- Less pressure than hour estimates
- Team-based, not individual
- Velocity stabilizes over time
Fibonacci Scale
1, 2, 3, 5, 8, 13, 21
Why Fibonacci?
- Forces meaningful distinctions
- Reflects increasing uncertainty
- Prevents false precision (no "7.5 points")Estimation Scale Reference
1 Point: Trivial
Time: 1-2 hours
Complexity: Very simple
Uncertainty: None
Examples:
- Fix typo in error message
- Update text on button
- Add simple validation
- Change constant value
Technical:
- No tests needed beyond smoke test
- No database changes
- No API changes
- Straightforward implementation2 Points: Simple
Time: Half day
Complexity: Straightforward
Uncertainty: Minimal
Examples:
- Add new API endpoint (CRUD)
- Create basic form component
- Add email validation
- Update existing UI element
Technical:
- Basic unit tests needed
- Standard implementation pattern
- May need database migration
- Clear requirements3 Points: Moderate
Time: 1 day
Complexity: Some complexity
Uncertainty: Low to medium
Examples:
- Build quiz submission flow
- Integrate third-party API (well-documented)
- Add search functionality
- Create reusable component with variants
Technical:
- Multiple unit tests needed
- Integration test recommended
- Some edge cases to handle
- May touch multiple files/modules5 Points: Complex
Time: 2-3 days
Complexity: Significant
Uncertainty: Medium
Examples:
- Implement AI quiz generator
- Build student dashboard with analytics
- Create payment processing flow
- Refactor authentication system
Technical:
- Comprehensive tests required
- Multiple components/modules
- External dependencies
- Performance considerations
- Error handling important8 Points: Very Complex
Time: 1 week
Complexity: High
Uncertainty: High
Examples:
- AI essay grading engine
- Offline mode implementation
- Real-time collaboration features
- Major database schema refactor
Technical:
- Extensive testing needed
- Complex state management
- Multiple integration points
- Significant edge cases
- Performance critical
- Should consider breaking down13 Points: Epic (Split Required)
Time: 2 weeks or more
Complexity: Very high
Uncertainty: Very high
Action: Break into smaller stories
Example: "Build complete M-Pesa integration"
Should split into:
- API endpoint setup (3 pts)
- Payment initiation (5 pts)
- Webhook handling (5 pts)
- Error recovery (3 pts)
- Testing & documentation (3 pts)
Total: 19 points across 5 storiesEstimation Process
Planning Poker
How It Works:
Story Presentation (2 min)
- Product owner reads story
- Explains user value and context
- Shows designs if applicable
Clarification (3 min)
- Team asks questions
- Technical approach discussed
- Dependencies identified
- Edge cases surfaced
Private Estimation
- Each team member selects card privately
- No discussion during selection
- Prevents anchoring bias
Simultaneous Reveal
- Everyone shows card at once
- Range of estimates visible
Discussion (5 min)
- Highest and lowest explain reasoning
- Different perspectives shared
- Hidden complexity surfaced
- Risks and assumptions discussed
Re-estimate
- Second round of estimation
- Usually converges
- If still divergent, more discussion
Consensus
- Team agrees on final estimate
- Record in story
- Move to next story
Example Session:
Story: AI generates personalized quiz from student's notes
Product Owner: Students can upload notes, AI creates a 10-question quiz
tailored to their learning gaps based on past performance.
Developer 1: Do we need to store the notes or is this ephemeral?
PO: Store for future quiz variations.
Developer 2: What AI model? Gemini?
PO: Yes, Gemini API.
Developer 3: How do we identify learning gaps?
PO: From quiz history and progress data we already track.
[Team estimates privately]
Reveal: 3, 5, 5, 8
Developer 1 (3 points): I thought this was just API integration,
we have similar quiz generation already.
Developer 4 (8 points): But we need to build the note parsing,
learning gap analysis, and personalization logic. Plus file uploads.
Discussion: Ah, the learning gap analysis is new complexity.
File upload is straightforward. Note parsing might be tricky.
[Team re-estimates]
Reveal: 5, 5, 5, 5
Consensus: 5 story pointsEstimation Calibration
Reference Stories (Historical Examples):
These actual YeboLearn stories serve as benchmarks:
1 Point Reference:
Story: Fix login redirect after signup
What: Redirect users to dashboard instead of landing page after signup
Why: One-line route change, tested manually
Actual Effort: 1.5 hours2 Points Reference:
Story: Add email validation to registration
What: Validate email format and uniqueness before allowing signup
Why: Standard validation, database check, error handling
Actual Effort: 4 hours3 Points Reference:
Story: Create quiz results summary component
What: Display score, correct/incorrect answers, time taken
Why: React component with multiple sub-components, state management
Actual Effort: 6 hours5 Points Reference:
Story: Implement M-Pesa payment webhook
What: Receive payment callbacks, update database, send confirmation
Why: External API integration, error handling, idempotency, testing
Actual Effort: 2 days8 Points Reference:
Story: Build AI quiz generation API
What: Accept topic, generate questions using Gemini, store and return
Why: Complex AI integration, prompt engineering, rate limiting, caching
Actual Effort: 5 daysCalibration Exercise:
Before estimation session, team reviews:
"This story feels similar to [reference story] which was [X points]"
Example:
"This payment refund feature is similar to the M-Pesa webhook we did,
so probably 5 points like that one."Velocity Tracking
Calculating Velocity
Sprint Velocity = Sum of completed story points
Sprint 25:
Completed Stories:
✓ Essay grading UI (9 pts)
✓ AI grading backend (13 pts)
✓ Teacher dashboard (5 pts)
✓ Performance optimization (5 pts)
✓ Bug fixes (3 pts)
Total: 35 story points
Incomplete:
✗ Analytics improvements (3 pts) - carried to next sprint
Sprint 25 Velocity: 35 pointsImportant: Only count completed stories (done = in production)
Rolling Average Velocity
Last 6 Sprints:
Sprint 20: 34 points
Sprint 21: 38 points
Sprint 22: 32 points (holiday week)
Sprint 23: 40 points
Sprint 24: 36 points
Sprint 25: 35 points
Simple Average: 35.8 points
Median: 35.5 points
Use for Planning: 32-36 points (conservative)Why Rolling Average?
- Smooths out variance
- Accounts for team changes
- Adapts to skill improvement
- More reliable than single sprint
Velocity Trends
Healthy Trends:
Increasing Velocity (Gradual):
30 → 32 → 35 → 36 → 38 → 40
Reasons:
✓ Team gaining experience
✓ Better tooling/automation
✓ Reduced context switching
✓ Improved estimation accuracy
✓ Less technical debt
Action: Great! Maintain quality standards.Stable Velocity:
35 → 36 → 34 → 35 → 36 → 35
Reasons:
✓ Mature team
✓ Consistent estimation
✓ Predictable capacity
Action: Excellent! Reliable delivery.Concerning Trends:
Decreasing Velocity:
40 → 38 → 35 → 32 → 28 → 25
Reasons:
⚠️ Accumulating technical debt
⚠️ Increased production support
⚠️ Team turnover
⚠️ Over-commitment leading to burnout
⚠️ More complex features
Action: Investigate root cause, address issues.Erratic Velocity:
25 → 45 → 30 → 50 → 20 → 40
Reasons:
⚠️ Inconsistent estimation
⚠️ Poor sprint planning
⚠️ Frequent scope changes
⚠️ External dependencies
Action: Improve estimation, stabilize process.Velocity Analysis
Sprint Retrospective Velocity Review:
Sprint 25 Velocity Analysis
Committed: 32 points
Delivered: 35 points
Achievement: 109%
Breakdown:
Planned Work: 32 points (100% completed)
Stretch Goals: 3 points (completed)
Unplanned Work: 0 points
Why We Over-Delivered:
✓ Stretch goals were well-estimated
✓ No production incidents
✓ Fast PR review cycle
✓ Good pair programming on complex work
Lessons:
• We can handle stretch goals reliably
• Consider committing to 34-36 points next sprint
• Keep pair programming on 8+ point storiesCapacity Planning
Team Capacity Calculation
Available Hours Per Sprint:
Developer: 80 hours per 2-week sprint
├─ Development: 56 hours (70%)
├─ Meetings: 8 hours (10%)
├─ Code Reviews: 8 hours (10%)
├─ Learning/Exploration: 4 hours (5%)
└─ Unexpected Issues: 4 hours (5%)
Effective Development: ~56 hours/developer/sprintTeam of 4 Developers:
Total Capacity: 4 × 56 = 224 hours
Story Points/Hour (Historical):
224 hours ÷ 35 points = 6.4 hours/point
This varies by complexity:
- 1 point ≈ 2 hours
- 2 points ≈ 4 hours
- 3 points ≈ 7 hours
- 5 points ≈ 16 hours
- 8 points ≈ 32 hoursPlanning Capacity:
Team Velocity: 35 points/sprint
Safety Buffer: 10%
Sprint Commitment: 32 points
Stretch Goals: 6 points
If everything goes perfect: 38 points
Typical delivery: 32-35 points
Conservative: 28-32 pointsAdjusting for Factors
Holidays and PTO:
Normal Sprint: 35 points capacity
Sprint with 1 Developer on PTO (25% team):
Reduced Capacity: 35 × 0.75 = 26 points
Sprint with Christmas Holiday:
Reduced Capacity: 35 × 0.6 = 21 points
Action: Plan accordingly, under-commitNew Team Members:
First Sprint: 50% productive (learning, onboarding)
Second Sprint: 70% productive
Third Sprint: 90% productive
Fourth Sprint+: 100% productive
Team of 3 + 1 New Developer:
Sprint 1: (3 × 12) + (1 × 6) = 42 points
Sprint 2: (3 × 12) + (1 × 8) = 44 points
Sprint 3: (3 × 12) + (1 × 11) = 47 pointsMajor Production Incidents:
Historical Impact:
Average: 0.5 incidents per sprint
Average Resolution: 8 hours (1 point equivalent)
Capacity Planning: Built into velocity average
If incident-free sprint: Deliver stretch goalsRelease Forecasting
Forecasting Methodology
Simple Forecast:
Feature Size: 42 story points
Team Velocity: 35 points/sprint
Sprints Needed: 42 ÷ 35 = 1.2 sprints
Forecast: 2 sprints (round up for safety)
Timeline: 4 weeksConfidence Intervals:
Based on velocity variance:
Optimistic (90th percentile): 40 points/sprint
→ 42 ÷ 40 = 1.05 sprints (2 weeks)
Most Likely (median): 35 points/sprint
→ 42 ÷ 35 = 1.2 sprints (3 weeks)
Pessimistic (10th percentile): 28 points/sprint
→ 42 ÷ 28 = 1.5 sprints (4 weeks)
Communicate: "Between 2-4 weeks, most likely 3 weeks"Multi-Feature Roadmap
Q1 2026 Roadmap Forecast:
Features Planned:
1. AI Essay Grading - 42 points
2. M-Pesa Integration - 26 points
3. Offline Mode - 34 points
4. WhatsApp Notifications - 18 points
Total: 120 points
Team Velocity: 35 points/sprint
Sprints Available: Q1 = 6 sprints
Total Capacity: 6 × 35 = 210 points
Buffer (20%): 210 × 0.8 = 168 points available
Feasibility: 120 points < 168 points ✓
Forecast:
Sprint 26-27: AI Essay Grading (42 pts)
Sprint 28: M-Pesa Integration (26 pts)
Sprint 29-30: Offline Mode (34 pts)
Sprint 31: WhatsApp Notifications (18 pts)
Remaining Capacity: 48 points
Use for: Bug fixes, performance, technical debtDependency Considerations:
If features depend on each other:
M-Pesa Integration (26 pts) must complete before
WhatsApp Payment Notifications (8 pts)
Total Sequential: 34 points = 1 sprint
Timeline Impact:
- Parallel: Faster delivery
- Sequential: Longer timeline
- Plan sprints accordinglyEstimation Best Practices
Do's
Estimate as a Team:
- Multiple perspectives surface hidden complexity
- Shared understanding of work
- Better accuracy than individual estimates
Use Historical Data:
- Reference similar past stories
- Calibrate against known benchmarks
- Learn from estimation accuracy
Re-estimate if Needed:
- If story changes significantly, re-estimate
- If actual effort differs greatly, analyze why
- Update reference stories
Account for Uncertainty:
- Higher points for higher uncertainty
- Create spike stories for unknowns
- Build buffer into risky work
Focus on Relative Sizing:
- "Is this bigger or smaller than X?"
- Don't agonize over perfect precision
- Estimates are ranges, not commitments
Don'ts
Don't Convert to Hours:
- Story points are abstract, keep them that way
- Conversion creates false precision
- Focus on relative complexity
Don't Estimate Individual Tasks:
- Estimate user stories, not technical tasks
- Task breakdown happens during sprint
- Too granular = wasted effort
Don't Pad Estimates:
- Trust the velocity to account for unknowns
- Padding leads to inflated points
- Use story points honestly
Don't Compare Developers:
- Velocity is team metric, not individual
- Different developers ≠ different points
- Focus on team improvement
Don't Ignore Outliers:
- If 8-point story took 1 week, understand why
- Update estimation approach if needed
- Learn from variance
Common Estimation Scenarios
Scenario 1: Unknown Technology
Story: Integrate WhatsApp Business API
Team: "We've never used WhatsApp API before."
Approach:
1. Create spike story (time-boxed: 1 day)
→ Research API, build proof-of-concept
→ Estimate remaining work after spike
2. Or estimate with high uncertainty
→ 8 points (vs 5 if familiar)
→ Account for learning curve
→ Pair programming to share knowledgeScenario 2: Vague Requirements
Story: "Improve student dashboard"
Team: "This is too vague to estimate."
Approach:
1. Refuse to estimate until clarified
2. Work with product owner to define specifics
3. Break down into concrete stories:
- Add quiz completion chart (3 pts)
- Show recent activity feed (3 pts)
- Display upcoming assignments (2 pts)
Total: 8 points (now estimatable)Scenario 3: Dependency Uncertainty
Story: Launch M-Pesa integration
Team: "Depends on M-Pesa approval, outside our control."
Approach:
1. Estimate work we control (integration code)
2. Flag dependency in story
3. Don't commit to sprint until dependency resolved
4. Have backup work ready if blockedEstimation Metrics
Estimation Accuracy
Track Accuracy Over Time:
Sprint 25:
Story: AI Grading Backend
Estimate: 13 points
Actual: Completed in sprint (13 points accurate)
Story: Teacher Dashboard
Estimate: 5 points
Actual: 8 points (60% accurate, underestimated)
Average Accuracy: 85%Improving Accuracy:
- Analyze under/over-estimated stories
- Update reference examples
- Improve breakdown of complex work
- Better upfront clarification
Velocity Predictability
Standard Deviation of Velocity:
Sprint 20-25: 34, 38, 32, 40, 36, 35
Average: 35.8
Std Dev: 2.9 (low variance = predictable)
Target: Std Dev < 5 points (good predictability)Related Documentation
- Planning Overview - Planning process
- Sprint Structure - Sprint ceremonies
- Backlog Management - Story writing
- Development Workflow - Execution