Prompt Engineering Approach

Inspired by the following article:

Cognition | Don’t Build Multi-Agents

Frameworks for LLM Agents have been surprisingly disappointing. I want to offer some principles for building agents based on our own trial & error, and explain why some tempting ideas are actually quite bad in practice.

cognition.ai

Core Approach: Sequential Processing Pipeline

🎯 Strategic Decision: Sequential vs. Parallel

Recommendation: Sequential with Contextual Handoffs

Why Sequential?

Error Isolation: Single phase failures don't cascade
Context Building: Later phases benefit from earlier structured data
Quality Control: Clear validation points between phases
Debugging: Easy to identify problematic extraction types
User Control: Natural review/correction breakpoints

Pipeline Architecture

Reflection Text
    ↓
Phase 1: People Extraction & Name Resolution
    ↓ (validated people list)
Phase 2: Interaction Context & Sentiment Analysis
    ↓ (interaction dynamics)
Phase 3: Attribute Extraction (5 sub-phases)
    ↓ (structured friend data)
Phase 4: Cross-Person Connection Analysis
    ↓ (relationship insights)
Phase 5: Validation & Conflict Resolution
    ↓ (final review)
Final Output: Structured Data + User Review Interface

🚀 Implementation Strategy

Phase Breakdown

Phase 1: People Extraction & Name Resolution ⭐ Critical Foundation

Extract all people mentioned (names, pronouns, relationships)
Match to existing friends vs. identify new people
Handle edge cases (nicknames, "my sister", pronouns)
Output: Resolved person list with confidence scores

Phase 2: Interaction Context & Sentiment

Overall interaction sentiment and energy impact
Relationship dynamics observed
Context (planned vs. spontaneous, supportive vs. casual)
Output: Interaction metadata for each person

Phase 3: Sequential Attribute Extraction

Break into focused sub-phases:

3a: Life Details & Family Information
3b: Activities & Preferences
3c: Support & Care Dynamics
3d: Growth & Projects
3e: Memories & Gratitude

Phase 4: Connection Analysis

Identify shared interests/activities between people
Spot potential introduction opportunities
Note complementary skills or life stages

Phase 5: Validation & Quality Control

Check for over-interpretation
Identify missing obvious information
Flag low-confidence extractions
Logical consistency review

🤔 Key Questions for You to Consider

Strategic Architecture Decisions

1. Error Handling Philosophy

Question: When one phase fails, do you continue with remaining phases or abort?
Options:

Graceful degradation (continue with partial data)
Hard stop (ensure data consistency)
User choice (let them decide in UI)

Consider: User frustration vs. data quality trade-offs

2. User Involvement Level

Question: How much user validation do you want between phases?
Options:

Full automation (only final review)
Phase-by-phase confirmation
Exception-based (only when confidence is low)

Consider: User fatigue vs. accuracy, different user personas

3. New vs. Edit Logic Complexity

Question: How sophisticated should the new/edit detection be?
Options:

Simple append-only (everything is new)
Smart merge (detect conflicts, suggest updates)
Full diff analysis (track what changed and why)

Consider: Development complexity vs. user value

Technical Implementation Questions

4. Prompt Context Management

Question: How much previous extraction data do you include in later phases?
Trade-offs:

More context = better decisions but longer prompts
Less context = faster processing but potential inconsistencies

Consider: Token limits, cost, processing speed

5. Confidence Threshold Strategy

Question: What confidence levels trigger different behaviors?
Examples:

0.9: Auto-accept
0.7-0.9: Add with user review flag
0.5-0.7: Require user confirmation
<0.5: Reject or manual review

Consider: User patience, data quality requirements

6. Retry and Recovery Logic

Question: How do you handle phase failures or low-quality extractions?
Options:

Automatic retry with modified prompts
Fallback to simpler extraction methods
Queue for manual review

Consider: Cost implications, user experience

User Experience & Product Questions

7. Progress Transparency

Question: How much of the processing pipeline should users see?
Options:

Black box (just loading, then results)
Progress indicators (Phase 2 of 5...)
Detailed breakdown (now extracting activities...)

Consider: User anxiety vs. transparency value

8. Review Interface Complexity

Question: How granular should user review/editing be?
Options:

Bulk approve/reject by phase
Individual item editing
Batch operations with filtering

Consider: User time investment, accuracy needs

9. Learning and Adaptation

Question: How should the system learn from user corrections?
Options:

Store corrections for future prompt engineering
User-specific adaptation
Global system improvement

Consider: Privacy, personalization value, complexity

Data Quality & Validation Questions

10. Conflicting Information Handling

Question: When new extractions conflict with existing data, what's the strategy?
Scenarios:

New reflection says "Sarah loves hiking" but existing data says "Sarah dislikes outdoor activities"
Person's job title changes
Family situation updates

Consider: Information freshness, user trust, correction workflows

11. Extraction Granularity

Question: How detailed should extractions be?
Trade-offs:

High detail = rich data but potential noise
High-level only = cleaner but less useful

Examples:

"Sarah likes board games" vs. "Sarah loves strategy board games, especially Wingspan, prefers 2-4 players"

12. Context Preservation

Question: How much original context should you preserve?
Options:

Store exact quotes for everything
Paraphrase and summarize
Link back to original reflection

Consider: Storage costs, user privacy, debugging needs

🎛️ Configuration Decisions to Make

Processing Parameters



yaml
# Example configuration to define
processing_config:
  phases:
    people_extraction:
      confidence_threshold: 0.8
      max_retry_attempts: 2
      enable_fuzzy_matching: true

    attribute_extraction:
      parallel_subphases: false# Sequential for v1
      context_from_previous: "summary"# none, summary, full
      confidence_threshold: 0.7

    validation:
      enabled: true
      auto_fix_obvious_errors: true
      flag_low_confidence: true

  user_interaction:
    progress_visibility: "phase_level"
    review_required_for: ["conflicts", "low_confidence"]
    batch_operations: true

Quality Thresholds

What confidence level requires user review?
How many extraction failures before manual fallback?
What constitutes "obvious" information that shouldn't be missed?

Cost Management

Token budget per reflection
Retry limits
Fallback to cheaper models for certain phases?

🧪 Testing Strategy Questions

13. Validation Approach

Question: How will you measure extraction quality?
Methods:

Manual review of sample extractions
User correction rate tracking
A/B testing different prompt strategies

Consider: Ground truth establishment, ongoing quality monitoring

14. Edge Case Handling

Question: What edge cases are most important to handle well?
Examples:

Very short reflections
Stream-of-consciousness writing
Mixed languages or cultural references
Highly emotional or sensitive content

Consider: User diversity, failure gracefully vs. accuracy

🔄 Iteration and Improvement Strategy

15. Feedback Loop Design

Question: How will you use user corrections to improve the system?
Options:

Real-time prompt adjustment
Batch retraining periodically
User-specific customization

Consider: Privacy implications, system complexity

16. Performance Monitoring

Question: What metrics will guide prompt engineering improvements?
Candidates:

Processing success rate by phase
User correction frequency
Time to complete processing
User satisfaction scores

Consider: Leading vs. lagging indicators

🎯 Immediate Next Steps to Define

Pick Your Error Handling Philosophy - This affects everything else
Define Confidence Thresholds - Critical for user experience
Choose User Involvement Level - Shapes the entire interaction model
Establish Quality Metrics - How will you know if it's working?
Design the Review Interface - Users need to easily correct/confirm extractions

💡 Recommended Starting Point

For MVP/v1, consider:

Conservative confidence thresholds (better to under-extract than over-extract)
Simple new-vs-edit logic (append-only with conflict flagging)
Phase-level progress indicators (build user trust)
Exception-based user review (only when confidence is low)
Rich context preservation (better debugging and user transparency)

This gives you a solid foundation that you can iterate and optimize based on real user behavior and feedback.