education leadership

How District Technology Directors Should Evaluate AI Vendors

EduGenius Team··16 min read

How District Technology Directors Should Evaluate AI Vendors

District technology directors face a procurement environment unlike anything in the history of educational technology. Between January 2023 and December 2024, the number of AI-powered education products tripled, according to EdSurge's product index. LearnPlatform's 2024 EdTech Top 40 report found the average U.S. school district now uses 1,403 unique digital tools — a 24% increase from the previous year — and AI tools represent the fastest-growing category.

The fundamental challenge isn't finding AI products. It's evaluating them. Traditional edtech procurement focused on features, pricing, and compliance certifications. AI tools require those same checks plus a fundamentally new set of questions: Where does our data go when the AI processes it? Does student input train the model? What happens when the AI is wrong? Who is liable for AI-generated errors in instructional content?

A CoSN 2024 survey found that only 23% of districts had a formal evaluation process specifically designed for AI tools — the remaining 77% were either applying traditional procurement frameworks (which miss critical AI-specific risks) or making ad hoc decisions without systematic evaluation.

This guide provides a structured evaluation framework built for the specific demands of AI in education.


Why Traditional Procurement Frameworks Fall Short

Standard edtech evaluation asks: Does the tool work? Is it accessible? Does it meet security requirements? Is the price reasonable? Those are necessary questions — but they're insufficient for AI.

Traditional EvaluationAI-Specific Evaluation (Also Required)
Does it work as described?How does it work? (What AI model? What training data?)
Is student data secure?Is student data used to train AI models?
Is it accessible (WCAG, Section 508)?Does the AI output reflect bias that could harm students?
Is the cost reasonable?What's the total cost including training, integration, and AI compute scaling?
Does it integrate with our SIS/LMS?Does it share data with third-party AI providers (OpenAI, Google, Anthropic)?
Is the vendor financially stable?Can the vendor survive the AI market consolidation happening right now?
Does the content meet standards?Does AI-generated content contain errors, hallucinations, or inappropriate material?

The right side of this table represents what most districts are not currently evaluating — and where the greatest risks lie.


Phase 1: Initial Screening (Week 1)

Before investing time in deep evaluation, screen vendors against non-negotiable requirements. This eliminates 40-60% of candidates quickly.

Non-Negotiable Requirements Checklist

INITIAL SCREENING — PASS/FAIL

□ PRIVACY AND COMPLIANCE
  □ Vendor willing to sign your state's DPA
    (or SDPC National DPA)
  □ FERPA-compliant (school official exception
    documented)
  □ COPPA-compliant (if student-facing, K-8)
  □ SOC 2 Type II certification (or equivalent
    security audit)
  □ No student data used for AI model training
    (must be explicit, not implied)
  □ Clear data deletion policy upon contract
    termination

□ TECHNICAL
  □ Runs on your existing infrastructure
    (bandwidth, devices, browsers)
  □ SSO/SAML integration available
  □ WCAG 2.1 AA accessibility compliance
  □ SIS/LMS integration via standard protocols
    (LTI 1.3, OneRoster, SIF)

□ VENDOR VIABILITY
  □ Vendor has been operating for 2+ years
    (or has strong backing/revenue)
  □ Vendor has existing K-12 district customers
    (references available)
  □ Pricing is within budget range

Any FAIL = STOP. Do not proceed to Phase 2.

Questions for Vendor Sales Calls

During initial discovery calls, ask these five questions. The quality of the answers tells you as much as the answers themselves:

QuestionWhat a Good Answer Sounds LikeRed Flag
"What AI model powers your product, and where is it hosted?""We use [specific model] hosted on [AWS/Azure/GCP]. Student data is processed in the US. We do not send data to third-party AI APIs without DPA coverage.""We use proprietary AI" (no specifics) or unable to name the underlying model
"Is any student data — including prompts, interactions, and generated content — used to train or fine-tune your AI models?""No. Student data is not used for model training. This is documented in our DPA, Section X.""Data may be used to improve our product" or vague non-answer
"What happens to our data if we cancel the contract?""All district data is deleted within 30 days. We provide a data export in standard format and written certification of deletion.""We retain aggregated/anonymized data" without clear definition of anonymization
"Can you provide references from 3 districts of similar size and demographics?"Readily provides references from districts you can verify"Our customers prefer not to be contacted" or only provides testimonials
"What is your AI's error rate on educational content, and how do you measure accuracy?"Provides specific accuracy metrics from internal testing or third-party evaluation"Our AI is very accurate" with no data, or "Teachers review everything so errors don't matter"

Phase 2: Deep Evaluation (Weeks 2-3)

For vendors that pass initial screening, conduct a structured deep evaluation across six dimensions.

Evaluation Scoring Framework

Score each dimension on a 1-5 scale. Weight the dimensions according to your district's priorities.

DimensionWeight (Suggested)Evaluation Method
Educational effectiveness25%Teacher pilot feedback, content quality review, alignment to standards
Privacy and data governance25%DPA review, privacy policy analysis, security documentation
Technical integration15%IT staff testing, SSO integration test, bandwidth assessment
Usability and adoption15%Teacher/admin usability testing, training requirements assessment
Total cost of ownership10%3-year cost projection including hidden costs
Vendor stability and support10%Financial review, support responsiveness testing, roadmap review

1. Educational Effectiveness (25%)

This is the dimension that most evaluation frameworks under-examine. An AI tool can be technically excellent, fully compliant, and beautifully designed — and still produce mediocre educational content.

Content quality assessment protocol:

Select 10 representative generation tasks:
- 3 tasks in your most-taught subjects
- 2 tasks requiring differentiation (IEP
  accommodations, ELL modifications)
- 2 tasks at different grade bands (K-2, 3-5,
  6-8, 9-12)
- 1 task with specific state standards alignment
- 1 task that requires cultural sensitivity
- 1 intentionally vague task (tests how tool
  handles ambiguity)

For each generated output, evaluate:
□ Factual accuracy (verified against standards
  and subject knowledge)
□ Grade-level appropriateness (reading level,
  cognitive demand)
□ Alignment to specified standards
□ Differentiation quality (not just simplified
  language but genuinely different approaches)
□ Bias check (representation, cultural
  sensitivity, assumptions)
□ Usability (can a teacher use this output
  directly, or does it require significant
  editing?)

Scoring: A tool that produces output teachers can use with minimal editing (10-15 minutes) on 7+ of 10 tasks scores 4-5. A tool requiring significant rework on most outputs scores 1-2. Platforms like EduGenius demonstrate the standard by providing Bloom's Taxonomy-aligned content with automatic differentiation and answer keys — benchmark vendor claims against these capabilities.

2. Privacy and Data Governance (25%)

Beyond the pass/fail screening, deep evaluation examines the specifics of data handling.

Assessment AreaWhat to ExamineHow to Verify
Data flow mappingExactly what data enters the AI, where it's processed, what's stored, what's returnedRequest data flow diagram from vendor; verify against DPA
Sub-processor disclosureDoes the vendor use third-party AI providers (OpenAI, Anthropic, Google)? If so, under what terms?Request complete sub-processor list; verify DPAs extend to sub-processors
Model training isolationIs district data used in any form of model training, fine-tuning, or reinforcement learning?Verify in DPA; request written attestation
Data residencyWhere is data stored and processed? (Matters for GDPR-affected districts and state laws)Request data center locations; verify against compliance requirements
Incident historyHas the vendor had previous data breaches or privacy incidents?Check vendor's breach disclosure history; search for news reports

See Legal Considerations for AI in Education — FERPA, COPPA, and GDPR for the full legal framework governing these requirements.

3. Technical Integration (15%)

TestMethodPass Criteria
SSO integrationConfigure SAML/OAuth with your IdP (Azure AD, Google Workspace, Clever)Auto-provisioning works; teacher/student roles sync correctly
SIS/LMS integrationTest OneRoster or LTI 1.3 connection with your systemsRoster sync accurate; grade passback functional (if applicable)
Bandwidth impactMonitor network during concurrent use (simulate 50+ users)No significant impact on other services; acceptable latency (<3 seconds per AI response)
Device compatibilityTest on representative devices (Chromebooks, iPads, Windows laptops)Full functionality on all district-supported devices
Offline/low-bandwidthTest on degraded network (simulate rural/low-bandwidth conditions)Graceful degradation; no data loss; clear error messages

See Comparing AI Deployment Models — Cloud, On-Premise, and Hybrid for how deployment architecture affects these technical requirements.

4. Usability and Adoption (15%)

Test with 5-8 representative teachers covering different comfort levels with technology:

USABILITY TEST PROTOCOL

Task 1: First use (no training)
- Give teacher the tool with no instruction
- Ask them to complete a typical task
- Measure: time to first successful output,
  number of errors, frustration indicators

Task 2: After 15-minute orientation
- Provide brief orientation video or guide
- Ask teacher to complete 3 varied tasks
- Measure: success rate, output quality,
  confidence level

Task 3: After 1 week of daily use
- Teacher uses tool independently for 1 week
- Interview: What works? What doesn't? Would
  you keep using this?
- Measure: continued use rate, reported time
  savings, output quality over time

SCORING:
5 — Teachers productive within first use
4 — Teachers productive after brief orientation
3 — Teachers productive after structured training
2 — Teachers require ongoing support
1 — Teachers resist using despite training

5. Total Cost of Ownership (10%)

Vendor pricing tells you the license cost. Total cost of ownership tells you what you'll actually spend.

Cost CategoryYear 1Year 2Year 3
License/subscription feesQuoteQuote + escalationQuote + escalation
Implementation/setupVendor fees + IT staff time
TrainingInitial PD days + substitute costsRefresher + new hire trainingRefresher + new hire training
IntegrationSSO/LMS setup (IT staff time or consultant)MaintenanceMaintenance
Support overheadHelp desk tickets, IT troubleshootingOngoingOngoing
Usage scalingBase usageGrowth (10-20% more users?)Full adoption
TOTALSumSumSum

Hidden costs to watch for:

  • Per-seat pricing that escalates with adoption ("starts at $3/teacher" but grows)
  • AI compute charges based on usage volume
  • Premium features locked behind higher tiers
  • Integration costs not included in base price
  • Training costs for new staff each year

6. Vendor Stability and Support (10%)

FactorHow to AssessWhy It Matters
Funding/revenueAsk about funding stage, revenue model, path to profitability30-40% of AI startups will not exist in 3 years (CB Insights, 2024)
Customer retentionAsk for retention rate and average contract durationHigh churn = red flag for product quality or support
Support responsivenessSubmit a test support ticket during evaluation; measure response timeYou will need support; test it before you commit
Product roadmapRequest roadmap for next 12 months; assess realismOverpromising is common; grounded roadmaps indicate maturity
Exit strategyVerify data portability, export formats, deletion timelineIf they fail, can you get your data out?

Phase 3: Pilot Program (Weeks 4-8)

Don't go from evaluation to full deployment. A structured pilot reduces risk and generates the evidence you need for a deployment decision.

Pilot Design Template

PILOT PROGRAM STRUCTURE

Duration: 4-6 weeks (minimum 4 instructional
weeks)

Participants:
- 8-12 teachers (mix of tech-comfortable and
  tech-hesitant)
- 2-3 grade bands or subject areas
- At least 1 school site (2 if possible for
  comparison)

Clear Success Criteria (define BEFORE pilot):
□ X% of pilot teachers report net time savings
□ AI-generated content meets quality threshold
  on Y% of outputs
□ No privacy incidents or policy violations
□ Teacher satisfaction score ≥ X on 1-5 scale
□ Technical issues resolved within X hours

Data Collection:
- Weekly 5-minute teacher survey (quantitative)
- Bi-weekly focus group or interview (qualitative)
- Usage analytics from vendor dashboard
- IT support ticket log
- Content quality samples (3-5 per teacher)

Decision Framework:
- Meets ALL success criteria → Proceed to
  deployment planning
- Meets MOST criteria with manageable gaps →
  Negotiate improvements; conditional deployment
- Fails multiple criteria → Do not deploy; share
  specific feedback with vendor

Pilot timing matters: Don't pilot during the first or last two weeks of school, during testing windows, or during a week with multiple half-days. You need normal instructional conditions to get meaningful results.


Contract Negotiation

For vendors that pass evaluation and pilot, negotiate contracts deliberately. This is where technology directors have more leverage than they often realize.

Negotiation Points

AreaWhat to NegotiateTypical Outcome
Multi-year pricingLock rates for 2-3 years; cap annual increases at 3-5%Most vendors will agree to caps in exchange for commitment
Usage tiersNegotiate based on realistic adoption (not 100% Day 1)Start with conservative tier; negotiate upgrade path
Exit clause90-day termination notice; no penalty after Year 1Protects against vendor underperformance
Data portabilityExplicit right to export all district data in standard formatEssential; some vendors make export difficult
SLA guarantees99.5%+ uptime; 4-hour response for critical issuesGet specific commitments, not vague promises
Training inclusionInitial training + annual refresher included in contractMany vendors will include training to close the deal
Most-favored pricingIf vendor offers a lower price to a comparable district, you get the same rateUncommon but worth requesting for large districts

Key Takeaways

  • Only 23% of districts have AI-specific evaluation processes (CoSN, 2024). Traditional procurement frameworks miss critical AI-specific risks including model training data use, AI-generated content accuracy, and sub-processor data sharing. See AI for School Leaders — A Strategic Guide to Transforming Education Administration for strategic context.
  • The most important privacy question is not "Is student data secure?" but "Is student data used to train AI models?" Many AI vendors use customer data for model improvement unless explicitly prohibited in the DPA. See Legal Considerations for AI in Education — FERPA, COPPA, and GDPR for the full legal framework.
  • Evaluate educational effectiveness with the same rigor as technical compliance. A tool that passes every security check but produces mediocre content wastes money. Test with 10 representative tasks spanning subjects, grade levels, and differentiation needs. See Building a Culture of Innovation — Leading AI Adoption in Schools for adoption strategy.
  • Always pilot before full deployment. 4-6 weeks with 8-12 teachers using the tool under normal conditions generates better evidence than any demo or sales call. Define success criteria before the pilot starts — not after.
  • Total cost of ownership is 1.5-2.5× the license cost. Training, integration, support overhead, and usage scaling are real costs that vendor quotes don't include. Budget for the full picture. See AI for Student Enrollment Forecasting and Resource Planning for resource planning.
  • Negotiate from strength. Vendors want multi-year contracts; you want flexibility. Trade commitment for rate locks, training inclusion, and exit clauses. Your existing DPA compliance and pilot data are negotiation assets. See Best AI Content Generation Tools for Educators — Head-to-Head Comparison for tool comparison.

Frequently Asked Questions

Should we evaluate free AI tools with the same rigor as paid ones?

Yes — potentially with even more rigor. Free tools have a business model somewhere, and if you're not paying with money, you may be paying with data. Free tiers of commercial AI tools often have weaker privacy protections than paid tiers (less restrictive data use policies, model training on free-tier inputs, limited DPA availability). Apply the same privacy screening and DPA requirements regardless of cost. Some genuinely free educational tools (Khan Academy, certain Google Workspace features) have strong privacy practices — but verify, don't assume.

How do we evaluate vendors that use OpenAI or Anthropic as their underlying AI?

Many educational AI vendors are wrappers around major AI models (GPT-4, Claude, Gemini). This isn't inherently problematic, but it means your data flows through both the vendor AND the underlying AI provider. Ask: (1) Does the vendor have a DPA with their AI provider that extends your district's privacy protections? (2) Is your data excluded from model training at both the vendor level AND the underlying AI provider level? (3) Where is the data processed — the vendor's infrastructure or the AI provider's? The vendor's DPA with you is only as strong as their agreement with their sub-processor.

What's the minimum pilot duration?

Four instructional weeks is the minimum for meaningful results. Shorter pilots don't capture the learning curve — teachers need at least 2 weeks to get past initial friction, and you need at least 2 more weeks of data on actual productive use. If the vendor pushes for a 1-2 week pilot, they may be trying to capture the honeymoon period before teachers encounter the tool's limitations. Insist on 4 weeks minimum, and include at least one full instructional cycle (unit or grading period) if possible.

How do we handle teachers who want to buy AI tools with personal funds or classroom budgets?

This is a governance issue, not a technology issue. Any AI tool that accesses student data must go through the district's privacy review regardless of who pays for it. Establish a clear policy: all AI tools used in instructional settings require IT approval, even free tools, even personally purchased tools. This isn't bureaucracy — it's liability management. A teacher using an unapproved AI tool that experiences a data breach creates liability for the district, not just the teacher. Provide a streamlined approval path (48-72 hours for tools on an approved list) so teachers don't feel the process is a barrier.

#AI-vendor-evaluation#edtech-procurement#technology-director#AI-tools-schools#vendor-selection