How District Technology Directors Should Evaluate AI Vendors

District technology directors face a procurement environment unlike anything in the history of educational technology. Between January 2023 and December 2024, the number of AI-powered education products tripled, according to EdSurge's product index. LearnPlatform's 2024 EdTech Top 40 report found the average U.S. school district now uses 1,403 unique digital tools — a 24% increase from the previous year — and AI tools represent the fastest-growing category.

The fundamental challenge isn't finding AI products. It's evaluating them. Traditional edtech procurement focused on features, pricing, and compliance certifications. AI tools require those same checks plus a fundamentally new set of questions: Where does our data go when the AI processes it? Does student input train the model? What happens when the AI is wrong? Who is liable for AI-generated errors in instructional content?

A CoSN 2024 survey found that only 23% of districts had a formal evaluation process specifically designed for AI tools — the remaining 77% were either applying traditional procurement frameworks (which miss critical AI-specific risks) or making ad hoc decisions without systematic evaluation.

This guide provides a structured evaluation framework built for the specific demands of AI in education.

Why Traditional Procurement Frameworks Fall Short

Standard edtech evaluation asks: Does the tool work? Is it accessible? Does it meet security requirements? Is the price reasonable? Those are necessary questions — but they're insufficient for AI.

Traditional Evaluation	AI-Specific Evaluation (Also Required)
Does it work as described?	How does it work? (What AI model? What training data?)
Is student data secure?	Is student data used to train AI models?
Is it accessible (WCAG, Section 508)?	Does the AI output reflect bias that could harm students?
Is the cost reasonable?	What's the total cost including training, integration, and AI compute scaling?
Does it integrate with our SIS/LMS?	Does it share data with third-party AI providers (OpenAI, Google, Anthropic)?
Is the vendor financially stable?	Can the vendor survive the AI market consolidation happening right now?
Does the content meet standards?	Does AI-generated content contain errors, hallucinations, or inappropriate material?

The right side of this table represents what most districts are not currently evaluating — and where the greatest risks lie.

Phase 1: Initial Screening (Week 1)

Before investing time in deep evaluation, screen vendors against non-negotiable requirements. This eliminates 40-60% of candidates quickly.

Non-Negotiable Requirements Checklist

INITIAL SCREENING — PASS/FAIL

□ PRIVACY AND COMPLIANCE
  □ Vendor willing to sign your state's DPA
    (or SDPC National DPA)
  □ FERPA-compliant (school official exception
    documented)
  □ COPPA-compliant (if student-facing, K-8)
  □ SOC 2 Type II certification (or equivalent
    security audit)
  □ No student data used for AI model training
    (must be explicit, not implied)
  □ Clear data deletion policy upon contract
    termination

□ TECHNICAL
  □ Runs on your existing infrastructure
    (bandwidth, devices, browsers)
  □ SSO/SAML integration available
  □ WCAG 2.1 AA accessibility compliance
  □ SIS/LMS integration via standard protocols
    (LTI 1.3, OneRoster, SIF)

□ VENDOR VIABILITY
  □ Vendor has been operating for 2+ years
    (or has strong backing/revenue)
  □ Vendor has existing K-12 district customers
    (references available)
  □ Pricing is within budget range

Any FAIL = STOP. Do not proceed to Phase 2.

Questions for Vendor Sales Calls

During initial discovery calls, ask these five questions. The quality of the answers tells you as much as the answers themselves:

Question	What a Good Answer Sounds Like	Red Flag
"What AI model powers your product, and where is it hosted?"	"We use [specific model] hosted on [AWS/Azure/GCP]. Student data is processed in the US. We do not send data to third-party AI APIs without DPA coverage."	"We use proprietary AI" (no specifics) or unable to name the underlying model
"Is any student data — including prompts, interactions, and generated content — used to train or fine-tune your AI models?"	"No. Student data is not used for model training. This is documented in our DPA, Section X."	"Data may be used to improve our product" or vague non-answer
"What happens to our data if we cancel the contract?"	"All district data is deleted within 30 days. We provide a data export in standard format and written certification of deletion."	"We retain aggregated/anonymized data" without clear definition of anonymization
"Can you provide references from 3 districts of similar size and demographics?"	Readily provides references from districts you can verify	"Our customers prefer not to be contacted" or only provides testimonials
"What is your AI's error rate on educational content, and how do you measure accuracy?"	Provides specific accuracy metrics from internal testing or third-party evaluation	"Our AI is very accurate" with no data, or "Teachers review everything so errors don't matter"

Phase 2: Deep Evaluation (Weeks 2-3)

For vendors that pass initial screening, conduct a structured deep evaluation across six dimensions.

Evaluation Scoring Framework

Score each dimension on a 1-5 scale. Weight the dimensions according to your district's priorities.

Dimension	Weight (Suggested)	Evaluation Method
Educational effectiveness	25%	Teacher pilot feedback, content quality review, alignment to standards
Privacy and data governance	25%	DPA review, privacy policy analysis, security documentation
Technical integration	15%	IT staff testing, SSO integration test, bandwidth assessment
Usability and adoption	15%	Teacher/admin usability testing, training requirements assessment
Total cost of ownership	10%	3-year cost projection including hidden costs
Vendor stability and support	10%	Financial review, support responsiveness testing, roadmap review

1. Educational Effectiveness (25%)

This is the dimension that most evaluation frameworks under-examine. An AI tool can be technically excellent, fully compliant, and beautifully designed — and still produce mediocre educational content.

Content quality assessment protocol:

Select 10 representative generation tasks:
- 3 tasks in your most-taught subjects
- 2 tasks requiring differentiation (IEP
  accommodations, ELL modifications)
- 2 tasks at different grade bands (K-2, 3-5,
  6-8, 9-12)
- 1 task with specific state standards alignment
- 1 task that requires cultural sensitivity
- 1 intentionally vague task (tests how tool
  handles ambiguity)

For each generated output, evaluate:
□ Factual accuracy (verified against standards
  and subject knowledge)
□ Grade-level appropriateness (reading level,
  cognitive demand)
□ Alignment to specified standards
□ Differentiation quality (not just simplified
  language but genuinely different approaches)
□ Bias check (representation, cultural
  sensitivity, assumptions)
□ Usability (can a teacher use this output
  directly, or does it require significant
  editing?)

Scoring: A tool that produces output teachers can use with minimal editing (10-15 minutes) on 7+ of 10 tasks scores 4-5. A tool requiring significant rework on most outputs scores 1-2. Platforms like EduGenius demonstrate the standard by providing Bloom's Taxonomy-aligned content with automatic differentiation and answer keys — benchmark vendor claims against these capabilities.

2. Privacy and Data Governance (25%)

Beyond the pass/fail screening, deep evaluation examines the specifics of data handling.

Assessment Area	What to Examine	How to Verify
Data flow mapping	Exactly what data enters the AI, where it's processed, what's stored, what's returned	Request data flow diagram from vendor; verify against DPA
Sub-processor disclosure	Does the vendor use third-party AI providers (OpenAI, Anthropic, Google)? If so, under what terms?	Request complete sub-processor list; verify DPAs extend to sub-processors
Model training isolation	Is district data used in any form of model training, fine-tuning, or reinforcement learning?	Verify in DPA; request written attestation
Data residency	Where is data stored and processed? (Matters for GDPR-affected districts and state laws)	Request data center locations; verify against compliance requirements
Incident history	Has the vendor had previous data breaches or privacy incidents?	Check vendor's breach disclosure history; search for news reports

See Legal Considerations for AI in Education — FERPA, COPPA, and GDPR for the full legal framework governing these requirements.

3. Technical Integration (15%)

Test	Method	Pass Criteria
SSO integration	Configure SAML/OAuth with your IdP (Azure AD, Google Workspace, Clever)	Auto-provisioning works; teacher/student roles sync correctly
SIS/LMS integration	Test OneRoster or LTI 1.3 connection with your systems	Roster sync accurate; grade passback functional (if applicable)
Bandwidth impact	Monitor network during concurrent use (simulate 50+ users)	No significant impact on other services; acceptable latency (<3 seconds per AI response)
Device compatibility	Test on representative devices (Chromebooks, iPads, Windows laptops)	Full functionality on all district-supported devices
Offline/low-bandwidth	Test on degraded network (simulate rural/low-bandwidth conditions)	Graceful degradation; no data loss; clear error messages

See Comparing AI Deployment Models — Cloud, On-Premise, and Hybrid for how deployment architecture affects these technical requirements.

4. Usability and Adoption (15%)

Test with 5-8 representative teachers covering different comfort levels with technology:

USABILITY TEST PROTOCOL

Task 1: First use (no training)
- Give teacher the tool with no instruction
- Ask them to complete a typical task
- Measure: time to first successful output,
  number of errors, frustration indicators

Task 2: After 15-minute orientation
- Provide brief orientation video or guide
- Ask teacher to complete 3 varied tasks
- Measure: success rate, output quality,
  confidence level

Task 3: After 1 week of daily use
- Teacher uses tool independently for 1 week
- Interview: What works? What doesn't? Would
  you keep using this?
- Measure: continued use rate, reported time
  savings, output quality over time

SCORING:
5 — Teachers productive within first use
4 — Teachers productive after brief orientation
3 — Teachers productive after structured training
2 — Teachers require ongoing support
1 — Teachers resist using despite training

5. Total Cost of Ownership (10%)

Vendor pricing tells you the license cost. Total cost of ownership tells you what you'll actually spend.

Cost Category	Year 1	Year 2	Year 3
License/subscription fees	Quote	Quote + escalation	Quote + escalation
Implementation/setup	Vendor fees + IT staff time	—	—
Training	Initial PD days + substitute costs	Refresher + new hire training	Refresher + new hire training
Integration	SSO/LMS setup (IT staff time or consultant)	Maintenance	Maintenance
Support overhead	Help desk tickets, IT troubleshooting	Ongoing	Ongoing
Usage scaling	Base usage	Growth (10-20% more users?)	Full adoption
TOTAL	Sum	Sum	Sum

Hidden costs to watch for:

Per-seat pricing that escalates with adoption ("starts at $3/teacher" but grows)
AI compute charges based on usage volume
Premium features locked behind higher tiers
Integration costs not included in base price
Training costs for new staff each year

6. Vendor Stability and Support (10%)

Factor	How to Assess	Why It Matters
Funding/revenue	Ask about funding stage, revenue model, path to profitability	30-40% of AI startups will not exist in 3 years (CB Insights, 2024)
Customer retention	Ask for retention rate and average contract duration	High churn = red flag for product quality or support
Support responsiveness	Submit a test support ticket during evaluation; measure response time	You will need support; test it before you commit
Product roadmap	Request roadmap for next 12 months; assess realism	Overpromising is common; grounded roadmaps indicate maturity
Exit strategy	Verify data portability, export formats, deletion timeline	If they fail, can you get your data out?

Phase 3: Pilot Program (Weeks 4-8)

Don't go from evaluation to full deployment. A structured pilot reduces risk and generates the evidence you need for a deployment decision.

Pilot Design Template

PILOT PROGRAM STRUCTURE

Duration: 4-6 weeks (minimum 4 instructional
weeks)

Participants:
- 8-12 teachers (mix of tech-comfortable and
  tech-hesitant)
- 2-3 grade bands or subject areas
- At least 1 school site (2 if possible for
  comparison)

Clear Success Criteria (define BEFORE pilot):
□ X% of pilot teachers report net time savings
□ AI-generated content meets quality threshold
  on Y% of outputs
□ No privacy incidents or policy violations
□ Teacher satisfaction score ≥ X on 1-5 scale
□ Technical issues resolved within X hours

Data Collection:
- Weekly 5-minute teacher survey (quantitative)
- Bi-weekly focus group or interview (qualitative)
- Usage analytics from vendor dashboard
- IT support ticket log
- Content quality samples (3-5 per teacher)

Decision Framework:
- Meets ALL success criteria → Proceed to
  deployment planning
- Meets MOST criteria with manageable gaps →
  Negotiate improvements; conditional deployment
- Fails multiple criteria → Do not deploy; share
  specific feedback with vendor

Pilot timing matters: Don't pilot during the first or last two weeks of school, during testing windows, or during a week with multiple half-days. You need normal instructional conditions to get meaningful results.

Contract Negotiation

For vendors that pass evaluation and pilot, negotiate contracts deliberately. This is where technology directors have more leverage than they often realize.

Negotiation Points

Area	What to Negotiate	Typical Outcome
Multi-year pricing	Lock rates for 2-3 years; cap annual increases at 3-5%	Most vendors will agree to caps in exchange for commitment
Usage tiers	Negotiate based on realistic adoption (not 100% Day 1)	Start with conservative tier; negotiate upgrade path
Exit clause	90-day termination notice; no penalty after Year 1	Protects against vendor underperformance
Data portability	Explicit right to export all district data in standard format	Essential; some vendors make export difficult
SLA guarantees	99.5%+ uptime; 4-hour response for critical issues	Get specific commitments, not vague promises
Training inclusion	Initial training + annual refresher included in contract	Many vendors will include training to close the deal
Most-favored pricing	If vendor offers a lower price to a comparable district, you get the same rate	Uncommon but worth requesting for large districts

Key Takeaways

Only 23% of districts have AI-specific evaluation processes (CoSN, 2024). Traditional procurement frameworks miss critical AI-specific risks including model training data use, AI-generated content accuracy, and sub-processor data sharing. See AI for School Leaders — A Strategic Guide to Transforming Education Administration for strategic context.
The most important privacy question is not "Is student data secure?" but "Is student data used to train AI models?" Many AI vendors use customer data for model improvement unless explicitly prohibited in the DPA. See Legal Considerations for AI in Education — FERPA, COPPA, and GDPR for the full legal framework.
Evaluate educational effectiveness with the same rigor as technical compliance. A tool that passes every security check but produces mediocre content wastes money. Test with 10 representative tasks spanning subjects, grade levels, and differentiation needs. See Building a Culture of Innovation — Leading AI Adoption in Schools for adoption strategy.
Always pilot before full deployment. 4-6 weeks with 8-12 teachers using the tool under normal conditions generates better evidence than any demo or sales call. Define success criteria before the pilot starts — not after.
Total cost of ownership is 1.5-2.5× the license cost. Training, integration, support overhead, and usage scaling are real costs that vendor quotes don't include. Budget for the full picture. See AI for Student Enrollment Forecasting and Resource Planning for resource planning.
Negotiate from strength. Vendors want multi-year contracts; you want flexibility. Trade commitment for rate locks, training inclusion, and exit clauses. Your existing DPA compliance and pilot data are negotiation assets. See Best AI Content Generation Tools for Educators — Head-to-Head Comparison for tool comparison.

Frequently Asked Questions

Should we evaluate free AI tools with the same rigor as paid ones?

Yes — potentially with even more rigor. Free tools have a business model somewhere, and if you're not paying with money, you may be paying with data. Free tiers of commercial AI tools often have weaker privacy protections than paid tiers (less restrictive data use policies, model training on free-tier inputs, limited DPA availability). Apply the same privacy screening and DPA requirements regardless of cost. Some genuinely free educational tools (Khan Academy, certain Google Workspace features) have strong privacy practices — but verify, don't assume.

How do we evaluate vendors that use OpenAI or Anthropic as their underlying AI?

Many educational AI vendors are wrappers around major AI models (GPT-4, Claude, Gemini). This isn't inherently problematic, but it means your data flows through both the vendor AND the underlying AI provider. Ask: (1) Does the vendor have a DPA with their AI provider that extends your district's privacy protections? (2) Is your data excluded from model training at both the vendor level AND the underlying AI provider level? (3) Where is the data processed — the vendor's infrastructure or the AI provider's? The vendor's DPA with you is only as strong as their agreement with their sub-processor.

What's the minimum pilot duration?

Four instructional weeks is the minimum for meaningful results. Shorter pilots don't capture the learning curve — teachers need at least 2 weeks to get past initial friction, and you need at least 2 more weeks of data on actual productive use. If the vendor pushes for a 1-2 week pilot, they may be trying to capture the honeymoon period before teachers encounter the tool's limitations. Insist on 4 weeks minimum, and include at least one full instructional cycle (unit or grading period) if possible.

How do we handle teachers who want to buy AI tools with personal funds or classroom budgets?

This is a governance issue, not a technology issue. Any AI tool that accesses student data must go through the district's privacy review regardless of who pays for it. Establish a clear policy: all AI tools used in instructional settings require IT approval, even free tools, even personally purchased tools. This isn't bureaucracy — it's liability management. A teacher using an unapproved AI tool that experiences a data breach creates liability for the district, not just the teacher. Provide a streamlined approval path (48-72 hours for tools on an approved list) so teachers don't feel the process is a barrier.

How District Technology Directors Should Evaluate AI Vendors

How District Technology Directors Should Evaluate AI Vendors

Why Traditional Procurement Frameworks Fall Short

Phase 1: Initial Screening (Week 1)

Non-Negotiable Requirements Checklist

Questions for Vendor Sales Calls

Phase 2: Deep Evaluation (Weeks 2-3)

Evaluation Scoring Framework

1. Educational Effectiveness (25%)

2. Privacy and Data Governance (25%)

3. Technical Integration (15%)

4. Usability and Adoption (15%)

5. Total Cost of Ownership (10%)

6. Vendor Stability and Support (10%)

Phase 3: Pilot Program (Weeks 4-8)

Pilot Design Template

Contract Negotiation

Negotiation Points

Key Takeaways

Frequently Asked Questions

Should we evaluate free AI tools with the same rigor as paid ones?

How do we evaluate vendors that use OpenAI or Anthropic as their underlying AI?

What's the minimum pilot duration?

How do we handle teachers who want to buy AI tools with personal funds or classroom budgets?

Related Articles

AI-Powered Grant Writing Assistance for Educators

How AI Helps Schools Prepare for State Audits and Reporting

Scaling AI from One Classroom to the Whole School