Skills Data Science Grammarly Document Data Handling Pipeline

Grammarly Document Data Handling Pipeline

v20260423
grammarly-data-handling
Implements a robust data pipeline for handling large documents with the Grammarly API. This utility manages text chunking to comply with API character limits (max 100k), processes text in manageable segments, and aggregates multiple scoring metrics (like correctness, clarity, and tone) into a single, comprehensive report. Ideal for large-scale content analysis and data ingestion.
Get Skill
334 downloads
Overview

Grammarly Data Handling

Overview

Handle large documents, text chunking, and data pipelines for Grammarly API. The API accepts max 100,000 characters (4 MB) with a minimum of 30 words.

Instructions

Step 1: Text Chunking

function chunkText(text: string, maxChars = 90000): string[] {
  if (text.length <= maxChars) return [text];
  const paragraphs = text.split('\n\n');
  const chunks: string[] = [];
  let current = '';
  for (const p of paragraphs) {
    if ((current + '\n\n' + p).length > maxChars && current) {
      chunks.push(current);
      current = p;
    } else {
      current = current ? current + '\n\n' + p : p;
    }
  }
  if (current) chunks.push(current);
  return chunks;
}

Step 2: Aggregate Scores Across Chunks

function aggregateScores(scores: any[]): any {
  const avg = (arr: number[]) => arr.reduce((a, b) => a + b, 0) / arr.length;
  return {
    overallScore: Math.round(avg(scores.map(s => s.overallScore))),
    correctness: Math.round(avg(scores.map(s => s.correctness))),
    clarity: Math.round(avg(scores.map(s => s.clarity))),
    engagement: Math.round(avg(scores.map(s => s.engagement))),
    tone: Math.round(avg(scores.map(s => s.tone))),
    chunkCount: scores.length,
  };
}

Step 3: File Processing Pipeline

import fs from 'fs';

async function scoreFile(filePath: string, token: string) {
  const text = fs.readFileSync(filePath, 'utf-8');
  const chunks = chunkText(text);
  const scores = [];
  for (const chunk of chunks) {
    if (chunk.split(/\s+/).length >= 30) {
      scores.push(await grammarlyClient.score(chunk));
    }
  }
  return aggregateScores(scores);
}

Resources

Info
Category Data Science
Name grammarly-data-handling
Version v20260423
Size 2.27KB
Updated At 2026-04-28
Language