技能 数据科学 Grammarly文档数据处理管线

Grammarly文档数据处理管线

v20260423
grammarly-data-handling
这是一个用于处理大型文档的完整数据管道,专为与Grammarly API交互设计。它解决了大文件API限制的问题,通过智能文本分块(Chunking)将文本分割为可处理的小块,然后对每一块进行评分,最后汇总所有关键指标(如正确性、清晰度、语气等),生成全面的报告。
获取技能
334 次下载
概览

Grammarly Data Handling

Overview

Handle large documents, text chunking, and data pipelines for Grammarly API. The API accepts max 100,000 characters (4 MB) with a minimum of 30 words.

Instructions

Step 1: Text Chunking

function chunkText(text: string, maxChars = 90000): string[] {
  if (text.length <= maxChars) return [text];
  const paragraphs = text.split('\n\n');
  const chunks: string[] = [];
  let current = '';
  for (const p of paragraphs) {
    if ((current + '\n\n' + p).length > maxChars && current) {
      chunks.push(current);
      current = p;
    } else {
      current = current ? current + '\n\n' + p : p;
    }
  }
  if (current) chunks.push(current);
  return chunks;
}

Step 2: Aggregate Scores Across Chunks

function aggregateScores(scores: any[]): any {
  const avg = (arr: number[]) => arr.reduce((a, b) => a + b, 0) / arr.length;
  return {
    overallScore: Math.round(avg(scores.map(s => s.overallScore))),
    correctness: Math.round(avg(scores.map(s => s.correctness))),
    clarity: Math.round(avg(scores.map(s => s.clarity))),
    engagement: Math.round(avg(scores.map(s => s.engagement))),
    tone: Math.round(avg(scores.map(s => s.tone))),
    chunkCount: scores.length,
  };
}

Step 3: File Processing Pipeline

import fs from 'fs';

async function scoreFile(filePath: string, token: string) {
  const text = fs.readFileSync(filePath, 'utf-8');
  const chunks = chunkText(text);
  const scores = [];
  for (const chunk of chunks) {
    if (chunk.split(/\s+/).length >= 30) {
      scores.push(await grammarlyClient.score(chunk));
    }
  }
  return aggregateScores(scores);
}

Resources

信息
Category 数据科学
Name grammarly-data-handling
版本 v20260423
大小 2.27KB
更新时间 2026-04-28
语言