Modern Package Development: How I Built & Shipped a DX-First Library | Mridul Kumar

Modern Package Development: How I Built & Shipped a DX-First Library

1 February, 202612 minutes read

I was building an AI chatbot called Blockpedia and wanted to add a speech-to-text feature. You know, like how ChatGPT, Gemini, and Claude have that little mic icon that lets you talk instead of type.

I didn't want to pay for Whisper API calls or deal with complex audio processing setups. Browsers already have the Web Speech API built-in - free, instant, and good enough for most use cases.

So I searched for React packages. Found a few. Used them. And quickly realized they all had the same problem: they just wrap the API without solving the actual hard parts.

The "Hard Parts" Nobody Handles

  • Mic permissions - Users click the button, nothing happens, they're confused. Where's the permission prompt? Was it denied before?
  • Silence detection - User stops talking but recording continues forever. Or it cuts off mid-sentence.
  • Cursor position - Text always appends to the end. What if I want to insert it where my cursor is?
  • Browser quirks - Works in Chrome, breaks in Safari. No webkit prefixing.

So I did what any developer does when they can't find the right tool: I built it myself and published it on npm.

This post is the story of how I built @syntropy-labs/react-web-speech, but more importantly, it's a guide to modern npm package development in 2026 - the tools, the patterns, and the mindset.


What I Built

Before diving into how to build packages, let me show you what the end result looks like. The entire API is two hooks:

Basic usage
import { useSpeechInput } from '@syntropy-labs/react-web-speech'

function VoiceInput() {
  const { transcript, isListening, toggle, permissionState } = useSpeechInput({
    silenceTimeout: 2000,  // Auto-stop after 2s silence
    continuous: false,     // Stop after one phrase
  })

  return (
    <div>
      <button onClick={toggle}>
        {isListening ? '🔴 Listening...' : '🎤 Click to speak'}
      </button>
      <p>Permission: {permissionState}</p>  {/* 'prompt' | 'granted' | 'denied' */}
      <p>You said: {transcript}</p>
    </div>
  )
}

And for the cursor-aware insertion that existing packages don't handle:

Cursor-aware insertion
import { useSpeechInputWithCursor } from '@syntropy-labs/react-web-speech'

function SmartTextarea() {
  const inputRef = useRef<HTMLTextAreaElement>(null)
  const [value, setValue] = useState('')

  const { isListening, toggle } = useSpeechInputWithCursor({
    inputRef,
    value,
    onChange: setValue,  // Text inserts at cursor position
  })

  return (
    <div>
      <textarea ref={inputRef} value={value} onChange={e => setValue(e.target.value)} />
      <button onClick={toggle}>{isListening ? 'Stop' : 'Speak'}</button>
    </div>
  )
}
✓ Permission Tracking

Know exactly if permission is prompt/granted/denied before the user clicks anything

✓ Smart Silence Detection

Configurable timeout that actually works - stops cleanly when user stops talking

✓ Cursor-Aware Insertion

Text goes exactly where the cursor is, not just appended to the end

✓ Cross-Browser Support

Handles webkit prefixing for Safari, graceful degradation for Firefox

~5KB gzipped. Fully typed. Tree-shakeable. Zero dependencies beyond React.


Try It Yourself

Don't take my word for it - here's the actual package running on this page. Click the mic and say something.

Browser Support (your browser is highlighted):

🟢Chrome
🟢Edge
🟡Safari
🔴Firefox

Demo 1: Basic Speech Input

The useSpeechInput hook handles everything - permissionStates, listening state, transcript, and automatic silence detection.

Web Speech API not supported in this browser

Try opening this page in Chrome, Edge, or Safari

Demo 2: Cursor-Aware Text Insertion

This is what other packages don't do. Position your cursor anywhere in the text below, then speak - your words will be inserted exactly where the cursor is.

Web Speech API not supported in this browser

Try opening this page in Chrome, Edge, or Safari

What's happening under the hood?

  • Permission tracking - The hook checks mic permission state before you even click, so you can show appropriate UI
  • Silence detection - Recording stops automatically 2 seconds after you stop talking (configurable)
  • Cursor position - The hook captures and restores the selection range, inserting text at the exact cursor position
  • Browser compatibility - Handles webkit prefixing for Safari automatically

The Modern Package Toolchain (2026)

The JavaScript ecosystem moves fast. Here's what I used and why - these are the tools that will save you hours of configuration hell.

tsdown

Build Tool

tsdown is the spiritual successor to tsup, powered by Rolldown (the Rust bundler). It's what you should be using in 2026.

Why tsdown over tsup/esbuild?
  • 10x faster builds (Rust-based)
  • Native ESM + CJS dual output
  • Auto-generates .d.ts files
  • Zero config for most cases
Migration from tsup?
  • Same API surface
  • Drop-in replacement
  • Same config file format
  • Just change the package name
tsdown.config.ts
import { defineConfig } from 'tsdown'

export default defineConfig({
  entry: ['./src/index.ts'],
  format: ['esm', 'cjs'],      // Dual package support
  dts: true,                    // Generate TypeScript declarations
  clean: true,                  // Clean dist/ before build
  treeshake: true,              // Remove unused code
  external: ['react', 'react-dom'],  // Don't bundle peer deps
})
🧪

Vitest + Testing Library

Testing

Jest is fine, but Vitest is faster and has native TypeScript support. For React hooks, pair it with @testing-library/react.

src/__tests__/useSpeechInput.test.ts
import { renderHook, act } from '@testing-library/react'
import { describe, it, expect, vi } from 'vitest'
import { useSpeechInput } from '../hooks/useSpeechInput'

describe('useSpeechInput', () => {
  it('should initialize with correct defaults', () => {
    const { result } = renderHook(() => useSpeechInput())

    expect(result.current.isListening).toBe(false)
    expect(result.current.transcript).toBe('')
    expect(result.current.isSupported).toBe(true)
  })

  it('should toggle listening state', () => {
    const { result } = renderHook(() => useSpeechInput())

    act(() => result.current.toggle())
    expect(result.current.isListening).toBe(true)
  })
})
📦

Changesets

Versioning

Forget manually bumping versions. Changesets handles semver, changelogs, and npm publishing in one workflow.

Terminal
# Add a changeset for your changes
npx changeset

# When ready to release, version all packages
npx changeset version

# Publish to npm
npx changeset publish

It asks you what kind of change (patch/minor/major) and generates a changelog entry. No more "what version should this be?" debates.

🐶

Husky + lint-staged

Git Hooks

Automatically lint and format code before every commit. Never push broken code again.

package.json
{
  "lint-staged": {
    "*.{js,ts,tsx}": ["eslint --fix", "prettier --write"]
  },
  "scripts": {
    "prepare": "husky"
  }
}
.husky/pre-commit
npx lint-staged

2020 vs 2026 Tooling

Purpose20202026
BundlerRollup / Webpacktsdown (Rust)
TestingJestVitest
Type Generationtsc (slow)tsdown dts (fast)
VersioningManual + npm versionChangesets
LintingESLint (slow)ESLint + Oxlint (fast)

Project Structure

Here's the structure I used. It's pretty standard, but the key is keeping it simple:

Project structure
react-web-speech/
├── src/
│   ├── index.ts              # Public exports only
│   ├── hooks/
│   │   ├── useSpeechInput.ts
│   │   └── useSpeechInputWithCursor.ts
│   ├── utils/
│   │   └── cursor.ts         # Cursor manipulation utilities
│   └── __tests__/
│       └── useSpeechInput.test.ts
├── dist/                      # Generated by tsdown
│   ├── index.mjs             # ESM build
│   ├── index.cjs             # CommonJS build
│   └── index.d.ts            # Type declarations
├── tsdown.config.ts
├── tsconfig.json
├── package.json
└── README.md

Key Principle: Explicit Public API

Your src/index.ts should only export what users need. Don't export internal utilities or types unless they're actually useful to consumers.

src/index.ts
// Only export what users need
export { useSpeechInput } from './hooks/useSpeechInput'
export { useSpeechInputWithCursor } from './hooks/useSpeechInputWithCursor'

// Export types for TypeScript users
export type {
  SpeechInputOptions,
  SpeechInputResult,
  PermissionState
} from './types'

// Internal utilities stay internal
// DON'T export: insertTextAtCursor, getSelectionRange, etc.

The package.json Anatomy

This is where most people mess up. Here's every field that matters for a modern npm package:

package.json
{
  "name": "@syntropy-labs/react-web-speech",
  "version": "0.1.2",
  "description": "React hooks for Web Speech API with mic permissionStates, listening states, and cursor-aware text insertion",
  "type": "module",

  "main": "./dist/index.cjs",
  "module": "./dist/index.mjs",
  "types": "./dist/index.d.cts",

  "exports": {
    ".": {
      "import": {
        "types": "./dist/index.d.mts",
        "default": "./dist/index.mjs"
      },
      "require": {
        "types": "./dist/index.d.cts",
        "default": "./dist/index.cjs"
      }
    }
  },

  "files": ["dist"],
  "sideEffects": false,

  "peerDependencies": {
    "react": ">=17.0.0",
    "react-dom": ">=17.0.0"
  },
  "peerDependenciesMeta": {
    "react-dom": { "optional": true }
  },

  "scripts": {
    "build": "tsdown",
    "dev": "tsdown --watch",
    "test": "vitest",
    "lint": "eslint src/",
    "lint:fix": "eslint src/ --fix",
    "release": "changeset publish"
  },

  "keywords": ["react", "speech", "voice", "web-speech-api", "hooks"],
  "license": "MIT",
  "repository": "https://github.com/syntropyLabs/react-web-speech"
}

"type": "module"

Tells Node.js this is an ES module package. Required for modern tooling. Without it, you'll get import/export errors.

"exports" (Conditional exports)

The modern way to define entry points. Supports both ESM (import) and CJS (require) with proper type declarations for each. This is what makes your package work everywhere.

"sideEffects": false

Tells bundlers your package has no side effects, enabling aggressive tree-shaking. If users only import one hook, they shouldn't get the other in their bundle.

"peerDependencies"

React packages should NEVER bundle React. Instead, list it as a peer dependency so users' existing React version is used. This prevents the dreaded "multiple React instances" error.

"files": ["dist"]

Only publish the dist folder to npm. Don't publish src/, tests, or config files. Keeps your package size small.


The DX-First Mindset

DX (Developer Experience) isn't just about having good docs. It's about designing your API so developers fall into the pit of success. Here's how I thought about it:

1. Sensible Defaults

The hook should work with zero configuration. Only require options when there's no obvious default.

✗ Bad: Requires config
useSpeech({ lang: 'en-US', continuous: false, ... })
✓ Good: Works immediately
useSpeechInput() // Just works

2. Discoverable API

Return objects, not arrays. Let TypeScript autocomplete guide developers.

✗ Bad: Array destructuring
const [text, listening, toggle] = useSpeech()
✓ Good: Named properties
const { transcript, isListening } = useSpeechInput()

3. Handle Edge Cases Internally

Don't force users to handle browser quirks. Do it inside your package.

Internal browser handling
// Inside the hook - users never see this
const SpeechRecognition =
  window.SpeechRecognition ||
  window.webkitSpeechRecognition  // Safari support

const isSupported = typeof SpeechRecognition !== 'undefined'

// Expose a simple boolean to users
return { isSupported, ... }

4. Fail Gracefully

If the browser doesn't support the feature, don't crash. Return a sensible state.

Graceful degradation
// If Web Speech API isn't supported
if (!isSupported) {
  return {
    isListening: false,
    transcript: '',
    isSupported: false,
    permissionState: 'denied' as const,
    start: () => {},  // No-op
    stop: () => {},   // No-op
    toggle: () => {}, // No-op
  }
}

Publishing to npm

Once your package is ready, publishing is straightforward:

First-time setup
# Login to npm (create account at npmjs.com if needed)
npm login

# For scoped packages (@your-name/package)
npm login --scope=@your-scope

# Initialize changesets (one time)
npx changeset init
Publishing workflow
# 1. Make your changes and commit them

# 2. Create a changeset describing the change
npx changeset
# > What kind of change? (patch/minor/major)
# > Summary: "Add silence detection timeout option"

# 3. When ready to release, version the package
npx changeset version
# This bumps version in package.json and updates CHANGELOG.md

# 4. Build and publish
npm run build
npx changeset publish
# or just: npm run release (if you have it in scripts)

Pro Tip: Provenance

Add provenance to your publishes for supply chain security. This proves your package was built from your GitHub repo:

package.json
{
  "publishConfig": {
    "provenance": true
  }
}

Requires publishing from GitHub Actions. npm will show a "Provenance" badge on your package page.


Key Takeaways

  1. 1Build what you actually need. The best packages come from solving your own problems. If you can't find a good solution, others probably can't either.
  2. 2Use modern tooling. tsdown, Vitest, and Changesets will save you hours. The Rust-based tools are genuinely faster.
  3. 3Obsess over DX. Sensible defaults, discoverable APIs, graceful degradation. Make it impossible for users to use your package wrong.
  4. 4Handle the hard parts. Browser quirks, edge cases, permissions - don't just wrap an API, solve the actual problems developers face.
  5. 5Ship it. Your first version doesn't need to be perfect. v0.1.0 is better than an unpublished masterpiece.

That's how I built and shipped my first npm package. If you're building something and can't find the right tool - maybe it's time to build it yourself.

Resources:

• tsdown - github.com/rolldown/tsdown

• Changesets - github.com/changesets/changesets

• Vitest - vitest.dev

• Web Speech API MDN - developer.mozilla.org