Enhance User Experience with Text-to-Speech: A Practical Implementation

I've implemented Text to Speech feature on my website's blog page. It wasn't a necessary addition, but I saw it as a valuable learning opportunity.

NextJS

Web API

Javascript

Text To Speech

Enhance User Experience with Text-to-Speech: A Practical Implementation

Few days ago I've implemented Text to Speech feature on my website's blog page. It wasn't a necessary addition, but I saw it as a valuable learning opportunity. With just 30 lines of JavaScript code,I successfully integrated Text to Speech functionality,complemented by CSS animations for the buttons that control the audio playback.

In this blog I'm going to explain how I have implemented that in my website along with code examples.

Prerequisites

  • Any level of Javascript experience

Concepts used

  • Web API
  • JS Event handlers
  • React Hooks

Background on Web Speech API

We will need SpeechSynthesisUtterance(). This is an interface of the Web Speech API represents a speech request. It contains the content, the speech service should read and information how to read language, pitch and volume.

Default values for pitch, rate and volume is 1 which can be updated anytime.

Let's start building

Create a file which have all the functions related to Text To Speech functionality like Speak, Pause, Stop etc. Speak will accept a parameter which will be the actual text to be spoken out. I have created a file named text-to-speech.ts inside common/lib folder.

At the very first check if browser supports Web Speech. For this I have created a function named isTTSSupported(). This will return me the boolean value and I can use this function any where in the app if needed.

export const isTTSSupported = () => {
  if ("speechSynthesis" in window) {
    return true;
  } else {
    return false;
  }
};

Now lets get voice for your text to spoken out in.

const getVoices = () => {
  let voices = speechSynthesis.getVoices();
  if (!voices.length) {
    // some time the voice will not be initialized so we can call spaek with empty string
    // this will initialize the voices
    let utterance = new SpeechSynthesisUtterance("");
    speechSynthesis.speak(utterance);
    voices = speechSynthesis.getVoices();
  }
  return voices;
};

Now we have setup ready for our text to speech. We will now setup function for Speaking out the text.

export const speak = (text, onEndEvent) => {
  try {
    if (isTTSSupported()) {
      const voices = getVoices();

      let speakData = new SpeechSynthesisUtterance();
      speakData.text = text;
      speakData.lang = "en";
      speakData.voice = voices[0];
      speakData.rate = 0.95;
      speakData.addEventListener("end", () => {
        onEndEvent();
      });

      if (speechSynthesis.paused) {
        speechSynthesis.resume();
      } else {
        speechSynthesis.cancel();
        speechSynthesis.speak(speakData);
      }
    }
  } catch (err) {
    throw new Error("Text can't be played");
  }
};

Here is my speak function which accepts 2 parameters text & onEndEvent. As the name suggests text is the actual text which we want to be played when Play button is clicked and onEndEvent it will notify the component about speaking event is ended and now we have to clear the state and reset the buttons.

if (speechSynthesis.paused) {
    speechSynthesis.resume();
} else {
    speechSynthesis.cancel();
    speechSynthesis.speak(speakData);
}

In this code block, I'm checking if it was paused then just resume the text and if not then play from beginning. Used speechSynthesis.cancel() before speaking out new text because Web Speech API caches the text which was last played. So to avoid any side effects it is better to reset it completely and play the provided text.

Integration with Component

In my [slug].js I have displayed my article content inside article tag. So I need to get the element for which I have used useRef hook.

const articleRef = useRef(null);

<article ref={articleRef} className="my-5">
 ...blog content
</article>

For initializing the text which I want to play I have used useEffect & useState hooks. In useEffect I had 2 dependencies articleRef and post, if post is changed the text needs to be updated and at component loading articleRef should have the target element reference.

const articleRef = useRef<HTMLDivElement>(null);
const [articleText, setArticleText] = useState("");

useEffect(() => {
    if (articleRef.current) {
      setArticleText(`${post.title}. ${post.excerpt}. ${articleRef.current.innerText}`);
    }
  }, [articleRef, post]);

Once this is done, I have the Text To Speech functionality almost ready as I have the text to play on click and all the checks related to Web Speech and Playback functionality. Now need to create a Play and Pause button functionality.

I have created a separate component which handles buttons and playback of the text. This accepts Text to be played which will be sent to speak() which we have created in text-to=speech.ts.

type Props = {
  text: string;
};

const TextToSpeech = ({ text }: Props) => {
  const [isStarted, setStarted] = useState(false);
  const [isPlayed, setPlayed] = useState(false);

  useEffect(() => {
    handlePlayFromStart();
  }, [text]);

  const handleSpeak = () => {
    setStarted(true);
    setPlayed(true);
    speak(text, handlePlayFromStart);
  };

  const handlePause = () => {
    setStarted(false);
    pause();
  };

  const handlePlayFromStart = () => {
    setPlayed(false);
    setStarted(false);
    cancel();
  };

  return (
    <div className={styles.ttsButtonContainer}>
      {text && isTTSSupported() && (
        <>
          {!isStarted ? (
            <button
              aria-label="Play"
              title="Play"
              className={classNames("btn", styles.playButton)}
              onClick={handleSpeak}
            >
              <FaPlay />
            </button>
          ) : (
            <button
              title="Pause"
              aria-label="Pause"
              className={classNames("btn", styles.pauseButton)}
              onClick={handlePause}
            >
              <FaPause />
            </button>
          )}

          {isPlayed && (
            <button
              title="Play From Start"
              aria-label="Play From Start"
              className={classNames("btn", styles.cancelButton)}
              onClick={handlePlayFromStart}
            >
              <MdReplay />
            </button>
          )}
        </>
      )}
    </div>
  );
};

In this component when it is loaded or text prop is changed it will reset the state and will play from start if play button is clicked. I have added 3 functionalities here which are:

  • Play
  • Pause
  • Play from start

While playing the text if it has ended it will automatically sets the state to default as we have sent a function for onEnd event on speechSynthesis.

Conclusion

This is how I have created Text To Speech functionality on my website's blog. I hope you will like this approach. I would also like to know if any other approach is better than this.

Please share this article if you have liked it.


Get latest updates

I post blogs and videos on different topics on software
development. Subscribe newsletter to get notified.


You May Also Like

Master Pagination, Search, and Language Filtering in NextJS with Prisma ORM

Master Pagination, Search, and Language Filtering in NextJS with Prisma ORM

Learn how to implement pagination, search, and language filtering in a NextJS app using Prisma ORM. Enhance your code snippet sharing app's functionality with these essential features for improved user experience.

When to Use a Monorepo: Benefits, Drawbacks, and Practical Examples

When to Use a Monorepo: Benefits, Drawbacks, and Practical Examples

Learn when to use a monorepo, its benefits, and drawbacks. This guide includes practical examples to help you decide if a monorepo is right for your development projects.

NodeJS: An Introduction to Streams for Efficient Data Handling

NodeJS: An Introduction to Streams for Efficient Data Handling

Learn the basics of NodeJS streams, including reading, writing, and piping data, to efficiently handle large data sets in your applications with practical code examples.