OpenAi语音转文字开源项目Whisper JNI调用

当前位置：首页 >> 开源 >> OpenAi语音转文字开源项目Whisper JNI调用

OpenAi语音转文字开源项目Whisper JNI调用 3160 开源 | 2026-1-8

Whisper 是 OpenAI 研发的自动语音识别（ASR）系统，它具备强大的多语言识别能力，能处理多种不同口音和背景噪音下的语音内容，将其准确地转换为文本。其支持的语言众多，包括但不限于英语、中文、法语、德语等，广泛应用于会议记录、音频字幕制作、语音内容整理等场景。

Whisper 可以在多种操作系统上运行，如 Windows、macOS 和 Linux。

whisper官网地址

https://github.com/openai/whisper

WhisperJNI（A JNI wrapper for whisper.cpp, allows transcribe speech to text in Java.）

https://github.com/GiviMAD/whisper-jni

whisper-jni 是一个基于 Whisper.cpp（而非 OpenAI 官方的 PyTorch 版本）并通过 JNI（Java Native Interface）封装的 Java 调用库。因此，它的 GPU 支持机制与 Python 的 openai-whisper 完全不同。

Whisper.cpp（以及 whisper-jni）默认使用 CPU 推理，且原生不支持 CUDA / GPU 加速。

Whisper.cpp 是由 ggerganov 开发的 C++ 实现，主打轻量、跨平台、无依赖（仅需 ggml），设计初衷就是在 CPU 上高效运行（尤其适合移动端或嵌入式设备）。它不依赖 PyTorch/TensorFlow，也不直接支持 NVIDIA CUDA。

综上

1：当你选择使用这个方式进行使用时，就不要再选择文件过大的模型了，如（http://www.javacui.com/opensource/777.html ）可以选择small模型，能满足基本的需求。

2：Whisper.cpp 原生仅支持 16-bit PCM WAV 格式，建议使用 FFmpeg 进行转换。

转换命令参考

ffmpeg.exe -i 1767594398074.mp3 -ar 16000 -ac 1 -c:a pcm_s16le 1767594398074.wav

3：默认转换出来的是繁体字，我们需要借用opencc4j工具包，转为简体。

所以

1：先把MP3转为WAV

2：POM引入，相关的DLL和SO文件，在引入的JAR包可见

<dependency>
	<groupId>io.github.givimad</groupId>
	<artifactId>whisper-jni</artifactId>
	<version>1.7.1</version>
</dependency>
<dependency>
	<groupId>com.github.houbb</groupId>
	<artifactId>opencc4j</artifactId>
	<version>1.6.2</version>
</dependency>

3：示例Java代码

package com.example.springboot;
import com.github.houbb.opencc4j.util.ZhConverterUtil;
import io.github.givimad.whisperjni.WhisperFullParams;
import io.github.givimad.whisperjni.WhisperJNI;
import lombok.extern.slf4j.Slf4j;
import org.junit.jupiter.api.Test;
import javax.sound.sampled.AudioInputStream;
import javax.sound.sampled.AudioSystem;
import javax.sound.sampled.UnsupportedAudioFileException;
import java.io.File;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.ByteOrder;
import java.nio.file.Path;
@Slf4j
public class WhisperTest {
    @Test
    public void test1() {
        var modelPath = "D:\\Soft\\Whisper\\ggml-small.bin";
        var audioPath = "D:\\Soft\\Whisper\\1767594398074.wav";
        try {
            WhisperJNI.loadLibrary();
        } catch (IOException e) {
            e.printStackTrace();
        }
        WhisperJNI.setLibraryLogger(null); // 捕获/禁用 whisper.cpp 日志
        var whisper = new WhisperJNI();
        float[] samples = readAudio(new File(audioPath));
        try (var ctx = whisper.init(Path.of(modelPath))) {
            var params = new WhisperFullParams();
            //"en" - 英语
            //"es" - 西班牙语
            //"fr" - 法语
            //"de" - 德语
            //"it" - 意大利语
            //"ko" - 韩语
            //"pt" - 葡萄牙语
            //"ja" - 日语
            params.language = "zh"; // 汉语
            params.nThreads = 8; // 线程数
            long startTime = System.nanoTime(); // 开始计时
            int result = whisper.full(ctx, params, samples, samples.length);
            long endTime = System.nanoTime(); // 结束计时
            long transcriptionTime = endTime - startTime;
            System.out.println("语音转文字耗时: " + (double)transcriptionTime/1_000_000 + " 毫秒");
            if (result != 0) {
                throw new RuntimeException("Transcription failed with code " + result);
            }
            int numSegments = whisper.fullNSegments(ctx);
            for (int i = 0; i < numSegments; i++) {
                String text = whisper.fullGetSegmentText(ctx, i);
                String simplifiedText = ZhConverterUtil.toSimple(text);
                System.out.println(simplifiedText);
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
    private static float[] readAudio(File file) {
        // 样本为 16 位整数、16000 Hz、Little Endian 的 wav 文件
        try (AudioInputStream audioInputStream = AudioSystem.getAudioInputStream(file)) {
            // 将所有可用数据读取到 Little Endian 捕获缓冲区
            ByteBuffer captureBuffer = ByteBuffer.allocate(audioInputStream.available());
            captureBuffer.order(ByteOrder.LITTLE_ENDIAN);
            int read = audioInputStream.read(captureBuffer.array());
            if (read == -1) {
                throw new IOException("Empty file");
            }
            // 获取 16 位整数音频采样，Java 中为 short 类型
            var shortBuffer = captureBuffer.asShortBuffer();
            // 将采样值转换为 f32 格式
            float[] samples = new float[captureBuffer.capacity() / 2];
            var i = 0;
            while (shortBuffer.hasRemaining()) {
                samples[i++] = Float.max(-1f, Float.min(((float) shortBuffer.get()) / (float) Short.MAX_VALUE, 1f));
            }
            return samples;
        } catch (IOException | UnsupportedAudioFileException e) {
            e.printStackTrace();
        }
        return new float[0];
    }
}

4：输出

语音转文字耗时: 3722.8131 毫秒
据黑龙江省疾病预防控制中心消息
小寒节气标志著我国大部分地区
进入一年中最寒冷时段
寒潮频繁、气温皱降、空气乾燥
是呼吸道传染病 新脑血管疾病的高发期
未保障公众健康
黑龙江省疾控中心特发布健康提示
指导公众科学防寒保暖、健康过冬

可以看到，在使用small模型时，对于普通的音频文件，可以正常解析。但是由于先转WAN，后面还要繁体转简体。

推荐您阅读更多有关于“ java JNI OpenAi 语音文字 Whisper ”的文章

上一篇：Coqui.ai 文本转语音 Ubuntu安装下一篇：OpenAi语音转文字开源项目Whisper Ubuntu安装

猜你喜欢

发表评论：

个人资料: Java小强

未曾清贫难成人，不经打击老天真。
自古英雄出炼狱，从来富贵入凡尘。

站内搜索

文章分类

Java(45)	站长(50)
开源(140)	框架(51)
理论(79)	JS(55)
Linux(21)	DB(56)
服务器(61)	网络编程(11)
生活(35)	软件(83)
PHP(14)	其他(5)

最新文章

热门文章

随机文章