JCHub

  • Home
  • Category
    • A/V
    • WebRTC
    • Beauty of Programming
    • Linux
    • Windows
    • Moments of Life
    • Campus Life
  • Reference
    • API Reference
    • Utilities
    • AV Test
    • Doc
  • Message Board
  • About
AI
A/V

Free Voice Clone TTS Survey

Preface With the arrival of the era of large-model voice conversations (ChatGPT-4o, Gemini Live, Doubao, etc.), high-naturalness, zero/few-shot voice cloning has become one of the core pain points for AI application deployment. Whether it’s AI short-drama dubbing, personalized digital humans, voice customer service, podcast/audiobook production, or localized private deployment, the quality, latency, VRAM usage, and cross-language capabilities of voice-clone TTS directly determine the user experience. This article is a record of evaluating and comparing more than a dozen open-source TTS solutions in early 2025. TTS Evaluation Survey (Basic Tools) Before comparing specific models, here is a brief list of commonly used objective and subjective evaluation methods to help verify results. TTS Evaluation Practices in the Era of Large-Model Voice Conversations Link:QECon Tech Sharing – TTS Evaluation Practices in the Era of Large-Model Voice Conversations Microsoft Pronunciation Assessment Service Link:Using Pronunciation Assessment This introduces how to use Azure Speech Service’s pronunciation assessment feature to automatically evaluate user pronunciation through programming. It can analyze metrics such as accuracy, fluency, and completeness, and is suitable for language learning, speech training, and similar scenarios. seed-tts-eval (Most Commonly Used Objective Metrics) Link:https://github.com/BytedanceSpeech/seed-tts-eval This is used for the most basic evaluations. Almost every TTS model paper provides these two metrics: Word Error Rate(WER)and Speaker Similarity(SIM)。 For WER, Whisper-large-v3 is used for English and Paraformer-zh for Chinese as the automatic speech recognition (ASR) engines. For speaker similarity, a WavLM-large model fine-tuned on speaker verification tasks is used to extract speaker embeddings, and cosine similarity is calculated between each test speech sample and the reference speech sample. Mainstream…

2025年3月19日 0comments 77hotness 0likes Jeff Read all
AI

本机Graphrag+ollma跑通

Conda 环境 [crayon-69caa0547c9f9909045307/] Ollama配置 [crayon-69caa0547ca04450689393/] 通过Ollama下载需要的大语言模型以及嵌入模型,这里用的阿里千问以及nomic。 [crayon-69caa0547ca09589025451/] 代码下载以及依赖安装 [crayon-69caa0547ca0c136530621/] 环境配置 创建一个目录用于存放输入的文本数据集,例如txt,csv文档。 [crayon-69caa0547ca0f969773085/] 初始化./ragtest目录用于生成默认环境配置文件。 [crayon-69caa0547ca12271863059/] ./ragtest目录下settings.yaml为默认配置文件,配的是Chatgpt模型,由于我们通过Ollama使用本地大模型,所以需要修改配置,可以使用如下内容直接替换settings.yaml中的配置: [crayon-69caa0547ca16097026129/] 修改内容如下: [crayon-69caa0547ca19586636169/] 配置文件参数说明参考:https://microsoft.github.io/graphrag/posts/config/json_yaml/ 提示词微调 在当前工作目录ragtest下,用于大模型提取实体以及关系等的提示词默认存放在prompts子目录下,可通过settings.yaml修改提示词目录。对于特定领域,默认提示词模板表现不佳。这里我们可以通过官方提供的方法进行提示词微调,替换默认提示词模板,从而提高在特定领域上的表现。 自动模板 可以通过配置domain,language等参数,生成符合我们要求的提示词模板,如下是针对某电影拍摄书籍的一个自动模板提示词微调示例。这样就会在prompt目录下生成新的提示词模板,更符合拍摄领域。 [crayon-69caa0547ca1e179175401/] 详细参数配置可以参考:https://microsoft.github.io/graphrag/posts/prompt_tuning/auto_prompt_tuning/ 手动提示词微调 按照规范自己写一个提示词模板,参考:https://microsoft.github.io/graphrag/posts/prompt_tuning/manual_prompt_tuning/ 执行索引 这一步会通过大模型提取实体,关系等,构建知识图谱,耗时较久。 [crayon-69caa0547ca21057073165/] 搜索 索引阶段提取的结构被用来提供材料,作为LLM的context来回答问题。查询模式包括本地和全局的搜索: 本地搜索:通过图谱中实体关联信息和原始文档相关文本块来推理关于特定实体的问题 全局搜索:通过社区的总结来推理关于语料库整体问题的答案 本地搜索 [crayon-69caa0547ca24768029876/] 全局搜索 [crayon-69caa0547ca27802980376/] 参考 [1] https://microsoft.github.io/graphrag/posts/query/overview/

2025年2月15日 0comments 53hotness 0likes Jeff Read all
Copyright Statement

Unauthorized reproduction or plagiarism in any form is strictly prohibited. For reprint requests, please contact via email.

Recent Comments
MurakLierpef Published at 7 hours ago(03 03202633105 30 30pm26) Sometimes simple visual hallucinations may also oc...
snail Published at 3 days ago(03 03202633105 27 27pm26) 多谢,大佬。醍醐灌顶!
Bramsnawl Published at 4 days ago(03 03202633110 27 27am26) Proper blood collection playing cards are measure ...
NasibDepdrotte Published at 4 days ago(03 03202633110 26 26pm26) Inf ect isC linNo rth A m viiiix, Sm ets o urgo is...
Pereplanirovka kvartir_cvsr Published at 5 days ago(03 03202633105 25 25pm26) перепланировка услуги [url=https://pereplanirovka-...
Ad

COPYRIGHT © 2026 jianchihu.net. ALL RIGHTS RESERVED.

Theme Kratos Made By Seaton Jiang