<?xml version="1.0"?>
<?xml-stylesheet type="text/css" href="http://www.cslt.org/mediawiki/skins/common/feed.css?303"?>
<feed xmlns="http://www.w3.org/2005/Atom" xml:lang="zh-cn">
		<id>http://www.cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Sinovoice-2016-4-14</id>
		<title>Sinovoice-2016-4-14 - 版本历史</title>
		<link rel="self" type="application/atom+xml" href="http://www.cslt.org/mediawiki/index.php?action=history&amp;feed=atom&amp;title=Sinovoice-2016-4-14"/>
		<link rel="alternate" type="text/html" href="http://www.cslt.org/mediawiki/index.php?title=Sinovoice-2016-4-14&amp;action=history"/>
		<updated>2026-04-04T04:34:21Z</updated>
		<subtitle>本wiki的该页面的版本历史</subtitle>
		<generator>MediaWiki 1.23.3</generator>

	<entry>
		<id>http://www.cslt.org/mediawiki/index.php?title=Sinovoice-2016-4-14&amp;diff=19757&amp;oldid=prev</id>
		<title>Zhangzy：以“==Data==  *16K LingYun :* 2000h data ready :* 4300h real-env data to label  * YueYu :* Total 250h(190h-YueYu + 60h-English) :* Add 60h YueYu :* CER: 75%-&gt;76%  * WeiY...”为内容创建页面</title>
		<link rel="alternate" type="text/html" href="http://www.cslt.org/mediawiki/index.php?title=Sinovoice-2016-4-14&amp;diff=19757&amp;oldid=prev"/>
				<updated>2016-04-14T06:32:35Z</updated>
		
		<summary type="html">&lt;p&gt;以“==Data==  *16K LingYun :* 2000h data ready :* 4300h real-env data to label  * YueYu :* Total 250h(190h-YueYu + 60h-English) :* Add 60h YueYu :* CER: 75%-&amp;gt;76%  * WeiY...”为内容创建页面&lt;/p&gt;
&lt;p&gt;&lt;b&gt;新页面&lt;/b&gt;&lt;/p&gt;&lt;div&gt;==Data==&lt;br /&gt;
&lt;br /&gt;
*16K LingYun&lt;br /&gt;
:* 2000h data ready&lt;br /&gt;
:* 4300h real-env data to label&lt;br /&gt;
&lt;br /&gt;
* YueYu&lt;br /&gt;
:* Total 250h(190h-YueYu + 60h-English)&lt;br /&gt;
:* Add 60h YueYu&lt;br /&gt;
:* CER: 75%-&amp;gt;76%&lt;br /&gt;
&lt;br /&gt;
* WeiYu&lt;br /&gt;
:* 50h for training&lt;br /&gt;
:* 120h labeled ready&lt;br /&gt;
&lt;br /&gt;
==Model training==&lt;br /&gt;
===Big-Model Training===&lt;br /&gt;
* 7*2048-10000h net weight-matrix factoring, to improve the decoding speed --SVD&lt;br /&gt;
:* SVD looks OK, but fine-tuning still didn't work.&lt;br /&gt;
  Base WER:&lt;br /&gt;
  relu_2000_mpe_1000H: 17.72&lt;br /&gt;
  relu_1200_mpe_1000H: 18.60&lt;br /&gt;
&lt;br /&gt;
  |layer / nodes retaind|  200  |  400  |  600  |  800  | 1000  | 1200  | 1400  |  1600  |&lt;br /&gt;
  |      hidden 2       |       |       | 22.53 | 20.30 | 19.01 |       |       |        |&lt;br /&gt;
  |      hidden 7       |       | 18.92 | 18.30 | 17.92 |       |       |       |        |     &lt;br /&gt;
  |       final         |       |       | 18.32 | 18.00 | 17.83 |       |       |        |     &lt;br /&gt;
* 7*1024 cross-entropy total train, then mpe, 0.2 improvment&lt;br /&gt;
* 7*1024 svd factoring, speed the decoding&lt;br /&gt;
&lt;br /&gt;
* 8k&lt;br /&gt;
&lt;br /&gt;
===Embedding===&lt;br /&gt;
* 10000h-chain 5*400+800 DONE.&lt;br /&gt;
:* Beam affect the performance of chain model significantly, need more investigation.&lt;br /&gt;
* 5*576-2400 TDNN model&lt;br /&gt;
&lt;br /&gt;
===SinSong Robot===&lt;br /&gt;
* Test based on 10000h(7*2048-xent) model&lt;br /&gt;
  ------------------------------------------------&lt;br /&gt;
    condition | clean  | replay(0.5m) | real-env&lt;br /&gt;
  ------------------------------------------------&lt;br /&gt;
      wer     |   3    |  18(mpe-14)  | too-bad&lt;br /&gt;
  ------------------------------------------------&lt;br /&gt;
&lt;br /&gt;
* Plan to record in restaurant on April 10.&lt;br /&gt;
&lt;br /&gt;
===Character LM===&lt;br /&gt;
*Except Sogou-2T, 9-gram has been done.&lt;br /&gt;
*Worse than word-lm(9%-&amp;gt;6%)&lt;br /&gt;
*Add word boundary tag to Character-LM trainig&lt;br /&gt;
*Merge Character-LM  &amp;amp; word-LM&lt;br /&gt;
:* Union&lt;br /&gt;
:* Compose, success.&lt;br /&gt;
* 2-step decoding: first, character-based LM. Then, word-based LM.&lt;br /&gt;
&lt;br /&gt;
===Project===&lt;br /&gt;
* Pingan &amp;amp; Yueyu Deletion error too more&lt;br /&gt;
:* TDNN deletion error rate &amp;gt; DNN deletion error rate&lt;br /&gt;
:* TDNN Silence scale is too sensitive for different test cases.&lt;br /&gt;
 &lt;br /&gt;
==SID==&lt;br /&gt;
===Digit===&lt;br /&gt;
* Same Channel test EER: 100% &lt;br /&gt;
:* Speaker confirm&lt;br /&gt;
:* phone channel&lt;br /&gt;
&lt;br /&gt;
* Cross Channel&lt;br /&gt;
:* Mic-wav PLDA adaptation EER from 9% to 7% (20-30 persons)&lt;/div&gt;</summary>
		<author><name>Zhangzy</name></author>	</entry>

	</feed>