[15.07] CRNN

ç§ã¯å šéšæ¬²ããïŒâ
OCR ã¯å€ãåŠåã§ããã深局åŠç¿ã®æšé²ã«ãããåã³æ³šç®ãéããŠããŸãã
åé¡ã®å®çŸ©â
深局åŠç¿ã®æ®åã«äŒŽããOCRïŒå åŠæåèªèïŒåéã¯æ°ããªçºå±ã®æ©äŒãè¿ããŠããŸããåŸæ¥ã®æäœæ¥ã§èšèšãããç¹åŸŽã«åºã¥ãåé¡æ¹æ³ãšæ¯èŒããŠãCNNïŒç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒãå©çšãã OCR æ¹æ³ã¯ãããé«ãæ§èœãšåŒ·åãªäžè¬åèœåã瀺ããŠããŸãã
ãã®äžã§æãéèŠãªå©ç¹ã® 1 ã€ã¯ãCNN ãç»åããç¹åŸŽãèªåçã«åŠç¿ã§ããããšã§ããããã«ãããæäœæ¥ã§ç¹åŸŽãèšèšãéžæããå¿ èŠããªããªãã倧éã®äººä»¶è²»ãç¯çŽã§ããèšç®ãªãœãŒã¹ã®æ¶è²»ãæžå°ããŸãã
ããããæåèªèã«ãããéèŠãªåé¡ã¯äŸç¶ãšããŠååšããŸãïŒã©ã®ããã«å¹æçã«åé¡ãããïŒ
æçµçãªæååé¡ã«ã¯ããã€ãã®äž»æµã®æ¹æ³ããããŸãããŸããé£ã®CHARããç»åã 1 æåããŠã¿ãŸãããïŒ
1. èŸæžã®å®çŸ©â
èŸæžãšã³ã³ãŒãã£ã³ã°ã¯ãæãç°¡åã§çŽæ¥çãªæ¹æ³ã§ãããèŸæžãšã³ã³ãŒãã£ã³ã°ãšãåŒã°ããŸãããã®æ¹æ³ã§ã¯ãã·ã¹ãã ãæåã«èŸæžãå®çŸ©ããŸãããã®èŸæžã«ã¯ããã¹ãŠã®å¯èœãªã©ãã«ïŒéåžžã¯åèªããã¬ãŒãºïŒãå«ãŸããŸããã·ã¹ãã ãç»åãåŠçãããšãèªèãããæåã¯èŸæžã®ããããã®ãšã³ããªã«åé¡ãããŸããç°¡åã«èšãã°ãã¢ãã«ã®ä»»åã¯ãäºåã«æ±ºããããåèªãªã¹ãããæé©ãªåèªãéžã¶ããšã§ãã
ç»åå ã®æåãèŸæžã«ãªãå Žåããã®æ¹æ³ã¯æ©èœããªããªããŸããã¢ãã«ã¯èŸæžå€ã®å 容ãèªèããããšãã§ããŸãããããã¯ãèŸæžå€ã®ã©ã³ãã ãªæååïŒäŸãã°ã©ã³ãã ã«çæããããã¹ã¯ãŒããé»è©±çªå·ãªã©ïŒãåŠçããå Žåã«ã¯å¹æãèããªããŸãã
ããã«ãåŠçããå¿ èŠãããèŸæžãéåžžã«å€§ããå ŽåïŒäŸãã°æ°åäžã®ãšã³ããªïŒãã·ã¹ãã ã®å¹çã倧ãã圱é¿ãåããèŸæžã®æ¡åŒµæ§ãäœããªããŸãã
2. æååã·ãŒã±ã³ã¹ãšã³ã³ãŒãã£ã³ã°â
æååã·ãŒã±ã³ã¹ãšã³ã³ãŒãã£ã³ã°ã¯ããã 1 ã€ã®äžè¬çãªåé¡æ¹æ³ã§ãããèŸæžãšã³ã³ãŒãã£ã³ã°ãšã¯ç°ãªããäºåå®çŸ©ãããèŸæžã«äŸåãããæåãçŽæ¥æååã®ã·ãŒã±ã³ã¹ã«åé¡ããŸããããã¯ãã·ã¹ãã ãå ·äœçãªåèªãç¥ãå¿ èŠã¯ãªããç»åå ã®åæåãåé¡ããŠæçµçã«å®å šãªæååã·ãŒã±ã³ã¹ãæ§ç¯ããããšãæå³ããŸãã
æååã·ãŒã±ã³ã¹ãšã³ã³ãŒãã£ã³ã°ã¯ãèŸæžãšã³ã³ãŒãã£ã³ã°ãããææŠçã§ããã¢ãã«ã¯åæåã®ç¹åŸŽãåŠç¿ããæåãæ£ç¢ºã«çµã¿åãããŠå®å šãªã·ãŒã±ã³ã¹ãæ§ç¯ããå¿ èŠããããã¢ãã«ã®èœåã«å¯ŸããèŠæ±ãé«ããªããŸããæååéã«ã¯äŸåé¢ä¿ããããããã¢ãã«ã¯è¯å¥œãªæèç解èœåãæã£ãŠããªããã°ã誀ã£ãæåã®çµã¿åãããçæããå¯èœæ§ããããŸãã
3. N-gram ãšã³ã³ãŒãã£ã³ã°â
Bag-of-N-gram ãšã³ã³ãŒãã£ã³ã°ãN-gram ãšã³ã³ãŒãã£ã³ã°ã¯ãåèªãšæåãæ··åããæ¹æ³ã§ããæè¡·çãªè§£æ±ºçã§ããN-gram ã¯ãN åã®æåã§æ§æãããã·ãŒã±ã³ã¹ã§ãããN 㯠2ïŒbi-gramïŒã 3ïŒtri-gramïŒãããã«ã¯ãã以äžã®ã·ãŒã±ã³ã¹ã«ãªãããšããããŸãããã®ãšã³ã³ãŒãã£ã³ã°æ¹æ³ã¯ãåäžã®æåãèå¥ããã ãã§ãªããæåã®çµã¿åãããèå¥ããããšã§ãæèæ å ±ãããè¯ãæããããšãã§ããŸãã
N ã®å¢å ãšãšãã«ãN-gram çµã¿åããã®æ°ã¯æ¥æ¿ã«å¢å ããèšç®ã³ã¹ããé«ããªããŸããç¹ã«é·ãããã¹ãã·ãŒã±ã³ã¹ãåŠçããéãå¹çãäœäžããå¯èœæ§ããããŸããäžéšã®èªåœã«å¯ŸããŠãN-gram ã¯åèªå šäœã®æå³ãæ£ç¢ºã«æããããšãã§ããªãå Žåããããç¹ã« N ãå°ããå ŽåãèŸæžãšã³ã³ãŒãã£ã³ã°ãæååã·ãŒã±ã³ã¹ãšã³ã³ãŒãã£ã³ã°ã»ã©èªè粟床ãé«ããªãå ŽåããããŸãã
以äžãç·åãããšãèè ã¯åªããã¢ãã«ãããã€ãã®èŠæ±ãæºããã¹ãã ãšèããŠããŸãïŒ
- ãšã³ãããŒãšã³ãã®åŠç¿ïŒååŠçãæé ãå¿ èŠãšãããçŽæ¥ç»åããç¹åŸŽãåŠç¿ããã
- ç³ã¿èŸŒã¿ç¹åŸŽïŒæäœæ¥ã§èšèšãããç¹åŸŽã§ã¯ãªããåŠç¿ãããç³ã¿èŸŒã¿ç¹åŸŽã䜿çšããã
- æåã¬ãã«ã®ã¢ãããŒã·ã§ã³äžèŠïŒæåã¬ãã«ã®ã¢ãããŒã·ã§ã³ãªãã§ã¢ãã«ãèšç·Žããç»åããçŽæ¥æååãåŠç¿ããã
- å¶éãªãïŒç¹å®ã®èŸæžã«å¶éããããä»»æã®æååãåŠçã§ããã
- ã¢ãã«èŠæš¡ãå°ããïŒã¢ãã«ã®ä¿å容éãå°ãããå®è¡å¹çãé«ãã
å°ã欲匵ããªããã«èŠããŸãããèè ãã©ã®ããã«è§£æ±ºããã®ãèŠãŠã¿ãŸãããã
åé¡ã®è§£æ±ºâ
ã¢ãã«ã¢ãŒããã¯ãã£â
CRNN ã¢ãã«ã¯äžã€ã®éšåã§æ§æãããŠããŸããäžã®å³ãèŠãªãã説æããŠãããŸãããã
ç³ã¿èŸŒã¿ãããã¯ãŒã¯å±€â
äžã®å³ã®äžçªäžããèŠãŠãããŸããããã€ãŸããç³ã¿èŸŒã¿å±€ïŒConvolutional LayersïŒã®éšåã§ããããã§ã¯ãç§ãã¡ã銎æã¿ã®ããç³ã¿èŸŒã¿ãã¥ãŒã©ã«ãããã¯ãŒã¯ïŒCNNïŒã䜿çšããŠãç»åã®ç¹åŸŽãæœåºããŸãã
ããã§ãæåãå«ãŸããç»åã®å
¥åãµã€ãºã32x128
ã ãšä»®å®ããŸããåãåããèŠããšãããã¯é·ã 128 ã®ã·ãŒã±ã³ã¹ã§ãããåãè¡ã㯠3x32 次å
ã®ç¹åŸŽãã¯ãã«ã§ãïŒRGB ç»åãä»®å®ïŒã
å®éã«ã¯ã次ã®éžæè¢ããããŸãïŒãçŽæ¥å ç»åãã·ãŒã±ã³ã¹ã¢ãã«ã«éããããšã§ãã
ãããããã®æ¹æ³ã§ã¯ã¢ãã«ãéåžžã«è€éã«ãªããã·ãŒã±ã³ã¹ã®é·ããé·ããªããšãã¢ãã«ãèšç·Žãã«ãããªãå¯èœæ§ããããŸããããã§ãèè ã¯æåã«ç»åãç³ã¿èŸŒã¿ãããã¯ãŒã¯ã§ç¹åŸŽãæœåºãããã®ç¹åŸŽãã·ãŒã±ã³ã¹ã¢ãã«ã«éããšããæ¹æ³ãéžã³ãŸããã
ããããç³ã¿èŸŒã¿ãããã¯ãŒã¯ã䜿çšãããšãå¥ã®åé¡ãçºçããŸãïŒã¢ãã«ãã·ãŒã±ã³ã¹æ å ±ã倧ãã倱ã£ãŠããŸãã®ã§ãã
å
çšã®äŸã§ãæåãå«ãŸããç»åã®ãµã€ãºã32x128
ã ãšä»®å®ããäžè¬çãªããã¯ããŒã³ã䜿çšãããšã倧æµã¯ 5 åã®ããŠã³ãµã³ããªã³ã°ãè¡ãããæçµçãªç¹åŸŽãããã®ãµã€ãºã¯1x4
ã«ãªããŸãã
ããã¯æããã«æãŸãããããŸãããäºæž¬ã«äœ¿çšã§ããæ å ±ãå°ãªãããŸãïŒ
ããã§ãèè
ã¯ç³ã¿èŸŒã¿ãããã¯ãŒã¯ã«ããã€ãã®å€æŽãå ããŸãããæ¹æ³ã¯éåžžã«ã·ã³ãã«ã§ãïŒMaxPooling
æäœãå€æŽããã ãã§ããå
ã
ã¯kernel=2, stride=2
ã§ãããããããkernel=(2, 1), stride=(2, 1)
ã«å€æŽããŸããã
ããã«ãããç¹åŸŽãããã®å¹
ã¯å€ããããé«ãã ããæžå°ããŸããã€ãŸãã32x128
ã®ç»åãå
¥åãããšã5 åã®ããŠã³ãµã³ããªã³ã°ãçµãŠãç¹åŸŽãããã®ãµã€ãºã¯1x128
ã«ãªããŸãã
å®è£ ã«ãããŠãã·ãŒã±ã³ã¹ã®å¹ ã倧ããããå¯èœæ§ãèæ ®ããŠãèè ã¯è«æå ã§å¹ ã«å¯Ÿã㊠2 åãé«ãã«å¯Ÿã㊠4 åã®ããŠã³ãµã³ããªã³ã°ãè¡ã£ãŠããŸãã
ååž°ç¥çµç¶²å±€â
次ã«ãäžå€®ã®ååž°å±€ïŒRecurrent LayersïŒã§ãã
å çšãå ¥åãããæåç»åãã·ãŒã±ã³ã¹ããŒã¿ã«å€æããŸãããã次ã«ãããã®ããŒã¿ãã·ãŒã±ã³ã¹ã¢ãã«ã«éãå¿ èŠããããŸããããã§èè 㯠BiLSTM ã䜿çšããŠããã®ã·ãŒã±ã³ã¹ãåŠçããŠããŸãã
LSTM ã¯æ¹è¯ããã RNNïŒåŸªç°ç¥çµãããã¯ãŒã¯ïŒã§ãäž»ã«äŒçµ±ç㪠RNN ã®é·æäŸåæ§åé¡ãæ¹åãããã®ã§ããRNN ã¯æéã¹ããããé²ãã«ã€ããŠã以åã®æ å ±ãä¿æããããšãé£ãããç¹ã«ã·ãŒã±ã³ã¹å ã®é·è·é¢äŸåãåé¡ãšãªããŸããLSTM ã¯ãã¡ã¢ãªã»ã«ãããã²ãŒãæ©æ§ãïŒå ¥åã²ãŒããå¿åŽã²ãŒããåºåã²ãŒããªã©ïŒã䜿çšããŠãæ å ±ã®æµããå¶åŸ¡ããé·æéã®ã·ãŒã±ã³ã¹å ã§éèŠãªæ å ±ãä¿æããäžèŠãªå 容ãæšãŠãããšãã§ããŸããããã«ãããé·ãã·ãŒã±ã³ã¹ããŒã¿ã®åŠçãåŸæã«ãªããŸãã
BiLSTM 㯠LSTM ãšåæ¹åãããã¯ãŒã¯ã®å©ç¹ãçµã¿åããããã®ã§ãã2 ã€ã® LSTM å±€ã䜿çšããŠããŒã¿ãåŠçããŸãã1 ã€ã¯åããåŸãžïŒã·ãŒã±ã³ã¹ã®å é ããæ«å°ŸïŒããã 1 ã€ã¯åŸãããåãžïŒã·ãŒã±ã³ã¹ã®æ«å°Ÿããå é ïŒåŠçããŸããããã«ãããåã¿ã€ã ã¹ãããã®æ å ±ã«ã¯ãäž¡æ¹åããã®æèãå«ãŸããŸããèšèªã¢ãã«ãé³å£°èªèãæ©æ¢°ç¿»èš³ãªã©ã®ã¿ã¹ã¯ã«ãããŠãBiLSTM ã¯å šäœã®æèãããããæããã¢ãã«ã®äºæž¬ç²ŸåºŠãåäžãããããšãã§ããŸãã
å çšã®åé¡ã«æ»ããšãç»å CNN ã§ç¹åŸŽãæœåºããåŸããã®ã·ãŒã±ã³ã¹ã®é·ãã¯ç»åã®å¹ ã«ãªããŸããåã¿ã€ã ã¹ãããã¯ãå ã®ç»åå ã® 1 ã€ã®ãããã¯ã«å¯Ÿå¿ããŸãïŒäžã®å³ã®ããã«ïŒãåã¿ã€ã ã¹ãããã®ãå容éãã¯ç³ã¿èŸŒã¿ãããã¯ãŒã¯ã®èšèšã«ãã£ãŠæ±ºãŸããŸãã
ãã®ç¹åŸŽã·ãŒã±ã³ã¹ã BiLSTM ã«å ¥åãããšãããé«åºŠãªç¹åŸŽè¡šçŸãåŸãããããã䜿çšããŠç»åå ã®æåãäºæž¬ã§ããããã«ãªããŸãã
ã¢ã©ã€ã¡ã³ããããŠããªãã·ãŒã±ã³ã¹ã©ãã«â
ããã¯äžå³ã®ãã©ã³ã¹ã¯ãªãã·ã§ã³ã¬ã€ã€ãŒã«çžåœããŸãã
æåã«è¿°ã¹ãããã«ãæååäœã®ã©ãã«ä»ãã¯éåžžã«æéã®ãããäœæ¥ã§ãã
ãã®ãããæåã®ã¢ã©ã€ã¡ã³ãã·ãŒã±ã³ã¹ã®åé¡ã解決ããããã«ãæ¬è«æã§ã¯ CTCïŒConnectionist Temporal ClassificationïŒæè¡ãå°å ¥ããŠãæåã·ãŒã±ã³ã¹ãäºæž¬ããŸãã
CTC ã¯ãã¢ã©ã€ã¡ã³ããããŠããªãã·ãŒã±ã³ã¹ã©ãã«ã®åé¡ãåŠçããããã«ç¹å¥ã«èšèšãããæè¡ã§ãé次äºæž¬ ã«åºã¥ãã©ãã«ã·ãŒã±ã³ã¹ ã®äžã§ãåã©ãã«ã®äœçœ®ãç¡èŠããŸããããã«ãããé³å£°èªèãææžãæåèªèãªã©ã®ã·ãŒã±ã³ã¹ããŒã¿ã«ç¹ã«é©ããŠããŸãã
åŸæ¥ã®ã·ãŒã±ã³ã¹ã¢ãã«ãšã¯ç°ãªããCTC ã¯åæéã¹ããããå ·äœçãªæåäœçœ®ãšæ£ç¢ºã«å¯Ÿå¿ãããå¿ èŠããªããããã«ãã£ãŠèšç·Žéçšãå€§å¹ ã«ç°¡çŽ åãããŸãã
å ¥åãã·ãŒã±ã³ã¹ ã§ããã ã¯ã·ãŒã±ã³ã¹ã®é·ãã ãšä»®å®ããŸããå ã¯ãã©ãã«éå äžã®ç¢ºçååžã§ãã ã¯ã¿ã¹ã¯ã«ããããã¹ãŠã®ã©ãã«ïŒäŸãã°è±åïŒãå«ã¿ã"blank" ã¯ç©ºçœã©ãã«ãæå³ããŸãããã®ã空çœã©ãã«ãã¯ãç¹å®ã®æéã¹ãããã§æåãåºåããªãããšã瀺ããããèŠåçã§ãªãã·ãŒã±ã³ã¹é·ã«å¯ŸããŠæçšã§ãã
ã¢ãã«ã®é次äºæž¬ãè€æ°ã®æéã¹ãããã§åãæåã空çœã©ãã«ãäºæž¬ããå¯èœæ§ããããããCTC ã¯ãããã³ã°é¢æ° ã䜿çšããŠãããã®åé·éšåãåãé€ããæçµçãªã©ãã«ã·ãŒã±ã³ã¹ãåŸãŸããå
·äœçãªæäœãšããŠã¯ãéè€ããæåãåé€ãã次ã«blank
ã©ãã«ãåé€ããŸãã
äŸãã°ãã¢ãã«ãåèªãhelloãã®é次åºåã次ã®ããã«äºæž¬ãããšããŸãïŒ
--hh-e-l-ll-oo--
ããã§ã-
㯠blank
ã©ãã«ã瀺ããéè€ããæåãçµ±åãããblank
ã¯åãé€ãããŸãããããã³ã°é¢æ° ã«ãã£ãŠåŠçãããåŸããã®ã·ãŒã±ã³ã¹ã¯æ¬¡ã®ããã«ãªããŸãïŒ
hello
CTC ã®ç¹åŸŽã¯ããããã®åé·ãªæ å ±ãåŠçããåºåã·ãŒã±ã³ã¹ãç°¡æœãªã©ãã«ã·ãŒã±ã³ã¹ã«ãããã³ã°ã§ããããšã§ãã
CTC ã§ã¯ãäžããããé次äºæž¬ ã«å¯ŸããŠãã©ãã«ã·ãŒã±ã³ã¹ ã®æ¡ä»¶ä»ã確çãèšç®ããããšèããŠããŸããå€ãã®ç°ãªãé次ã·ãŒã±ã³ã¹ ãåãã©ãã«ã·ãŒã±ã³ã¹ ã«ãããã³ã°ãããå¯èœæ§ããããããCTC ã¯ããããã¹ãŠã®å¯èœãª ã®ç¢ºçãå ç®ããŠãæçµçãªã©ãã«ã·ãŒã±ã³ã¹ ã®ç¢ºçãæ±ããŸãïŒ
ããã§ãåé次ã·ãŒã±ã³ã¹ ã®ç¢ºçã¯æ¬¡ã®ããã«å®çŸ©ãããŸãïŒ
ããã¯ãåæéã¹ããã ã«ãããŠãã¢ãã«ãã©ãã« ãåºåãã確çã瀺ããŠããŸãã
ç解ãå©ããããã«ããcatããšããåèªãåŠçããäŸãæããŸãããã
å ¥åãé³å£°ã®äžéšã§ãããã¢ãã«ãåæéã¹ãããã§æ¬¡ã®ãããªäºæž¬ãè¡ã£ããšä»®å®ããŸãïŒ
æéã¹ããã | c | a | t | blank |
---|---|---|---|---|
1 | 0.6 | 0.1 | 0.1 | 0.2 |
2 | 0.1 | 0.7 | 0.1 | 0.1 |
3 | 0.1 | 0.2 | 0.6 | 0.1 |
4 | 0.2 | 0.2 | 0.2 | 0.4 |
åæéã¹ãããã§ãã¢ãã«ã¯åã©ãã«ïŒç©ºçœã©ãã« blank
ãå«ãïŒã®ç¢ºçäºæž¬ãè¡ããŸããäŸãã°ãæåã®æéã¹ãããã§ã¯ã©ãã« c
ã®ç¢ºçãé«ãã4 çªç®ã®æéã¹ãããã§ã¯ blank
ã®ç¢ºçãé«ãã§ãã
ãã®å Žåãæçµçãªã©ãã«ã·ãŒã±ã³ã¹ãcatãã«ãããã³ã°ãããè€æ°ã®é次äºæž¬ã·ãŒã±ã³ã¹ãèããããŸãïŒ
- ã·ãŒã±ã³ã¹ 㯠"cat" ã«ãããã³ã°å¯èœïŒ
- ã·ãŒã±ã³ã¹ ã "cat" ã«ãããã³ã°å¯èœã
CTC ã¯ããã¹ãŠã®å¯èœãªé次äºæž¬ã·ãŒã±ã³ã¹ã®ç¢ºçãå ç®ããŠãæçµçã«ã©ãã«ã·ãŒã±ã³ã¹ãcatãã®ç·ç¢ºçãåŸãŸããèšç·Žéçšã§ã¯ãã¢ãã«ã®ç®æšã¯æ£ããã©ãã«ã·ãŒã±ã³ã¹ãcatãã®è² ã®å¯Ÿæ°å°€åºŠãæå°åããããšã§ããããã«ãããèšç·Žãé²ãã«ã€ããŠãã¢ãã«ã¯ããæ£ç¢ºãªé次äºæž¬ãåºåããããã«ãªããŸãã
ãã®æ¹æ³ãéããŠãCTC ã¯åæéã¹ãããã§å¯Ÿå¿ããæåãæ確ã«ã©ãã«ä»ããããšããå¹æçã«åŠç¿ããæ£ããã©ãã«ã·ãŒã±ã³ã¹ãäºæž¬ã§ããã®ã§ãã
èšç·ŽæŠç¥â
èè 㯠Jaderberg ããå ¬éããåæããŒã¿ã»ãããå Žé¢æåèªèã®èšç·ŽããŒã¿ãšããŠäœ¿çšããŸããïŒ
- Text Recognition DataïŒãã®ããŒã¿ã»ããã«ã¯ã800 äžæã®èšç·Žç»åãšããã«å¯Ÿå¿ããã©ãã«ä»ãæåãå«ãŸããŠããŸãããããã®ç»åã¯åæãšã³ãžã³ã§çæãããéåžžã«é«ããªã¢ã«ããæã£ãŠããŸãã
ã¢ãã«ã¯ãã®åæããŒã¿ã®ã¿ã§äžåºŠèšç·Žãããã®åŸããã¹ãŠã®å®äžçã®ãã¹ãããŒã¿ã»ããã§ãã¹ããè¡ããŸããããããã®ããŒã¿ã»ããã«ã¯åŸ®èª¿æŽãè¡ã£ãŠããŸããã
ãããã¯ãŒã¯ã®æ§æ詳现ã¯ä»¥äžã®éãã§ãïŒ
- ç³ã¿èŸŒã¿å±€ã®æ§é 㯠VGG æ§é ãããŒã¹ã«ããŠãããè±èªæåèªèã«é©å¿ããããã«èª¿æŽãããŠããŸãã
- 3 çªç®ããã³ 4 çªç®ã®æ倧ããŒãªã³ã°å±€ã§ã¯ãåŸæ¥ã®æ£æ¹åœ¢ã®ããŒãªã³ã°ãŠã£ã³ããŠã§ã¯ãªãã ã®é·æ¹åœ¢ããŒãªã³ã°ãŠã£ã³ããŠãæ¡çšããŠããŸãã
- 5 çªç®ããã³ 6 çªç®ã®ç³ã¿èŸŒã¿å±€ã®åŸã«ã¯ããããããããæ£èŠåå±€ã 2 ã€æ¿å ¥ããŠãããããã«ããèšç·Žéçšãå€§å¹ ã«å éãããŸããã
- èšç·Žã«ã¯ ADADELTA ã¢ã«ãŽãªãºã ã䜿çšãããã©ã¡ãŒã¿ã¯ 0.9 ã«èšå®ããŠããŸããèšç·Žäžããã¹ãŠã®ç»åã¯èšç·Žãå éããããã«ã«ãªãµã€ãºãããŸãã
- ãã¹ãç»åã®é«ã㯠32 ã«ã¹ã±ãŒãªã³ã°ãããå¹ ã¯é«ãã«æ¯äŸããŠã¹ã±ãŒãªã³ã°ãããŸãããå°ãªããšã 100 ãã¯ã»ã«ã«ä¿ãããŸãã
è©äŸ¡ææšâ
èè ã¯ãã¢ãã«ã®æ§èœãè©äŸ¡ããããã«ã以äžã® 4 ã€ã®äžè¬çãªå Žé¢æåèªèãã³ãããŒã¯ããŒã¿ã»ããã䜿çšããŸããïŒ
-
ICDAR 2003 (IC03)
- ãã¹ãã»ããã«ã¯ 251 æã®å Žé¢ç»åãå«ãŸããŠããããããã®ç»åã«ã¯æåã®å¢çããã¯ã¹ãã©ããªã³ã°ãããŠããŸãã
- å è¡ç 究ãšå ¬å¹³ã«æ¯èŒããããã«ãéã¢ã«ãã¡ãããæ°åæåã 3 æåæªæºã®æåãå«ãŸããç»åã¯éåžžç¡èŠãããŸãããã£ã«ã¿ãªã³ã°åŸãæçµçã« 860 æã®åãæãããæåç»åããã¹ãã»ãããšããŠäœ¿çšãããŸãã
- åãã¹ãç»åã«ã¯ 50 åèªã®èªåœïŒèŸæžïŒãä»éããŠãããããã«ããã¹ãŠã®ç»åã®èªåœãçµ±åããå®å šèŸæžããããè©äŸ¡ã«äœ¿çšãããŸãã
-
ICDAR 2013 (IC13)
- ãã¹ãã»ãã㯠IC03 ã®äžéšããŒã¿ãåŒãç¶ããä¿®æ£ããããã®ã§ãæçµçã« 1,015 æã®åãæãããæåç»åãå«ãŸããæ£ç¢ºãªã©ãã«ãæäŸãããŠããŸãã
- IC03 ãšã¯ç°ãªããIC13 ã«ã¯èªåœè¡šãæäŸãããŠããªããããè©äŸ¡æã«ã¯èŸæžè£å©ã¯äœ¿çšãããŸããïŒã€ãŸããèŸæžãªãèšå®ã§ãïŒã
-
IIIT 5K-Word (IIIT5k)
- ãã¹ãã»ããã«ã¯ãã€ã³ã¿ãŒãããããåéããã 3,000 æã®åãæãããæåç»åãå«ãŸããŠãããããå€æ§ãªãã©ã³ããšèšèªã®å€åãã«ããŒããŠããŸãã
- åç»åã«ã¯ 2 ã€ã®èªåœè¡šãæ·»ä»ãããŠãããäžã€ã¯ 50 åèªã®å°ããªèŸæžãããäžã€ã¯ 1,000 åèªãå«ã倧ããªèŸæžã§ãèŸæžè£å©ã®è©äŸ¡ã«äœ¿çšãããŸãã
-
Street View Text (SVT)
- ãã¹ãã»ããã¯ãGoogle ã¹ããªãŒããã¥ãŒããåéããã 249 æã®å Žé¢ç»åã§æ§æããã647 æã®æåç»åãåãæãããŠããŸãã
- åæåç»åã«ã¯ 50 åèªã®èªåœè¡šãæ·»ä»ãããŠãããèŸæžè£å©ã®è©äŸ¡ã«äœ¿çšãããŸãã
èšè«â
ã¢ãã«ã®å€è§çæ¯èŒâ
CRNN ãä»ã®æ¹æ³ãšæ¯ã¹ãŠåªããŠããç¹ãããå æ¬çã«ç€ºãããã«ãèè ã¯äžèšã®è¡šãæäŸããŠããŸãïŒ
- E2E TrainïŒç«¯å°ç«¯ã®èšç·ŽããµããŒããããŠããããååŠçãã¹ãããå¥ã®æäœã¯äžèŠã
- Conv FtrsïŒæäœæ¥ã§èšèšãããç¹åŸŽã§ã¯ãªããèšç·Žç»åããåŠç¿ããç³ã¿èŸŒã¿ç¹åŸŽã䜿çšãããã
- CharGT-FreeïŒã¢ãã«ã®èšç·Žã«æåã¬ãã«ã®ã©ãã«ä»ããå¿ èŠãã©ããã
- UnconstrainedïŒç¹å®ã®èŸæžã«å¶éãããããšãªããèŸæžå€ã®åèªãã©ã³ãã ãªã·ãŒã±ã³ã¹ãåŠçã§ãããã
- Model SizeïŒã¢ãã«ã®ã¹ãã¬ãŒãžãµã€ãºã
è¡šãèŠãŠã¿ããšãCRNN ã¯å€ãã®ç¹ã§åªããŠããŸããäŸãã°ã端å°ç«¯èšç·ŽããµããŒãããæåã¬ãã«ã®ã©ãã«ä»ããå¿ èŠãšãããç¹å®ã®èŸæžã«å¶éãããããšãªãããŸãã¢ãã«ã®ãµã€ãºãå°ãããšããç¹åŸŽããããŸãã
éå»ã®æ¹æ³ãšã®æ¯èŒâ
äžè¡šã¯ãCRNN ã¢ãã«ã 4 ã€ã®å ¬å ±ããŒã¿ã»ããã§ã®èªè粟床ã瀺ããææ°ã®æ·±å±€åŠç¿ã¢ãã«ãšã®æ¯èŒãè¡ã£ããã®ã§ãã
å¶éãããèŸæžïŒconstrained lexiconïŒã§ã®ç¶æ³äžã§ã¯ãCRNN ã¯ã»ãšãã©ã®ãã³ãããŒã¯ã§ä»ã®æ¹æ³ãäžåããææ¡ãããæè¯ã®æåèªèã¢ãã«ãå¹³åçã«è¶ ããŠããŸããç¹ã« IIIT5k ããã³ SVT ããŒã¿ã»ããã§é¡èãªææãäžããŠããŸãã
CRNN ã¯äºã決ããããèŸæžã«äŸåãããã©ã³ãã ãªæååïŒäŸãã°é»è©±çªå·ïŒãæç« ãä»ã®ã¿ã€ãã®æåïŒäŸãã°äžåœèªïŒãèªèå¯èœã§ãããã«ãããã¹ãŠã®ãã¹ãããŒã¿ã»ããã§ç«¶äºåã瀺ããŠããŸãã
èŸæžãªãïŒunconstrained lexiconïŒã®ç¶æ³ã§ã¯ãCRNN 㯠SVT ããŒã¿ã»ããã§æè¯ã®çµæãéæããŠããŸãã
ãã®è«æã§ã¯ãè¡šãèè åã§èšèŒããããšãå€ããæ¢ãã«ãããããèå³ã®ããèªè ã¯åæãåç §ããããšããå§ãããŸãã
äžè¬åã®æ¡åŒµâ
æåã ãã OCR ã§ã¯ãããŸããã
CRNN ã¯æåèªèã ãã§ãªããä»ã®åéã«ãå¿çšå¯èœã§ãäŸãã°å åŠé³æ¥œèªèåé¡ïŒOMRïŒã«ãå©çšã§ããŸãã
åŸæ¥ã® OMR æ¹æ³ã§ã¯ãç»åã®ååŠçïŒäŸãã°äºå€åïŒãäºç·èæ€åºãããã³åå¥ã®é³ç¬Šèªèãå¿ èŠã§ããèè 㯠OMR ãã·ãŒã±ã³ã¹èªèåé¡ã«è»¢æããCRNN ã䜿ã£ãŠç»åããé³ç¬Šã·ãŒã±ã³ã¹ãçŽæ¥äºæž¬ããŸããã
- ç°¡ç¥åã®ããã«ãããã§ã¯é³é«ã®èªèã«ã®ã¿çŠç¹ãåœãŠãåé³ã¯ç¡èŠãããã¹ãŠã®æ¥œè㯠C ã¡ãžã£ãŒèª¿ã§ãããšä»®å®ããŠããŸãã
CRNN ã®èšç·ŽããŒã¿ãæºåããããã«ãèè 㯠2650 æã® musescore ãµã€ãããã®ç»åãåéããŸããïŒ
åç»åã«ã¯ 3ã20 åã®é³ç¬Šçãå«ãŸããæåã§é³é«ã®ã·ãŒã±ã³ã¹ãã©ããªã³ã°ãããå転ãçž®å°ããã€ãºè¿œå ãªã©ã®ããŒã¿æ¡åŒµæè¡ã䜿çšããŠèšç·Žãµã³ãã«ã 265k æã«å¢å ãããŸããã
æ¯èŒã®ãããèè ã¯äºã€ã®åæ¥ OMR ãšã³ãžã³ãCapella ScanãšPhotoScoreãè©äŸ¡ããŸããã
äžè¡šã«ç€ºãããã«ãCRNN ã¯ãã¹ãŠã®ããŒã¿ã»ããã«ãããŠãããäºã€ã®åæ¥ã·ã¹ãã ã倧ããäžåã£ãŠããŸãã
Capella Scan ãš PhotoScore ã¯ãCleanãããŒã¿ã»ããã§ã¯æ¯èŒçè¯ãçµæãåºããŠããŸãããåæããŒã¿ãå®äžçã®ããŒã¿ã§ã¯é¡èã«ããã©ãŒãã³ã¹ãäœäžããŸãã
ãã®äž»ãªçç±ã¯ããããã®ã·ã¹ãã ãäºç·èãšé³ç¬Šã®æ€åºã«é 匷ãªäºå€åã«äŸåããŠãããããåæããŒã¿ãå®äžçã®ã·ãŒã³ã§ã¯ç §ææ¡ä»¶ã®äžè¯ããã€ãºã®å¹²æžãèæ¯ã®æ··ä¹±ã«ããäºå€åã倱æããããããã§ããå¯Ÿç §çã«ãCRNN ã¯ç³ã¿èŸŒã¿ç¹åŸŽã䜿çšããŠããããã€ãºãå€åœ¢ã«å¯ŸããŠéåžžã«é å¥ã§ãã
ããã«ãCRNN ã®ååž°å±€ã¯æ¥œèå ã®æèæ å ±ã掻çšã§ããåé³ç¬Šã®èªèã¯ããèªäœã«äŸåããã ãã§ãªããè¿æ¥ããé³ç¬Šãåç §ã§ããŸããäŸãã°ãé³ç¬Šã®åçŽäœçœ®ãæ¯èŒããããšã§ãç¹å®ã®é³ç¬Šãããæ£ç¢ºã«èå¥ã§ããããã«ãªããŸãã
çµè«â
åãã«èè ãææ¡ããèŠä»¶ãå確èªããŠã¿ãŸãããïŒ
- 端å°ç«¯èšç·ŽïŒååŠçãã¹ãããå¥æäœãäžèŠã§ãçŽæ¥ç»åããç¹åŸŽãåŠç¿ã
- ç³ã¿èŸŒã¿ç¹åŸŽïŒæäœæ¥ã§èšèšãããç¹åŸŽã§ã¯ãªããèšç·Žç»åããåŠç¿ããç³ã¿èŸŒã¿ç¹åŸŽã䜿çšã
- æåã¬ãã«ã®ã©ãã«äžèŠïŒæåã¬ãã«ã®ã©ãã«ä»ããå¿ èŠãšãããç»åããçŽæ¥æååãåŠç¿ã
- å¶éãªãïŒç¹å®ã®èŸæžã«å¶éãããããšãªããä»»æã®æååãåŠçã§ããã
- ã¢ãã«ãµã€ãºå°ïŒã¢ãã«ã®ã¹ãã¬ãŒãžãµã€ãºãå°ãããå¹ççã«å®è¡å¯èœã
ããããã¹ãŠã CRNN ã¯å®çŸããŠããããŸãã« OCR ã«ãããéåå¡çãªäœåã§ããOCR ã«é¢ãããã¹ãŠã®äººã«äžèªããå§ãããŸãã