IEICE TRANSACTIONS on Information

Impact Factor

0.59
Eigenfactor

0.002
article influence

0.1
Cite Score

1.4

To the Advance publication
To the Archives

Advance publication (published online immediately after acceptance)

A+Block: Web Security Add-on Service by Just Pointing to Your Website URL
Shota FUJII Shohei KAKEI Masanori HIROTOMO Makoto TAKITA Yoshiaki SHIRAISHI Masami MOHRI Hiroki KUZUNO Masakatu MORII

Pubricized:
2025/02/06
PAPER
- Summary
- Free PDF (647.3KB)
Aspect-level Cross-linguistic Multi-layer Sentiment Analysis: A Case Study on User Preferences for Mask Attributes During the COVID-19 Pandemic
Haoran LUO Tengfei SHAO Tomoji KISHI Shenglei LI

Pubricized:
2025/01/31
PAPER
- Summary
- Free PDF (3.2MB)
Handwritten Character Image Generation for Effective Data Augmentation
Chee Siang LEOW Tomoki KITAGAWA Hideaki YAJIMA Hiromitsu NISHIZAKI

Pubricized:
2025/01/31
PAPER
- Summary
- Free PDF (4.2MB)
Solve Point Competition with Label Reassignment in Unmanned Aerial Vehicle Object Detector
Dengtian YANG Lan CHEN Xiaoran HAO

Pubricized:
2025/01/27
LETTER
- Summary
- Free PDF (2.3MB)
DPDF-AEC: MULTI-SCALE DUAL PATH CONVOLUTION RECURRENT NETWORK WITH DEEP FILTERING BLOCK FOR ACOUSTIC ECHO CANCELLATION
Rong HUANG Yue XIE

Pubricized:
2025/01/23
LETTER
- Summary
- Free PDF (1.5MB)
Detecting Praising Behaviors Based on Multimodal Information
Toshiki ONISHI Asahi OGUSHI Ryo ISHII Akihiro MIYATA

Pubricized:
2025/01/23
PAPER
- Summary
- Free PDF (250.2KB)
PICSU: A Japanese Vocabulary Learning System with Object Recognition and Lexical Database
Meihua XUE Kazuki SUGITA Koichi OTA Wen GU Shinobu HASEGAWA

Pubricized:
2025/01/23
PAPER
- Summary
- Free PDF (1.7MB)
A Spatial Graph Convolutional Network based on Node Semantic for Graph Classification
Jinyong SUN Zhiwei DONG Zhigang SUN Guoyong CAI Xiang ZHAO

Pubricized:
2025/01/20
PAPER
- Summary
- Free PDF (1.5MB)
Societal Bias in Image Captioning: Identifying and Measuring Bias Amplification
Yusuke HIROTA Yuta NAKASHIMA Noa GARCIA

Pubricized:
2025/01/20
PAPER
- Summary
- Free PDF (1.8MB)
Mitigating Gender Bias Amplification in Image Captioning
Yusuke HIROTA Yuta NAKASHIMA Noa GARCIA

Pubricized:
2025/01/20
PAPER
- Summary
- Free PDF (2.1MB)
Toward an Understanding of Musical Factors in Judging a Song on First Listen
Kosetsu TSUKUDA Tomoyasu NAKANO Masahiro HAMASAKI Masataka GOTO

Pubricized:
2025/01/17
PAPER
- Summary
- Free PDF (1.5MB)
Hail Intelligent Recognition Algorithm Based on HAM-Unet
ZhengYu LU PengFei XU

Pubricized:
2025/01/17
PAPER
- Summary
- Free PDF (1.3MB)
Title Information for Transformer-based Japanese Document Emphasis
Binggang ZHUO Ryota HONDA Masaki MURATA

Pubricized:
2025/01/16
PAPER
- Summary
- Free PDF (2.4MB)
Improved Quantum Approximate Optimization Algorithm based on Conditional Value-at-Risk for Portfolio Optimization
Qingqing YU Rong JIN

Pubricized:
2025/01/15
PAPER
- Summary
- Free PDF (2.4MB)
An Interpretable Multi-level Feature Disentanglement Algorithm for Speech Emotion Recognition
Huawei TAO Ziyi HU Sixian LI Chunhua ZHU Peng LI Yue XIE

Pubricized:
2025/01/10
LETTER
- Summary
- Free PDF (1.5MB)
A Lightweight Dendritic ShuffleNet for Medical Image Classification
Qianhang DU Zhipeng LIU Yaotong SONG Ningning WANG Zeyuan JU Shangce GAO

Pubricized:
2025/01/10
PAPER
- Summary
- Free PDF (850.2KB)
Section Min-Hash Approximating Time Series Search based on Dynamic Time Warping
Ryota TOMODA Hisashi KOGA

Pubricized:
2025/01/09
PAPER
- Summary
- Free PDF (872.2KB)
Communication Performance of ROS and ROS 2-based IoT Systems for Smart Home Applications
Reina SASAKI Atsuko TAKEFUSA Hidemoto NAKADA Masato OGUCHI

Pubricized:
2025/01/09
PAPER
- Summary
- Free PDF (1.6MB)
Non-Cooperative Rational Synthesis Problem for Probabilistic Strategies
So KOIDE Yoshiaki TAKATA Hiroyuki SEKI

Pubricized:
2025/01/09
LETTER
- Summary
- Free PDF (114.5KB)
Sports performance prediction for college students through ensemble learning algorithm
Huang Rong Qian Zewen Ma Hao Han Zhezhe Xie Yue

Pubricized:
2025/01/07
PAPER
- Summary
- Free PDF (855.5KB)
Pre-trained BERT Model Retrieval: Inference-Based No-Learning Approach using k-Nearest Neighbour Algorithm
Huu-Long PHAM Ryota MIBAYASHI Takehiro YAMAMOTO Makoto P. KATO Yusuke YAMAMOTO Yoshiyuki SHOJI Hiroaki OHSHIMA

Pubricized:
2025/01/07
PAPER
- Summary
- Free PDF (996KB)
GAMPALv2: An Anomaly Detection Mechanism for Internet Traffic by Predicting Flow Size Range from Time Features
Taku WAKUI Fumio TERAOKA Takao KONDO

Pubricized:
2025/01/07
PAPER
- Summary
- Free PDF (1.9MB)
Copy-Move Forgery Detection via Dimensionality Reduction and Double Quantization Feature
Shaobao Wu Zhihua Wu Meixuan Huang

Pubricized:
2025/01/06
PAPER
- Summary
- Free PDF (5.1MB)
POEM: Pruning with Output Error Minimization for Compressing Deep Neural Networks
Koji KAMMA Toshikazu WADA

Pubricized:
2024/12/27
PAPER
- Summary
- Free PDF (532.8KB)
Structural Relation Multi-class Token Transformer for Weakly Supervised Semantic Segmentation
Dingjie PENG Wataru KAMEYAMA

Pubricized:
2024/12/25
PAPER
- Summary
- Free PDF (1.6MB)
Spotlight Contents Extraction from Text-based Online Discussion
Zhizhong WANG Wen GU Zhaoxing LI Koichi OTA Shinobu HASEGAWA

Pubricized:
2024/12/20
PAPER
- Summary
- Free PDF (776.2KB)
Evaluation of Sense Embeddings with Optimized Vector-Meaning Correspondence
Tomoaki YAMAZAKI Seiya ITO Kouzou OHARA

Pubricized:
2024/12/20
PAPER
- Summary
- Free PDF (1MB)
A Necessary and Sufficient Condition for Controlled Generation of Right Linear Grammars with Unknown Behaviors
Daihei ISE Satoshi KOBAYASHI

Pubricized:
2024/12/19
PAPER
- Summary
- Free PDF (424.5KB)
Online Communication Environment Design for Encouraging Reciprocal Use of Information among Groups
Masanari ICHIKAWA Yugo TAKEUCHI

Pubricized:
2024/12/19
PAPER
- Summary
- Free PDF (3.5MB)
Self-supervised Neural Architecture Search for Multimodal Deep Neural Networks
Shota SUZUKI Satoshi ONO

Pubricized:
2024/12/18
LETTER
- Summary
- Free PDF (361.6KB)
Enhancing GPU Performance through Complexity-Effective Out-of-Order Execution using Distance-based ISA
Reoma MATSUO Toru KOIZUMI Hidetsugu IRIE Shuichi SAKAI Ryota SHIOYA

Pubricized:
2024/12/16
PAPER
- Summary
- Free PDF (4.2MB)
A Transformer-based fully trainable point process
Hirotaka HACHIYA Fumiya NISHIZAWA

Pubricized:
2024/12/12
PAPER
- Summary
- Free PDF (1.4MB)
Removing Mislabeled Data from Trained Models via Machine Unlearning
Issa SUGIURA Shingo OKAMURA Naoto YANAI

Pubricized:
2024/12/11
PAPER
- Summary
- Free PDF (261.5KB)
A component placement mechanism for latency-constrained applications in cloud-edge environments
Mudai KOBAYASHI Mohammad Mikal Bin Amrul Halim Gan Takahisa SEKI Takahiro HIROFUCHI Ryousei TAKANO Mitsuhiro KISHIMOTO

Pubricized:
2024/12/11
LETTER
- Summary
- Free PDF (476.1KB)
AdaCF: Adaptively Updating Adjacency Matrix in Collaborative Filtering
Chi ZHANG Luwei ZHANG Toshihiko YAMASAKI

Pubricized:
2024/12/10
PAPER
- Summary
- Free PDF (1.6MB)
CLOCK-DPP: Hybrid disk buffer replacement policy for SSDs with dirty page preservation for write intensive environments
Jung Min Lim Wonho Lee Jun-Hyeong Choi Jong Wook Kwak

Pubricized:
2024/12/10
LETTER
- Summary
- Free PDF (1.2MB)
Propagation-based Code Clone Analysis for Detecting Smart Contract Vulnerability
Zhuo ZHANG Donghui LI Kun JIANG Ya LI Junhu WANG Xiankai MENG

Pubricized:
2024/12/10
LETTER
- Summary
- Free PDF (1.9MB)
Performance Enhancement of the LFSR-Based Unpredictable Random Number Generator in Rocket Core
Takayoshi SHIKANO Shuichi ICHIKAWA

Pubricized:
2024/12/10
PAPER
- Summary
- Free PDF (198.4KB)
BMR: a new BBR algorithm with moderate delivery rate
Shotaro ISHIKURA Ryosuke MINAMI Miki YAMAMOTO

Pubricized:
2024/12/10
PAPER
- Summary
- Free PDF (1.5MB)
Clinical Learning Order-guided Deep Neural Network For Brain Tumor Segmentation
Pengfei ZHANG Jinke WANG Yuanzhi CHENG Shinichi TAMURA

Pubricized:
2024/12/06
PAPER
- Summary
- Free PDF (1.9MB)
A Boosting Method Based on Center-of-gravity Oversampling and Pruning for Classifying Imbalanced Data
Fengqi GUO Qicheng LIU

Pubricized:
2024/12/05
PAPER
- Summary
- Free PDF (1.7MB)
Multi-modal Fake News Detection Enhanced by Fine-grained Knowledge Graph
Runlong HAO Hui LUO Yang LI

Pubricized:
2024/11/29
PAPER
- Summary
- Free PDF (2.2MB)
FusionReg: LiDAR-Camera Fusion Regression Enhancement for 3D Object Detection
Rongchun XIAO Yuansheng LIU Jun ZHANG Yanliang HUANG Xi HAN

Pubricized:
2024/11/27
PAPER
- Summary
- Free PDF (2.6MB)
An Accelerated Integrity-secured Name Resolution Architecture Using Two Full-service Resolvers with and without DNSSEC Validation in Parallel
Yong JIN Kazuya IGUCHI Nariyoshi YAMAI Rei NAKAGAWA Toshio MURAKAMI

Pubricized:
2024/11/22
PAPER
- Summary
- Free PDF (1010.6KB)
Leveraging Heterogeneous Programmable Data Planes for Security and Privacy of Cellular Networks, 5G & Beyond
Toru HASEGAWA Yuki KOIZUMI Junji TAKEMASA Jun KURIHARA Toshiaki TANAKA Timothy WOOD K. K. RAMAKRISHNAN

Pubricized:
2024/11/21
INVITED PAPER
- Summary
- Free PDF (10.1MB)
DGA-based Malware Communication Detection from DoH Traffic Using Hierarchical Machine Learning Analysis
Rikima MITSUHASHI Yong JIN Katsuyoshi IIDA Yoshiaki TAKAI

Pubricized:
2024/11/21
PAPER
- Summary
- Free PDF (3.8MB)
Exploiting Multi-Level Data Uncertainty for Japanese-Chinese Neural Machine Translation
Zezhong LI Jianjun MA Fuji REN

Pubricized:
2024/11/19
LETTER
- Summary
- Free PDF (275.3KB)
Improving Sentiment Analysis with an Ensemble Transformers Model on Health Pandemic based on Twitter data
Lorenzo Mamelona TingHuai Ma Jia Li Bright Bediako-Kyeremeh Benjamin Kwapong Osibo

Pubricized:
2024/11/19
PAPER
- Summary
- Free PDF (1.1MB)
APW: Asymmetric Padded Winograd to reduce thread divergence for computational efficiency on SIMT architecture
Wonho LEE Jong Wook KWAK

Pubricized:
2024/11/14
LETTER
- Summary
- Free PDF (1.1MB)
Effects of Numerical Method Selection on Fully-Pipelined FPGA Accelerators for Neural Simulations
Xiaoxiao ZHOU Yukinori SATO

Pubricized:
2024/11/13
PAPER
- Summary
- Free PDF (1.5MB)
A Text-to-Lyrics Generation Method Leveraging Image-based Semantics and Reducing Plagiarism Risk
Kento WATANABE Masataka GOTO

Pubricized:
2024/11/13
PAPER
- Summary
- Free PDF (1.6MB)
Multimodal Voice Activity Projection for Turn-taking and Effects on Speaker Adaptation
Kazuyo ONISHI Hiroki TANAKA Satoshi NAKAMURA

Pubricized:
2024/11/13
PAPER
- Summary
- Free PDF (1.6MB)
On a Perturbation Concept in Regular Interconnection Networks
Takashi YOKOTA Kanemitsu OOTSU

Pubricized:
2024/11/12
LETTER
- Summary
- Free PDF (217.3KB)
Bayesian-Optimization-based auto Optical Mark array recognition for flexible paper answer sheet
Chenbo SHI Wenxin SUN Jie ZHANG Junsheng ZHANG Chun ZHANG Changsheng ZHU

Pubricized:
2024/11/12
PAPER
- Summary
- Free PDF (3.9MB)
Selecting Source Code Generation Tools Based on Bandit Algorithms
Masateru TSUNODA Ryoto SHIMA Amjed TAHIR Kwabena Ebo BENNIN Akito MONDEN Koji TODA Keitaro NAKASAI

Pubricized:
2024/11/11
LETTER
- Summary
- Free PDF (130.3KB)
On the Application of Bandit Algorithm for Selecting Clone Detection Methods
Masateru TSUNODA Takuto KUDO Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI Kenichi MATSUMOTO

Pubricized:
2024/11/11
LETTER
- Summary
- Free PDF (1.1MB)
Lightweight Neural Data Sequence Modeling by Scale Causal Blocks
Hiroaki AKUTSU Ko ARAI

Pubricized:
2024/11/08
PAPER
- Summary
- Free PDF (1.6MB)
Recaptured Image Detection Based on Multi-Scale Residual Features of Discriminative Regions
Lanxi LIU Pengpeng YANG Suwen DU Sani M. ABDULLAHI

Pubricized:
2024/11/08
PAPER
- Summary
- Free PDF (5.9MB)
Learn Discriminative Features for Small Object Detection through Multi-scale Image Degradation with Contrastive Learning
Xiaoguang TU Zhi HE Gui FU Jianhua LIU Mian ZHONG Chao ZHOU Xia LEI Juhang YIN Yi HUANG Yu WANG

Pubricized:
2024/11/05
PAPER
- Summary
- Free PDF (1.4MB)
Joint Distribution-Aligned Dual-Sparse Linear Regression for Cross-Stimulus Speech-Based Depression Detection
Yingying LU Cheng LU Yuan ZONG Feng ZHOU Chuangao TANG

Pubricized:
2024/11/01
LETTER
- Summary
- Free PDF (229.4KB)
Multi-grained Guaranteeable Requirement Analysis for Iterative Adaptation
Jialong LI Takuto YAMAUCHI Takanori HIRANO Jinyu CAI Kenji TEI

Pubricized:
2024/10/31
PAPER
- Summary
- Free PDF (567.6KB)
A fully digital transmitting-receiving platform for MIMO radar waveform diversity experiment
Wei LEI Yue ZHANG Hanfeng XIE Zebin CHEN Zengping CHEN Weixing LI

Pubricized:
2024/10/30
PAPER
- Summary
- Free PDF (7MB)
Leveraging Different Boolean Function Decompositions to Reduce T-Count in LUT-based Quantum Circuit Synthesis
David CLARINO Naoya ASADA Atsushi MATSUO Shigeru YAMASHITA

Pubricized:
2024/10/30
PAPER
- Summary
- Free PDF (1.3MB)
Criticality and Tolerance in Injection Timing in Cup-Stacking Method for Collective Communication
Takashi YOKOTA Kanemitsu OOTSU

Pubricized:
2024/10/28
PAPER
- Summary
- Free PDF (1.5MB)
An anchor-free Siamese tracker with multi-attention and corner detection mechanism
Xiaokang Jin Benben Huang Hao Sheng Yao Wu

Pubricized:
2024/10/28
PAPER
- Summary
- Free PDF (3MB)
Effect of Politeness on Trust in Re-enter Requests to User by Smart Speaker -Pilot Study-
Tomoki MIYAMOTO

Pubricized:
2024/10/23
LETTER
- Summary
- Free PDF (2.2MB)
Fine-tuning Models for Final Disagreement Anticipation in Negotiation Mid-Dialogues
Ken WATANABE Katsuhide FUJITA

Pubricized:
2024/10/10
PAPER
- Summary
- Free PDF (3.8MB)
Deepfake speech detection: approaches from acoustic features related to auditory perception to deep neural networks
Masashi UNOKI Kai LI Anuwat CHAIWONGYEN Quoc-Huy NGUYEN Khalid ZAMAN

Pubricized:
2024/10/07
INVITED PAPER
- Summary
- Free PDF (965KB)
Video Watermarking Method Based on 3D U-Net Robust Against Re-shooting
Takaharu TSUBOYAMA Ryota TAKAHASHI Motoi IWATA Koichi KISE

Pubricized:
2024/10/07
PAPER
- Summary
- Free PDF (4MB)
UTStyleCap4K: Generating Image Captions with Sentimental Styles
Chi ZHANG Li TAO Toshihiko YAMASAKI

Pubricized:
2024/10/02
PAPER
- Summary
- Free PDF (2.5MB)
FP-GNN: A Graph Neural Network for Hardware Trojan Detection in Gate-Level Netlist
Ann Jelyn TIEMPO Yong-Jin JEONG

Pubricized:
2024/10/01
LETTER
- Summary
- Free PDF (532.3KB)
A Multi-Agent Deep Reinforcement Learning Algorithm for Task offloading in future 6G V2X Network
Jiakun LI Jiajian LI Yanjun SHI Hui LIAN Haifan WU

Pubricized:
2024/09/24
PAPER
- Summary
- Free PDF (1.2MB)
Building Defect Prediction Models by Online Learning Considering Defect Overlooking
Nikolay FEDOROV Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI

Pubricized:
2024/09/09
LETTER
- Summary
- Free PDF (96.5KB)
The Impact of Defect (Re) Prediction on Software Testing
Yukasa MURAKAMI Yuta YAMASAKI Masateru TSUNODA Akito MONDEN Amjed TAHIR Kwabena Ebo BENNIN Koji TODA Keitaro NAKASAI

Pubricized:
2024/09/09
LETTER
- Summary
- Free PDF (148.4KB)
Bilaterally Colored Finite Automata and Bilaterally Colored Regular Expressions
Akira ITO Yoshiaki TAKAHASHI

Pubricized:
2024/08/20
PAPER
- Summary
- Free PDF (652.7KB)
Strategies and Equilibria on Indistinguishability of Winning Objectives and Related Decision Problems
Rindo NAKANISHI Yoshiaki TAKATA Hiroyuki SEKI

Pubricized:
2024/08/20
PAPER
- Summary
- Free PDF (1.5MB)
Computational Complexity of Yajisan-Kazusan and Stained Glass
Chuzo IWAMOTO Ryo TAKAISHI

Pubricized:
2024/08/16
PAPER
- Summary
- Free PDF (682.2KB)
A Bigram Based ILP Formulation for Break Minimization in Sports Scheduling Problems
Koichi FUJII Tomomi MATSUI

Pubricized:
2024/08/08
PAPER
- Summary
- Free PDF (1MB)
(15/14)n Flips are (almost) Sufficient to Sort Heydari and Sudborough's Pancake Stack
Kazuyuki AMANO

Pubricized:
2024/08/05
LETTER
- Summary
- Free PDF (288.7KB)
Overlapping of Lattice Unfolding for Cuboids
Takumi SHIOTA Tonan KAMATA Ryuhei UEHARA

Pubricized:
2024/08/05
PAPER
- Summary
- Free PDF (722.9KB)
An FPT Algorithm for the Exact Matching Problem and NP-hardness of Related Problems
Hitoshi MURAKAMI Yutaro YAMAGUCHI

Pubricized:
2024/08/01
PAPER
- Summary
- Free PDF (704.5KB)
Escape from the Room
Kento KIMURA Tomohiro HARAMIISHI Kazuyuki AMANO Shin-ichi NAKANO

Pubricized:
2024/07/11
PAPER
- Summary
- Free PDF (1.1MB)
Online combinatorial linear optimization via a Frank-Wolfe-based metarounding algorithm
Ryotaro MITSUBOSHI Kohei HATANO Eiji TAKIMOTO

Pubricized:
2024/07/11
PAPER
- Summary
- Free PDF (1.1MB)
Space-efficient FPT Algorithms for Degeneracy
Naohito MATSUMOTO Kazuhiro KURITA Masashi KIYOMI

Pubricized:
2024/05/31
- Summary
- Free PDF (101.3KB)
The Least Core of Routing Game Without Triangle Inequality
Tomohiro KOBAYASHI Tomomi MATSUI

Pubricized:
2024/05/30
- Summary
- Free PDF (232.9KB)
Enumerating floorplans with Aligned Columns
Shin-ichi NAKANO

Pubricized:
2024/05/30
- Summary
- Free PDF (365.4KB)
An IP Core Protection Scheme Based on Hybrid Lightweight Encryption for Neuromorphic Computing System
Ming PAN

The aritcle processing charge of this paper has not been paid.

Pubricized:
2022/09/14
- Summary

Whole issue(35.4MB)

Volume E99-D No.10 (Publication Date:2016/10/01)

Special Section on Recent Advances in Machine Learning for Spoken Language Processing

FOREWORD Open Access
Norihide KITAOKA

FOREWORD

Page(s):
2422-2422
- HTML
- Free PDF (76.8KB)
Sensitivity-Characterised Activity Neurogram (SCAN) for Visualising and Understanding the Inner Workings of Deep Neural Network Open Access
Khe Chai SIM

INVITED PAPER

Pubricized:
2016/07/19
Page(s):
2423-2430
Deep Neural Network (DNN) is a powerful machine learning model that has been successfully applied to a wide range of pattern classification tasks. Due to the great ability of the DNNs in learning complex mapping functions, it has been possible to train and deploy DNNs pretty much as a black box without the need to have an in-depth understanding of the inner workings of the model. However, this often leads to solutions and systems that achieve great performance, but offer very little in terms of how and why they work. This paper introduces Sensitivity-characterised Activity Neorogram (SCAN), a novel approach for understanding the inner workings of a DNN by analysing and visualising the sensitivity patterns of the neuron activities. SCAN constructs a low-dimensional visualisation space for the neurons so that the neuron activities can be visualised in a meaningful and interpretable way. The embedding of the neurons within this visualisation space can be used to compare the neurons, both within the same DNN and across different DNNs trained for the same task. This paper will present the observations from using SCAN to analyse DNN acoustic models for automatic speech recognition.
Speaker Adaptive Training Localizing Speaker Modules in DNN for Hybrid DNN-HMM Speech Recognizers
Tsubasa OCHIAI Shigeki MATSUDA Hideyuki WATANABE Xugang LU Chiori HORI Hisashi KAWAI Shigeru KATAGIRI

PAPER-Acoustic modeling

Pubricized:
2016/07/19
Page(s):
2431-2443
Among various training concepts for speaker adaptation, Speaker Adaptive Training (SAT) has been successfully applied to a standard Hidden Markov Model (HMM) speech recognizer, whose state is associated with Gaussian Mixture Models (GMMs). On the other hand, focusing on the high discriminative power of Deep Neural Networks (DNNs), a new type of speech recognizer structure, which combines DNNs and HMMs, has been vigorously investigated in the speaker adaptation research field. Along these two lines, it is natural to conceive of further improvement to a DNN-HMM recognizer by employing the training concept of SAT. In this paper, we propose a novel speaker adaptation scheme that applies SAT to a DNN-HMM recognizer. Our SAT scheme allocates a Speaker Dependent (SD) module to one of the intermediate layers of DNN, treats its remaining layers as a Speaker Independent (SI) module, and jointly trains the SD and SI modules while switching the SD module in a speaker-by-speaker manner. We implement the scheme using a DNN-HMM recognizer, whose DNN has seven layers, and elaborate its utility over TED Talks corpus data. Our experimental results show that in the supervised adaptation scenario, our Speaker-Adapted (SA) SAT-based recognizer reduces the word error rate of the baseline SI recognizer and the lowest word error rate of the SA SI recognizer by 8.4% and 0.7%, respectively, and by 6.4% and 0.6% in the unsupervised adaptation scenario. The error reductions gained by our SA-SAT-based recognizers proved to be significant by statistical testing. The results also show that our SAT-based adaptation outperforms, regardless of the SD module layer selection, its counterpart SI-based adaptation, and that the inner layers of DNN seem more suitable for SD module allocation than the outer layers.
Investigation of DNN-Based Audio-Visual Speech Recognition
Satoshi TAMURA Hiroshi NINOMIYA Norihide KITAOKA Shin OSUGA Yurie IRIBE Kazuya TAKEDA Satoru HAYAMIZU

PAPER-Acoustic modeling

Pubricized:
2016/07/19
Page(s):
2444-2451
Audio-Visual Speech Recognition (AVSR) is one of techniques to enhance robustness of speech recognizer in noisy or real environments. On the other hand, Deep Neural Networks (DNNs) have recently attracted a lot of attentions of researchers in the speech recognition field, because we can drastically improve recognition performance by using DNNs. There are two ways to employ DNN techniques for speech recognition: a hybrid approach and a tandem approach; in the hybrid approach an emission probability on each Hidden Markov Model (HMM) state is computed using a DNN, while in the tandem approach a DNN is composed into a feature extraction scheme. In this paper, we investigate and compare several DNN-based AVSR methods to mainly clarify how we should incorporate audio and visual modalities using DNNs. We carried out recognition experiments using a corpus CENSREC-1-AV, and we discuss the results to find out the best DNN-based AVSR modeling. Then it turns out that a tandem-based method using audio Deep Bottle-Neck Features (DBNFs) and visual ones with multi-stream HMMs is the most suitable, followed by a hybrid approach and another tandem scheme using audio-visual DBNFs.
Investigation of Combining Various Major Language Model Technologies including Data Expansion and Adaptation Open Access
Ryo MASUMURA Taichi ASAMI Takanobu OBA Hirokazu MASATAKI Sumitaka SAKAUCHI Akinori ITO

PAPER-Language modeling

Pubricized:
2016/07/19
Page(s):
2452-2461
This paper aims to investigate the performance improvements made possible by combining various major language model (LM) technologies together and to reveal the interactions between LM technologies in spontaneous automatic speech recognition tasks. While it is clear that recent practical LMs have several problems, isolated use of major LM technologies does not appear to offer sufficient performance. In consideration of this fact, combining various LM technologies has been also examined. However, previous works only focused on modeling technologies with limited text resources, and did not consider other important technologies in practical language modeling, i.e., use of external text resources and unsupervised adaptation. This paper, therefore, employs not only manual transcriptions of target speech recognition tasks but also external text resources. In addition, unsupervised LM adaptation based on multi-pass decoding is also added to the combination. We divide LM technologies into three categories and employ key ones including recurrent neural network LMs or discriminative LMs. Our experiments show the effectiveness of combining various LM technologies in not only in-domain tasks, the subject of our previous work, but also out-of-domain tasks. Furthermore, we also reveal the relationships between the technologies in both tasks.
N-gram Approximation of Latent Words Language Models for Domain Robust Automatic Speech Recognition Open Access
Ryo MASUMURA Taichi ASAMI Takanobu OBA Hirokazu MASATAKI Sumitaka SAKAUCHI Satoshi TAKAHASHI

PAPER-Language modeling

Pubricized:
2016/07/19
Page(s):
2462-2470
This paper aims to improve the domain robustness of language modeling for automatic speech recognition (ASR). To this end, we focus on applying the latent words language model (LWLM) to ASR. LWLMs are generative models whose structure is based on Bayesian soft class-based modeling with vast latent variable space. Their flexible attributes help us to efficiently realize the effects of smoothing and dimensionality reduction and so address the data sparseness problem; LWLMs constructed from limited domain data are expected to robustly cover unknown multiple domains in ASR. However, the attribute flexibility seriously increases computation complexity. If we rigorously compute the generative probability for an observed word sequence, we must consider the huge quantities of all possible latent word assignments. Since this is computationally impractical, some approximation is inevitable for ASR implementation. To solve the problem and apply this approach to ASR, this paper presents an n-gram approximation of LWLM. The n-gram approximation is a method that approximates LWLM as a simple back-off n-gram structure, and offers LWLM-based robust one-pass ASR decoding. Our experiments verify the effectiveness of our approach by evaluating perplexity and ASR performance in not only in-domain data sets but also out-of-domain data sets.
Investigation of Using Continuous Representation of Various Linguistic Units in Neural Network Based Text-to-Speech Synthesis
Xin WANG Shinji TAKAKI Junichi YAMAGISHI

PAPER-Speech synthesis

Pubricized:
2016/07/19
Page(s):
2471-2480
Building high-quality text-to-speech (TTS) systems without expert knowledge of the target language and/or time-consuming manual annotation of speech and text data is an important yet challenging research topic. In this kind of TTS system, it is vital to find representation of the input text that is both effective and easy to acquire. Recently, the continuous representation of raw word inputs, called “word embedding”, has been successfully used in various natural language processing tasks. It has also been used as the additional or alternative linguistic input features to a neural-network-based acoustic model for TTS systems. In this paper, we further investigate the use of this embedding technique to represent phonemes, syllables and phrases for the acoustic model based on the recurrent and feed-forward neural network. Results of the experiments show that most of these continuous representations cannot significantly improve the system's performance when they are fed into the acoustic model either as additional component or as a replacement of the conventional prosodic context. However, subjective evaluation shows that the continuous representation of phrases can achieve significant improvement when it is combined with the prosodic context as input to the acoustic model based on the feed-forward neural network.
Statistical Bandwidth Extension for Speech Synthesis Based on Gaussian Mixture Model with Sub-Band Basis Spectrum Model
Yamato OHTANI Masatsune TAMURA Masahiro MORITA Masami AKAMINE

PAPER-Voice conversion

Pubricized:
2016/07/19
Page(s):
2481-2489
This paper describes a novel statistical bandwidth extension (BWE) technique based on a Gaussian mixture model (GMM) and a sub-band basis spectrum model (SBM), in which each dimensional component represents a specific acoustic space in the frequency domain. The proposed method can achieve the BWE from speech data with an arbitrary frequency bandwidth whereas the conventional methods perform the conversion from fixed narrow-band data. In the proposed method, we train a GMM with SBM parameters extracted from full-band spectra in advance. According to the bandwidth of input signal, the trained GMM is reconstructed to the GMM of the joint probability density between low-band SBM and high-band SBM components. Then high-band SBM components are estimated from low-band SBM components of the input signal based on the reconstructed GMM. Finally, BWE is achieved by adding the spectra decoded from estimated high-band SBM components to the ones of the input signal. To construct the full-band signal from the narrow-band one, we apply this method to log-amplitude spectra and aperiodic components. Objective and subjective evaluation results show that the proposed method extends the bandwidth of speech data robustly for the log-amplitude spectra. Experimental results also indicate that the aperiodic component extracted from the upsampled narrow-band signal realizes the same performance as the restored and the full-band aperiodic components in the proposed method.
A Statistical Sample-Based Approach to GMM-Based Voice Conversion Using Tied-Covariance Acoustic Models
Shinnosuke TAKAMICHI Tomoki TODA Graham NEUBIG Sakriani SAKTI Satoshi NAKAMURA

PAPER-Voice conversion

Pubricized:
2016/07/19
Page(s):
2490-2498
This paper presents a novel statistical sample-based approach for Gaussian Mixture Model (GMM)-based Voice Conversion (VC). Although GMM-based VC has the promising flexibility of model adaptation, quality in converted speech is significantly worse than that of natural speech. This paper addresses the problem of inaccurate modeling, which is one of the main reasons causing the quality degradation. Recently, we have proposed statistical sample-based speech synthesis using rich context models for high-quality and flexible Hidden Markov Model (HMM)-based Text-To-Speech (TTS) synthesis. This method makes it possible not only to produce high-quality speech by introducing ideas from unit selection synthesis, but also to preserve flexibility of the original HMM-based TTS. In this paper, we apply this idea to GMM-based VC. The rich context models are first trained for individual joint speech feature vectors, and then we gather them mixture by mixture to form a Rich context-GMM (R-GMM). In conversion, an iterative generation algorithm using R-GMMs is used to convert speech parameters, after initialization using over-trained probability distributions. Because the proposed method utilizes individual speech features, and its formulation is the same as that of conventional GMM-based VC, it makes it possible to produce high-quality speech while keeping flexibility of the original GMM-based VC. The experimental results demonstrate that the proposed method yields significant improvements in term of speech quality and speaker individuality in converted speech.
Policy Optimization for Spoken Dialog Management Using Genetic Algorithm
Hang REN Qingwei ZHAO Yonghong YAN

PAPER-Spoken dialog system

Pubricized:
2016/07/19
Page(s):
2499-2507
The optimization of spoken dialog management policies is a non-trivial task due to the erroneous inputs from speech recognition and language understanding modules. The dialog manager needs to ground uncertain semantic information at times to fully understand the need of human users and successfully complete the required dialog tasks. Approaches based on reinforcement learning are currently mainstream in academia and have been proved to be effective, especially when operating in noisy environments. However, in reinforcement learning the dialog strategy is often represented by complex numeric model and thus is incomprehensible to humans. The trained policies are very difficult for dialog system designers to verify or modify, which largely limits the deployment for commercial applications. In this paper we propose a novel framework for optimizing dialog policies specified in human-readable domain language using genetic algorithm. We present learning algorithms using user simulator and real human-machine dialog corpora. Empirical experimental results show that the proposed approach can achieve competitive performance on par with some state-of-the-art reinforcement learning algorithms, while maintaining a comprehensible policy structure.
Neural Network Approaches to Dialog Response Retrieval and Generation
Lasguido NIO Sakriani SAKTI Graham NEUBIG Koichiro YOSHINO Satoshi NAKAMURA

PAPER-Spoken dialog system

Pubricized:
2016/07/19
Page(s):
2508-2517
In this work, we propose a new statistical model for building robust dialog systems using neural networks to either retrieve or generate dialog response based on an existing data sources. In the retrieval task, we propose an approach that uses paraphrase identification during the retrieval process. This is done by employing recursive autoencoders and dynamic pooling to determine whether two sentences with arbitrary length have the same meaning. For both the generation and retrieval tasks, we propose a model using long short term memory (LSTM) neural networks that works by first using an LSTM encoder to read in the user's utterance into a continuous vector-space representation, then using an LSTM decoder to generate the most probable word sequence. An evaluation based on objective and subjective metrics shows that the new proposed approaches have the ability to deal with user inputs that are not well covered in the database compared to standard example-based dialog baselines.
Re-Ranking Approach of Spoken Term Detection Using Conditional Random Fields-Based Triphone Detection
Naoki SAWADA Hiromitsu NISHIZAKI

PAPER-Spoken term detection

Pubricized:
2016/07/19
Page(s):
2518-2527
This study proposes a two-pass spoken term detection (STD) method. The first pass uses a phoneme-based dynamic time warping (DTW)-based STD, and the second pass recomputes detection scores produced by the first pass using conditional random fields (CRF)-based triphone detectors. In the second-pass, we treat STD as a sequence labeling problem. We use CRF-based triphone detection models based on features generated from multiple types of phoneme-based transcriptions. The models train recognition error patterns such as phoneme-to-phoneme confusions in the CRF framework. Consequently, the models can detect a triphone comprising a query term with a detection probability. In the experimental evaluation of two types of test collections, the CRF-based approach worked well in the re-ranking process for the DTW-based detections. CRF-based re-ranking showed 2.1% and 2.0% absolute improvements in F-measure for each of the two test collections.
Spoken Term Detection Using SVM-Based Classifier Trained with Pre-Indexed Keywords
Kentaro DOMOTO Takehito UTSURO Naoki SAWADA Hiromitsu NISHIZAKI

PAPER-Spoken term detection

Pubricized:
2016/07/19
Page(s):
2528-2538
This study presents a two-stage spoken term detection (STD) method that uses the same STD engine twice and a support vector machine (SVM)-based classifier to verify detected terms from the STD engine's output. In a front-end process, the STD engine is used to pre-index target spoken documents from a keyword list built from an automatic speech recognition result. The STD result includes a set of keywords and their detection intervals (positions) in the spoken documents. For keywords having competitive intervals, we rank them based on the STD matching cost and select the one having the longest duration among competitive detections. The selected keywords are registered in the pre-index. They are then used to train an SVM-based classifier. In a query term search process, a query term is searched by the same STD engine, and the output candidates are verified by the SVM-based classifier. Our proposed two-stage STD method with pre-indexing was evaluated using the NTCIR-10 SpokenDoc-2 STD task and it drastically outperformed the traditional STD method based on dynamic time warping and a confusion network-based index.
Acoustic Scene Analysis Based on Hierarchical Generative Model of Acoustic Event Sequence
Keisuke IMOTO Suehiro SHIMAUCHI

PAPER-Acoustic event detection

Pubricized:
2016/07/19
Page(s):
2539-2549
We propose a novel method for estimating acoustic scenes such as user activities, e.g., “cooking,” “vacuuming,” “watching TV,” or situations, e.g., “being on the bus,” “being in a park,” “meeting,” utilizing the information of acoustic events. There are some methods for estimating acoustic scenes that associate a combination of acoustic events with an acoustic scene. However, the existing methods cannot adequately express acoustic scenes, e.g., “cooking,” that have more than one subordinate category, e.g., “frying ingredients” or “plating food,” because they directly associate acoustic events with acoustic scenes. In this paper, we propose an acoustic scene estimation method based on a hierarchical probabilistic generative model of an acoustic event sequence taking into account the relation among acoustic scenes, their subordinate categories, and acoustic event sequences. In the proposed model, each acoustic scene is represented as a probability distribution over their unsupervised subordinate categories, called “acoustic sub-topics,” and each acoustic sub-topic is represented as a probability distribution over acoustic events. Acoustic scene estimation experiments with real-life sounds showed that the proposed method could correctly extract subordinate categories of acoustic scenes.
Improved End-to-End Speech Recognition Using Adaptive Per-Dimensional Learning Rate Methods
Xuyang WANG Pengyuan ZHANG Qingwei ZHAO Jielin PAN Yonghong YAN

LETTER-Acoustic modeling

Pubricized:
2016/07/19
Page(s):
2550-2553
The introduction of deep neural networks (DNNs) leads to a significant improvement of the automatic speech recognition (ASR) performance. However, the whole ASR system remains sophisticated due to the dependent on the hidden Markov model (HMM). Recently, a new end-to-end ASR framework, which utilizes recurrent neural networks (RNNs) to directly model context-independent targets with connectionist temporal classification (CTC) objective function, is proposed and achieves comparable results with the hybrid HMM/DNN system. In this paper, we investigate per-dimensional learning rate methods, ADAGRAD and ADADELTA included, to improve the recognition of the end-to-end system, based on the fact that the blank symbol used in CTC technique dominates the output and these methods give frequent features small learning rates. Experiment results show that more than 4% relative reduction of word error rate (WER) as well as 5% absolute improvement of label accuracy on the training set are achieved when using ADADELTA, and fewer epochs of training are needed.
Multi-Task Learning in Deep Neural Networks for Mandarin-English Code-Mixing Speech Recognition
Mengzhe CHEN Jielin PAN Qingwei ZHAO Yonghong YAN

LETTER-Acoustic modeling

Pubricized:
2016/07/19
Page(s):
2554-2557
Multi-task learning in deep neural networks has been proven to be effective for acoustic modeling in speech recognition. In the paper, this technique is applied to Mandarin-English code-mixing recognition. For the primary task of the senone classification, three schemes of the auxiliary tasks are proposed to introduce the language information to networks and improve the prediction of language switching. On the real-world Mandarin-English test corpus in mobile voice search, the proposed schemes enhanced the recognition on both languages and reduced the relative overall error rates by 3.5%, 3.8% and 5.8% respectively.
Speeding up Deep Neural Networks in Speech Recognition with Piecewise Quantized Sigmoidal Activation Function
Anhao XING Qingwei ZHAO Yonghong YAN

LETTER-Acoustic modeling

Pubricized:
2016/07/19
Page(s):
2558-2561
This paper proposes a new quantization framework on activation function of deep neural networks (DNN). We implement fixed-point DNN by quantizing the activations into powers-of-two integers. The costly multiplication operations in using DNN can be replaced with low-cost bit-shifts to massively save computations. Thus, applying DNN-based speech recognition on embedded systems becomes much easier. Experiments show that the proposed method leads to no performance degradation.
Short Text Classification Based on Distributional Representations of Words
Chenglong MA Qingwei ZHAO Jielin PAN Yonghong YAN

LETTER-Text classification

Pubricized:
2016/07/19
Page(s):
2562-2565
Short texts usually encounter the problem of data sparseness, as they do not provide sufficient term co-occurrence information. In this paper, we show how to mitigate the problem in short text classification through word embeddings. We assume that a short text document is a specific sample of one distribution in a Gaussian-Bayesian framework. Furthermore, a fast clustering algorithm is utilized to expand and enrich the context of short text in embedding space. This approach is compared with those based on the classical bag-of-words approaches and neural network based methods. Experimental results validate the effectiveness of the proposed method.

Regular Section

HISTORY: An Efficient and Robust Algorithm for Noisy 1-Bit Compressed Sensing
Biao SUN Hui FENG Xinxin XU

PAPER-Fundamentals of Information Systems

Pubricized:
2016/07/06
Page(s):
2566-2573
We consider the problem of sparse signal recovery from 1-bit measurements. Due to the noise present in the acquisition and transmission process, some quantized bits may be flipped to their opposite states. These sign flips may result in severe performance degradation. In this study, a novel algorithm, termed HISTORY, is proposed. It consists of Hamming support detection and coefficients recovery. The HISTORY algorithm has high recovery accuracy and is robust to strong measurement noise. Numerical results are provided to demonstrate the effectiveness and superiority of the proposed algorithm.
A Linear Time Algorithm for Finding a Spanning Tree with Non-Terminal Set V_NT on Cographs
Shin-ichi NAKAYAMA Shigeru MASUYAMA

PAPER-Fundamentals of Information Systems

Pubricized:
2016/07/12
Page(s):
2574-2584
Given a graph G=(V,E) where V and E are a vertex and an edge set, respectively, specified with a subset V_NT of vertices called a non-terminal set, the spanning tree with non-terminal set V_NT is a connected and acyclic spanning subgraph of G that contains all the vertices of V where each vertex in a non-terminal set is not a leaf. In the case where each edge has the weight of a nonnegative integer, the problem of finding a minimum spanning tree with a non-terminal set V_NT of G was known to be NP-hard. However, the complexity of finding a spanning tree on general graphs where each edge has the weight of one was unknown. In this paper, we consider this problem and first show that it is NP-hard even if each edge has the weight of one on general graphs. We also show that if G is a cograph then finding a spanning tree with a non-terminal set V_NT of G is linearly solvable when each edge has the weight of one.
Competitive Strategies for Evacuating from an Unknown Affected Area
Qi WEI Xuehou TAN Bo JIANG

PAPER-Fundamentals of Information Systems

Pubricized:
2016/06/22
Page(s):
2585-2590
This article presents efficient strategies for evacuating from an unknown affected area in a plane. Evacuation is the process of movement away from a threat or hazard such as natural disasters. Consider that one or n(n ≥ 3) agents are lost in an unknown convex region P. The agents know neither the boundary information of P nor their positions. We seek competitive strategies that can evacuate the agent from P as quickly as possible. The performance of the strategy is measured by a competitive ratio of the evacuation path over the shortest path. We give a 13.812-competitive spiral strategy for one agent, and prove that it is optimal among all monotone and periodic strategies by showing a matching lower bound. Also, we give a new competitive strategy EES for n(n ≥ 3) agents and adjust it to be more efficient with the analysis of its performance.
Reliability-Enhanced ECC-Based Memory Architecture Using In-Field Self-Repair
Gian MAYUGA Yuta YAMATO Tomokazu YONEDA Yasuo SATO Michiko INOUE

PAPER-Dependable Computing

Pubricized:
2016/06/27
Page(s):
2591-2599
Embedded memory is extensively being used in SoCs, and is rapidly growing in size and density. It contributes to SoCs to have greater features, but at the expense of taking up the most area. Due to continuous scaling of nanoscale device technology, large area size memory introduces aging-induced faults and soft errors, which affects reliability. In-field test and repair, as well as ECC, can be used to maintain reliability, and recently, these methods are used together to form a combined approach, wherein uncorrectable words are repaired, while correctable words are left to the ECC. In this paper, we propose a novel in-field repair strategy that repairs uncorrectable words, and possibly correctable words, for an ECC-based memory architecture. It executes an adaptive reconfiguration method that ensures 'fresh' memory words are always used until spare words run out. Experimental results demonstrate that our strategy enhances reliability, and the area overhead contribution is small.
Shilling Attack Detection in Recommender Systems via Selecting Patterns Analysis
Wentao LI Min GAO Hua LI Jun ZENG Qingyu XIONG Sachio HIROKAWA

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2016/06/27
Page(s):
2600-2611
Collaborative filtering (CF) has been widely used in recommender systems to generate personalized recommendations. However, recommender systems using CF are vulnerable to shilling attacks, in which attackers inject fake profiles to manipulate recommendation results. Thus, shilling attacks pose a threat to the credibility of recommender systems. Previous studies mainly derive features from characteristics of item ratings in user profiles to detect attackers, but the methods suffer from low accuracy when attackers adopt new rating patterns. To overcome this drawback, we derive features from properties of item popularity in user profiles, which are determined by users' different selecting patterns. This feature extraction method is based on the prior knowledge that attackers select items to rate with man-made rules while normal users do this according to their inner preferences. Then, machine learning classification approaches are exploited to make use of these features to detect and remove attackers. Experiment results on the MovieLens dataset and Amazon review dataset show that our proposed method improves detection performance. In addition, the results justify the practical value of features derived from selecting patterns.
Latent Attribute Inference of Users in Social Media with Very Small Labeled Dataset
Ding XIAO Rui WANG Lingling WU

PAPER-Artificial Intelligence, Data Mining

Pubricized:
2016/07/20
Page(s):
2612-2618
With the surge of social media platform, users' profile information become treasure to enhance social network services. However, attributes information of most users are not complete, thus it is important to infer latent attributes of users. Contemporary attribute inference methods have a basic assumption that there are enough labeled data to train a model. However, in social media, it is very expensive and difficult to label a large amount of data. In this paper, we study the latent attribute inference problem with very small labeled data and propose the SRW-COND solution. In order to solve the difficulty of small labeled data, SRW-COND firstly extends labeled data with a simple but effective greedy algorithm. Then SRW-COND employs a supervised random walk process to effectively utilize the known attributes information and link structure of users. Experiments on two real datasets illustrate the effectiveness of SRW-COND.
Semi-Incremental Recognition of On-Line Handwritten Japanese Text
Cuong-Tuan NGUYEN Bilan ZHU Masaki NAKAGAWA

PAPER-Image Recognition, Computer Vision

Pubricized:
2016/07/19
Page(s):
2619-2628
This paper presents a semi-incremental recognition method for on-line handwritten Japanese text and its evaluation. As text becomes longer, recognition time and waiting time become large if it is recognized after it is written (batch recognition). Thus, incremental methods have been proposed with recognition triggered by every stroke but the recognition rates are damaged and more CPU time is incurred. We propose semi-incremental recognition and employ a local processing strategy by focusing on a recent sequence of strokes defined as ”scope” rather than every new stroke. For the latest scope, we build and update a segmentation and recognition candidate lattice and advance the best-path search incrementally. We utilize the result of the best-path search in the previous scope to exclude unnecessary segmentation candidates. This reduces the number of candidate character recognition with the result of reduced processing time. We also reuse the segmentation and recognition candidate lattice in the previous scope for the latest scope. Moreover, triggering recognition processes every several strokes saves CPU time. Experiments made on TUAT-Kondate database show the effectiveness of the proposed semi-incremental recognition method not only in reduced processing time and waiting time, but also in recognition accuracy.
LRU-LC: Fast Estimating Cardinality of Flows over Sliding Windows
Jingsong SHAN Jianxin LUO Guiqiang NI Yinjin FU Zhaofeng WU

LETTER-Fundamentals of Information Systems

Pubricized:
2016/06/29
Page(s):
2629-2632
Estimating the cardinality of flows over sliding windows on high-speed links is still a challenging work under time and space constrains. To solve this problem, we present a novel data structure maintaining a summary of data and propose a constant-time update algorithm for fast evicting expired information. Moreover, a further memory-reducing schema is given at a cost of very little loss of accuracy.
LAB-LRU: A Life-Aware Buffer Management Algorithm for NAND Flash Memory
Liyu WANG Lan CHEN Xiaoran HAO

LETTER-Computer System

Pubricized:
2016/06/21
Page(s):
2633-2637
NAND flash memory has been widely used in storage systems. Aiming to design an efficient buffer policy for NAND flash memory, a life-aware buffer management algorithm named LAB-LRU is proposed, which manages the buffer by three LRU lists. A life value is defined for every page and the active pages with higher life value can stay longer in the buffer. The definition of life value considers the effect of access frequency, recency and the cost of flash read and write operations. A series of trace-driven simulations are carried out and the experimental results show that the proposed LAB-LRU algorithm outperforms the previous best-known algorithms significantly in terms of the buffer hit ratio, the numbers of flash write and read operations and overall runtime.
A Secure Light-Weight Public Auditing Scheme in Cloud Computing with Potentially Malicious Third Party Auditor
Yilun WU Xinye LIN Xicheng LU Jinshu SU Peixin CHEN

LETTER-Information Network

Pubricized:
2016/06/23
Page(s):
2638-2642
Public auditing is a new technique to protect the integrity of outsourced data in the remote cloud. Users delegate the ability of auditing to a third party auditor (TPA), and assume that each result from the TPA is correct. However, the TPA is not always trustworthy in reality. In this paper, we consider a scenario in which the TPA may lower the reputation of the cloud server by cheating users, and propose a novel public auditing scheme to address this security issue. The analyses and the evaluation prove that our scheme is both secure and efficient.
Mining Spatial Temporal Saliency Structure for Action Recognition
Yinan LIU Qingbo WU Linfeng XU Bo WU

LETTER-Pattern Recognition

Pubricized:
2016/07/06
Page(s):
2643-2646
Traditional action recognition approaches use pre-defined rigid areas to process the space-time information, e.g. spatial pyramids, cuboids. However, most action categories happen in an unconstrained manner, that is, the same action in different videos can happen at different places. Thus we need a better video representation to deal with the space-time variations. In this paper, we introduce the idea of mining spatial temporal saliency. To better handle the uniqueness of each video, we use a space-time over-segmentation approach, e.g. supervoxel. We choose three different saliency measures that take not only the appearance cues, but also the motion cues into consideration. Furthermore, we design a category-specific mining process to find the discriminative power in each action category. Experiments on action recognition datasets such as UCF11 and HMDB51 show that the proposed spatial temporal saliency video representation can match or surpass some of the state-of-the-art alternatives in the task of action recognition.
Transfer Semi-Supervised Non-Negative Matrix Factorization for Speech Emotion Recognition
Peng SONG Shifeng OU Xinran ZHANG Yun JIN Wenming ZHENG Jinglei LIU Yanwei YU

LETTER-Speech and Hearing

Pubricized:
2016/07/01
Page(s):
2647-2650
In practice, emotional speech utterances are often collected from different devices or conditions, which will lead to discrepancy between the training and testing data, resulting in sharp decrease of recognition rates. To solve this problem, in this letter, a novel transfer semi-supervised non-negative matrix factorization (TSNMF) method is presented. A semi-supervised negative matrix factorization algorithm, utilizing both labeled source and unlabeled target data, is adopted to learn common feature representations. Meanwhile, the maximum mean discrepancy (MMD) as a similarity measurement is employed to reduce the distance between the feature distributions of two databases. Finally, the TSNMF algorithm, which optimizes the SNMF and MMD functions together, is proposed to obtain robust feature representations across databases. Extensive experiments demonstrate that in comparison to the state-of-the-art approaches, our proposed method can significantly improve the cross-corpus recognition rates.
Fast Coding-Mode Selection and CU-Depth Prediction Algorithm Based on Text-Block Recognition for Screen Content Coding
Mengmeng ZHANG Ang ZHU Zhi LIU

LETTER-Image Processing and Video Processing

Pubricized:
2016/07/12
Page(s):
2651-2655
As an important extension of high-efficiency video coding (HEVC), screen content coding (SCC) includes various new coding modes, such as Intra Block Copy (IBC), Palette-based coding (Palette), and Adaptive Color Transform (ACT). These new tools have improved screen content encoding performance. This paper proposed a novel and fast algorithm by classifying Code Units (CUs) as text CUs or non-text CUs. For text CUs, the Intra mode was skipped in the compression process, whereas for non-text CUs, the IBC mode was skipped. The current CU depth range was then predicted according to its adjacent left CU depth level. Compared with the reference software HM16.7+SCM5.4, the proposed algorithm reduced encoding time by 23% on average and achieved an approximate 0.44% increase in Bjøntegaard delta bit rate and a negligible peak signal-to-noise ratio loss.
Local Multi-Coordinate System for Object Retrieval
Go IRIE Yukito WATANABE Takayuki KUROZUMI Tetsuya KINEBUCHI

LETTER-Image Processing and Video Processing

Pubricized:
2016/07/06
Page(s):
2656-2660
Encoding multiple SIFT descriptors into a single vector is a key technique for efficient object image retrieval. In this paper, we propose an extension of local coordinate system (LCS) for image representation. The previous LCS approaches encode each SIFT descriptor by a single local coordinate, which is not adequate for localizing its position in the descriptor space. Instead, we use multiple local coordinates to represent each descriptor with PCA-based decorrelation. Experiments show that this simple modification can improve retrieval performance significantly.
Illumination-Invariant Face Representation via Normalized Structural Information
Wonjun KIM

LETTER-Image Processing and Video Processing

Pubricized:
2016/06/21
Page(s):
2661-2663
A novel method for illumination-invariant face representation is presented based on the orthogonal decomposition of the local image structure. One important advantage of the proposed method is that image gradients and corresponding intensity values are simultaneously used with our decomposition procedure to preserve the original texture while yielding the illumination-invariant feature space. Experimental results demonstrate that the proposed method is effective for face recognition and verification even with diverse lighting conditions.
Robust and Adaptive Object Tracking via Correspondence Clustering
Bo WU Yurui XIE Wang LUO

LETTER-Image Recognition, Computer Vision

Pubricized:
2016/06/23
Page(s):
2664-2667
We propose a new visual tracking method, where the target appearance is represented by combining color distribution and keypoints. Firstly, the object is localized via a keypoint-based tracking and matching strategy, where a new clustering method is presented to remove outliers. Secondly, the tracking confidence is evaluated by the color template. According to the tracking confidence, the local and global keypoints matching can be performed adaptively. Finally, we propose a target appearance update method in which the new appearance can be learned and added to the target model. The proposed tracker is compared with five state-of-the-art tracking methods on a recent benchmark dataset. Both qualitative and quantitative evaluations show that our method has favorable performance.
Robust Hybrid Finger Pattern Identification Using Intersection Enhanced Gabor Based Direction Coding
Wenming YANG Wenyang JI Fei ZHOU Qingmin LIAO

LETTER-Image Recognition, Computer Vision

Pubricized:
2016/07/06
Page(s):
2668-2671
Automated biometrics identification using finger vein images has increasingly generated interest among researchers with emerging applications in human biometrics. The traditional feature-level fusion strategy is limited and expensive. To solve the problem, this paper investigates the possible use of infrared hybrid finger patterns on the back side of a finger, which includes both the information of finger vein and finger dorsal textures in original image, and a database using the proposed hybrid pattern is established. Accordingly, an Intersection enhanced Gabor based Direction Coding (IGDC) method is proposed. The Experiment achieves a recognition ratio of 98.4127% and an equal error rate of 0.00819 on our newly established database, which is fairly competitive.