Uncovering the World of Sound, from Human Language to Noh Singing


Katunobu Itou, Professor,

Department of Digital Media, Faculty of Computer and Information Sciences

Posted Jan. 27, 2020

Faculty Profile

Professor Katunobu Itouhas played a pioneering role in research on Japanese speech recognition system and other forms of language media. He continues to push forward boldly, expanding his research horizons into such fields as digital analysis of singing in Noh theatre.

A pioneer in spoken language research, taking on research challenges that others avoid

My main research field is computer processing of sound and language media.

My starting point was research on the processing of spoken language (language used by humans to communicate with one another). Ever since my student days, I’ve been deeply involved in the development of speech recognition technologies and spoken dialog systems, which use computer-based analysis of the language produced in human conversation.

Developing cutting-edge technology is always a struggle. Initially, there were many limitations on the research environment, such as the use of supercomputers. Even when I could think up an effective algorithm, I couldn’t conduct experiments to verify it, because there was no space to store the speech data for analysis, and no computers capable of high-speed processing. Research on music required especially large volumes of data, and was considered extremely difficult to achieve, even if you had the will to do so.

Over time, processing speeds improved, data handling capacity increased, and networks became faster, dramatically. Great advancements were made toward the development of practical applications in speech recognition research. Speech data can now be accumulated in servers, and data for analysis exchanged instantaneously across the network. In my student days I never thought that this kind of research environment would be possible, and I continue to be surprised at the progress we’ve made.

I find the process of generating new frameworks in unexplored areas to be very significant, and this guides my approach to research.

Research in areas which people rarely tackle is an exercise in repeated trial and error, but it can be pursued with free thinking. I originally got into speech recognition technology precisely because it was an area that few researchers were working in.

Now, my research has extended to various types of sound, including music. In my lab we’re engaged in a number of unique research projects conceived by students, including speech conversion for voice actors in animation films, and a system for automatic conversion of orchestral scores for small ensembles.

Utilizing Hosei’s research diversity in analysis of Noh recitation

A new research initiative I’ve been pursuing in recent years is the digital analysis of singing (utai) in Noh theater, enlisting the help of The Nogami Memorial Noh Theatre Research Institute of Hosei University (Noh Theater Research Institute).

To analyze sound, you need to record the original sound and convert it to data by uploading to a computer. Sound itself is invisible, but sound data can be rendered visually in speech waveform graphs and the like. You can extract specific elements of the sound and perform many other types of analysis.

The singing in Noh theater includes distinctive intonation that can’t be conveyed simply by writing out the melody in a musical score.  What’s more, performance of the same piece of music can vary among the different schools of Noh in terms of the use of vibrato, emphasis, accents, and other nuances. The music can be rather impenetrable for unfamiliar listeners.

My research involves approximating the melody line and creating a visual rendering of the kinds of expressions that are added to it (see illustration). I am hoping that this kind of approach will prompt more objective understanding of this traditional performing art and furnish a basis for the advancement of Noh research.

Hosei University has a total of 15 undergraduate faculties and graduate schools, as well as many research facilities. But I always think it’s a shame that there are so few opportunities to interact across these different parts of the university in our research and study activities. Today, IT (information technology) is something that we all use, and I sense that if we work together on it we could achieve more meaningful research and learning.

I believe that the connections with the Noh Theater Research Institute cultivated through my research are one first step toward the development of more horizontal relationships. I want this to be a point of departure for reaching out in many different directions.

Fostering technologists for Japan’s future

There are many workplaces where you can utilize knowledge of computer science. Relationships with electronics manufacturers are especially strong, and students who studied technology at university have gone on to contribute greatly to the advancement of industry.

In recent years, however, it seems that electronics manufacturers are struggling to procure new talent. I’m concerned that at this rate there won’t be enough technologists to drive our society forward. Fostering high-caliber technologists is an urgent priority.

Looking at my own students, I’m sometimes frustrated at their hesitant attitude, but also believe that they do have the potential to succeed if they really set their minds to achieving a goal. I want to give them a supportive push forward so they can become active players in society.

Seeking to contribute to human and social progress by unravelling the mysteries of sound

As a graduate student, on the encouragement of my supervising professor, I went to work as an intern at ATR (the Advanced Telecommunications Research Institute International). There I was able to pursue research on the foundations of speech recognition technology.

Developing speech recognition systems in Japanese is seen as difficult because the language does not have a fixed word order like English, and the breaks between individual words are unclear. Fortunately, ATR had data of articles from the Nihon Keizai Shimbun newspaper being read out word by word, so using this data set and methods of statistical analysis, I created a Japanese-language version of a linguistic model (N-gram model) for ascertaining the probabilities of different word combinations appearing in well-formed Japanese sentences. Being able to pursue such cutting-edge research while still a student was a great motivation for me. Ultimately this model achieved a level of over 90% accuracy, and formed the mainstream of subsequent research on speech recognition technology. I felt that all my hard work had paid off.

Sound has many different elements, and not all of them can be detected by the human ear. Human hearing has limits. We know that low-frequency sounds can be heard relatively well, while high-frequency ones are hard to hear. This is probably connected to the low frequency of the human voice, which is around 100-200Hz for men and 200-300Hz for women.

But human ears can distinguish not only different pitches of sound but also fine differences in its makeup.  For example, if a piano and a violin both play a note at exactly the same pitch, we still hear them as different sounds, perceiving the difference in timbre. Analyzing the sounds produced by musical instruments reveals variation in the intensity of each frequency. When rendered in graphical form we can see that the distribution varies from instrument to instrument. It is these differences that enable us to perceive the same melody as different when played on different instruments. The same mechanism accounts for why each person’s voice sounds different to us. Our ability to hear details in the composition of sound is also what enables speech to become meaningful language.

I think that sound research is an area that will continue to evolve. Just as we can now do things that were previously impossible, I hope to continue tackling new challenges in order to contribute to the progress of society and humankind.

Katunobu Itou, Professor

Department of Digital Media, Faculty of Computer and Information Sciences

Born in Osaka in 1965.

Graduated from the Department of Information Engineering, Faculty of Engineering, Tokyo Institute of Technology, and completed the Doctoral program in Information Engineering, Graduate School of Science and Engineering at the same institution. Doctor of Engineering. Worked as a Senior Researcher at the Electrotechnical Laboratory (now the National Institute of Advanced Industrial Science and Technology), then an associate professor in the Nagoya University Graduate School of Information Science. Appointed to his current position of Professor in the Faculty of Computer and Information Sciences of Hosei University in 2008.