Audio-visual documents contain much more semantic
information than textual data, on the other hand, computers can now
manipulate and store a large amount of Audio-visual data. Therefore, the
amount of audio-visual data is increasing in an exponential way. As for
textual information, the more information is available, the more important
is the content description. However, searching in textual data using
textual queries is a very natural way since the data and the query are
from the same type. Unfortunately, describing the semantics of audio-visual documents to enable good search capabilities is not easy for machines. Such semantic description makes searching audio-visual documents easy and effective .
For example, typical audio-visual
queries can be:
Find me the video file when I
said:” I’m leaving this town”
Find me the video file when the
president of France promises the people a low unemployment rate.
Find me the video file where
“Einstein” presented his relativity theory in the USA.
Find me a video containing a
mountain close to the sea, where some children are playing.
Find me a music excerpt which
is happy, fast, sad, loud….
These queries are very probable in the near future for
a society that relies more and more on Knowledge.
This is the context of my thesis, studying the ability
of machines to listen to the audio-visual data and to describe the
I am working essentially on the audio data, so I am
studying machines that “listen” and describe.
Multimedia description and content-based
Artificial perception systems,
artificial intelligence (life).
Audio signal analysis.