Finally, we introduce two innovative applications of our method for sound querying. We demonstrate its robustness to several classes of audio distortions. By introducing a selection criterion based on the hypervolume dominated by a class, we show that our approach outstands the state-of-art methods in audio classification even with a few number of features. To demonstrate the performances of our approach, we show its efficiency in audio classification tasks. This allows to cope with the multidimensional nature of timbre perception and also to obtain a set of efficient propositions rather than a single best solution. This approach introduces a multidimensional assessment of similarity in audio matching. We formally state this problem and report an efficient implementation. We show how this problem can be cast into a new approach merging multiobjective optimization and time series matching, called MultiObjective Time Series (MOTS) matching. We address in this paper an innovative way of querying generic audio databases by simultaneously optimizing the temporal evolution of multiple spectral properties. Furthermore, most audio-retrieval systems rely on a single measure of similarity, which is unlikely to convey the perceptual similarity of audio signals. However, it requires users to have a well-formed soundfile of what they seek, which is not always a valid assumption. The Query By Example (QBE) paradigm tries to tackle this shortcoming by finding audio clips similar to a given sound example. This problem stems from the nature of query specification, which does not account for the underlying complexity of audio data. Even when metadata are available, query results may remain far from the timbre expected by users. Seeking sound samples in a massive database can be a tedious and time consuming task. A 30% concept prediction is achieved on a database of over 50,000 sounds and over 1600 concepts In order to overcome the need of a huge number of classifiers to distinguish many different sound classes, we use a nearest-neighbor classifier with a database of isolated sounds unambiguously linked to WordNet concepts. To tackle the taxonomy definition problem we use WordNet, a semantic network that organizes real world knowledge. We report experimental results on a general sound annotator. A general sound recognition tool would require: first, a taxonomy that represents the world and, second, thousands of classifiers, each specialized in distinguishing little details. Automatic annotation methods, normally fine-tuned to reduced domains such as musical instruments or reduced sound effects taxonomies, are not mature enough for labeling with great detail any possible sound. Currently, annotation of audio content is done manually, which is an arduous task. Sound effects providers rely on text-retrieval techniques to offer their collections. Sound engineers need to access vast collections of sound effects for their film and video productions.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |