Object Recognition

 

Theory

  • As we exist, incoming external information from the world is extracted and processed by our sensory modalities. As this occurs, we begin to rapidly create top-down predictions of the environment based on associative connections drawn from our long-term memory of past-experiences. This helps us make sense of and interact with the world, guiding our cognition and behaviour.

  • The general ‘proactive brain framework’ suggests that this predictive process involves comparing incoming inputs (external sensory information or an internally generated thought) with analogous representations in our memory, this then co-activates associated representations (context frames), resulting in the generation of predictions on as to what else may be of high importance in a particular context — aiding object and context recognition.

    • An example of this process: input (sofa) → compared with analogous representation in memory (similar sofas seen before) → co-activation of associated representations in memory (coffee table, pillows) → generation of predictions of what other elements will be present/what the context is itself.

  • Objects tend to occur together in similar contexts and are related to each other and/or the environment they are found in. The stored memory representations of objects are then linked and clustered together based on their degrees of relatedness. Clusters of associated representations are defined as ‘context frames:

    • Within context frames, certain elements are generally expected to appear together, forming predictions.

    • Context frames contain both spatial and non-spatial information of the representations.

    • Context frames are formed from past-experiences via implicit or explicit learning.

    • An example of a context frame: living room, sofa, TV

  • Associative processes are important for prediction and hence visual object recognition:

    • Predictions are initiated from top-down mechanisms that influence bottom-up processes (extraction of sensorial inputs) by reducing the degree of extraction of a bottom-up input, this leads to faster interpretation of information. The predictive process therefore only requires a minimal bottom-up input to occur.

    • Only the low-spatial frequency (LSF) of a new bottom-up input is required in order to initiate top-down mechanisms, generating predictions that facilitate object and context recognition.

    • A study by Bar and Ullman (1996) found that during recognition tasks of objects within a scene containing another object, the spatial configuration of objects had an effect on object recognition performance. Objects arranged in an improper spatial configuration — in relation to each other— within a scene increased response time and errors for object recognition. They suggest that recognition memory must not only contain the types of objects that occur together in a scene, but also the spatial relationships between the objects.

Relevant brain architecture

  • The brain regions responsible for context-based associative predictions are the medial temporal lobe (MTL), medial parietal cortex (MPC) and medial prefrontal cortex (MPFC). These regions have been defined based on contrasting objects with a strong contextual association against objects with weak ones.

  • Early visual areas send the LSF-components of a new input image to the orbitofrontal cortex (OFC) to activate the associative processes that generate a prediction within the temporal cortex.


 
Josh Artus