PAOLO PULCINI
Neural IR:a personal journey between the latent and the observable space
Get the slides here: paoloearth.github.io/Neural_IR_slides/
A bit of context
IR is a big field.
Neural IR is a new interesting sub-field of it.
Books could be written on the topic but time and space are limited so ...
I hope you will enjoy my selection
Goals
Or things that you should know by the end of the presentation
-
General Overview
About the applications of NN to IR tasks
-
Able to Experiment
And implement a basic neural information systems
-
Know where to look
When developing your neural IR
Roadmap
Not to get lost
- Neural IR
- Representations is everything
- Unsupervised Learning
- Query-document matching
- (Supervised) Learning
- NN Architectures & INPUTS
- Conclusions
Bonus: Live Demo
Neural IR : What / Why / How.
-
What is it?
Neural IR is the application of shallow or deep neural networks to IR tasks
-
Why neural IR could be a good idea?
From 2010 the application of NN to CV, Speech recognition & others real-world application has led to several breakthroughs. This relatively new scenario could benefit as well.
-
How are NN being applied to IR tasks
The characteristics of the application plays the main role in definining the problem. Different architectures (and datasets) solve different problems.
Where are NN used?
Categorizations:
- NN influences the representation of the query
- NN influences the representation of the documents
- NN influences the maching/relevance estimation
- NN influences any combination of the previously mentioned
* * *
Representation: be wise enough to choose the one that best suits your problem(I)
Terms as vectors
Vector representations is by far the most common.
Two main categories:
- Local representation (aka one-hot)
- Distributed representation
Vectors allow for arithmetic operations
Similarity
Different representations schemes defines distinct notions of similarity between the terms in the corresponding vector space. This lead to different levels of generalization. It is important to learn a term representation that is suitable for each specific task
Local representations
- Each term is a unique entity
- Terms outside of the fixed vocabulary have no representation
Distributed representations
- Each term is represented by a (spare or dense) vector of: hand-crafted features or a latent representation
- This feature extraction procedure should allow the definition of "similarity" based on such properties.
Representation: be wise enough to choose the one that best suits your problem(II)
Distributional hypothesis
"A word is characterized by the company it keeps "
Firth (1957)
Observed
- Representations that are measurable(explicitely) from the data. Categorized on the base of:
- Distributional features : (e.g., in documents, neighbouring terms with or without distances)
- Weighting schemes applied over the raw counts(e.g. TF-IDF)
- Can capture interesting relationships but resultant representations are highly sparse and high-dimensional
Embeddings
- Simpler representations that are learnt from data and assimilate the properties of the terms and the inter-term relationships observable in the original feature space.
- NB: With both representation is possible to use
cosine similarity
as metric.
The quick brown fox jumps over the lazy dog