What is Knowledge Graph?
A knowledge graph is a structured data model that represents real-world information as a network of entities (nodes) connected by relationships (edges). It organizes facts in a machine-readable way to support querying, reasoning, and integration across sources.
It is built from triples in the form of subject-predicate-object, such as 'Paris-capitalOf-France'. These triples are stored in a graph database that allows flexible connections between many types of data.
Ontologies or schemas define the types of entities and relationships, giving the graph meaning and enabling consistent rules for inference, such as deducing new facts from existing ones.
Data from different sources can be merged by linking equivalent entities, creating a unified view that supports semantic search and complex pattern discovery.
Example
Google's Knowledge Graph connects entities like musicians, albums, and tour dates so that a search for 'Taylor Swift' instantly shows related facts, songs, and events pulled from many sources.
Why it matters
Knowledge graphs supply structured, verifiable context that improves search accuracy, powers recommendation engines, and helps large language models reduce hallucinations by grounding answers in explicit facts.
Frequently asked questions
A knowledge graph focuses on relationships and meaning between entities rather than rigid tables, making it easier to connect and query diverse information flexibly.
Related terms
An ontology is a formal, structured model that defines the key concepts in a domain and the relationships between them, allowing data to be organized and interpreted with explicit meaning.
Batch size is the number of training examples processed together in a single forward and backward pass during model training.
Chunking is the process of breaking large datasets, documents, or files into smaller, fixed-size or semantically meaningful segments. It is a common data preprocessing step in AI/ML pipelines to manage memory and enable efficient processing.
Cosine similarity measures how similar two vectors are by computing the cosine of the angle between them, ignoring their magnitudes.
Data augmentation is a technique that artificially increases the size and diversity of a training dataset by creating modified versions of existing data samples.
Data labeling is the process of adding tags or annotations to raw data so that machine learning models can learn from it during training.