bootleg.symbols package¶
Submodules¶
bootleg.symbols.constants module¶
Constants.
bootleg.symbols.entity_profile module¶
Entity profile.
- class bootleg.symbols.entity_profile.EntityObj(*, entity_id: str, mentions: List[Tuple[str, float]], title: str, description: str, types: Dict[str, List[str]] = None, relations: List[Dict[str, str]] = None)[source]¶
Bases:
pydantic.main.BaseModel
Base entity object class to check types.
- description: str¶
- entity_id: str¶
- mentions: List[Tuple[str, float]]¶
- relations: Optional[List[Dict[str, str]]]¶
- title: str¶
- types: Optional[Dict[str, List[str]]]¶
- class bootleg.symbols.entity_profile.EntityProfile(entity_symbols, type_systems=None, kg_symbols=None, edit_mode=False, verbose=False)[source]¶
Bases:
object
Entity Profile object to handle and manage entity, type, and KG metadata.
- add_entity(entity_obj)[source]¶
Add entity to our dump.
- Parameters
entity_obj – JSON object of entity metadata
- add_mention(qid: str, mention: str, score: float)[source]¶
Add the mention with its score to the QID.
- Parameters
qid – QID
mention – mention
score – score
- add_relation(qid, relation, qid2)[source]¶
Add the relation triple.
- Parameters
qid – head QID
relation – relation
qid2 – tail QID
- add_type(qid, type, type_system)[source]¶
Add type to QID in for the given type system.
- Parameters
qid – QID
type – type name
type_system – type system
- get_all_types(type_system)[source]¶
Return list of all type names for a type system.
- Parameters
type_system – type system
Returns: List of strings
- get_desc(qid)[source]¶
Get the description of an entity QID.
- Parameters
qid – entity QID
Returns: string
- get_eid(qid)[source]¶
Get the entity EID (internal number) of an entity QID.
- Parameters
qid – entity QID
Returns: integer
- get_entities_of_type(typename, type_system)[source]¶
Get all entities of type
typename
for type systemtype_system
.- Parameters
typename – type name
type_system – type system
Returns: List of QIDs
- get_mentions(qid)[source]¶
Get the mentions for the QID.
- Parameters
qid – QID
Returns: List of mentions
- get_mentions_with_scores(qid)[source]¶
Get the mentions with thier scores associated with the QID.
- Parameters
qid – QID
Returns: List of tuples [mention, score]
- get_qid_cands(mention)[source]¶
Get the entity QID candidates of the mention.
- Parameters
mention – mention
Returns: List of QIDs
- get_qid_count_cands(mention)[source]¶
Get the entity QID candidates with their scores of the mention.
- Parameters
mention – mention
Returns: List of tuples [QID, score]
- get_relations_between(qid, qid2)[source]¶
Check if two QIDs are connected in KG and returns their relation.
- Parameters
qid – QID one
qid2 – QID two
Returns: string relation or None
- get_relations_tails_for_qid(qid)[source]¶
Get dict of relation to tail qids for given qid.
- Parameters
qid – QID
Returns: Dict relation to list of tail qids for that relation
- get_type_typeid(type, type_system)[source]¶
Get the type type id for the type of the
type_system
system.- Parameters
type – type
type_system – type system
Returns: type id
- get_types(qid, type_system)[source]¶
Get the type names associated with the given QID for the
type_system
system.- Parameters
qid – QID
type_system – type system
Returns: list of typename strings
- classmethod load_from_cache(load_dir, edit_mode=False, verbose=False, no_kg=False, no_type=False, type_systems_to_load=None)[source]¶
Load a pre-saved profile.
- Parameters
load_dir – load directory
edit_mode – edit mode flag, default False
verbose – verbose flag, default False
no_kg – load kg or not flag, default False
no_type – load types or not flag, default False. If True, this will ignore type_systems_to_load.
type_systems_to_load – list of type systems to load, default is None which means all types systems
Returns: entity profile object
- classmethod load_from_jsonl(profile_file, max_candidates=30, max_types=10, max_kg_connections=100, edit_mode=False)[source]¶
Load an entity profile from the raw jsonl file.
Each line is a JSON object with entity metadata.
Example object:
{ "entity_id": "C000", "mentions": [["dog", 10.0], ["dogg", 7.0], ["animal", 4.0]], "title": "Dog", "types": {"hyena": ["animal"], "wiki": ["dog"]}, "relations": [ {"relation": "sibling", "object": "Q345"}, {"relation": "sibling", "object": "Q567"}, ], }
- Parameters
profile_file – file where jsonl data lives
max_candidates – maximum entity candidates
max_types – maximum types per entity
max_kg_connections – maximum KG connections per entity
edit_mode – edit mode
Returns: entity profile object
- mention_exists(mention)[source]¶
Check if mention exists.
- Parameters
mention – mention
Returns: Boolean
- property num_entities_with_pad_and_nocand¶
Get the number of entities including a PAD and UNK entity.
Returns: integer
- prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in
entities_to_keep
.- Parameters
entities_to_keep – List or Set of entities to keep
- reidentify_entity(qid, new_qid)[source]¶
Rename
qid
tonew_qid
.- Parameters
qid – old QID
new_qid – new QID
- remove_mention(qid, mention)[source]¶
Remove the mention from being associated with the QID.
- Parameters
qid – QID
mention – mention
- remove_relation(qid, relation, qid2)[source]¶
Remove the relation triple.
- Parameters
qid – head QID
relation – relation
qid2 – tail QID
- remove_type(qid, type, type_system)[source]¶
Remove the type from QID in the given type system.
- Parameters
qid – QID
type – type to remove
type_system – type system
bootleg.symbols.entity_symbols module¶
Entity symbols.
- class bootleg.symbols.entity_symbols.EntitySymbols(alias2qids: Union[Dict[str, list], bootleg.utils.classes.nested_vocab_tries.TwoLayerVocabularyScoreTrie], qid2title: Dict[str, str], qid2desc: Optional[Dict[str, str]] = None, qid2eid: Optional[bootleg.utils.classes.nested_vocab_tries.VocabularyTrie] = None, alias2id: Optional[bootleg.utils.classes.nested_vocab_tries.VocabularyTrie] = None, max_candidates: int = 30, alias_cand_map_dir: str = 'alias2qids', alias_idx_dir: str = 'alias2id', edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]¶
Bases:
object
Entity Symbols class for managing entity metadata.
- add_entity(qid, mentions, title, desc='')[source]¶
Add entity QID to our mappings with its mentions and title.
- Parameters
qid – QID
mentions – List of tuples [mention, score]
title – title
desc – description
- add_mention(qid: str, mention: str, score: float)[source]¶
Add mention to QID with the associated score.
The mention already exists, error thrown to call
set_score
instead. If there are already max candidates to that mention, the last candidate of the mention is removed in place of QID.- Parameters
qid – QID
mention – mention
score – score
- alias_exists(alias)[source]¶
Check alias existance.
- Parameters
alias – alias string
Returns: boolean
- get_alias2qids_dict()[source]¶
Get the alias2qids mapping.
Key is alias, value is list of candidate tuple of length two of [QID, sort_value].
Returns: Dict alias2qids mapping
- get_alias_from_idx(alias_idx)[source]¶
Get the alias from the numeric index.
- Parameters
alias_idx – alias numeric index
Returns: alias string
- get_alias_idx(alias)[source]¶
Get the numeric index of an alias.
- Parameters
alias – alias
Returns: integer representation of alias
- get_eid_cands(alias, max_cand_pad=False)[source]¶
Get the EID candidates for an alias.
- Parameters
alias – alias
max_cand_pad – whether to pad with -1 or not if fewer than max_candidates candidates
Returns: List of EID ints
- get_mentions(qid)[source]¶
Get the mentions for the QID.
- Parameters
qid – QID
Returns: List of mentions
- get_mentions_with_scores(qid)[source]¶
Get the mentions and the associated score for the QID.
- Parameters
qid – QID
Returns: List of tuples [mention, score]
- get_qid_cands(alias, max_cand_pad=False)[source]¶
Get the QID candidates for an alias.
- Parameters
alias – alias
max_cand_pad – whether to pad with ‘-1’ or not if fewer than max_candidates candidates
Returns: List of QID strings
- get_qid_count_cands(alias, max_cand_pad=False)[source]¶
Get the [QID, sort_value] candidates for an alias.
- Parameters
alias – alias
max_cand_pad – whether to pad with [‘-1’,-1] or not if fewer than max_candidates candidates
Returns: List of [QID, sort_value]
- classmethod load_from_cache(load_dir, alias_cand_map_dir='alias2qids', alias_idx_dir='alias2id', edit_mode=False, verbose=False)[source]¶
Load entity symbols from load_dir.
- Parameters
load_dir – directory to load from
alias_cand_map_dir – alias2qid directory
alias_idx_dir – alias2id directory
edit_mode – edit mode flag
verbose – verbose flag
- prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in
entities_to_keep
.- Parameters
entities_to_keep – Set of entities to keep
- reidentify_entity(old_qid, new_qid)[source]¶
Rename
old_qid
tonew_qid
.- Parameters
old_qid – old QID
new_qid – new QID
- remove_mention(qid, mention)[source]¶
Remove the mention from those associated with the QID.
- Parameters
qid – QID
mention – mention to remove
- set_desc(qid: str, desc: str)[source]¶
Set the description for a QID.
- Parameters
qid – QID
desc – description
bootleg.symbols.kg_symbols module¶
KG symbols class.
- class bootleg.symbols.kg_symbols.KGSymbols(qid2relations: Union[Dict[str, Dict[str, List[str]]], bootleg.utils.classes.nested_vocab_tries.ThreeLayerVocabularyTrie], max_connections: Optional[int] = 50, edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]¶
Bases:
object
KG Symbols class for managing KG metadata.
- add_entity(qid, relation_dict)[source]¶
Add a new entity to our relation mapping.
- Parameters
qid – QID
relation_dict – dictionary of relation -> list of connected other_qids by relation
- add_relation(qid, relation, qid2)[source]¶
Add a relationship triple to our mapping.
If the QID already has max connection through
relation
, the lastother_qid
is removed and replaced byqid2
.- Parameters
qid – head entity QID
relation – relation
qid2 – tail entity QID:
- get_qid2relations_dict()[source]¶
Return a dictionary form of the relation to qid mappings object.
Returns: Dict of relation to head qid to list of tail qids
- get_relations_between(qid1, qid2)[source]¶
Check if two QIDs are connected in KG and returns the relations between then.
- Parameters
qid1 – QID one
qid2 – QID two
Returns: string relation or empty set
- get_relations_tails_for_qid(qid)[source]¶
Get dict of relation to tail qids for given qid.
- Parameters
qid – QID
Returns: Dict relation to list of tail qids for that relation
- classmethod load_from_cache(load_dir, prefix='', edit_mode=False, verbose=False)[source]¶
Load type symbols from load_dir.
- Parameters
load_dir – directory to load from
prefix – prefix to add to beginning to file
edit_mode – edit mode
verbose – verbose flag
Returns: TypeSymbols
- prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in
entities_to_keep
.- Parameters
entities_to_keep – Set of entities to keep
- reidentify_entity(old_qid, new_qid)[source]¶
Rename
old_qid
tonew_qid
.- Parameters
old_qid – old QID
new_qid – new QID
bootleg.symbols.type_symbols module¶
Type symbols class.
- class bootleg.symbols.type_symbols.TypeSymbols(qid2typenames: Union[Dict[str, List[str]], bootleg.utils.classes.nested_vocab_tries.TwoLayerVocabularyScoreTrie], max_types: Optional[int] = 10, edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]¶
Bases:
object
Type Symbols class for managing type metadata.
- add_entity(qid, types)[source]¶
Add an entity QID with its types to our mappings.
- Parameters
qid – QID
types – list of type names
- add_type(qid, typename)[source]¶
Add the type to the QID.
If the QID already has maximum types, the last type is removed and replaced by
typename
.- Parameters
qid – QID
typename – type name
- get_entities_of_type(typename)[source]¶
Get all entity QIDs of type
typename
.- Parameters
typename – typename
Returns: List
- get_qid2typename_dict()[source]¶
Return dictionary of qid to typenames.
Returns: Dict of QID to list of typenames.
- get_types(qid)[source]¶
Get the type names associated with the given QID.
- Parameters
qid – QID
Returns: list of typename strings
- classmethod load_from_cache(load_dir, prefix='', edit_mode=False, verbose=False)[source]¶
Load type symbols from load_dir.
- Parameters
load_dir – directory to load from
prefix – prefix to add to beginning to file
edit_mode – edit mode flag
verbose – verbose flag
Returns: TypeSymbols
- prune_to_entities(entities_to_keep)[source]¶
Remove all entities except those in
entities_to_keep
.- Parameters
entities_to_keep – Set of entities to keep
- reidentify_entity(old_qid, new_qid)[source]¶
Rename
old_qid
tonew_qid
.- Parameters
old_qid – old QID
new_qid – new QID
Module contents¶
Symbols init.