bootleg.symbols package

Submodules

bootleg.symbols.constants module

Constants.

bootleg.symbols.constants.check_qid_exists(func)[source]

Check QID exists.

bootleg.symbols.constants.edit_op(func)[source]

Edit op.

bootleg.symbols.entity_profile module

Entity profile.

class bootleg.symbols.entity_profile.EntityObj(*, entity_id: str, mentions: List[Tuple[str, float]], title: str, description: str, types: Dict[str, List[str]] = None, relations: List[Dict[str, str]] = None)[source]

Bases: pydantic.main.BaseModel

Base entity object class to check types.

description: str
entity_id: str
mentions: List[Tuple[str, float]]
relations: Optional[List[Dict[str, str]]]
title: str
types: Optional[Dict[str, List[str]]]
class bootleg.symbols.entity_profile.EntityProfile(entity_symbols, type_systems=None, kg_symbols=None, edit_mode=False, verbose=False)[source]

Bases: object

Entity Profile object to handle and manage entity, type, and KG metadata.

add_entity(entity_obj)[source]

Add entity to our dump.

Parameters

entity_obj – JSON object of entity metadata

add_mention(qid: str, mention: str, score: float)[source]

Add the mention with its score to the QID.

Parameters
  • qid – QID

  • mention – mention

  • score – score

add_relation(qid, relation, qid2)[source]

Add the relation triple.

Parameters
  • qid – head QID

  • relation – relation

  • qid2 – tail QID

add_type(qid, type, type_system)[source]

Add type to QID in for the given type system.

Parameters
  • qid – QID

  • type – type name

  • type_system – type system

get_all_mentions()[source]

Return list of all mentions.

Returns: List of strings

get_all_qids()[source]

Return all entity QIDs.

Returns: List of strings

get_all_types(type_system)[source]

Return list of all type names for a type system.

Parameters

type_system – type system

Returns: List of strings

get_all_typesystems()[source]

Return list of all type systems.

Returns: List of strings

get_desc(qid)[source]

Get the description of an entity QID.

Parameters

qid – entity QID

Returns: string

get_eid(qid)[source]

Get the entity EID (internal number) of an entity QID.

Parameters

qid – entity QID

Returns: integer

get_entities_of_type(typename, type_system)[source]

Get all entities of type typename for type system type_system.

Parameters
  • typename – type name

  • type_system – type system

Returns: List of QIDs

get_mentions(qid)[source]

Get the mentions for the QID.

Parameters

qid – QID

Returns: List of mentions

get_mentions_with_scores(qid)[source]

Get the mentions with thier scores associated with the QID.

Parameters

qid – QID

Returns: List of tuples [mention, score]

get_qid_cands(mention)[source]

Get the entity QID candidates of the mention.

Parameters

mention – mention

Returns: List of QIDs

get_qid_count_cands(mention)[source]

Get the entity QID candidates with their scores of the mention.

Parameters

mention – mention

Returns: List of tuples [QID, score]

get_relations_between(qid, qid2)[source]

Check if two QIDs are connected in KG and returns their relation.

Parameters
  • qid – QID one

  • qid2 – QID two

Returns: string relation or None

get_relations_tails_for_qid(qid)[source]

Get dict of relation to tail qids for given qid.

Parameters

qid – QID

Returns: Dict relation to list of tail qids for that relation

get_title(qid)[source]

Get the title of an entity QID.

Parameters

qid – entity QID

Returns: string

get_type_typeid(type, type_system)[source]

Get the type type id for the type of the type_system system.

Parameters
  • type – type

  • type_system – type system

Returns: type id

get_types(qid, type_system)[source]

Get the type names associated with the given QID for the type_system system.

Parameters
  • qid – QID

  • type_system – type system

Returns: list of typename strings

classmethod load_from_cache(load_dir, edit_mode=False, verbose=False, no_kg=False, no_type=False, type_systems_to_load=None)[source]

Load a pre-saved profile.

Parameters
  • load_dir – load directory

  • edit_mode – edit mode flag, default False

  • verbose – verbose flag, default False

  • no_kg – load kg or not flag, default False

  • no_type – load types or not flag, default False. If True, this will ignore type_systems_to_load.

  • type_systems_to_load – list of type systems to load, default is None which means all types systems

Returns: entity profile object

classmethod load_from_jsonl(profile_file, max_candidates=30, max_types=10, max_kg_connections=100, edit_mode=False)[source]

Load an entity profile from the raw jsonl file.

Each line is a JSON object with entity metadata.

Example object:

{
    "entity_id": "C000",
    "mentions": [["dog", 10.0], ["dogg", 7.0], ["animal", 4.0]],
    "title": "Dog",
    "types": {"hyena": ["animal"], "wiki": ["dog"]},
    "relations": [
        {"relation": "sibling", "object": "Q345"},
        {"relation": "sibling", "object": "Q567"},
    ],
}
Parameters
  • profile_file – file where jsonl data lives

  • max_candidates – maximum entity candidates

  • max_types – maximum types per entity

  • max_kg_connections – maximum KG connections per entity

  • edit_mode – edit mode

Returns: entity profile object

mention_exists(mention)[source]

Check if mention exists.

Parameters

mention – mention

Returns: Boolean

property num_entities_with_pad_and_nocand

Get the number of entities including a PAD and UNK entity.

Returns: integer

prune_to_entities(entities_to_keep)[source]

Remove all entities except those in entities_to_keep.

Parameters

entities_to_keep – List or Set of entities to keep

qid_exists(qid)[source]

Check if QID exists.

Parameters

qid – entity QID

Returns: Boolean

reidentify_entity(qid, new_qid)[source]

Rename qid to new_qid.

Parameters
  • qid – old QID

  • new_qid – new QID

remove_mention(qid, mention)[source]

Remove the mention from being associated with the QID.

Parameters
  • qid – QID

  • mention – mention

remove_relation(qid, relation, qid2)[source]

Remove the relation triple.

Parameters
  • qid – head QID

  • relation – relation

  • qid2 – tail QID

remove_type(qid, type, type_system)[source]

Remove the type from QID in the given type system.

Parameters
  • qid – QID

  • type – type to remove

  • type_system – type system

save(save_dir)[source]

Save the profile.

Parameters

save_dir – save directory

save_to_jsonl(profile_file)[source]

Dump the entity dump to jsonl format.

Parameters

profile_file – file to save the data

update_entity(entity_obj)[source]

Update the metadata associated with the entity.

The entity must already be in our dump to be updated.

Parameters

entity_obj – JSON of entity metadata.

bootleg.symbols.entity_symbols module

Entity symbols.

class bootleg.symbols.entity_symbols.EntitySymbols(alias2qids: Union[Dict[str, list], bootleg.utils.classes.nested_vocab_tries.TwoLayerVocabularyScoreTrie], qid2title: Dict[str, str], qid2desc: Optional[Dict[str, str]] = None, qid2eid: Optional[bootleg.utils.classes.nested_vocab_tries.VocabularyTrie] = None, alias2id: Optional[bootleg.utils.classes.nested_vocab_tries.VocabularyTrie] = None, max_candidates: int = 30, alias_cand_map_dir: str = 'alias2qids', alias_idx_dir: str = 'alias2id', edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]

Bases: object

Entity Symbols class for managing entity metadata.

add_entity(qid, mentions, title, desc='')[source]

Add entity QID to our mappings with its mentions and title.

Parameters
  • qid – QID

  • mentions – List of tuples [mention, score]

  • title – title

  • desc – description

add_mention(qid: str, mention: str, score: float)[source]

Add mention to QID with the associated score.

The mention already exists, error thrown to call set_score instead. If there are already max candidates to that mention, the last candidate of the mention is removed in place of QID.

Parameters
  • qid – QID

  • mention – mention

  • score – score

alias_exists(alias)[source]

Check alias existance.

Parameters

alias – alias string

Returns: boolean

get_alias2qids_dict()[source]

Get the alias2qids mapping.

Key is alias, value is list of candidate tuple of length two of [QID, sort_value].

Returns: Dict alias2qids mapping

get_alias_from_idx(alias_idx)[source]

Get the alias from the numeric index.

Parameters

alias_idx – alias numeric index

Returns: alias string

get_alias_idx(alias)[source]

Get the numeric index of an alias.

Parameters

alias – alias

Returns: integer representation of alias

get_all_alias_vocabtrie()[source]

Get a trie of all aliases.

Returns: Vocab trie of all aliases.

get_all_aliases()[source]

Get all aliases.

Returns: Dict_keys of all aliases

get_all_qids()[source]

Get all QIDs.

Returns: Dict_keys of all QIDs

get_all_titles()[source]

Get all QID titles.

Returns: Dict_values of all titles

get_desc(id)[source]

Get description for QID.

Parameters

id – QID string

Returns: title string

get_eid(id)[source]

Get the QID for the EID.

Parameters

id – EID int

Returns: QID string

get_eid_cands(alias, max_cand_pad=False)[source]

Get the EID candidates for an alias.

Parameters
  • alias – alias

  • max_cand_pad – whether to pad with -1 or not if fewer than max_candidates candidates

Returns: List of EID ints

get_mentions(qid)[source]

Get the mentions for the QID.

Parameters

qid – QID

Returns: List of mentions

get_mentions_with_scores(qid)[source]

Get the mentions and the associated score for the QID.

Parameters

qid – QID

Returns: List of tuples [mention, score]

get_qid(id)[source]

Get the QID associated with EID.

Parameters

id – EID

Returns: QID string

get_qid2eid_dict()[source]

Get the qid2eid mapping.

Returns: Dict qid2eid mapping

get_qid2title_dict()[source]

Get the qid2title mapping.

Returns: Dict qid2title mapping

get_qid_cands(alias, max_cand_pad=False)[source]

Get the QID candidates for an alias.

Parameters
  • alias – alias

  • max_cand_pad – whether to pad with ‘-1’ or not if fewer than max_candidates candidates

Returns: List of QID strings

get_qid_count_cands(alias, max_cand_pad=False)[source]

Get the [QID, sort_value] candidates for an alias.

Parameters
  • alias – alias

  • max_cand_pad – whether to pad with [‘-1’,-1] or not if fewer than max_candidates candidates

Returns: List of [QID, sort_value]

get_title(id)[source]

Get title for QID.

Parameters

id – QID string

Returns: title string

classmethod load_from_cache(load_dir, alias_cand_map_dir='alias2qids', alias_idx_dir='alias2id', edit_mode=False, verbose=False)[source]

Load entity symbols from load_dir.

Parameters
  • load_dir – directory to load from

  • alias_cand_map_dir – alias2qid directory

  • alias_idx_dir – alias2id directory

  • edit_mode – edit mode flag

  • verbose – verbose flag

prune_to_entities(entities_to_keep)[source]

Remove all entities except those in entities_to_keep.

Parameters

entities_to_keep – Set of entities to keep

qid_exists(qid)[source]

Check QID existance.

Parameters

alias – QID string

Returns: boolean

reidentify_entity(old_qid, new_qid)[source]

Rename old_qid to new_qid.

Parameters
  • old_qid – old QID

  • new_qid – new QID

remove_mention(qid, mention)[source]

Remove the mention from those associated with the QID.

Parameters
  • qid – QID

  • mention – mention to remove

save(save_dir)[source]

Dump the entity symbols.

Parameters

save_dir – directory string to save

set_desc(qid: str, desc: str)[source]

Set the description for a QID.

Parameters
  • qid – QID

  • desc – description

set_score(qid: str, mention: str, score: float)[source]

Change the mention QID score and resorts candidates.

Highest score is first.

Parameters
  • qid – QID

  • mention – mention

  • score – score

set_title(qid: str, title: str)[source]

Set the title for a QID.

Parameters
  • qid – QID

  • title – title

bootleg.symbols.kg_symbols module

KG symbols class.

class bootleg.symbols.kg_symbols.KGSymbols(qid2relations: Union[Dict[str, Dict[str, List[str]]], bootleg.utils.classes.nested_vocab_tries.ThreeLayerVocabularyTrie], max_connections: Optional[int] = 50, edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]

Bases: object

KG Symbols class for managing KG metadata.

add_entity(qid, relation_dict)[source]

Add a new entity to our relation mapping.

Parameters
  • qid – QID

  • relation_dict – dictionary of relation -> list of connected other_qids by relation

add_relation(qid, relation, qid2)[source]

Add a relationship triple to our mapping.

If the QID already has max connection through relation, the last other_qid is removed and replaced by qid2.

Parameters
  • qid – head entity QID

  • relation – relation

  • qid2 – tail entity QID:

get_all_relations()[source]

Get all relations in our KG mapping.

Returns: Set

get_qid2relations_dict()[source]

Return a dictionary form of the relation to qid mappings object.

Returns: Dict of relation to head qid to list of tail qids

get_relations_between(qid1, qid2)[source]

Check if two QIDs are connected in KG and returns the relations between then.

Parameters
  • qid1 – QID one

  • qid2 – QID two

Returns: string relation or empty set

get_relations_tails_for_qid(qid)[source]

Get dict of relation to tail qids for given qid.

Parameters

qid – QID

Returns: Dict relation to list of tail qids for that relation

classmethod load_from_cache(load_dir, prefix='', edit_mode=False, verbose=False)[source]

Load type symbols from load_dir.

Parameters
  • load_dir – directory to load from

  • prefix – prefix to add to beginning to file

  • edit_mode – edit mode

  • verbose – verbose flag

Returns: TypeSymbols

prune_to_entities(entities_to_keep)[source]

Remove all entities except those in entities_to_keep.

Parameters

entities_to_keep – Set of entities to keep

reidentify_entity(old_qid, new_qid)[source]

Rename old_qid to new_qid.

Parameters
  • old_qid – old QID

  • new_qid – new QID

remove_relation(qid, relation, qid2)[source]

Remove a relation triple from our mapping.

Parameters
  • qid – head entity QID

  • relation – relation

  • qid2 – tail entity QID

save(save_dir, prefix='')[source]

Dump the kg symbols.

Parameters
  • save_dir – directory string to save

  • prefix – prefix to add to beginning to file

bootleg.symbols.type_symbols module

Type symbols class.

class bootleg.symbols.type_symbols.TypeSymbols(qid2typenames: Union[Dict[str, List[str]], bootleg.utils.classes.nested_vocab_tries.TwoLayerVocabularyScoreTrie], max_types: Optional[int] = 10, edit_mode: Optional[bool] = False, verbose: Optional[bool] = False)[source]

Bases: object

Type Symbols class for managing type metadata.

add_entity(qid, types)[source]

Add an entity QID with its types to our mappings.

Parameters
  • qid – QID

  • types – list of type names

add_type(qid, typename)[source]

Add the type to the QID.

If the QID already has maximum types, the last type is removed and replaced by typename.

Parameters
  • qid – QID

  • typename – type name

get_all_types()[source]

Return all typenames.

get_entities_of_type(typename)[source]

Get all entity QIDs of type typename.

Parameters

typename – typename

Returns: List

get_qid2typename_dict()[source]

Return dictionary of qid to typenames.

Returns: Dict of QID to list of typenames.

get_types(qid)[source]

Get the type names associated with the given QID.

Parameters

qid – QID

Returns: list of typename strings

classmethod load_from_cache(load_dir, prefix='', edit_mode=False, verbose=False)[source]

Load type symbols from load_dir.

Parameters
  • load_dir – directory to load from

  • prefix – prefix to add to beginning to file

  • edit_mode – edit mode flag

  • verbose – verbose flag

Returns: TypeSymbols

prune_to_entities(entities_to_keep)[source]

Remove all entities except those in entities_to_keep.

Parameters

entities_to_keep – Set of entities to keep

reidentify_entity(old_qid, new_qid)[source]

Rename old_qid to new_qid.

Parameters
  • old_qid – old QID

  • new_qid – new QID

remove_type(qid, typename)[source]

Remove the type from the QID.

Parameters
  • qid – QID

  • typename – type name to remove

save(save_dir, prefix='')[source]

Dump the type symbols.

Parameters
  • save_dir – directory string to save

  • prefix – prefix to add to beginning to file

Module contents

Symbols init.