Data & runtime — PDF / Excel / Python / APT

The data-extraction and runtime tool namespaces — reading and searching PDFs, reading/writing Excel workbooks, the persistent Python runtime, and local package management.

The tools for pulling data out of documents and crunching it: PDF extraction, Excel I/O, a persistent Python runtime, and package management. All operate on the Leif host — paths are Leif-workspace paths, and apt_* installs on Leif. For files on another host, get them there first (see Shell & file tools).

PDF

read_pdf(file_path, start_page=None, end_page=None, max_chars=100000)
extract_pdf_tables(file_path, page=None)
search_pdf(file_path, search_term, case_sensitive=False, max_results=50)
get_pdf_info(file_path)

read_pdf pulls text (bounded by max_chars and an optional page range); extract_pdf_tables is the one for tabular data (omit page for all pages); search_pdf finds a term without reading the whole document; get_pdf_info returns metadata (page count, etc.). Useful for vendor price lists and quotes that arrive as PDFs.

Excel

read_excel(path, sheet_name=0, header_row=0)
write_excel(path, data, sheet_name="Sheet1")
append_to_excel(path, data, sheet_name=None)
convert_excel_to_csv(excel_path, csv_path=None, sheet_name=0)
search_excel(path, search_term, sheet_name=None, case_sensitive=False)
get_excel_info(path)

sheet_name accepts an index (0) or a name. For write_excel / append_to_excel, data is a list of row objects (dicts). convert_excel_to_csv is handy for getting a workbook into the CSV shape the pricing importer wants — though the import itself still reads from nvrbackup (see Pricing App File Import).

Python runtime

A persistent Python environment — variables defined in one execute_python call survive into the next unless you clear them.

execute_python(code, clear_namespace=False, timeout=None)
get_python_namespace()
install_python_package(package)
list_python_packages()
validate_script(file_path, run_checks=True)

clear_namespace=True resets the environment for a clean run; get_python_namespace shows what’s currently defined. Install missing deps with install_python_package. validate_script syntax-checks a file before you run it. This is the escape hatch for any data wrangling the dedicated tools don’t cover.

Local packages (APT)

apt_install(packages, update_cache=True)
apt_search(search_term)

Installs on the Leif host. packages is a list.