__init__.py 1.3 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445
  1. """Gumbo HTML parser.
  2. These are the Python bindings for Gumbo. All public API classes and functions
  3. are exported from this module. They include:
  4. - CTypes representations of all structs and enums defined in gumbo.h. The
  5. naming convention is to take the C name and strip off the "Gumbo" prefix.
  6. - A low-level wrapper around the gumbo_parse function, returning the classes
  7. exposed above. Usage:
  8. import gumbo
  9. with gumboc.parse(text, **options) as output:
  10. do_stuff_with_doctype(output.document)
  11. do_stuff_with_parse_tree(output.root)
  12. - Higher-level bindings that mimic the API provided by html5lib. Usage:
  13. from gumbo import html5lib
  14. This requires that html5lib be installed (it uses their treebuilders), and is
  15. intended as a drop-in replacement.
  16. - Similarly, higher-level bindings that mimic BeautifulSoup and return
  17. BeautifulSoup objects. For this, use:
  18. import gumbo
  19. soup = gumbo.soup_parse(text, **options)
  20. It will give you back a soup object like BeautifulSoup.BeautifulSoup(text).
  21. """
  22. from gumbo.gumboc import *
  23. try:
  24. from gumbo import html5lib_adapter as html5lib
  25. except ImportError:
  26. # html5lib not installed
  27. pass
  28. try:
  29. from gumbo.soup_adapter import parse as soup_parse
  30. except ImportError:
  31. # BeautifulSoup not installed
  32. pass