Metadata-Version: 2.1
Name: playwrightcapture
Version: 1.15.7
Summary: A simple library to capture websites using playwright
Home-page: https://github.com/Lookyloo/PlaywrightCapture
License: BSD-3-Clause
Author: Raphaël Vinot
Author-email: raphael.vinot@circl.lu
Requires-Python: >=3.8,<4.0
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Telecommunications Industry
Classifier: License :: OSI Approved :: BSD License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet
Classifier: Topic :: Security
Provides-Extra: recaptcha
Requires-Dist: SpeechRecognition (>=3.8.1,<4.0.0); extra == "recaptcha"
Requires-Dist: beautifulsoup4 (>=4.11.1,<5.0.0)
Requires-Dist: dateparser (>=1.1.1,<2.0.0)
Requires-Dist: lxml (>=4.9.1,<5.0.0)
Requires-Dist: playwright (>=1.27.0,<2.0.0)
Requires-Dist: pydub (>=0.25.1,<0.26.0); extra == "recaptcha"
Requires-Dist: requests (>=2.28.1,<3.0.0); extra == "recaptcha"
Requires-Dist: w3lib (>=2.0.1,<3.0.0)
Project-URL: Repository, https://github.com/Lookyloo/PlaywrightCapture
Description-Content-Type: text/markdown

# Playwright Capture

Simple replacement for [splash](https://github.com/scrapinghub/splash) using [playwright](https://github.com/microsoft/playwright-python).

# Install

```bash
pip install playwrightcapture
```

# Usage

A very basic example:

```python
from playwrightcapture import Capture

async with Capture() as capture:
    await capture.prepare_context()
    entries = await capture.capture_page(url)
```

Entries is a dictionaries that contains (if all goes well) the HAR, the screenshot, all the cookies of the session, the URL as it is in the browser at the end of the capture, and the full HTML page as rendered.


# reCAPTCHA bypass

No blackmagic, it is just a reimplementation of a [well known technique](https://github.com/NikolaiT/uncaptcha3)
as implemented [there](https://github.com/Binit-Dhakal/Google-reCAPTCHA-v3-solver-using-playwright-python),
and [there](https://github.com/embium/solverecaptchas).

This modules will try to bypass reCAPTCHA protected websites if you install it this way:

```bash
pip install playwrightcapture[recaptcha]
```

This will install `requests`, `pydub` and `SpeechRecognition`. In order to work, `pydub`
requires `ffmpeg` or `libav`, look at the [install guide ](https://github.com/jiaaro/pydub#installation)
for more details.
`SpeechRecognition` uses the Google Speech Recognition API to turn the audio file into text (I hope you appreciate the irony).

