# hypothesmith
Hypothesis strategies for generating Python programs, something like CSmith.

This is definitely pre-alpha, but if you want to play with it feel free!
You can even keep the shiny pieces when - not if - it breaks.

Get it today with [`pip install hypothesmith`](https://pypi.org/project/hypothesmith/),
or by cloning [the GitHub repo](https://github.com/Zac-HD/hypothesmith).

You can run the tests, such as they are, with `tox` on Python 3.6 or later.
Use `tox -va` to see what environments are available.

## Usage
This package provides two Hypothesis strategies for generating Python source code.

The generated code will always be syntatically valid, and is useful for testing
parsers, linters, auto-formatters, and other tools that operate on source code.

> DO NOT EXECUTE CODE GENERATED BY THESE STRATEGIES.
>
> It could do literally anything that running Python code is able to do,
> including changing, deleting, or uploading important data.  Arbitrary
> code can be useful, but "arbitrary code execution" can be very, very bad.

#### `hypothesmith.from_grammar(start="file_input", *, auto_target=True)`

Generates syntactically-valid Python source code based on the grammar.

Valid values for ``start`` are ``"single_input"``, ``"file_input"``, or
``"eval_input"``; respectively a single interactive statement, a module or
sequence of commands read from a file, and input for the eval() function.

If ``auto_target`` is ``True``, this strategy uses ``hypothesis.target()``
internally to drive towards larger and more complex examples.  We recommend
leaving this enabled, as the grammar is quite complex and only simple examples
tend to be generated otherwise.

#### `hypothesmith.from_node(node=libcst.Module, *, auto_target=True)`

Generates syntactically-valid Python source code based on the node types
defined by the [`LibCST`](https://libcst.readthedocs.io/en/latest/) project.

You can pass any subtype of `libcst.CSTNode`.  Alternatively, you can use
Hypothesis' built-in `from_type(node_type).map(lambda n: libcst.Module([n]).code`,
after Hypothesmith has registered the required strategies.  However, this does
not include automatic targeting and limitations of LibCST may lead to invalid
code being generated.

## Notable bugs found with Hypothesmith
- [BPO-38953](https://bugs.python.org/issue38953) `tokenize` -> `untokenize` roundtrip bugs.
- [`lib2to3` errors on \r in comment](https://github.com/psf/black/issues/970)
- [Black fails on files ending in a backslash](https://github.com/psf/black/issues/1012)
- [At least three round-trip bugs in LibCST](https://github.com/Instagram/LibCST#acknowledgements)
  (search commits for "hypothesis")
- [Invalid code generated by LibCST](https://github.com/Instagram/LibCST/issues/287)

## Changelog

### 0.1.1 - 2020-05-17
- Emit some debug info to help diagnose a possible upstream bug in CPython

### 0.1.0 - 2020-04-24
- Added `auto_target=True` argument to the `from_node()` strategy.
- Improved `from_node()` generation of comments and trailing whitespace.

### 0.0.8 - 2020-04-23
- Added a `from_node()` strategy which uses [`LibCST`](https://pypi.org/project/libcst/)
  to generate source code.  This is a proof-of-concept rather than a robust tool,
  but IMO it's a pretty cool concept.

### 0.0.7 - 2020-04-19
- The `from_grammar()` strategy now takes an `auto_target=True` argument, to
drive generated examples towards (relatively) larger and more complex programs.

### 0.0.6 - 2020-04-08
- support for non-ASCII identifiers

### 0.0.5 - 2019-11-27
- Updated project metadata and started testing on Python 3.8

### 0.0.4 - 2019-09-10
- Depends on more recent Hypothesis version, with upstreamed grammar generation.
- Improved filtering rejects fewer valid examples, finding another bug in Black.

### 0.0.3 - 2019-08-08
Checks validity at statement level, which makes filtering much more efficient.
Improved testing, input validation, and code comments.

### 0.0.2 - 2019-08-07
Improved filtering and fixing of source code generated from the grammar.
This version found a novel bug: `"pass #\\r#\\n"` is accepted by the
built-in `compile()` and `exec()` functions, but not by `black` or `lib2to3`.

### 0.0.1 - 2019-08-06
Initial release.  This is a minimal proof of concept, generating from the
grammar and rejecting it if we get errors from `black` or `tokenize`.
Cool, but while promising not very useful at this stage.
