Metadata-Version: 2.4
Name: scanc
Version: 1.2.2
Summary: AI-ready code-base scanner that outputs Markdown or XML.
Home-page: https://github.com/mqxym/scanc
Author: mqxym
Author-email: maxim@omg.lol
License: MIT
Project-URL: Source, https://github.com/mqxym/scanc
Project-URL: Bug Tracker, https://github.com/mqxym/scanc/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Environment :: Console
Classifier: Topic :: Software Development :: Code Generators
Classifier: Topic :: Software Development :: Documentation
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENCE
Requires-Dist: click>=8.0
Requires-Dist: treelib>=1.6.1
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: tox; extra == "dev"
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.7; extra == "tiktoken"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# scanc

![Test Status](https://github.com/mqxym/scanc/actions/workflows/python-publish.yml/badge.svg)

> scanc = scan c(ode) <br>
> A fast, pure‑Python project code‑scanner that outputs clean, AI‑ready Markdown or XML.

`scanc` helps you **spill an entire codebase into an LLM prompt** (or a file) in seconds—while keeping noise low, controlling token budgets, and giving you full visibility.

---

## Features

| Feature                         | Description                                                      |
| ------------------------------- | ---------------------------------------------------------------- |
|  **Blazing Fast, Pure‑Python** | Zero native dependencies; easy to install and run anywhere.      |
|  **Smart Default Ignores**    | Automatically skips `node_modules`, `.venv`, `.git`, and more.   |
|  **Flexible Filters**         | Include/exclude by *extension*, *filename*, or *regex* patterns. |
|  **Optional Directory Tree**  | Prepend a fenced tree diagram of your project structure.         |
|  **Token Counter**            | Estimate LLM token costs with `tiktoken` before you paste.       |
|  **Cross‑Platform CLI**       | Works on macOS, Linux, and Windows out of the box.               |

---

## Installation

```bash
# Optional: Use a virutal environment
python3 -m venv --prompt scanc-env .venv
source .venv/bin/activate

pip install scanc[tiktoken]  # installs optional token‑counter support
```

## Quickstart

Scan a directory and emit Markdown:

```bash
scanc .                         # scan current folder
scanc -e py,js --tree           # only .py and .js files + directory tree
scanc -f xml                    # output scan in xml format (new in v1.2.0)
scanc -e py -x "tests" | less   # only py files exclude tests in path
scanc --tokens gpt-4o           # show token count for gpt 4o only
scanc -e py | pbcopy            # scan and copy (macOS copy command example)
```

Write output directly to a file:

```bash
scanc -e ts --tree -o scan.md src/
cat scan.md
```

---

## CLI Reference

```bash
scanc [OPTIONS] [PATHS...]
```

* `-e, --ext EXTS`          Comma‑separated extensions to include (e.g. `py,js`).
* `-i, --include-regex`     Regex patterns to include (full path match).
* `-x, --exclude-regex`     Regex patterns to exclude (full path match).
* `--no-default-excludes`   Disable built‑in ignore list.
* `-t, --tree`              Prepend directory tree (fenced code block).
* `-T, --tokens MODEL`      Output only token count for given LLM model.
* `--max-size BYTES`        Skip files larger than BYTES (default 1 MiB).
* `--follow-symlinks`       Traverse symlinks when scanning.
* `-o, --out OUTFILE`       Write result to `OUTFILE` instead of stdout.
* `-f, --format FORMAT`     Output format (default: `markdown`).
* `-V, --version`           Show version and exit.

## Integration & Extensibility

- **Formatter Hook:** Customize output by passing your own formatter via entry points.
- **Extras:** Use `scanc[tiktoken]` to enable token counting; more extras may follow.

## Docker usage

A ready-to-run container is published to GitHub Container Registry (GHCR).
It runs as **non-root** and scans the **mounted host directory** by default.

### Pull

```bash
docker pull ghcr.io/mqxym/scanc:latest
```

### Scan the current project (read-only mount)

```bash
# Linux/macOS (Bash/Zsh)
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc:latest .

# Windows PowerShell
docker run --rm -v "${PWD}:/work:ro" ghcr.io/mqxym/scanc:latest .
```

Because the container’s `WORKDIR` is `/work` and `ENTRYPOINT` is `scanc`,
passing `.` scans your host’s current folder.

### Write output to a file

Either redirect on the host:

```bash
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc:latest -e py --tree . > scan.md
```

...or mount as **writable** and write into `/work`:

```bash
docker run --rm -v "$PWD":/work ghcr.io/mqxym/scanc:latest -e py --tree -o /work/scan.md .
```

> Tip (Linux/macOS): preserve file ownership when writing by mapping your UID/GID
>
> ```bash
> docker run --rm \
>   --user "$(id -u)":"$(id -g)" \
>   -v "$PWD":/work ghcr.io/mqxym/scanc:latest -o /work/scan.md .
> ```

### Examples

```bash
# Only Python & JS files, include directory tree
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc:latest -e py,js --tree .

# Token count only (requires optional 'tiktoken' which is baked into the image)
docker run --rm -v "$PWD":/work:ro ghcr.io/mqxym/scanc:latest --tokens gpt-4o .
```

## Licence

Released under the MIT Licence. See [LICENCE](LICENCE) for details.
