Skip to content

Commit

Permalink
docs: update documentation for CLI
Browse files Browse the repository at this point in the history
  • Loading branch information
anonrig committed May 15, 2023
1 parent f558210 commit e166e55
Showing 1 changed file with 73 additions and 79 deletions.
152 changes: 73 additions & 79 deletions docs/cli.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
## adaparse
## Command line interface (CLI)

The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, normalizes and queries them efficiently.

Expand All @@ -13,56 +13,93 @@ The adaparse command tool takes URL strings (ASCII/UTF-8) and it validates, norm
- `-p`, `--path`: Process all the URLs in a given file
- `-o`, `--output`: Output the results of the parsing to a file

### Usage/Examples:
### Performance

Well-formatted URL:
Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of
sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes.
The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0).

```bash
adaparse "http://www.google.com"
<details>
<summary>With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about **three times faster than trurl**.</summary>
```
Output:
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
```
http://www.google.com
</details>

<details>
<summary>With the top100 dataset, the adaparse tool is **twice as fast as the trurl**.</summary>
```
time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
```
</details>


#### Comparison

Ill-formatted URL:
```
wikipedia 100k
ada ▏ 55 ms ███████▋
trurl ▏ 180 ms █████████████████████████
top100
ada ▏ 61 ms █████████████▍
trurl ▏ 113 ms █████████████████████████
```

The results will vary depending on your system. We invite you to run your own benchmarks.

### Usage/Examples

#### Well-formatted URL

```bash
adaparse "h^tp:ws:/www.g00g.com"
adaparse "http://www.google.com"
```
Output:

```
Invalid URL: h^tp:ws:/www.g00g.com
http://www.google.com
```


Diagram flag:
#### Diagram

```bash
adaparse -d http://www.google.com/bal\?a\=\=11\#fddfds
```
```

Output:

```
http://www.google.com/bal?a==11#fddfds [38 bytes]
| | | | |
| | | | `------ hash_start
| | | `------------ search_start 25
| | `---------------- pathname_start 21
| | `---------------- host_end 21
| `------------------------------ host_start 7
| `------------------------------ username_end 7
`-------------------------------- protocol_end 5
```
http://www.google.com/bal?a==11#fddfds [38 bytes]
| | | | |
| | | | `------ hash_start
| | | `------------ search_start 25
| | `---------------- pathname_start 21
| | `---------------- host_end 21
| `------------------------------ host_start 7
| `------------------------------ username_end 7
`-------------------------------- protocol_end 5
```

#### Pipe Operator


### Piping Example

Ada can process URLs from piped input, making it easy to integrate with other command-line tools that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada. Given a list of URLs, one by line, we may query the normalized URL string (`href`) and detect any malformed URL:
Ada can process URLs from piped input, making it easy to integrate with other command-line tools
that produce ASCII or UTF-8 outputs. Here's an example of how to pipe the output of another command into Ada.
Given a list of URLs, one by line, we may query the normalized URL string (`href`) and detect any malformed URL:

```bash
cat dragonball_url.txt | adaparse --get href
Expand Down Expand Up @@ -95,14 +132,16 @@ www.gohan.com
If you omit `-g`, it will only provide a list of invalid URLs. This might be
useful if you want to valid quickly a list of URLs.

### Benchmark Runner

The benchmark flag can be used to output the time it takes to process piped input:

```bash
cat wikipedia_100k.txt | adaparse -b
```

```bash
Output:
```
Invalid URL: 1968:_Die_Kinder_der_Diktatur
Invalid URL: 58957:_The_Bluegrass_Guitar_Collection
Invalid URL: 650luc:_Gangsta_Grillz
Expand All @@ -120,26 +159,29 @@ read 5209265 bytes in 32819917 ns using 100000 lines, used 160 loads
0.1587226744053009 GB/s
```

#### Saving result to file system

There is an option to output to a file on disk:

```bash
cat wikipedia_100k.txt | adaparse -o wiki_output.txt
```

as well as read in from a file on disk without going through cat:
As well as read in from a file on disk without going through cat:

```bash
adaparse -p wikipedia_top_100_txt
```

#### Advanced Usage

You may also combine different flags together. E.g. Say one wishes to extract only the host from URLs stored in wikipedia.txt and output it to the test_write.txt file:

```bash
adaparse" -p wikipedia_top100.txt -o test_write.txt -g host -b
```
Console output:
Output:
```bash
read 5209265 bytes in 26737131 ns using 100000 lines, total_bytes is 5209265 used 160 loads
0.19483260937757307 GB/s(base)
Expand All @@ -160,51 +202,3 @@ en.wikipedia.org
en.wikipedia.org
(---snip---)
```
### Performance
Our `adaparse` tool may outperform other popular alternatives. We offer a [collection of
sets of URLs](https://github.com/ada-url/url-various-datasets) for benchmarking purposes.
The following results are on a MacBook Air 2022 (M2 processor) using LLVM 14. We
compare against [trurl](https://github.com/curl/trurl) version 0.6 (libcurl/7.87.0).
<details><summary>
With the wikipedia_100k dataset, we get that adaparse can generate normalized URLs about three
times faster than trurl.</summary>
<pre>
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| trurl --url-file - &> /dev/null 1
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,01s system 3% cpu 0,179 total
trurl --url-file - &> /dev/null 0,14s user 0,03s system 98% cpu 0,180 total
time cat url-various-datasets/wikipedia/wikipedia_100k.txt| ./build/tools/cli/adaparse -g href &> /dev/null
cat url-various-datasets/wikipedia/wikipedia_100k.txt 0,00s user 0,00s system 10% cpu 0,056 total
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 93% cpu 0,055 total
</pre>
</details>
<details><summary>With the top100 dataset, the adaparse tool is twice as fast as the trurl.</summary>
<pre>
time cat url-various-datasets/top100/top100.txt| trurl --url-file - &> /dev/null 1
cat url-various-datasets/top100/top100.txt 0,00s user 0,00s system 4% cpu 0,115 total
trurl --url-file - &> /dev/null 0,09s user 0,02s system 97% cpu 0,113 total
time cat url-various-datasets/top100/top100.txt| ./build/tools/cli/adaparse -g href &> /dev/null
cat url-various-datasets/top100/top100.txt 0,00s user 0,01s system 11% cpu 0,062 total
./build/tools/cli/adaparse -g href &> /dev/null 0,05s user 0,00s system 94% cpu 0,061 total
</pre>
</details>
The results will vary depending on your system. We invite you to run your own benchmarks.
```
wikipedia 100k
ada ▏ 55 ms ███████▋
trurl ▏ 180 ms █████████████████████████
top100
ada ▏ 61 ms █████████████▍
trurl ▏ 113 ms █████████████████████████
```

0 comments on commit e166e55

Please sign in to comment.