Bringing Tailscale up fails #43

jjo93sa · 2020-10-04T12:54:00Z

Describe the bug
Been using this role for a while. Recently, I guess something in Tailscale changed, because there's now a problem bringing up the Tailscale connection, which fails every time (I'm using tailscale_args to set subnet routes).

To Reproduce
Steps to reproduce the behavior:

Go to use this role to install Tailscale
See failure at start-up

Expected behavior
Tailscale should start.

Screenshots
N/A.

Desktop (please complete the following information):

OS: Ubuntu 18.04, 20.0

Additional context
I think Tailscale introduced extra logging as part of "tailscale up", such that the test

tailscale_status.stdout | length == 0

fails, even when the application has successfully started. As a work around, I've made the test >=, but this is clearly a hack.

The text was updated successfully, but these errors were encountered:

artis3n · 2020-10-04T18:40:37Z

Thanks for this issue - I noticed this recently as well and haven't had time to look into it. Something has definitely changed and the role needs to be updated.

artis3n · 2020-10-05T20:29:42Z

tailscale up seems to always pass in CI, but I have encountered similar consistent failures on new bare metal hosts using this role recently. That is unfortunate. Probably won't happen until this weekend, but I'll spin up a bunch of spot instances and see if I can nail down the exact failing assumption and make it reproducible for testing.

jjo93sa · 2020-10-05T20:35:41Z

I was going to try the same. Might get around to it tomorrow. I tried repeating the command ‘tailscale up’ on a host where it had already been executed and it didn’t seem to generate any output. I suspect that it is only on 1st run where output is causing a problem. At least that’s my working theory 60 % of the time, it works every time

…

On 5 Oct 2020, at 21:29, Ari Kalfus ***@***.***> wrote: tailscale up seems to always pass in CI, but I have encountered similar consistent failures on new bare metal hosts using this role recently. That is unfortunate. Probably won't happen until this weekend, but I'll spin up a bunch of spot instances and see if I can nail down the exact failing assumption and make it reproducible for testing. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

artis3n · 2020-10-06T02:39:54Z

If the machine is already connected, tailscale up returns exit code 0 with no stdout content. If --authkey has some issue during that first call, then tailscale up throws an OAuth URL to open in your browser and waits for that out of band auth to succeed. I am guessing that --authkey is perhaps not formatting correctly anymore and then the task fails timing out for the OAuth grant. This is a guess I need to isolate and reproduce.

jjo93sa · 2020-10-08T14:55:22Z

I've tried to repeat the Tailscale install using this role on a clean VM. When tailscale status is issued within task "Check if Tailscale is connected", I get the following logging in Ansible's debug output:

"stdout": "[L+V9o] tx= 0 rx= 0 10.10.20.251:41641, 172.17.0.1:41641", "stdout_lines": [ "[L+V9o] tx= 0 rx= 0 10.10.20.251:41641, 172.17.0.1:41641"

In the following task, "Bring Tailscale Up" there is the original test I described:

when: tailscale_status.stdout | length == 0

And this is what is causing this task to be skipped:

skipping: [gargantua] => { "changed": false, "skip_reason": "Conditional result was False" }

Thus, whatever test we apply to the stdout string captured in the "Check if Tailscale is connected" task needs to take into account that the new Tailscale versions output text upon success where once they didn't? From the output I see it is difficult to determine what content in the output from tailscale status indicates success; although I do note that the exit code when running it manually is 0; perhaps Tailscale is a good citizen and uses non-zero return codes to indicate failure, certainly seems to be from my (very inexperienced) understanding of the CLI code

I've therefore opted (in the interim) to use the following conditional, which is tested to work:

when: tailscale_status.rc == 0

HTH

artis3n · 2020-10-08T17:16:33Z

That is helpful, thank you! I have not had a chance to test myself yet.

I did file this against Tailscale a few months ago and I was seeing exit code 0 in all circumstances, so unless that has changed I do not think we can rely on exit code status. I think we'll be able to write a less brittle conditional check, though. Hoping for some time this weekend to dig into it.

jjo93sa · 2020-10-08T17:35:26Z

I definitely saw non-zero return codes from `tailscale status` so perhaps things have moved on since you raised that issue? Anyway, do let me know if there’s something I can help with 60 % of the time, it works every time

…

On 8 Oct 2020, at 18:16, Ari Kalfus ***@***.***> wrote: That is helpful, thank you! I have not had a chance to test myself yet. I did file this against Tailscale a few months ago and I was seeing exit code 0 in all circumstances, so unless that has changed I do not think we can rely on exit code status. I think we'll be able to write a less brittle conditional check, though. Hoping for some time this weekend to dig into it. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

artis3n · 2020-10-11T21:23:04Z

Brand new Ubuntu 20.04 AMI:

So the when conditional is still accurate. However:

ubuntu@ip-172-31-87-21:~$ sudo tailscale up --help
USAGE
  up [flags]

"tailscale up" connects this machine to your Tailscale network,
triggering authentication if necessary.

The flags passed to this command are specific to this machine. If you don't
specify any flags, options are reset to their default.

FLAGS
  -accept-dns true                           accept DNS configuration from the admin panel
  -accept-routes false                       accept routes advertised by other Tailscale nodes
  -advertise-routes ...                      routes to advertise to other nodes (comma-separated, e.g. 10.0.0.0/8,192.168.0.0/24)
  -advertise-tags ...                        ACL tags to request (comma-separated, e.g. eng,montreal,ssh)
  -authkey ...                               node authorization key
  -enable-derp true                          enable the use of DERP servers
  -host-routes true                          install host routes to other Tailscale nodes
  -hostname ...                              hostname to use instead of the one provided by the OS
  -login-server https://login.tailscale.com  base URL of control server
  -netfilter-mode on                         netfilter mode (one of on, nodivert, off)
  -shields-up false                          don't allow incoming connections
  -snat-subnet-routes true                   source NAT traffic to local routes advertised with -advertise-routes

The syntax for the flags has changed. Instead of --authkey it is -authkey...

But that doesn't seem to matter, I can successfully auth with --

However with both uses of authkey, putting in an invalid auth key hangs the process until I manually quit (or, I am supposing, Ansible times out).

Now on to debugging the Ansible role directly on an instance and see what's going on. The commands run by the role should be working correctly, unless some formatting issue appeared out of nowhere with the authkey variable. I think that is unlikely.

Can you elaborate on the circumstances where you saw non-zero return codes from tailscale status?

artis3n · 2020-10-11T21:24:42Z

When tailscale status is issued within task "Check if Tailscale is connected", I get the following logging in Ansible's debug output

The output from that command as you describe it is an already authenticated Tailscale node, so the role correctly skips running up.

artis3n · 2020-10-11T21:37:55Z

Testing steps:

Ubuntu -

sudo apt install python3 python3-pip ansible
ansible-galaxy install artis3n.tailscale

Copy molecule/default/converge.yml and modify the tailscale_auth_key appropriately
Set hosts: localhost and connection:local on the playbook

Amazon Linux 2:

sudo yum install python python-pip
pip install ansible
ansible-galaxy install artis3n.tailscale

Copy molecule/default/converge.yml and modify the tailscale_auth_key appropriately
Set hosts: localhost and connection:local on the playbook

Could not reproduce on Ubuntu 20.04 or Amazon Linux 2. The role successfully auth'd and connected the machine from a blank slate. I experienced an issue on PopOS 20.04 on a personal host but didn't dig into it. Can't reproduce that now.

jjo93sa · 2020-10-12T11:15:36Z

The logging I provided in #43 (comment) was generated on a fresh Ubuntu 18.04 installation running on a KVM host, using the this role executed as part of a playbook run with verbose logging. I could try it again, but prevent everything from the tail scale status to the end, and run those final steps manually to see if there's any logging/difference.

jjo93sa · 2020-10-12T17:28:59Z

OK, I tried this again. I span-up a brand new Ubuntu 20.04.1 KVM guest. Applied tag never to tasks "Check if Tailscale is connected" and "Bring Tailscale Up" in the role, and these were not executed by Ansible. Logged into the VM, and generated this output by manually executing the tasks:

This shows the logging that I see with tailscale status. It also shows the rc is 0.

So, now I run the tailscale up command with my authkey, and then status again.

Confirmed that the link is up: I can ping help.ipn.dev AOK. For reference,

ansible@gargantua:~$ sudo tailscale version
1.1.527-gf4f1e2e09

This experience is consistent for me on 18.04 and 20.04, both as KVM guests and on Raspberry Pi "bare metal" installs.

Let me know if any other testing would be useful.

artis3n · 2020-10-12T17:41:35Z

Do the manual setup commands for Tailscale work in your virtualized Ubuntu host? I will try the role against an Ubuntu VM tonight. This role does not yet support Raspberry Pi.

jjo93sa · 2020-10-12T17:59:49Z

That last bunch of screenshots was manually executed commands on my Ubuntu guests, after the Tailscale role installed the package. Everything works fine, but you’ll note the logging from the status command. For some reason you get no logging from the status command, but using the same O/S, I get that one line of output. That’s what is consistently borking my execution of the role. I’ve had no trouble running this role on Raspberry Pi with Ubuntu. In fact you can see in screenshot there is a node “tpin1” which is a Raspberry Pi CM3+ running Ubuntu 20.04 for ARM. All my VMs are built from scratch using ISOs downloaded directly from Canonical; there’s nothing custome about them. If the purpose of running the status command after installation is to check the daemon is running, perhaps there’s a better way? 60 % of the time, it works every time

…

On 12 Oct 2020, at 18:41, Ari Kalfus ***@***.***> wrote: Do the manual setup commands for Tailscale work in your virtualized Ubuntu host? I will try the role against an Ubuntu VM tonight. This role does not yet support Raspberry Pi. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

artis3n · 2020-10-12T19:56:00Z

The purpose of running the status command is to check whether you have authenticated to Tailscale - so I don't think an alternate command would be better. I will try to reproduce on an Ubuntu VM. The return code is always 0 for status, so I can't use that. If needed, I can more intelligently regex on the status stdout to check whether the server is authenticated to Tailscale, but I'd really like to nail down this as a reproducible case on my end to make that happen correctly.

jjo93sa · 2020-10-12T20:09:36Z

I’d have thought authentication to Tailscale wouldn’t have happened until the `tailscale up` command, when we supply the authkey? Does the call to status need to come after the up command, not before? 60 % of the time, it works every time

…

On 12 Oct 2020, at 20:56, Ari Kalfus ***@***.***> wrote: The purpose of running the status command is to check whether you have authenticated to Tailscale - so I don't think an alternate command would be better. I will try to reproduce on an Ubuntu VM. The return code is always 0 for status, so I can't use that. If needed, I can more intelligently regex on the status stdout to check whether the server is authenticated to Tailscale, but I'd really like to nail down this as a reproducible case on my end to make that happen correctly. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

artis3n · 2020-10-12T21:25:24Z

That check is for idempotency - to not attempt to re-authenticate if the node is already authenticated.

jjo93sa · 2020-10-13T08:15:25Z

Ah, yes, I knew that, sorry.

I'm checking through the Tailscale CLI status, and this commit is where I think the output was changed. In particular there's these lines added:

if statusArgs.self && st.Self != nil {
  printPS(st.Self)
}

With comment: "cmd/tailscale: add local node's information to status output (by default)"

artis3n · 2020-10-21T14:14:03Z

Can you pass a copy of the way you invoke the role? e.g. to match

there's now a problem bringing up the Tailscale connection, which fails every time (I'm using tailscale_args to set subnet routes)

jjo93sa · 2020-10-22T03:15:02Z

This is what I have at the moment.

- name: Task 4 - Install Tailscale                                                                                                            
  include_role:                                                                                                                               
    name: ansible-role-tailscale                                                                                                              
  vars:                                                                                                                                       
    release_stability: stable                                                                                                               
    tailscale_args: "--accept-routes=false --advertise-routes={{ tailscale_subnets | join(',') }}"                                            
    tailscale_auth_key: !vault |                                                                                                              
              $ANSIBLE_VAULT;1.1;AES256                                                                                                       
              35363...6361                                                             
  tags: [ tailscale, always ]

artis3n · 2020-10-24T19:31:10Z

This is my output on a guest Ubuntu 20.04 VM.

artis3n · 2020-10-24T19:32:05Z

---
- name: Test
  hosts: localhost
  connection: local
  tasks:
    - name: "Include artis3n.tailscale"
      include_role:
        name: artis3n.tailscale
      vars:
        tailscale_auth_key: !vault |
            $ANSIBLE_VAULT;1.2;AES256;tailscale
           ....

artis3n · 2020-10-24T20:03:15Z

Similar success with (reverted to clean snapshot):

---
- name: Test
  hosts: localhost
  connection: local
  tasks:
    - name: "Include artis3n.tailscale"
      include_role:
        name: artis3n.tailscale
      vars:
        tailscale_args: "--accept-routes=false --advertise-routes=10.0.0.0/24,10.0.1.0/24"
        tailscale_auth_key: !vault |
            $ANSIBLE_VAULT;1.2;AES256;tailscale
            ....

artis3n · 2020-10-25T01:34:15Z

Try running ansible-galaxy list and ensure you are using version v1.6.1 of this role

jjo93sa · 2020-10-25T09:16:32Z

Thanks for your help on this one. I've also been trying some experiments. I created a new Ubuntu 20.04 x86_64 VM under VMware Fusion. I executed the installation commands from the Tailscale "Getting Started" page.

Using the stable branch, I get the same result as you: no logging upon execution of tailscale status. The version of Tailscale was 1.0.5 (IIRC)

Using the unstable branch, I get the logging as described earlier (status command result doesn't depend on sudo):

So, it might seem that this is the solution to the problem. The only concern I have is that I'm sure I've tried both stable and unstable branches using the Ansible role, and experienced the problem in both cases. I guess that's the next thing to test? However, if the unstable branch is likely to migrate to stable, it might be worth addressing that in this role ahead of time?

artis3n · 2020-10-25T14:13:23Z

I am not sure why you are seeing that on the stable branch when executing the role, but I'm not able to reproduce that.

But if this is on the unstable branch, this is a good heads up that a fix will be needed. I will play with that but I'd rather see it become behavior on the stable branch first before I invest a lot of time in resolving it. This behavior may not make it into stable.

artis3n · 2020-10-30T20:47:55Z

Aaand tailscale 1.2.0 is released so let's see what happens with renewed testing

artis3n · 2020-11-01T23:24:43Z

1.2.0 doesn't appear to have broken the role, so I still cannot reproduce this. Going to leave open if you manage to identify why you are seeing this behavior. FWIW the VM testing I did with Ubuntu 20.04 was with VMWare Workstation 15.x.

artis3n · 2020-11-07T15:27:22Z

Got it! On Tailscale 1.2.2 I am seeing the behavior in this issue. So it did make its way from unstable to stable.

TASK [artis3n.tailscale : Tailscale Version] ***********************************
    ok: [instance] => {
        "tailscale_version.stdout": "1.2.2\n  tailscale commit: 76c2982d8832b9a70305a24abcc600486e39b523\n  go version: go1.15.4"
    }
    
    TASK [artis3n.tailscale : Bring Tailscale Up] **********************************
    skipping: [instance]
    
    TASK [artis3n.tailscale : Print Status if Tailscale Up Is Skipped - Please Include in GitHub Issue] ***
    ok: [instance] => {
        "tailscale_status": {
            "changed": false,
            "cmd": [
                "tailscale",
                "status"
            ],
            "delta": "0:00:00.006949",
            "end": "2020-11-07 15:25:24.602192",
            "failed": false,
            "rc": 0,
            "start": "2020-11-07 15:25:24.595243",
            "stderr": "",
            "stderr_lines": [],
            "stdout": "[L+V9o]                                            tx=       0 rx=       0       ",
            "stdout_lines": [
                "[L+V9o]                                            tx=       0 rx=       0       "
            ]
        }
    }

Fixes #43

artis3n · 2020-11-07T19:26:21Z

Merging the PR auto-closed this issue - v1.8.0 is now on Ansible Galaxy. That version should fix this issue. Re-open if that is not the case

jjo93sa added the bug:needs-reproduction A reported bug that needs to be confirmed and reproduced. label Oct 4, 2020

artis3n added bug This bug is confirmed and can be reproduced. and removed bug:needs-reproduction A reported bug that needs to be confirmed and reproduced. labels Oct 4, 2020

artis3n added the hacktoberfest-accepted label Oct 11, 2020

artis3n added bug:needs-reproduction A reported bug that needs to be confirmed and reproduced. and removed bug This bug is confirmed and can be reproduced. labels Oct 11, 2020

artis3n removed the hacktoberfest-accepted label Nov 1, 2020

artis3n added a commit that referenced this issue Nov 7, 2020

Correctly verifies state with new status output

614ae1e

Fixes #43

artis3n closed this as completed in 795afd4 Nov 7, 2020

artis3n mentioned this issue Nov 11, 2020

Fix Arch container, Fix Breaking Change to 'tailscale status' #59

Merged

artis3n mentioned this issue May 4, 2023

Workflow and testing improvements #328

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bringing Tailscale up fails #43

Bringing Tailscale up fails #43

jjo93sa commented Oct 4, 2020

artis3n commented Oct 4, 2020

artis3n commented Oct 5, 2020

jjo93sa commented Oct 5, 2020 via email

artis3n commented Oct 6, 2020 •

edited

Loading

jjo93sa commented Oct 8, 2020

artis3n commented Oct 8, 2020

jjo93sa commented Oct 8, 2020 via email

artis3n commented Oct 11, 2020

artis3n commented Oct 11, 2020

artis3n commented Oct 11, 2020 •

edited

Loading

jjo93sa commented Oct 12, 2020

jjo93sa commented Oct 12, 2020

artis3n commented Oct 12, 2020 •

edited

Loading

jjo93sa commented Oct 12, 2020 via email

artis3n commented Oct 12, 2020

jjo93sa commented Oct 12, 2020 via email

artis3n commented Oct 12, 2020

jjo93sa commented Oct 13, 2020

artis3n commented Oct 21, 2020

jjo93sa commented Oct 22, 2020

artis3n commented Oct 24, 2020

artis3n commented Oct 24, 2020

artis3n commented Oct 24, 2020 •

edited

Loading

artis3n commented Oct 25, 2020

jjo93sa commented Oct 25, 2020

artis3n commented Oct 25, 2020

artis3n commented Oct 30, 2020 •

edited

Loading

artis3n commented Nov 1, 2020

artis3n commented Nov 7, 2020

artis3n commented Nov 7, 2020 •

edited

Loading

Bringing Tailscale up fails #43

Bringing Tailscale up fails #43

Comments

jjo93sa commented Oct 4, 2020

artis3n commented Oct 4, 2020

artis3n commented Oct 5, 2020

jjo93sa commented Oct 5, 2020 via email

artis3n commented Oct 6, 2020 • edited Loading

jjo93sa commented Oct 8, 2020

artis3n commented Oct 8, 2020

jjo93sa commented Oct 8, 2020 via email

artis3n commented Oct 11, 2020

artis3n commented Oct 11, 2020

artis3n commented Oct 11, 2020 • edited Loading

jjo93sa commented Oct 12, 2020

jjo93sa commented Oct 12, 2020

artis3n commented Oct 12, 2020 • edited Loading

jjo93sa commented Oct 12, 2020 via email

artis3n commented Oct 12, 2020

jjo93sa commented Oct 12, 2020 via email

artis3n commented Oct 12, 2020

jjo93sa commented Oct 13, 2020

artis3n commented Oct 21, 2020

jjo93sa commented Oct 22, 2020

artis3n commented Oct 24, 2020

artis3n commented Oct 24, 2020

artis3n commented Oct 24, 2020 • edited Loading

artis3n commented Oct 25, 2020

jjo93sa commented Oct 25, 2020

artis3n commented Oct 25, 2020

artis3n commented Oct 30, 2020 • edited Loading

artis3n commented Nov 1, 2020

artis3n commented Nov 7, 2020

artis3n commented Nov 7, 2020 • edited Loading

artis3n commented Oct 6, 2020 •

edited

Loading

artis3n commented Oct 11, 2020 •

edited

Loading

artis3n commented Oct 12, 2020 •

edited

Loading

artis3n commented Oct 24, 2020 •

edited

Loading

artis3n commented Oct 30, 2020 •

edited

Loading

artis3n commented Nov 7, 2020 •

edited

Loading