-
-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bringing Tailscale up fails #43
Comments
Thanks for this issue - I noticed this recently as well and haven't had time to look into it. Something has definitely changed and the role needs to be updated. |
|
I was going to try the same. Might get around to it tomorrow. I tried repeating the command ‘tailscale up’ on a host where it had already been executed and it didn’t seem to generate any output. I suspect that it is only on 1st run where output is causing a problem. At least that’s my working theory
60 % of the time, it works every time
… On 5 Oct 2020, at 21:29, Ari Kalfus ***@***.***> wrote:
tailscale up seems to always pass in CI, but I have encountered similar consistent failures on new bare metal hosts using this role recently. That is unfortunate. Probably won't happen until this weekend, but I'll spin up a bunch of spot instances and see if I can nail down the exact failing assumption and make it reproducible for testing.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
If the machine is already connected, |
I've tried to repeat the Tailscale install using this role on a clean VM. When
In the following task, "Bring Tailscale Up" there is the original test I described:
And this is what is causing this task to be skipped:
Thus, whatever test we apply to the stdout string captured in the "Check if Tailscale is connected" task needs to take into account that the new Tailscale versions output text upon success where once they didn't? From the output I see it is difficult to determine what content in the output from I've therefore opted (in the interim) to use the following conditional, which is tested to work:
HTH |
That is helpful, thank you! I have not had a chance to test myself yet. I did file this against Tailscale a few months ago and I was seeing exit code 0 in all circumstances, so unless that has changed I do not think we can rely on exit code status. I think we'll be able to write a less brittle conditional check, though. Hoping for some time this weekend to dig into it. |
I definitely saw non-zero return codes from `tailscale status` so perhaps things have moved on since you raised that issue? Anyway, do let me know if there’s something I can help with
60 % of the time, it works every time
… On 8 Oct 2020, at 18:16, Ari Kalfus ***@***.***> wrote:
That is helpful, thank you! I have not had a chance to test myself yet.
I did file this against Tailscale a few months ago and I was seeing exit code 0 in all circumstances, so unless that has changed I do not think we can rely on exit code status. I think we'll be able to write a less brittle conditional check, though. Hoping for some time this weekend to dig into it.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
Brand new Ubuntu 20.04 AMI: So the when conditional is still accurate. However: ubuntu@ip-172-31-87-21:~$ sudo tailscale up --help
USAGE
up [flags]
"tailscale up" connects this machine to your Tailscale network,
triggering authentication if necessary.
The flags passed to this command are specific to this machine. If you don't
specify any flags, options are reset to their default.
FLAGS
-accept-dns true accept DNS configuration from the admin panel
-accept-routes false accept routes advertised by other Tailscale nodes
-advertise-routes ... routes to advertise to other nodes (comma-separated, e.g. 10.0.0.0/8,192.168.0.0/24)
-advertise-tags ... ACL tags to request (comma-separated, e.g. eng,montreal,ssh)
-authkey ... node authorization key
-enable-derp true enable the use of DERP servers
-host-routes true install host routes to other Tailscale nodes
-hostname ... hostname to use instead of the one provided by the OS
-login-server https://login.tailscale.com base URL of control server
-netfilter-mode on netfilter mode (one of on, nodivert, off)
-shields-up false don't allow incoming connections
-snat-subnet-routes true source NAT traffic to local routes advertised with -advertise-routes The syntax for the flags has changed. Instead of But that doesn't seem to matter, I can successfully auth with However with both uses of Now on to debugging the Ansible role directly on an instance and see what's going on. The commands run by the role should be working correctly, unless some formatting issue appeared out of nowhere with the authkey variable. I think that is unlikely. Can you elaborate on the circumstances where you saw non-zero return codes from |
The output from that command as you describe it is an already authenticated Tailscale node, so the role correctly skips running |
Testing steps: Ubuntu - sudo apt install python3 python3-pip ansible
ansible-galaxy install artis3n.tailscale
Amazon Linux 2: sudo yum install python python-pip
pip install ansible
ansible-galaxy install artis3n.tailscale
Could not reproduce on Ubuntu 20.04 or Amazon Linux 2. The role successfully auth'd and connected the machine from a blank slate. I experienced an issue on PopOS 20.04 on a personal host but didn't dig into it. Can't reproduce that now. |
The logging I provided in #43 (comment) was generated on a fresh Ubuntu 18.04 installation running on a KVM host, using the this role executed as part of a playbook run with verbose logging. I could try it again, but prevent everything from the tail scale status to the end, and run those final steps manually to see if there's any logging/difference. |
OK, I tried this again. I span-up a brand new Ubuntu 20.04.1 KVM guest. Applied tag never to tasks "Check if Tailscale is connected" and "Bring Tailscale Up" in the role, and these were not executed by Ansible. Logged into the VM, and generated this output by manually executing the tasks: This shows the logging that I see with So, now I run the Confirmed that the link is up: I can ping help.ipn.dev AOK. For reference, ansible@gargantua:~$ sudo tailscale version
1.1.527-gf4f1e2e09 This experience is consistent for me on 18.04 and 20.04, both as KVM guests and on Raspberry Pi "bare metal" installs. Let me know if any other testing would be useful. |
Do the manual setup commands for Tailscale work in your virtualized Ubuntu host? I will try the role against an Ubuntu VM tonight. This role does not yet support Raspberry Pi. |
That last bunch of screenshots was manually executed commands on my Ubuntu guests, after the Tailscale role installed the package. Everything works fine, but you’ll note the logging from the status command. For some reason you get no logging from the status command, but using the same O/S, I get that one line of output. That’s what is consistently borking my execution of the role.
I’ve had no trouble running this role on Raspberry Pi with Ubuntu. In fact you can see in screenshot there is a node “tpin1” which is a Raspberry Pi CM3+ running Ubuntu 20.04 for ARM.
All my VMs are built from scratch using ISOs downloaded directly from Canonical; there’s nothing custome about them.
If the purpose of running the status command after installation is to check the daemon is running, perhaps there’s a better way?
60 % of the time, it works every time
… On 12 Oct 2020, at 18:41, Ari Kalfus ***@***.***> wrote:
Do the manual setup commands for Tailscale work in your virtualized Ubuntu host? I will try the role against an Ubuntu VM tonight. This role does not yet support Raspberry Pi.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
The purpose of running the status command is to check whether you have authenticated to Tailscale - so I don't think an alternate command would be better. I will try to reproduce on an Ubuntu VM. The return code is always 0 for status, so I can't use that. If needed, I can more intelligently regex on the status stdout to check whether the server is authenticated to Tailscale, but I'd really like to nail down this as a reproducible case on my end to make that happen correctly. |
I’d have thought authentication to Tailscale wouldn’t have happened until the `tailscale up` command, when we supply the authkey? Does the call to status need to come after the up command, not before?
60 % of the time, it works every time
… On 12 Oct 2020, at 20:56, Ari Kalfus ***@***.***> wrote:
The purpose of running the status command is to check whether you have authenticated to Tailscale - so I don't think an alternate command would be better. I will try to reproduce on an Ubuntu VM. The return code is always 0 for status, so I can't use that. If needed, I can more intelligently regex on the status stdout to check whether the server is authenticated to Tailscale, but I'd really like to nail down this as a reproducible case on my end to make that happen correctly.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub, or unsubscribe.
|
That check is for idempotency - to not attempt to re-authenticate if the node is already authenticated. |
Ah, yes, I knew that, sorry. I'm checking through the Tailscale CLI status, and this commit is where I think the output was changed. In particular there's these lines added: if statusArgs.self && st.Self != nil {
printPS(st.Self)
} With comment: "cmd/tailscale: add local node's information to status output (by default)" |
Can you pass a copy of the way you invoke the role? e.g. to match
|
This is what I have at the moment. - name: Task 4 - Install Tailscale
include_role:
name: ansible-role-tailscale
vars:
release_stability: stable
tailscale_args: "--accept-routes=false --advertise-routes={{ tailscale_subnets | join(',') }}"
tailscale_auth_key: !vault |
$ANSIBLE_VAULT;1.1;AES256
35363...6361
tags: [ tailscale, always ] |
---
- name: Test
hosts: localhost
connection: local
tasks:
- name: "Include artis3n.tailscale"
include_role:
name: artis3n.tailscale
vars:
tailscale_auth_key: !vault |
$ANSIBLE_VAULT;1.2;AES256;tailscale
.... |
Similar success with (reverted to clean snapshot): ---
- name: Test
hosts: localhost
connection: local
tasks:
- name: "Include artis3n.tailscale"
include_role:
name: artis3n.tailscale
vars:
tailscale_args: "--accept-routes=false --advertise-routes=10.0.0.0/24,10.0.1.0/24"
tailscale_auth_key: !vault |
$ANSIBLE_VAULT;1.2;AES256;tailscale
.... |
Try running |
Thanks for your help on this one. I've also been trying some experiments. I created a new Ubuntu 20.04 x86_64 VM under VMware Fusion. I executed the installation commands from the Tailscale "Getting Started" page. Using the stable branch, I get the same result as you: no logging upon execution of Using the unstable branch, I get the logging as described earlier (status command result doesn't depend on sudo): So, it might seem that this is the solution to the problem. The only concern I have is that I'm sure I've tried both stable and unstable branches using the Ansible role, and experienced the problem in both cases. I guess that's the next thing to test? However, if the unstable branch is likely to migrate to stable, it might be worth addressing that in this role ahead of time? |
I am not sure why you are seeing that on the stable branch when executing the role, but I'm not able to reproduce that. But if this is on the unstable branch, this is a good heads up that a fix will be needed. I will play with that but I'd rather see it become behavior on the stable branch first before I invest a lot of time in resolving it. This behavior may not make it into stable. |
Aaand tailscale 1.2.0 is released so let's see what happens with renewed testing |
1.2.0 doesn't appear to have broken the role, so I still cannot reproduce this. Going to leave open if you manage to identify why you are seeing this behavior. FWIW the VM testing I did with Ubuntu 20.04 was with VMWare Workstation 15.x. |
Got it! On Tailscale 1.2.2 I am seeing the behavior in this issue. So it did make its way from unstable to stable.
|
Merging the PR auto-closed this issue - |
Describe the bug
Been using this role for a while. Recently, I guess something in Tailscale changed, because there's now a problem bringing up the Tailscale connection, which fails every time (I'm using tailscale_args to set subnet routes).
To Reproduce
Steps to reproduce the behavior:
Expected behavior
Tailscale should start.
Screenshots
N/A.
Desktop (please complete the following information):
Additional context
I think Tailscale introduced extra logging as part of "tailscale up", such that the test
tailscale_status.stdout | length == 0
fails, even when the application has successfully started. As a work around, I've made the test >=, but this is clearly a hack.
The text was updated successfully, but these errors were encountered: