Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OSX crash complaining of operation in progress in another thread when fork() was called #32499

Closed
elnur opened this issue Nov 2, 2017 · 16 comments
Labels
affects_2.4 This issue/PR affects Ansible v2.4 bug This issue/PR relates to a bug. macos macOS community support:core This issue/PR relates to code supported by the Ansible Engineering Team.

Comments

@elnur
Copy link

elnur commented Nov 2, 2017

ISSUE TYPE
  • Bug Report
COMPONENT NAME
  • lookup
ANSIBLE VERSION
ansible 2.4.1.0
  config file = /Users/elnur/proj/foo/ansible.cfg
  configured module search path = [u'/Users/elnur/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 2.7.14 (default, Oct  6 2017, 11:26:02) [GCC 4.2.1 Compatible Apple LLVM 9.0.0 (clang-900.0.37)]
CONFIGURATION
RETRY_FILES_ENABLED(/Users/elnur/proj/foo/ansible.cfg) = False

I have system variables set in .bash_profile:

export ANSIBLE_ETCD_URL=http://etcd.cloud.internal:2379
export ANSIBLE_ETCD_VERSION=v2
OS / ENVIRONMENT

Mac OS X High Sierra with latest updates. Ansible is installed using brew.

SUMMARY

Can't do an etcd lookup since upgrading to 2.4. Worked fine before 2.4.

STEPS TO REPRODUCE
- hosts: localhost
  gather_facts: yes

  pre_tasks:
    - debug:
        msg: "{{ lookup('etcd', 'deployment/environment') }}"
EXPECTED RESULTS
PLAY [localhost] **********************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
ok: [localhost]

TASK [debug] **************************************************************************************************************************
ok: [localhost] => {
    "changed": false,
    "msg": "development"
}

PLAY RECAP ****************************************************************************************************************************
localhost                  : ok=2    changed=0    unreachable=0    failed=0
ACTUAL RESULTS
PLAY [localhost] **********************************************************************************************************************

TASK [Gathering Facts] ****************************************************************************************************************
ok: [localhost]

TASK [debug] **************************************************************************************************************************
objc[22402]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[22402]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

Ansible just hangs after outputting that error message.

@ansibot ansibot added affects_2.4 This issue/PR affects Ansible v2.4 bug_report needs_triage Needs a first human triage before being processed. support:core This issue/PR relates to code supported by the Ansible Engineering Team. labels Nov 2, 2017
@nitzmahone
Copy link
Member

This is apparently due to some new security changes made in High Sierra that are breaking lots of Python things that use fork(). Rumor has it that adding
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES before your Ansible run should clear it up. The code that's causing issues is well below Ansible in the stack.

@nitzmahone nitzmahone removed the needs_triage Needs a first human triage before being processed. label Nov 2, 2017
@nitzmahone nitzmahone changed the title etcd lookup is broken OSX crash complaining of operation in progress in another thread when fork() was called Nov 2, 2017
@elnur
Copy link
Author

elnur commented Nov 3, 2017

@nitzmahone, the question is why it works fine when I revert to Ansible 2.3. Doesn't it mean it's a regression introduced in 2.4?

@elnur
Copy link
Author

elnur commented Nov 5, 2017

It'd be really nice to have this fixed because I need new stuff from 2.4 but I'm stuck with 2.3 because of the bug.

At least acknowledging it's an Ansible regression instead of blaming OS X would be a step in the right direction.

@nitzmahone
Copy link
Member

It doesn't appear to be an Ansible regression- 2.3.0 fails for us the exact same way on High Sierra. Are you sure you didn't upgrade to High Sierra between 2.3.x and 2.4.1 (since it was released right around that time)?

This is the best examination of the underlying issue we've found: https://blog.phusion.nl/2017/10/13/why-ruby-app-servers-break-on-macos-high-sierra-and-what-can-be-done-about-it/ - so far the bugs opened on Python core for this issue have been closed (eg https://bugs.python.org/issue30837). Regardless, the issue appears to be well outside the Ansible codebase, and the behavior doesn't appear to have changed in any recent releases, so I don't think there's much we can do.

@s8i
Copy link

s8i commented Nov 7, 2017

@nitzmahone export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES helped me get past the initial fork safety issue - thank you

Winrm has issue still
These Windows will ping without pinging via Ansible
Also these Windows machines would also ping via Ansible previous to my High Sierra update

$ ansible all -m ping
descartes | UNREACHABLE! => {
    "changed": false,
    "msg": "ssl: HTTPSConnectionPool(host='descartes', port=5986): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x108f7e510>: Failed to establish a new connection: [Errno 61] Connection refused',))",
    "unreachable": true
}
schrodinger | UNREACHABLE! => {
    "changed": false,
    "msg": "ssl: HTTPSConnectionPool(host='schrodinger', port=5986): Max retries exceeded with url: /wsman (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x108d20290>: Failed to establish a new connection: [Errno 61] Connection refused',))",
    "unreachable": true
}
 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

oracle | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
babbage | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
kitchenl1 | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

standup | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}
kitchenl2 | UNREACHABLE! => {
    "changed": false,
    "msg": "ssl: HTTPSConnectionPool(host='kitchenl2', port=5986): Max retries exceeded with url: /wsman (Caused by ConnectTimeoutError(<urllib3.connection.VerifiedHTTPSConnection object at 0x108f21b10>, 'Connection to kitchenl2 timed out. (connect timeout=30)'))",
    "unreachable": true
}
 [WARNING]: FATAL ERROR DURING FILE TRANSFER: Traceback (most recent call last):   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 276, in _winrm_exec
self._winrm_send_input(self.protocol, self.shell_id, command_id, data, eof=is_last)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-
packages/ansible/plugins/connection/winrm.py", line 256, in _winrm_send_input
protocol.send_message(xmltodict.unparse(rq))   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/protocol.py", line
207, in send_message     return self.transport.send_message(message)   File
"/usr/local/Cellar/ansible/2.4.1.0/libexec/lib/python2.7/site-packages/winrm/transport.py", line
202, in send_message     raise WinRMTransportError('http', error_message) WinRMTransportError:
(u'http', u'Bad HTTP response returned from server. Code 500')

shannon | FAILED! => {
    "failed": true,
    "msg": "winrm send_input failed"
}

@elnur
Copy link
Author

elnur commented Nov 9, 2017

It doesn't appear to be an Ansible regression- 2.3.0 fails for us the exact same way on High Sierra. Are you sure you didn't upgrade to High Sierra between 2.3.x and 2.4.1 (since it was released right around that time)?

brew lets me switch between different Ansible versions. So I switched back and forth between 2.3.0.0 and 2.4.1.0 and 2.3 works fine while 2.4 fails — both on High Sierra.

@elnur
Copy link
Author

elnur commented Nov 9, 2017

export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES before your Ansible run should clear it up.

That fixed it for me. I can finally use 2.4. Thanks. 🎉

@mikedlr
Copy link
Contributor

mikedlr commented Jan 8, 2018

Further comment: this is almost certainly a bug in the python which apple distributes. Apple has changed their fork implementation to disallow forking with active threads. Taking the latest python distributed from python.org seems to fix this error without needing to set the environment variable (tested on python 3.6).

andrewkroh added a commit to andrewkroh/beats-tester that referenced this issue Feb 7, 2018
Add OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES to the environment to
workaround an issue with running Ansible on MacOS 10.13 (High Sierra).

See ansible/ansible#32499
@ansibot ansibot added bug This issue/PR relates to a bug. and removed bug_report labels Mar 7, 2018
@scottsb
Copy link
Contributor

scottsb commented Jun 28, 2018

For anybody else who comes across this running Ansible from Vagrant, you can work around this by adding this hack to your Vagrantfile (can be placed anywhere due to how Vagrant works). This allows vagrant up/provision to be run without having to export the variable yourself (especially useful if supporting a team using the same Vagrant config, as I do):

# Temporary workaround to Python bug in macOS High Sierra which can break Ansible
# https://github.com/ansible/ansible/issues/34056#issuecomment-352862252
# This is an ugly hack tightly bound to the internals of Vagrant.Util.Subprocess, specifically
# the jailbreak method, self-described as "quite possibly, the saddest function in all of Vagrant."
# That in turn makes this assignment the saddest line in all of Vagrantfiles.
ENV["VAGRANT_OLD_ENV_OBJC_DISABLE_INITIALIZE_FORK_SAFETY"] = "YES"

@dmrzzz
Copy link

dmrzzz commented Sep 8, 2018

Taking the latest python distributed from python.org seems to fix this error without needing to set the environment variable (tested on python 3.6).

I'm admittedly way out of my depth here, but AFAICT the above isn't as reassuring as it seems. Discussion in https://bugs.python.org/issue33725 suggests that

  • the reason we don't get this error using the python.org installers is because those installers aren't (yet) being built with a 10.13 SDK

  • the ultimate resolution is for framework and application code to either avoid or be very careful with os.fork()

If that's true, maybe this should be solvable in the Ansible codebase after all?

FWIW, I do see the error using a shiny new Python 3.7.0 from homebrew (unless I apply the environment variable workaround).

@mikedlr
Copy link
Contributor

mikedlr commented Sep 14, 2018

Trying to remember what this bug is about I came across this article which gives a good explanation.

Essentially, to safely support threaded applications we should eliminate all calls to fork except a) when immediately followed by an exec() b) at the very start of a program when creating a special worker which can then do subsequent forks but which never runs threads. Unfortunately "threaded applications" means all applications because there are now many libraries which privately embed threading.

donovanmuller added a commit to absa-subatomic-legacy/local-hadron-collider that referenced this issue Sep 25, 2018
* Verified working on OpenShift 3.10 cluster
* Added workaround for ansible/ansible#32499
* Tweaked Bitucket resource usage
@Sohair63
Copy link

I upgraded my system to Mojave

  • rails - 5.1.3
  • ruby - 2.3.3

I am getting this error which is not resolved by export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES

2.3.3 :002 > User.first
objc[56844]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called.
objc[56844]: +[__NSPlaceholderDictionary initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

exited rails console

@swaathi
Copy link

swaathi commented Dec 7, 2018

The export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES fix does not work in Mojave.

Did you get it to work @Sohair63?

@mikedlr
Copy link
Contributor

mikedlr commented Dec 7, 2018

@swaathi please could you open a new bug report with full information - especially all software versions (MacOS,Python,Ansible etc), the exact playbook you are using that triggers the crash and exactly how you run it. I'll upgrade soon and then attempt to debug if I can reproduce. Put a link to your new bug from #49207

N.B. you have to set the environment variable in the shell before you launch the ansible process - it's not enough to set it in the ansible process. That has occasionally been a problem with debugging this situation.

@Sohair63
Copy link

Sohair63 commented Dec 7, 2018

@swaathi Yes it is working fine for me now, but not sure how it got fixed.
maybe recent updates of Xcode fixed it.

@ghost
Copy link

ghost commented Dec 7, 2018

since I upgraded my system to Mojave
export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES fix was working I had to put it through more than once though, but now it completely stopped working.
rails -5.2.1
ruby -2.3.3

@ansible ansible locked and limited conversation to collaborators Dec 7, 2018
@dagwieers dagwieers added the macos macOS community label Jan 27, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
affects_2.4 This issue/PR affects Ansible v2.4 bug This issue/PR relates to a bug. macos macOS community support:core This issue/PR relates to code supported by the Ansible Engineering Team.
Projects
None yet
Development

No branches or pull requests

10 participants