Heisenbug or the true story of undefined behavior, part 1
22 Dec 2019 - Development Original link
One day I was developing a program to work with a python source code. A kind of analyzer for the static code analysis. I found a library Redbaron
and wanted conduct some research to understand its capabilities.No so fast cowboy
I wrote a lot of python code in Pycharm
created by the JetBrains
. After my tests via pytest
I’ve got the first bug. I’ve got the same problem like in #119 (Really, Is it a problem of IDE by the JetBrains?).
After debugging, I found some solutions, patches, and hacks.
Next, I’m telling you, dear reader, a story about a difference between hacks, patches, and solutions.
And Yeah, It hasn’t been a problem of IDE.
Let me start from excerpts of the code.
This part of the code for the version 0.6.1 that’s contained in pip:
This part of the code for the version of master in git:
In master branch, that problem didn’t appear, but the bug was not solved. It wasn’t appearing because last changes have double-negative condition:
- If RedBaron was running under debug then in repr method of Node class the condition if in_a_shell() is True and data flow was passing into str().
That’s true for thepython shell
andipython
shell too. And that’s why the bug was not occurring in the shell.
- If RedBaron was running without debug mode and not in the shell then the bug didn’t occur because in log function ‘redbaron.DEBUG` parameter is False. And repr function was not called. String representation of the object was not caused by the caller.
That’s right for the master branch. But for 0.6.1 it isn’t true.
Look carefully at this code in 0.6.1 and in master branch:
- the difference was in one comma in call of log function.
The cost of this bug was in one comma.=) But it is actually still deeper, check.
This problem occurs in 0.6.1 because in this code python tries to get a string before it calls a log function. It calls a repr function implicitly. In repr function in_a_shell()
was returned False for Pycharm or something else that’s not a python shell or ipython shell.
Next, it is trying to get a path :
https://github.com/PyCQA/redbaron/blob/0.6.1/redbaron/base_nodes.py#L951,
and, after a few intermediate calls, stops here:
https://github.com/PyCQA/redbaron/blob/master/redbaron/base_nodes.py#L121
, to get an index of the node in the parent node list, but a tree has not
synchronized yet with last changes:
https://github.com/PyCQA/redbaron/blob/0.6.1/redbaron/base_nodes.py#L1333.
Look, here we have already done insertion of the code. Node.data
includes insertion, but node.node_list
does not. Parent points to node_list
but node_list
doesn’t have a new insertion. Index
method raises the exception - ValueError(https://docs.python.org/2/tutorial/datastructures.html) because there is no such item.
Look at excerpt from https://github.com/PyCQA/redbaron/blob/master/redbaron/base_nodes.py#L121:
pos = parent.index(node.node_list if isinstance(node, ProxyList) else node)
for parent.index(node)
occurred ValueError
and message of this error calls repr again to get a representation of the object, I mean, that’s call is being called again and would try to get path and
index again. Because the Tree is not being synchronized before log call.
Yeap, Pycharm(py.test) tells about it =)
Let’s start the show!
- First solution(fast hack)write something like this into https://github.com/PyCQA/redbaron/blob/master/redbaron/utils.py#L33:
if os.getenv('PYCHARM_HOSTED'):
if os.getenv('IPYTHONENABLE'):
print("IPYTHONENABLE %s" % os.getenv('IPYTHONENABLE'))
print("PYCHARM_HOSTED : {0}".format(os.getenv('PYCHARM_HOSTED')))
if not os.getenv('PYCHARM_HOSTED'):
if __IPYTHON__:
print("IPYTHOSHELL %s" % __IPYTHON__)
I’ve tested this code and that’s why it looks how it looks to avoid exceptions and misunderstanding. But it was a hack. And maybe it was not useful.
- Second solution(patch)add
try, except
block to catchValueError
if an element cannot be found in the list https://github.com/PyCQA/redbaron/blob/master/redbaron/base_nodes.py#L121:
if isinstance(parent, NodeList):
try:
pos = parent.index(node.node_list if isinstance(node, ProxyList) else node)
except Exception:
return 0
In this part, we catch exception forValueError
inUserList
or another unknown exception(butUserList.index
raisesValueError
like a list). This is the hack too and code is not clean and clear. And this ain’t cool.
- Third solution(patch)
Delete
log
calls in https://github.com/PyCQA/redbaron/blob/master/redbaron/base_nodes.py#L1642. But the error occurred in another place. Or rewrite a logging function… But it has already done because a log function call was changed that’s satisfied declaration of this method. I described it above. - Fourth solution(solution)
#first test, test_insert failed, changes have not synchronized yet
def _synchronise(self):
# TODO(alekum): log method calls __repr__ implicitly to get a self.data representation
log("Before synchronise, self.data = '%s' + '%s'" % (self.first_blank_lines, self.data))
#log("Before synchronise, self.data = '%s' + '%s'", self.first_blank_lines, self.data)
super(LineProxyList, self)._synchronise()
#log("After synchronise, self.data = '%s' + '%s'" % (self.first_blank_lines, self.data))
log("After synchronise, self.data = '%s' + '%s'", self.first_blank_lines, self.data)
#second test, test_insert passed, changes have synchronized already
def _synchronise(self):
# TODO(alekum): log method calls __repr__ implicitly to get a self.data representation
#log("Before synchronise, self.data = '%s' + '%s'" % (self.first_blank_lines, elf.data))
log("Before synchronise, self.data = '%s' + '%s'", self.first_blank_lines, self.data)
super(LineProxyList, self)._synchronise()
log("After synchronise, self.data = '%s' + '%s'" % (self.first_blank_lines, self.data))
#log("After synchronise, self.data = '%s' + '%s'",self.first_blank_lines, self.data)
The solution, that I’ve described: actual state of the node are in node_list.
def _synchronise(self):
# TODO(alekum): log method calls __repr__ implicitly to get a self.data representation
log("Before synchronise, self.data = '%s' + '%s'" % (self.first_blank_lines, self.node_list))
#log("Before synchronise, self.data = '%s' + '%s'", self.first_blank_lines, self.data)
super(LineProxyList, self)._synchronise()
log("After synchronise, self.data = '%s' + '%s'" % (self.first_blank_lines, self.node_list))
#log("After synchronise, self.data = '%s' + '%s'",self.first_blank_lines, self.data)
I’ve tested it with master and 0.6.1, all was good.
That was a true solution because it was solving the problem instead of masking them. I think you are a good engineer if you are solving problems and know the difference between hack, patch, and solution.
PS> Update: My PR has been applied. https://github.com/PyCQA/redbaron/pull/122Thanks for your time, will wait for your comments.