Bug 24067

Summary: python-lxml new security issue CVE-2018-19787
Product: Mageia Reporter: David Walser <luigiwalser>
Component: SecurityAssignee: QA Team <qa-bugs>
Status: RESOLVED FIXED QA Contact: Sec team <security>
Severity: major    
Priority: Normal CC: geiger.david68210, herman.viaene, lewyssmith, makowski.mageia, marja11, sysadmin-bugs
Version: 6Keywords: advisory, validated_update
Target Milestone: ---   
Hardware: All   
OS: Linux   
Whiteboard: MGA6-32-OK MGA6-64-OK
Source RPM: python-lxml-3.8.0-1.1.mga6.src.rpm CVE:
Status comment:

Description David Walser 2018-12-25 21:52:42 CET
Fedora has issued an advisory on December 21:
https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/thread/3RVMDZTRGFNPQRD6MD74QL2A5IOBPFXQ/

The issue is fixed upstream in 4.2.5.
Comment 1 David Walser 2018-12-26 02:10:10 CET
Ubuntu has issued an advisory for this on December 10:
https://usn.ubuntu.com/3841-1/
Comment 2 David GEIGER 2018-12-26 05:48:56 CET
Fixed for mga6!

CC: (none) => geiger.david68210

Comment 3 Marja Van Waes 2018-12-26 08:01:00 CET
(In reply to David GEIGER from comment #2)
> Fixed for mga6!

Thanks David :-)

Does someone have time to write an advisory?

Assigning to the Python stack maintainers, CC'ing the registered maintainer.

Assignee: bugsquad => python
CC: (none) => makowski.mageia, marja11

Comment 4 David Walser 2018-12-26 15:57:18 CET
Advisory:
========================

Updated python-lxml packages fix security vulnerability:

An issue was discovered in lxml before 4.2.5. lxml/html/clean.py in the
lxml.html.clean module does not remove javascript: URLs that use escaping,
allowing a remote attacker to conduct XSS attacks, as demonstrated by
"j a v a s c r i p t:" in Internet Explorer (CVE-2018-19787).

References:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2018-19787
https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/thread/3RVMDZTRGFNPQRD6MD74QL2A5IOBPFXQ/
========================

Updated packages in core/updates_testing:
========================
python2-lxml-4.2.5-1.mga6
python3-lxml-4.2.5-1.mga6
python-lxml-docs-4.2.5-1.mga6

from python-lxml-4.2.5-1.mga6.src.rpm

Assignee: python => qa-bugs

Comment 5 Lewis Smith 2018-12-28 11:19:47 CET
Pointers:
- From CVE refs, the only useful one is
  https://lists.debian.org/debian-lts-announce/2018/12/msg00001.html
  "LXML did not remove "javascript:" URLs that used escaping such as
  "j a v a s c r i p t". This is a similar issue to CVE-2014-3146."
- Of the bugs references, only #13326 is relevant (for the CVE above)..
- https://bugs.mageia.org/show_bug.cgi?id=13326#c9
Claire again! She gives an example for the old CVE we should be able to adapt here.

No time now to continue just now.

CC: (none) => lewyssmith

Comment 6 Herman Viaene 2018-12-29 12:11:21 CET
MGA6-32 MATE on IBM Thinkpad R50e
No installation issues
Followed lead of older bug as mentioned above, copying literally:
$ python
Python 2.7.15 (default, May  1 2018, 17:07:49) 
[GCC 5.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html.clean import clean_html
>>> 
>>> html = '''\
... <html>
... <body>
... <a href="javascript:alert(0)">
... aaa</a>
... <a href="javas\x01cript:alert(1)">bbb</a>
... <a href="javas\x02cript:alert(1)">bbb</a>
... <a href="javas\x03cript:alert(1)">bbb</a>
... <a href="javas\x04cript:alert(1)">bbb</a>
... <a href="javas\x05cript:alert(1)">bbb</a>
... <a href="javas\x06cript:alert(1)">bbb</a>
... <a href="javas\x07cript:alert(1)">bbb</a>
... <a href="javas\x08cript:alert(1)">bbb</a>
... <a href="javas\x09cript:alert(1)">bbb</a>
... </body>
... </html>'''
>>> 
>>> print clean_html(html)
<div>
<body>
<a href="">
aaa</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
<a href="">bbb</a>
</body>
</div>
>>> quit()
So at least this woks as before.
Tried the same for python3, following Claire's note on the print command (not copying all previous html commands here, they are exacltly the same, but at the end:
>>> 'print (clean_html(html))'
'print (clean_html(html))'
>>> 
and nothing is displayed.
Waiting for Lewis' comments.

CC: (none) => herman.viaene

Comment 7 Lewis Smith 2018-12-30 21:53:04 CET
Testing M6 x64

Somehow a pkg name mutates from 'python-lxml' to 'python2-lxml'.

BEFORE update: python-lxml-3.8.0-1.1.mga6, python3-lxml-3.8.0-1.1.mga6
? $ rpm -q python2-lxml
package python2-lxml is not installed

Trying Claire's script slightly tweaked re c5 example:-
 $ python
Python 2.7.15 (default, May  1 2018, 17:08:05) 
[GCC 5.4.0] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html.clean import clean_html
>>> html = '''\
... <html>
... <body>
... <a href="javascript:alert(0)">
... aaa</a>
... <a href="j a v a s c r i p t:alert(1)">bbb</a>
... <a href="j a v a s c r i p t:alert(1)">ccc</a>
... <a href="j a v a s c r i p t:alert(1)">ddd</a>
... </body>
... </html>'''
>>> print clean_html(html)
<div>
<body>
<a href="">
aaa</a>
<a href="">bbb</a>
<a href="">ccc</a>
<a href="">ddd</a>
</body>
</div>
 Alas - it gets properly cleaned. Try again:
>>> html = '''\
... <html>
... <body>
... <a href="javascript:alert(0)">
... aaa</a>
... <a href="j a v a \01s c r i p t:alert(1)">bbb</a>
... <a href="j a v a \02s c r i p t:alert(1)">ccc</a>
... <a href="j a v a \03s c r i p t:alert(1)">ddd</a>
... </body>
... </html>'''
>>> print clean_html(html)
<div>
<body>
<a href="">
aaa</a>
<a href="">bbb</a>
<a href="">ccc</a>
<a href="">ddd</a>
</body>
</div>
 All correct.
 $ python3
Python 3.5.3 (default, May 23 2018, 14:20:56) 
[GCC 5.4.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from lxml.html.clean import clean_html
>>> html = '''\
... <html>
... <body>
... <a href="javascript:alert(0)">aaa</a>
... <a href="j a v a \01s c r i p t:alert(1)">bbb</a>
... <a href="j a v a \02s c r i p t:alert(1)">ccc</a>
... <a href="j a v a \03s c r i p t:alert(1)">ddd</a>
... </body>
... </html>'''
>>> print (clean_html(html))
<div>
<body>
<a href="">aaa</a>
<a href="">bbb</a>
<a href="">ccc</a>
<a href="">ddd</a>
</body>
</div>
 which again is correct. So I cannot reproduce the example fault.
----------------------------------------------------------------
AFTER update: The update list now shows python2-lxml.
- python2-lxml-4.2.5-1.mga6.x86_64
- python3-lxml-4.2.5-1.mga6.x86_64
? $ rpm -q python-lxml
package python-lxml is not installed

BTAIM Results were identical (& correct) to before the update for Python[2] & Python3.
-----------------
@ Herman
> but at the end:
> >>> 'print (clean_html(html))'
> 'print (clean_html(html))'
> >>> 
> and nothing is displayed.
Doubtless the bounding quotes altered things.

We both agree that the updated pkgs behave as they should, even if they did so already. So OKs all round, validation, advisory from c4.

Keywords: (none) => advisory, validated_update
Whiteboard: (none) => MGA6-32-OK MGA6-64-OK
CC: (none) => sysadmin-bugs

Comment 8 Mageia Robot 2018-12-31 23:43:15 CET
An update for this issue has been pushed to the Mageia Updates repository.

https://advisories.mageia.org/MGASA-2018-0497.html

Status: NEW => RESOLVED
Resolution: (none) => FIXED