Bug 26403 - python-nltk new security issue CVE-2019-14751
Summary: python-nltk new security issue CVE-2019-14751
Status: RESOLVED FIXED
Alias: None
Product: Mageia
Classification: Unclassified
Component: Security (show other bugs)
Version: 7
Hardware: All Linux
Priority: Normal normal
Target Milestone: ---
Assignee: QA Team
QA Contact: Sec team
URL:
Whiteboard: MGA7-64-OK
Keywords: advisory, validated_update
Depends on:
Blocks:
 
Reported: 2020-03-31 23:33 CEST by David Walser
Modified: 2022-07-04 21:00 CEST (History)
5 users (show)

See Also:
Source RPM: python-nltk-3.0.3-3.mga7.src.rpm
CVE:
Status comment:


Attachments

Description David Walser 2020-03-31 23:33:35 CEST
openSUSE has issued an advisory today (March 31):
https://lists.opensuse.org/opensuse-updates/2020-03/msg00173.html

The issue is fixed upstream in 3.4.5.
Comment 1 David Walser 2020-04-01 00:08:49 CEST
Fedora has issued an advisory for this on March 27:
https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/thread/ZGZSSEJH7RHH3RBUEVWWYT75QU67J7SE/

Status comment: (none) => Fixed upstream in 3.4.5

Comment 2 David GEIGER 2020-04-01 06:22:30 CEST
Done for mga7!

CC: (none) => geiger.david68210

Comment 3 David Walser 2020-04-01 22:26:08 CEST
Advisory:
========================

Updated python-ntlk package fixes security vulnerability:

A vulnerability was found in NLTK Downloader before 3.4.5 is vulnerable to a
directory traversal, allowing attackers to write arbitrary files via a ../ in
an NLTK package (ZIP archive) that is mishandled during extraction
(CVE-2019-14751).

References:
https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14751
https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/thread/ZGZSSEJH7RHH3RBUEVWWYT75QU67J7SE/
========================

Updated packages in core/updates_testing:
========================
python-nltk-3.4.5-1.mga7

from python-nltk-3.4.5-1.mga7.src.rpm

Assignee: guillomovitch => qa-bugs
Status comment: Fixed upstream in 3.4.5 => (none)

Comment 4 Len Lawrence 2020-04-02 12:28:39 CEST
mga7, x86_64

CVE-2019-14751
https://github.com/mssalvatore/CVE-2019-14751_PoC
Downloaded index.xml and zip-slip.slip.

Running this PoC involves webservers so I am already out of my comfort zone.  Cannot figure out what is meant by these two instructions:

a) Place index.xml and zip-slip.zip in a directory where they will be served by a web server.

b) Change the value in the "Server Index" field to point to the index.xml

Running this PoC might give the user some insight into what the package is for.

So the question is "what would the directory be that is served by a webserver, (httpd presumably) so what - /var/www or create /var/nltk or something like that?"
And further, there is no "Server Index" field in index.xml so I would guess it must be in a config file somewhere, like /etc/httpd/conf?

CC: (none) => tarazed25

Comment 5 Len Lawrence 2020-04-02 12:39:05 CEST
Oops.  Answering one of those after running the downloader.  That provides a gui with a "Server index" field.  That just leaves "where to put index.xml?"
Comment 6 Len Lawrence 2020-04-02 13:04:39 CEST
Flying blind here; placed /var/www/ in the server index field after copying index.xml there and setting destination to current directory.  Hit download and saw a page of errors concluding with "unknown url type" - surprise, surprise.
However the interface continued on a refresh and downloaded a whole load of stuff.  No idea where it went though.
Comment 7 Len Lawrence 2020-04-02 13:52:14 CEST
Tried the downloader again with a valid URL:
Server Index = file:///var/www/index.xml

nltk.download() returned 
"showing info file:///var/www/index.xml"

but /tmp/evil.txt is a no-show.
zip-slip.zip is in the user's home directory and index.xml contains the pointer
url="http://localhost/zip-slip.zip"
so it is possible that that url does not mean what I think it means.
Comment 8 David Walser 2020-04-02 14:09:40 CEST
Directories served by the web server would be /var/www/html (that's the root, aka http://localhost/) or public_html in your homedir if you have mod_userdir.  The configuration for what to use as index files would indeed be in /etc/httpd/conf (httpd.conf is where I'd look first).
Comment 9 Len Lawrence 2020-04-02 17:33:17 CEST
Thanks David.  Modified the site index to file:///var/www/html/index.xml and tried download again.  Activity and no error messages but still no evil file in /tmp.
Next attempt was to modify the directory override section of httpd.conf.  The description warned about fools rushing in, nevertheless I tried it and apache duly choked and refused to restart.  Mended that but apache could not be restarted (error 98, address already in use).
Comment 10 David Walser 2020-04-02 17:48:18 CEST
Make sure you don't have another webserver running (nginx, lighttpd, etc).
Comment 11 Len Lawrence 2020-04-02 18:03:21 CEST
hiawatha was already installed; started that successfully then stopped it and disabled it.
Started apache successfully after that.
Don't know where to go from here.  The package is a standalone and presumably needs to be run in the same way that the PoC is handled.  Shall check for documentation but may have to drop this one altogether - one day wasted already.
Comment 12 Len Lawrence 2020-04-02 18:16:15 CEST
Found a good tutorial but unfortunately its terms require the source book to be acknowledged in full which amounts to advertising and that is against our charter I imagine.  To run some of the examples requires resources which would involve nltk.downloader which I have already failed to use successfully.
Comment 13 Len Lawrence 2020-04-02 19:30:20 CEST
Tried out the downloader using this link:
file:///var/www/html/index.html
and that worked a treat for downloading the all-corpora files.
<lightbulb>
Created /var/www/xml and moved index.xml to the new directory.
# chmode 644 xml/index.xml
Tried the download for the xml file and saw this:
Error downloading u'zipslip' from <http://localhost/zip-slip.zip>:
      <urlopen error [Errno 111] Connection refused>
</lightbulb>
So, no further forward on the PoC but on track for testing the package after the update.  My guess is that the target URL in index.xml needs to be modified.
Comment 14 David Walser 2020-04-02 19:35:10 CEST
If it needs to use a webserver, then you should be using http, not file.  /var/www is not accessible to apache, only /var/www/html.  It sounds like you need to put index.xml there, configure apache to recognize index.xml (and not just index.html) as an index file, and then look at http://localhost/
Comment 15 Len Lawrence 2020-04-02 19:58:08 CEST
This is what I have been doing:
Fiddled about - tried replacing the link in the xml file to url="file:///home/lcl/zip-slip.zip".  The file exists but the response from the downloader was 
"Error with downloaded zip file".  That seems rational because the file is corrupt so how can it be used as a PoC?  Just don't understand the context.

In response to David, comment 14.  What you outline is beyond my pay grade actually and as to the outcome there was no mention of the browser, just /tmp. 

Having already crashed apache with a damaged config file I am very reluctant to try that again without a week or so to familiarize myself with the layout of the file.  I am 81 and slower than molasses on a frosty morning.
Comment 16 Len Lawrence 2020-04-02 20:15:04 CEST
Looks like the wrong file was downloaded.  Starting all over again.  :-((
Comment 17 David Walser 2020-04-02 20:25:24 CEST
I'm just going off what you said in Comment 4, you said the PoC involves a web server, so that means you need to use an http URL, not a file URL.
Comment 18 Len Lawrence 2020-04-02 20:30:11 CEST
Point taken David.

Moved from production system to a test partition on another machine.
New index.xml file in /var/www/html with correct permissions.
Using server index http://localhost/index.xml from now on.  That of course causes problems currently so David's suggestions need to be implemented.
Starting on /etc/httpd/conf/httpd.conf .......
Comment 19 David Walser 2020-04-02 20:42:44 CEST
Mine has this in it:
<IfModule dir_module>
    DirectoryIndex index.html
</IfModule>

So I think you could change it to:
<IfModule dir_module>
    DirectoryIndex index.xml index.html
</IfModule>

and then: systemctl reload httpd
Comment 20 Len Lawrence 2020-04-02 22:25:54 CEST
That was it David.  Mille grazie.
Edited /etc/httpd/conf/httpd.conf according to the recipe and copied the zip-slip.zip file there for good measure.

Reloaded apache and restarted the nltk download server.

Server index = http://localhost/index.xml
Download directory = ~/nltk_data

Hit download, which succeeded without errors.  zip-slip.zip appeared in ~/nltk_data/misc/ and evil.txt landed in /tmp.
$ cat /tmp/evil.txt
This is an evil file

Updated the natural language toolkit.
Deleted /tmp/evil.txt.
$ python
>>> import nltk
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python2.7/site-packages/nltk/__init__.py", line 134, in <module>
    from nltk.text import *
  File "/usr/lib/python2.7/site-packages/nltk/text.py", line 26, in <module>
    from nltk.lm import MLE
  File "/usr/lib/python2.7/site-packages/nltk/lm/__init__.py", line 222, in <module>
    from nltk.lm.models import (
  File "/usr/lib/python2.7/site-packages/nltk/lm/models.py", line 12, in <module>
    from nltk.lm.api import LanguageModel, Smoothing
  File "/usr/lib/python2.7/site-packages/nltk/lm/api.py", line 19, in <module>
    from nltk.lm.vocabulary import Vocabulary
  File "/usr/lib/python2.7/site-packages/nltk/lm/vocabulary.py", line 23, in <module>
    from singledispatch import singledispatch
ImportError: No module named singledispatch

Something to investigate here.  Note that the upstream PoC specifies python3 but there is no indication on this bug that the package is version neutral but I assumed it was and started off with python2.7 with the intention of trying python3 in the tests.  Neutrality does not seem likely for a complex package and sure enough python3 cannot find nltk.  So we have a problem with the update under python2.7.  Do we send it back to the makers?
Comment 21 David Walser 2020-04-02 22:36:02 CEST
Yeah that doesn't look good.  Is it a regression?  If so, leave a feedback marker.
Comment 22 Len Lawrence 2020-04-03 00:55:21 CEST
Adding the feedback marker because it does look like a regression.

Keywords: (none) => feedback

Comment 23 David GEIGER 2020-04-03 06:43:34 CEST
If python version is less than 3.4 python-singledispatch is required, so let's try with python-nltk-3.4.5-1.1.mga7.
Comment 24 Len Lawrence 2020-04-03 10:29:41 CEST
Installed the latest python-nltk, which pulled in python-singledispatch.

Tested the PoC again:

$ ls ~/nltk_data/misc
files/

$ python
>>> import nltk
>>> nltk.download( )
>>>

Download gui came up.
Server Index = http://localhost/index.xml
Download Directory = ~/nltk_data

Hit Download -> Finished downloading collection u'all-nltk'

$ ls ~/nltk_data/misc
files/  zipslip.zip
ls /tmp/evil.txt
ls: cannot access '/tmp/evil.txt': No such file or directory

Good result.
Testing later.
Comment 25 Len Lawrence 2020-04-03 10:30:22 CEST
Oh, and thanks David for the prompt response.
Len Lawrence 2020-04-03 10:31:41 CEST

Keywords: feedback => (none)

Comment 26 Len Lawrence 2020-04-03 21:32:48 CEST
Continuing tests:
Used nltk.downloader( ) to obtain texts from a list of books held on the website https://www.nltk.org/book/.
The tutorial is actually chapters from the book so anybody serious about this package should buy the book.  It provides a gentle introduction to python as well.
Some of the examples require that the matlab package be installed.

All the examples I tried with interactive python worked as expected.

Giving this the green light.

Whiteboard: (none) => MGA7-64-OK

Comment 27 Len Lawrence 2020-04-03 21:38:05 CEST
To paraphrase the remark in comment 26:
Anybody serious about using this package and learning how by referencing this tutorial should buy the book.  That is only fair.
Comment 28 Thomas Andrews 2020-04-04 17:39:21 CEST
Len, David, your persistence is noted and greatly appreciated. This is what QA is all about.

Validating. Advisory in Comment 3, with the updated package number in Comment 23.

CC: (none) => andrewsfarm, sysadmin-bugs
Keywords: (none) => validated_update

Thomas Backlund 2020-04-05 18:33:20 CEST

CC: (none) => tmb
Keywords: (none) => advisory

Comment 29 Mageia Robot 2020-04-05 19:08:20 CEST
An update for this issue has been pushed to the Mageia Updates repository.

https://advisories.mageia.org/MGASA-2020-0160.html

Status: NEW => RESOLVED
Resolution: (none) => FIXED

David Walser 2022-07-04 21:00:33 CEST

Summary: python-ntlk new security issue CVE-2019-14751 => python-nltk new security issue CVE-2019-14751


Note You need to log in before you can comment on or make changes to this bug.