openSUSE has issued an advisory today (March 31): https://lists.opensuse.org/opensuse-updates/2020-03/msg00173.html The issue is fixed upstream in 3.4.5.
Fedora has issued an advisory for this on March 27: https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/thread/ZGZSSEJH7RHH3RBUEVWWYT75QU67J7SE/
Status comment: (none) => Fixed upstream in 3.4.5
Done for mga7!
CC: (none) => geiger.david68210
Advisory: ======================== Updated python-ntlk package fixes security vulnerability: A vulnerability was found in NLTK Downloader before 3.4.5 is vulnerable to a directory traversal, allowing attackers to write arbitrary files via a ../ in an NLTK package (ZIP archive) that is mishandled during extraction (CVE-2019-14751). References: https://cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2019-14751 https://lists.fedoraproject.org/archives/list/package-announce@lists.fedoraproject.org/thread/ZGZSSEJH7RHH3RBUEVWWYT75QU67J7SE/ ======================== Updated packages in core/updates_testing: ======================== python-nltk-3.4.5-1.mga7 from python-nltk-3.4.5-1.mga7.src.rpm
Assignee: guillomovitch => qa-bugsStatus comment: Fixed upstream in 3.4.5 => (none)
mga7, x86_64 CVE-2019-14751 https://github.com/mssalvatore/CVE-2019-14751_PoC Downloaded index.xml and zip-slip.slip. Running this PoC involves webservers so I am already out of my comfort zone. Cannot figure out what is meant by these two instructions: a) Place index.xml and zip-slip.zip in a directory where they will be served by a web server. b) Change the value in the "Server Index" field to point to the index.xml Running this PoC might give the user some insight into what the package is for. So the question is "what would the directory be that is served by a webserver, (httpd presumably) so what - /var/www or create /var/nltk or something like that?" And further, there is no "Server Index" field in index.xml so I would guess it must be in a config file somewhere, like /etc/httpd/conf?
CC: (none) => tarazed25
Oops. Answering one of those after running the downloader. That provides a gui with a "Server index" field. That just leaves "where to put index.xml?"
Flying blind here; placed /var/www/ in the server index field after copying index.xml there and setting destination to current directory. Hit download and saw a page of errors concluding with "unknown url type" - surprise, surprise. However the interface continued on a refresh and downloaded a whole load of stuff. No idea where it went though.
Tried the downloader again with a valid URL: Server Index = file:///var/www/index.xml nltk.download() returned "showing info file:///var/www/index.xml" but /tmp/evil.txt is a no-show. zip-slip.zip is in the user's home directory and index.xml contains the pointer url="http://localhost/zip-slip.zip" so it is possible that that url does not mean what I think it means.
Directories served by the web server would be /var/www/html (that's the root, aka http://localhost/) or public_html in your homedir if you have mod_userdir. The configuration for what to use as index files would indeed be in /etc/httpd/conf (httpd.conf is where I'd look first).
Thanks David. Modified the site index to file:///var/www/html/index.xml and tried download again. Activity and no error messages but still no evil file in /tmp. Next attempt was to modify the directory override section of httpd.conf. The description warned about fools rushing in, nevertheless I tried it and apache duly choked and refused to restart. Mended that but apache could not be restarted (error 98, address already in use).
Make sure you don't have another webserver running (nginx, lighttpd, etc).
hiawatha was already installed; started that successfully then stopped it and disabled it. Started apache successfully after that. Don't know where to go from here. The package is a standalone and presumably needs to be run in the same way that the PoC is handled. Shall check for documentation but may have to drop this one altogether - one day wasted already.
Found a good tutorial but unfortunately its terms require the source book to be acknowledged in full which amounts to advertising and that is against our charter I imagine. To run some of the examples requires resources which would involve nltk.downloader which I have already failed to use successfully.
Tried out the downloader using this link: file:///var/www/html/index.html and that worked a treat for downloading the all-corpora files. <lightbulb> Created /var/www/xml and moved index.xml to the new directory. # chmode 644 xml/index.xml Tried the download for the xml file and saw this: Error downloading u'zipslip' from <http://localhost/zip-slip.zip>: <urlopen error [Errno 111] Connection refused> </lightbulb> So, no further forward on the PoC but on track for testing the package after the update. My guess is that the target URL in index.xml needs to be modified.
If it needs to use a webserver, then you should be using http, not file. /var/www is not accessible to apache, only /var/www/html. It sounds like you need to put index.xml there, configure apache to recognize index.xml (and not just index.html) as an index file, and then look at http://localhost/
This is what I have been doing: Fiddled about - tried replacing the link in the xml file to url="file:///home/lcl/zip-slip.zip". The file exists but the response from the downloader was "Error with downloaded zip file". That seems rational because the file is corrupt so how can it be used as a PoC? Just don't understand the context. In response to David, comment 14. What you outline is beyond my pay grade actually and as to the outcome there was no mention of the browser, just /tmp. Having already crashed apache with a damaged config file I am very reluctant to try that again without a week or so to familiarize myself with the layout of the file. I am 81 and slower than molasses on a frosty morning.
Looks like the wrong file was downloaded. Starting all over again. :-((
I'm just going off what you said in Comment 4, you said the PoC involves a web server, so that means you need to use an http URL, not a file URL.
Point taken David. Moved from production system to a test partition on another machine. New index.xml file in /var/www/html with correct permissions. Using server index http://localhost/index.xml from now on. That of course causes problems currently so David's suggestions need to be implemented. Starting on /etc/httpd/conf/httpd.conf .......
Mine has this in it: <IfModule dir_module> DirectoryIndex index.html </IfModule> So I think you could change it to: <IfModule dir_module> DirectoryIndex index.xml index.html </IfModule> and then: systemctl reload httpd
That was it David. Mille grazie. Edited /etc/httpd/conf/httpd.conf according to the recipe and copied the zip-slip.zip file there for good measure. Reloaded apache and restarted the nltk download server. Server index = http://localhost/index.xml Download directory = ~/nltk_data Hit download, which succeeded without errors. zip-slip.zip appeared in ~/nltk_data/misc/ and evil.txt landed in /tmp. $ cat /tmp/evil.txt This is an evil file Updated the natural language toolkit. Deleted /tmp/evil.txt. $ python >>> import nltk Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/site-packages/nltk/__init__.py", line 134, in <module> from nltk.text import * File "/usr/lib/python2.7/site-packages/nltk/text.py", line 26, in <module> from nltk.lm import MLE File "/usr/lib/python2.7/site-packages/nltk/lm/__init__.py", line 222, in <module> from nltk.lm.models import ( File "/usr/lib/python2.7/site-packages/nltk/lm/models.py", line 12, in <module> from nltk.lm.api import LanguageModel, Smoothing File "/usr/lib/python2.7/site-packages/nltk/lm/api.py", line 19, in <module> from nltk.lm.vocabulary import Vocabulary File "/usr/lib/python2.7/site-packages/nltk/lm/vocabulary.py", line 23, in <module> from singledispatch import singledispatch ImportError: No module named singledispatch Something to investigate here. Note that the upstream PoC specifies python3 but there is no indication on this bug that the package is version neutral but I assumed it was and started off with python2.7 with the intention of trying python3 in the tests. Neutrality does not seem likely for a complex package and sure enough python3 cannot find nltk. So we have a problem with the update under python2.7. Do we send it back to the makers?
Yeah that doesn't look good. Is it a regression? If so, leave a feedback marker.
Adding the feedback marker because it does look like a regression.
Keywords: (none) => feedback
If python version is less than 3.4 python-singledispatch is required, so let's try with python-nltk-3.4.5-1.1.mga7.
Installed the latest python-nltk, which pulled in python-singledispatch. Tested the PoC again: $ ls ~/nltk_data/misc files/ $ python >>> import nltk >>> nltk.download( ) >>> Download gui came up. Server Index = http://localhost/index.xml Download Directory = ~/nltk_data Hit Download -> Finished downloading collection u'all-nltk' $ ls ~/nltk_data/misc files/ zipslip.zip ls /tmp/evil.txt ls: cannot access '/tmp/evil.txt': No such file or directory Good result. Testing later.
Oh, and thanks David for the prompt response.
Keywords: feedback => (none)
Continuing tests: Used nltk.downloader( ) to obtain texts from a list of books held on the website https://www.nltk.org/book/. The tutorial is actually chapters from the book so anybody serious about this package should buy the book. It provides a gentle introduction to python as well. Some of the examples require that the matlab package be installed. All the examples I tried with interactive python worked as expected. Giving this the green light.
Whiteboard: (none) => MGA7-64-OK
To paraphrase the remark in comment 26: Anybody serious about using this package and learning how by referencing this tutorial should buy the book. That is only fair.
Len, David, your persistence is noted and greatly appreciated. This is what QA is all about. Validating. Advisory in Comment 3, with the updated package number in Comment 23.
CC: (none) => andrewsfarm, sysadmin-bugsKeywords: (none) => validated_update
CC: (none) => tmbKeywords: (none) => advisory
An update for this issue has been pushed to the Mageia Updates repository. https://advisories.mageia.org/MGASA-2020-0160.html
Status: NEW => RESOLVEDResolution: (none) => FIXED
Summary: python-ntlk new security issue CVE-2019-14751 => python-nltk new security issue CVE-2019-14751