Update README.md

This commit is contained in:
Lucky 2023-11-30 23:50:42 -04:00 committed by GitHub
parent 601ef7d30c
commit 8783d7ae58
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

View File

@ -1,5 +1,8 @@
# 4 Chan Webscraper, Version 2
**Notice:**
I will be reuploading all of the files with their appropriate dates attached into a column. I didn't include dates into the column and I don't know how I overlooked this. Oops. I have 100+ CSVs to edit, but I'm busy with exams right now. Maybe over Chrsitmas break.
Consider doing your own data analysis. If you save your CSV, and make a pull request, I can add it to this repository for plotting word usage changes over time.
**Highlights:**
@ -13,6 +16,7 @@ Consider doing your own data analysis. If you save your CSV, and make a pull req
- Differs from V1 by scraping all replies to OP, and has a much larger noise filter.
- Sentiment analysis is also performed.
- **X number of "posts by this ID" with graphical representation.**
- Working on a "Word by Time Series" plot of word changes over time. I'll need to remove useless words to hone in on important (i.e. relevant) words. For example, "woman," isn't a relevant word, and it's just noise.
**Limitations**
- 4chan will recycle poster IDs, so they are not unique identifiers. Therefore, the data mining on n-pbtid isn't fully accurate for the upper-bound, but it should be closer to representative at the lower-bound.