Aug 22, 2023

How I created UltraStar songs scrapper

When I met my friends recently we had a karaoke. Everything was fine until we wanted to sing latest songs. They were not available anywhere. Our main source of songs takes a lot of time to upload new songs, sometimes years. This meant I had to search for another solution.

I came across this reddit post that listed sources for UltraStar songs. After going through each of them, I saw that the german one had a lot of songs, even the latest one from past month. This was perfect! But..

Where is audio?

Turns out it was just a database of lyrics. They didn't include audio or video for any song. At that time I didn't knew how UltraStar works exactly under the hood, so I just looked at the few songs I had on my disk. They were made of:

That gave me the exact knowledge I was looking for. I created an account to download files from database, and downloaded I'm Just Ken. Or tried to, because I had to wait 25 seconds before I could access the site. I quickly opened dev tools to check what was going on.

A little bit of reverse engineering later

They encourage users to upload their own music. Once they upload, they would reduce waiting time. Reduced time wasn't enough for me though. After quick look into dev tools I noticed that after 25 seconds, a POST request is made with some form data. I opened Insomnia, put the data in along with login cookie and bahm! It worked.

The request is made to the same song link, but is using POST method instead of GET. It also includes this form data:

wd = 1

Where to get audio/music?

I've quickly realized that most of the songs had one comment. It was embedded YouTube video to the uploaded song. It was like a miracle. I knew that probably not every song had the link, but after checking 20 of songs, not a single one had missing link. It was good enough for me.

Also, every song had image cover provided. After checking the source, I saved this link:[ID].jpg

Joining everything together

After some time of testing and debugging I had my script ready and working (although in many pieces). Some songs used different format for metadata, some included path to cover image, some did not. I decided to change each song metadata before saving it. It saved me so much trouble early on, because video path was often made of 3 links to different websites and no, they didn't work with UltraStar.

Before the end of the day, the script was polished and ready to use. We have downloaded ~100 songs and discovered a few bugs that were easy to fix.

Things I learned

Regex was always a hard thing to grasp on for me, but I had to use it few times in this project. I've learned how to use lazy quantifier and I can spot where it should go before testing.

I had to recall how to read files and parse them into objects.

The thing that I had most trouble with - YouTube download. It was made to be asynchronous, but I couldn't await it because it was using callbacks to signalize the end. Not a single solution came to my mind. After some reading I realized I can just wrap everything in promise and either reject or resolve it in callbacks.

This project is available for anyone on my GitHub.

Happy singing!