Saturn Data

Saturn Data Contact information, map and directions, contact form, opening hours, services, ratings, photos, videos and announcements from Saturn Data, Information Technology Company, 9407 NE VANCOUVER MALL Drive, STE 104 #890, Vancouver, WA.

Saturn Data is a leading provider of quality alternative data products and services from publicly web and mobile sources for unique and timely insights into investment opportunities.

Blink Charging stations
09/11/2022

Blink Charging stations

Data specification of charging stations at Blink

Job postings at Tesla and DocuSign
09/11/2022

Job postings at Tesla and DocuSign

Data specification of tech jobs at Tesla, DocuSign

Internet becomes more and more important in our day-to-day lives. The amount of data produced by Internet has increased ...
09/11/2022

Internet becomes more and more important in our day-to-day lives. The amount of data produced by Internet has increased from 0.1 zettabytes in 2013 to 4.4 zettabytes in 2020. It has become a challenge to scrape this amount of data from internet, store and process it.
- Saturn Data built a scalable and cost-effective solution to handle these challenges:
- Scalable to scape 1000+ mobiles apps
- Reasonable crawling speed
- Store 10+ Petabytes of data
- Ensure quality data and adapt to the mobile app changes
- Query 10+ Petabytes of data

The specs of scraping
The specs consists of what to scrape and the frequency of the scraping. Clients have very different requirements. Spec service and Scheduler are solutions to meet clients' requirements. We convert these requirements into structured data. Scheduler starts the jobs based on the set cadence. Spec service retrieves the structured specs and returns to Fetchers.

Scraping services
Scheduler kicks off the seed fetch service. Seed fetch service gets the seeds of URLs for a mobile app. All the URLs are sent to message queue for asynchronous processing. Fetcher/Renderer reads the URLs and parsed out the desired information from iOS or Android based on the specs, then stores the docs in object storage (S3) and desired information in database. The fetcher also calls Doc Dedupe service to eliminate the duplicate docs and sends next-level unprocessed docs to URL extractor. URL extractor extracts next-level URLs and saves them in database. Seed Fetch service reads the next level URLs and repeats the above process.

Query engine
To allow querying 10+ Petabytes of data, we leverage Apache Spark - Unified Engine for large-scale data analytics. Apache Sparks has these benefits:
- 100x faster than relational database like MySQL
- Easy of Use. Spark SQL is used to query data and friendly for most of engineers and scientists
- Cost effective because of its in-memory data processing

Reporting services
Validation service validates the results by inspecting the data volume, key columns, data aggregation, machine learning models and so on. Monitors and alarms are in place to ensure reliability. The data is delivered in many format like CSV. We also offer business intelligence reports to provide insights for the data (Data mining)

Scalability
Saturn Data collects data at 500+ QPS from mobile apps across the world. So, handing mobile requests at large scale is the key to our business. We optimized Fetcher/Renderer in these ways:
- Event-driven microservice. It can handles URL request at any scale.
- Provisioning a machine is fast. The machine pre-configured the environment, code by a container image.
- When the machine is idle, no cost will be incurred. So, it is cost-effective.

Availability
Saturn Data's services are designed for 99.9% availability across multiple data regions. Here are how we achieved it:
- Built a custom Http Manager that sends requests to a mobile app in a pre-computed rate to ensure no overburden to the target websites. Http Manager retries the requests to increase the availability of our services
- Restore the scraping from the previous stored state at any time. If the target website is down or not available, our service will store the state in the database.
- Data replication and fault tolerance. Data is replicated in eventually consistency.

Conclusion
Scraping data from mobile apps in a scalable and cost-effective manner is very challenging, which requires sophisticated infrastructure.

Saturn Data makes mobile app scraping simple and accessible to everyone. The price starts from $9.99. Contact us today!

Saturn Data built a scalable and cost-effective solution. This blog is about the architecture, components, scalability & availability

Alternative data is the data that published sources outside of the company offers unique and timely insights into the in...
09/10/2022

Alternative data is the data that published sources outside of the company offers unique and timely insights into the investment opportunities. The types of alternative data include but not be limited to the following
- Price tracker
- Product sales
- Product reviews
- Geolocation
- Website usage
- Service traffic
- Social media posts
- User activities
- Jobs

Big Data
With the explosive growth of data, it becomes more and more challenging to handle the alternative data by traditional ways such as Excel. That is how Big Data comes into the play. Big Data refers to datasets that is too big or too slow by the traditional data processing applications. For example, Carvana has 50k+ cars and is collected daily, which produces dataset in 0.5G per month, 6G a year and 30G in 5 years. If we deal with 100 sources, it will produce 50G data per month. It is very challenging for traditional ways to handle the big data. In the next post, I will talk about how Saturn Data handles big data by our scalable and cost-effective infrastructure.

Mobile App Scraping
One of the big data challenges is to capture data. The traditional way is to scrape data from web. Since 2017, mobile traffic mobile has taken the lead in the internet traffic (56% as of 2022) and thus companies spent much more efforts on mobile app development. A mobile app doesn’t function in the same way as a website. So, there are disadvantages for the web scraping solution:
- Mobile app has more data than the web. For example, Grab doesn’t show all the products for a specific store while they do in the mobile app.
- Some startups or small businesses don’t display their services or products on their websites, but they do in the mobile app. This makes web scraping impossible.
- The UI in mobile app is neater and more organized than websites that makes the mobile app scraping more efficient.
Though mobile app scraping is the future way to scrape data but there are challenges in terms of security and cost. For example, most of apps require login and it brings challenges in maintaining user accounts and data compliance.

Legality
Web/Mobile scraping is legal if data is collected on public websites and mobile apps. In April 2022, the Ninth Circuit reaffirmed scraping data that is publicly accessible on the internet is not a violation of the Computer Fraud and Abuse Act, or CFAA. This clarifies the common misconceptions on “Web scraping is illegal”, “Web scraping is operating in a grey are of low”. As long as scrapers are ethical, the data collected by these scrapers is lawful and can be used in investments, marketing, economics, academic research and even personal price tracker for a product in the Wishlist.

Data Compliance
To build an ethical mobile app scraper, we should follow the guidelines, aka data compliance. Data compliance is a process that identifies the applicable governance for data collection, transformation, storage, delivery, and other activities and ensures the compliance with terms for the crawled websites and privacy policies. The data compliance consists of these aspects:
- Compliance Review of the terms and conditions associated with the websites crawled
- Restrict the traffic to crawled websites and reduce the potential interference as possible
- Do not scraping personal data and intellectual property

Conclusion
Alternative data is very useful in many areas and will help in boosting your investment, business and make your day-to-day life better. However, achieving the alternative data is very challenging, which requires sophisticated technologies and data compliance.
Saturn Data makes mobile app scraping simple and accessible to everyone. The price starts from $9.99. Contact us today!

Alternative data is the data that published sources outside of the company offers unique and timely insights into the investment opportunities. The types of alternative data include but not be limited to the following: Price tracker Product sales Product reviews Geolocation Website usage Service tra...

Address

9407 NE VANCOUVER MALL Drive, STE 104 #890
Vancouver, WA
98662

Alerts

Be the first to know and let us send you an email when Saturn Data posts news and promotions. Your email address will not be used for any other purpose, and you can unsubscribe at any time.

Contact The Business

Send a message to Saturn Data:

Share