Shanghai Longfeng practice (1) – Shanghai dragon to carry out data preparation before

In order to ensure the )"

SEMWATCH has not been updated for a long time back, although more blog traffic, but as a non-profit group Bo, when it gives the people who really need a little bit of practical and useful article, that’s enough. As a member of the editorial, I think it is necessary to put this spirit in their own meager strength to continue.

log analysis, I don’t think too much for fixed work, because it is the source of data for the original (raw may sound will feel more?), so we can select the data dimension is almost unlimited. Therefore, especially to analyze and deal with the corresponding according to actual demand.

Then a

demand for some requirements are not particularly high log analysis, can try to use the log analysis system. Although the practical program I personally for all the graphical interface does not take over, but it provides some very good ideas of dimensionality.

web server log

when we started a Shanghai Longfeng work, the first thing to do is to ensure that everything we do can support data rather than your intuition. The main data source of Shanghai dragon from two blocks: Web server logs, third party traffic analysis tool.

Apache, Combine Nginx and other built-in log configuration format commonly used server can satisfy most requirements analysis of Shanghai dragon. It looks like this:

server log can meet the analysis needs of other departments, at least to ensure that the above mentioned items are in the server log. But don’t make any recording data are recorded, select only the actual needs of the part, it will make the website log volume is very large, is not conducive to efficiency analysis. These contents may need to be resolved through communication and maintenance.

111.111.111.111 – "[20/Feb/2012:18:09:25 +0800]" and "GET / HTTP/1.1" 2003121 "贵族宝贝semwatch.org/" and "Mozilla/5.0 (compatible; bot/2.1 +贵族宝贝 noble noble baby; baby贵族宝贝/bot.html

must record information such as access IP, access time, access to the page, the HTTP response status code, access the source and client identification, in the Combine log format there are.

heard that there is a large tourism website is using MongoDB combined with Map/Reduce log analysis, I also used MongoDB to achieve analysis mentioned in the light years log an important part of the function.

Leave a Reply

Your email address will not be published. Required fields are marked *